100%(5)100% fanden dieses Dokument nützlich (5 Abstimmungen)

6K Ansichten413 SeitenMar 30, 2010

© Attribution Non-Commercial (BY-NC)

PDF, TXT oder online auf Scribd lesen

Attribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

100%(5)100% fanden dieses Dokument nützlich (5 Abstimmungen)

6K Ansichten413 SeitenAttribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 413

and

Applications

Department of Mathematical Sciences

DeKalb, IL 60115

e-mail: dattab@math.niu.edu

The book is dedicated to my Parents and Father-in-law and Mother-in-law

whose endless blessing has made writing of this book possible

PREFACE

Numerical Linear Algebra is no longer just a subtopic of Numerical Analysis; it has grown

into an independent topic for research over the past few years. Because of its crucial role in

scientic computing, which is a major component of modern applied and engineering research,

numerical linear algebra has become an integral component of undergraduate and graduate curricula

in mathematics and computer science, and is increasingly becoming so in other curricula as well,

especially in engineering.

The currently available books completely devoted to the subject of numerical linear algebra are

Introduction to Matrix Computations by G. W. Stewart, Matrix Computations by G.

H. Golub and Charles Van Loan, Fundamentals of Matrix Computations by David Watkins,

and Applied Numerical Linear Algebra by William Hager. These books, along with the

most celebrated book, The Algebraic Eigenvalue Problem by J. H. Wilkinson, are sources of

knowledge in the subject. I personally salute the books by Stewart and Golub and Van Loan because

I have learned \my numerical linear algebra" from them. Wilkinson's book is a major reference,

and the books by Stewart and Golub and Van Loan are considered mostly to be \graduate texts"

and reference books for researchers in scientic computing.

I have taught numerical linear algebra and numerical analysis at Northern Illinois University,

the University of Illinois, Pennsylvania State University, the University of California{San Diego,

and the State University of Campinas, Brazil. I have used with great success the books by Golub

and Van Loan and by Stewart in teaching courses at the graduate level.

As for introductory undergraduate numerical linear algebra courses, I, like many other instruc-

tors, have taught topics of numerical linear algebra from the popular \numerical analysis" books.

These texts typically treat numerical linear algebra merely as a subtopic, so I have found they do

not adequately cover all that needs to be taught in a numerical linear algebra course. In some under-

graduate books on numerical analysis, numerical linear algebra is barely touched upon. Therefore,

in frustration I have occasionally prescribed the books by Stewart and by Golub and Van Loan as

texts at the introductory level, although only selected portions of these books have been used in

the classroom, and frequently, supplementary class notes had to be provided. When I have used

these two books as \texts" in introductory courses, a major criticism (or compliment, in the view

of some) coming from students on these campuses has been that they are \too rich" and \too vast"

for students new to the subject.

As an instructor, I have always felt the need for a book that is geared toward the undergrad-

uate, and which can be used as an independent text for an undergraduate course in Numerical

Linear Algebra. In writing this book, I hope to fulll such a need. The more recent books Fun-

damentals of Matrix Computations, by David Watkins, and Applied Numerical Linear

Algebra, by William Hager, address this need to some extent.

This book, Numerical Linear Algebra and Applications, is more elementary than most

existing books on the subject. It is an outgrowth of the lecture notes I have compiled over the years

for use in undergraduate courses in numerical linear algebra, and which have been \class-tested"

at Northern Illinois University and at the University of California-San Diego . I have deliberately

chosen only those topics which I consider essential to a study of numerical linear algebra. The

book is intended for use as a textbook at the undergraduate and beginning graduate levels in

mathematics, computer science and engineering. It can also serve as a reference book for scientists

and engineers. However, it is primarily written for use in a rst course in numerical linear algebra,

and my hope is that it will bring numerical linear algebra to the undergraduate classroom.

Here the principal topics of numerical linear algebra, such as Linear Systems, the Matrix Eigen-

value Problem, Singular Value Decomposition, Least Squares Methods, etc., have been covered

at a basic level. The book focuses on development of the basic tools and concepts of numerical

linear algebra and their eective use in algorithm-development. The algorithms are explained in

\step-by-step" fashion. Wherever necessary, I have referred the reader to the exact locations of

advanced treatment of the more dicult concepts, relying primarily on the aforementioned books

by Stewart, Golub and Van Loan, and occasionally on that of Wilkinson.

I have also drawn heavily on applications from dierent areas of science and engineering, such

as: electrical, mechanical and chemical engineering; physics and chemistry; statistics; control theory

and signal and image processing. At the beginning of each chapter, some illustrative case studies

from applications of practical interest are provided to motivate the student. The algorithms are

then outlined, followed by implementational details. MATLAB codes are provided in the appendix

for some selected algorithms. A MATLAB toolkit, called MATCOM, implementing the major

algorithms in Chapters 4 through 8 of the book, is included with the book.

I will consider myself successful and my eorts rewarded if the students taking a rst course in

numerical linear algebra and applications, using this book as a text, develop a rm grasp of the

basic concepts of round-o errors, stability, condition and accuracy, and leave with a knowledge

and appreciation of the core numerical linear algebra algorithms, their basic properties and im-

plementations. I truly believe that the book will serve as the right text for most of the existing

undergraduate and rst year graduate courses in numerical linear algebra. Furthermore, it will

provide enough incentives for the educators to introduce numerical linear algebra courses in their

curricula, if such courses are not in existence already. Prerequisites are a rst course in linear

algebra and good knowledge of scientic programming.

Following is a suggested format for instruction using Numerical Linear Algebra and Ap-

plications as a text. These guidelines have been drawn on the basis of my own teaching and those

of several other colleagues with whom I have had an opportunity to discuss.

1. A First Course in Numerical Linear Algebra (Undergraduate { one semester)

Chapter 2

Chapter 3: (except possibly Section 3.8)

Chapter 4

Chapter 5: 5.1{5.4, 5.5.1, 5.6

Chapter 6: 6.2, (some selected topics from Section 6.3), 6.4, 6.5.1, 6.5.3, 6.5.4, 6.6, 6.7, 6.9,

6.10.1{6.10.4

Chapter 7: 7.2, 7.3, 7.4, 7.5, 7.6, 7.8.1, 7.8.2

Chapter 8: 8.2, (some selected topics of Section 8.3), 8.4, 8.5.1, 8.5.3, 8.6.1, 8.6.2, 8.7, 8.9.1,

8.9.2, 8.9.3, 8.9.4

Possibly also some very selected portions of Chapter 9 and Chapter 10, depending upon the

availability of time and the interests of the students and instructors

2. A Second Course in Numerical Linear Algebra (Advanced Undergraduate, First Year

Graduate { one semester)

Chapter 1: 1.3.5, 1.3.6

Chapter 5: 5.5, 5.6, 5.7, 5.8

Chapter 6: 6.3, 6.5.2, 6.5.5, 6.8, 6.10.5, 6.10.6

Chapter 7: 7.7, 7.8, 7.9, 7.10, 7.11

Chapter 8: 8.3, 8.8, 8.9, 8.10, 8.11, 8.12

Chapter 9: 9.2, 9.3, 9.4, 9.5, 9.6.1, 9.8, 9.9, 9.10

Chapter 10

Chapter 11

Chapter 1

Chapter 2

Chapter 3 (except possibly Section 3.8)

Chapter 4

Chapter 5: 5.1,5.2, 5.3, 5.4

Chapter 6: 6.2, 6.3, 6.4, 6.5.1, 6.5.3, 6.6.3 (only the statement and implication of Theorem

6.6.3), 6.7.1, 6.7.2, 6.7.3, 6.7.8, 6.8, 6.9, 6.10.1, 6.10.2, 6.10.3, 6.10.4, 6.10.5

Chapter 7: 7.3, 7.5, 7.8.1, 7.8.2

Chapter 8: 8.2, 8.3, 8.4, 8.5, 8.6.1, 8.6.2, 8.7.1, 8.9.1, 8.9.2, 8.9.3, 8.9.4, 8.9.6, 8.12

Chapter 9

Chapter 10: 10.2, 10.3, 10.4, 10.5, 10.6.1, 10.6.3, 10.6.4, 10.8.1, 10.9.1, 10.9.2

CHAPTER-WISE BREAKDOWN

Chapter 1, Some Required Concepts from Core Linear Algebra, describes some important results

from theoretical linear algebra. Of special importance here are vector and matrix norms, special

matrices and convergence of the sequence of matrix powers, etc., which are essential to the under-

standing of numerical linear algebra, and which are not usually covered in an introductory linear

algebra course.

Chapter 2 is on Floating Point Numbers and Errors in Computations. Here the concepts of

oating point number systems and rounding errors have been introduced and it has been shown

through examples how round-o errors due to cancellation and recursive computations can \pop

up", even in simple calculations, and how these errors can be reduced in certain cases. The IEEE

oating point standard has been discussed.

Chapter 3 deals with Stability of Algorithms and Conditioning in Problems. The basic concepts

of conditioning and stability, including strong and weak stability, have been introduced and exam-

ples have been given on unstable and stable algorithms and ill-conditioned and well-conditioned

problems. It has been my experience, as an instructor, that many students, even after taking a

few courses on numerical analysis, do not clearly understand that \conditioning" is a property of

the problem, stability is a property of the algorithm, and both have eects on the accuracy of the

solution. Attempts have been made to make this as clear as possible.

It is important to understand the distinction between a \bad" algorithm and a numerically

eective algorithm and the fact that popular mathematical software is based only on numerically

eective algorithms. This is done in Chapter 4, Numerically Eective Algorithms and Mathematical

Software. The important properties such as eciency, numerical stability, storage-economy, etc.,

that make an algorithm and the associated software \numerically eective" are explained with

examples. In addition, a brief statement regarding important matrix software such as LINPACK,

EISPACK, IMSL, MATLAB, NAG, LAPACK, etc., is given in this chapter.

Chapter 5 is on Some Useful Transformations in Numerical Linear Algebra and Their Appli-

cations. The transformations such as elementary transformations, Householder re
ections, and

Givens rotations form the principal tools of most algorithms of numerical linear algebra. These

important tools are introduced in this chapter and it is shown how they are applied to achieve the

important decompositions such as LU and QR, and reduction to Hessenberg forms. This chapter

is a sort of \preparatory" chapter for the rest of the topics treated in the book.

Chapter 6 deals with the most important topic of numerically linear algebra, Numerical So-

lutions of Linear Systems. The direct methods such as Gaussian elimination with and without

pivoting, QR factorization methods, the method based on Cholesky decompositions, the methods

that take advantage of special structure of matrices, etc., the standard iterative methods such as

the Jacobi, the Gauss-Seidel, successive overrelation, iterative renement, and the perturbation

analysis of the linear system and computations of determinants, inverses and leading principal mi-

nors are discussed in this chapter. Some motivating examples from applications areas are given

before the techniques are discussed.

The Least Squares Solutions to Linear Systems, discussed in Chapter 7, are so important in

applications that the techniques for nding them should be discussed as much as possible, even

in an introductory course in numerical linear algebra. There are users who still routinely use the

normal equations method for computing least squares solution; the numerical diculties associated

with this approach are described in some detail and then a better method, based on the QR

decomposition for the least squares problems, is discussed. The most reliable general-purpose

method based on the singular value decomposition is mentioned in this chapter, and treated in full in

Chapter 10. The QR methods for rank-decient least squares problem and for the underdetermined

problem, and iterative renement procedures are also discussed in this chapter. Some discussion

on perturbation analysis is also included.

Chapter 8 picks up another important topic, probably the second most important topic, Nu-

merical Matrix Eigenvalue Problems. There are users who still believe that eigenvalues should be

computed by nding the zeros of the characteristic polynomial. It is clearly explained why this is

not a good general rule. The standard and most widely used techniques for eigenvalues computa-

tions, the QR iteration with and without shifts, are then discussed in some detail. The popular

techniques for eigenvector computations such as the inverse power method and the Rayleigh Quo-

tient Iteration are described, along with techniques of eigenvalue locations. The most common

methods for the symmetric eigenvalue problem and the symmetric Lanczos method are described

very brie
y in the end. Discussion on the stability of dierential and dierence equations, and

engineering applications to the vibration of structures and a stock market example from statistics,

are included which will serve as motivating examples for the students.

Chapter 9 deals with The Generalized Eigenvalue Problem (GEP). The GEP arises in many

practical applications such as in mechanical vibrations, design of structures, etc. In fact, in these

applications, almost all eigenvalue problems are generalized eigenvalue problems, most of them

are symmetric denite problems. We rst present a generalized QR iteration for the pair (A; B ),

commonly known as the QZ iteration for GEP. Then we discuss in detail techniques of simultaneous

diagonalization for generalized symmetric denite problems. Some applications of simultaneous

diagonalization techniques, such as decoupling of a system of second-order dierential equations,

are described in some detail. Since several practical applications, e.g. the design of large sparse

structures, give rise to very large-scale generalized denite eigenvalue problems, a brief discussion

on Lanczos-based algorithms for such problems is also included. In addition, several case studies

from vibration and structural engineering are presented. A brief mention is made of how to reduce

a quadratic eigenvalue problem to a standard eigenvalue problem, or to a generalized eigenvalue

problem.

The Singular Value Decomposition (SVD) and singular values play important roles in a wide

variety of applications. In Chapter 10, we rst show how the SVD can be used eectively to solve

computational linear algebra problems arising in applications, such as nding the structure of a

matrix (rank, nearness to rank-deciency, orthonormal basis for the range and the null space of a

matrix, etc.), nding least squares solutions to linear systems, computing the pseudoinverse, etc.

We then describe the most widely used method, the Golub-Kahan-Reinsch method, for computing

the SVD and its modication by Chan. The chapter concludes with the description of a very recent

method for computing the smallest singular values of a bidiagonal matrix with high accuracy by

Demmel and Kahan. A practical life example on separating the fetal ECG from the maternal ECG

is provided in this chapter as a motivating example.

The stability (or instability) of an algorithm is usually established by means of backward round-

o error analysis, introduced and made popular by James Wilkinson. Working out the details of

round-o error analysis of an algorithm can be quite tedious, and presenting such analysis for

every algorithm is certainly beyond the scope of this book. At the same time, I feel that every

student of numerical linear algebra should have some familiarity with the way rounding analysis

of an algorithm is performed. We have given the readers A Taste of Round-o Error Analysis

in Chapter 11 of the book by presenting such analyses of two popular algorithms: solution of a

triangular system and Gaussian elimination for triangularization. For other algorithms, we just

present the results (without proof) in the appropriate places in the book, and refer the readers to

the classic text The Algebraic Eigenvalue Problem by James H. Wilkinson and occasionally

to the book of Golub and Van Loan for more details and proofs.

The appendix contains MATLAB codes for a selected number of basic algorithms. Students

will be able to use these codes as a template for writing codes for more advanced algorithms. A

MATLAB toolkit containing implementation of some of the most important algorithms has been

included in the book, as well. Students can use this toolkit to compare dierent algorithms for the

same problem with respect to eciency, accuracy, and stability. Finally, some discussion of how to

write MATLAB programs will be included.

Some Basic Features of the Book

NUMERICAL LINEAR ALGEBRA AND APPLICATIONS

Clear explanation of the basic concepts. The two most fundamental concepts of Numer-

ical Linear Algebra, namely, the conditioning of the problem and the stability of an algorithm

via backward round-o error analysis, are introduced at a very early stage of the book with

simple motivating examples.

Specic results on these concepts are then stated with respect to each algorithm and problem

in the appropriate places, and their in
uence on the accuracy of the computed results is

clearly demonstrated. The concepts of weak and strong stability, recently introduced by

James Bunch, will appear for the rst time in this book.

Most undergraduate numerical analysis textbooks are somewhat vague in explaining these

concepts which, I believe, are fundamental to numerical linear algebra.

Discussion of fundamental tools in a separate chapter. Elementary, Householder and

Givens matrices are the three most basic tools in numerical linear algebra. Most computation-

ally eective numerical linear algebra algorithms have been developed using these basic tools

as principal ingredients. A separate chapter (Chapter 5) has been devoted to the introduc-

tion and discussion of these basic tools. It has been clearly demonstrated how a simple|but

very powerful|property of these matrices, namely, the ability of introducing zeros in specic

positions of a vector or of a matrix, can be exploited to develop algorithms for useful matrix

factorizations such as LU and QR and for reduction of a matrix to a simple form such as

Hessenberg.

In my experience as a teacher, I have seen that once students have been made familiar with

these basic tools and have learned some of their most immediate applications, the remainder

of the course goes very smoothly and quite fast.

Throughout the text, soon after describing a basic algorithm, it has been shown how the

algorithm can be made cost-eective and storage-ecient using the rich structures of these

matrices.

Step-by-step explanation of the algorithms. The following approach has been adopted

in the book for describing an algorithm: the rst few steps of the algorithm are described in

detail and in an elementary way, and then it is shown how the general kth step can be written

following the pattern of these rst few steps. This is particularly helpful to the understanding

of an algorithm at an undergraduate level.

Before presenting an algorithm, the basic ideas, the underlying principles and a clear goal

of the algorithm have been discussed. This approach appeals to the student's creativity and

stimulates his interest. I have seen from my own experience that once the basic ideas, the

mechanics of the development and goals of the algorithm have been laid out for the student,

he may then be able to reproduce some of the well-known algorithms himself, even before

learning them in the class.

Clear discussion of numerically eective algorithms and high-quality mathemat-

ical software. Along with mathematical software, a clear and concise denition of a \nu-

merically eective" algorithm has been introduced in Chapter 3 and the important properties

such as eciency, numerical stability, storage-economy etc., that make an algorithm and asso-

ciated software numerically eective, have been explained with ample simple examples. This

will help students not only to understand the distinction between a \bad" algorithm and a

numerically eective one, but also to learn how to transform a bad algorithm into a good

one, whenever possible. These ideas are not clearly spelled out in undergraduate texts and

as result, I have seen students who, despite having taken a few basic courses in numerical

analysis, remain confused about these issues.

For example, an algorithm which is only ecient is often mistaken by students for a \good"

algorithm, without understanding the fact that an ecient algorithm can be highly unstable

(e.g., Gaussian elimination without pivoting).

Applications. A major strength of the book is applications. As a teacher, I have often

been faced with questions such as: \Why is it important to study such-and-such problems?",

\Why do such-and-such problems need to be solved numerically?", or \What is the physical

signicance of the computed quantities?" Therefore, I felt it important to include practical

life examples as often as possible, for each computational problem discussed in the book.

I have done so at the outset of each chapter where numerical solutions of a computational

problem have been discussed. The motivating examples have been drawn from applications

areas, mainly from engineering; however, some examples from statistics, business, bioscience,

and control theory have also been given. I believe these examples will provide sucient

motivation to the curious student to study numerical linear algebra.

After a physical problem has been posed, the physical and engineering signicance of its

solution has been explained to some extent. The currently available numerical linear algebra

and numerical analysis books do not provide suciently motivating examples.

MATLAB codes and the MATLAB toolkit. The use of MATLAB is becoming increas-

ingly popular in all areas of scientic and engineering computing. I feel that numerical linear

algebra courses should be taught using MATLAB wherever possible. Of course, this does

not mean that the students should not learn to write FORTRAN codes for their favorite

algorithms{knowledge of FORTRAN is a great asset to a numerical linear algebra student.

MATLAB codes for some selected basic algorithms have therefore been provided to help the

students use these codes as templates for writing codes for more advanced algorithms. Also,

a MATLAB toolkit implementing the major algorithms presented in the book has been pro-

vided. The students will be able to compare dierent algorithms for the same problem with

regard to eciency, stability, and accuracy. For example, the students will be able to see in-

stantly, through numerical examples, why Gaussian elimination is more ecient than the QR

factorization method for linear systems problems; why the computed Q in QR factorization

may be more accurate with the Householder or Givens method than with the Gram-Schmidt

methods, etc.

Thorough discussions and the most up-to-date information. Each topic has been

very thoroughly discussed, and the most current information on the state of the problem has

been provided. The most frequently asked questions by the students have also been answered.

Solutions and answers to selected problems. Partial solutions for selected important

problems and, in some cases, complete answers, have been provided. I feel this is important

for our undergraduate students. In selecting the problems, emphasis has been placed on those

problems that need proofs.

Above all, I have imparted to the book my enthusiasm and my unique style of presenting

material in an undergraduate course at the level of the majority of students in the class, which

have made me a popular teacher. My teaching evaluations at every school at which I have taught

(e.g., State University of Campinas, Brazil; Pennsylvania State University; the University of Illinois

at Urbana-Champaign; University of California, San Diego; Northern Illinois University, etc.) have

been consistently \excellent" or \very good". As a matter of fact, the consistently excellent feedback

that I receive from my students provided me with enough incentive to write this book.

0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND COMPUTA-

TIONAL DIFFICULTIES

0.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1

0.2 Fundamental Linear Algebra Problems and Their Importance : : : : : : : : : : : : : 1

0.3 Computational Diculties of Solving Linear Algebra Problems Using Obvious Ap-

proaches : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4

CHAPTER 0

LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND

COMPUTATIONAL DIFFICULTIES

0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE

AND COMPUTATIONAL DIFFICULTIES

0.1 Introduction

The main objectives of this chapter are to state the fundamental linear algebra problems at the

outset, make a brief mention of their importance, and point out the diculties that one faces in

computational setting when trying to solve these problems using obvious approaches.

The fundamental linear algebra problems are:

n-vector b, the problem is to nd an n-vector x such that Ax = b.

A practical variation of the problem requires solutions of several linear systems with the same

matrix A on the left hand side. That is, the problem there is to nd a matrix X = [x1; x2; ::; xm]

such that

AX = B;

where B = [b1; b2; :::; bm] is an n m matrix.

Associated with linear system problems are the problems of nding the inverse of a matrix,

nding the rank, the determinant, the leading principal minors, an orthonormal basis for the range

and the null-space of A, and various projection matrices associated with A. Solutions of some of

these later problems require matrix factorizations and the problem of matrix factorizations and

linear system problems are intimately related.

It is perhaps not an exaggeration to say that the linear system problem arises in almost all

branches of science and engineering: applied mathematics, biology, chemistry, physics, electrical,

mechanical, civil, and vibration engineering, etc.

The most common source is the numerical solution of dierential equations. Many mathe-

matical models of physical and engineering systems are systems of dierential equations: ordinary

and partial. A system of dierential equations is normally solved numerically by discretizing the

system by means of nite dierences or nite element methods. The process of discretization, in

1

general, leads to a linear system, the solution of which is an approximate solution to the dierential

equations. (see Chapter 6 for more details).

the least squares problem is to nd an n-vector x such that the norm of the residual

vector, kAx ; bk2, is as small as possible.

Least squares problems arise in statistical and geometric applications that require tting a

polynomial or curve to experimental data, and engineering applications such as signal and image

processing. See Chapter 7 for some specic applications of least squares problems. It is worth

mentioning here that methods for numerically solving least squares problems invariably lead to

solutions of linear systems problems (see again Chapter 7 for details).

numbers i and n-vectors xi such that

Axi = i xi; i = 1; :::; n:

The eigenvalue problem typically arises in the explicit solution and stability analysis of a ho-

mogeneous system of rst order dierential equations. The stability analysis requires only implicit

knowledge of eigenvalues, whereas the explicit solution requires eigenvalues, and eigenvectors ex-

plicitly.

Applications such as buckling problems, stock market analysis, study of behavior of dynamical

systems, etc. require computations of only a few eigenvalues and eigenvectors, usually the few

largest or smallest ones.

In many practical instances, the matrix A is symmetric, and thus, the eigenvalue problem

becomes a symmetric eigenvalue problem. For details of some specic applications see Chapter 8.

A great number of eigenvalue problems arising in engineering applications are, however, generalized

eigenvalue problems, as stated below.

2

D. The Generalized and Quadratic Eigenvalue Problems: Given the n n

matrices A; B , and C , the problem is to nd i and xi such that

(2i A + iC + B )xi = 0; i = 1; : : :; n:

This is known as the quadratic eigenvalue problem. In the special case when C is a zero

matrix, the problem reduces to a generalized eigenvalue problem. That is, if we are given n n

matrices A and B , we must nd and x such that

Ax = Bx:

The leading equations of vibration engineering (a branch of engineering dealing with vibrations of

structures, etc.) are systems of homogeneous or nonhomogeneous second-order dierential equa-

tions. A homogeneous second order system has the form:

Az + C z_ + Bz = 0 ;

the solution and stability analysis of which lead to a quadratic eigenvalue problem.

Vibration problems are usually solved by setting C = 0.Moreover, in many

practical instances, the matrices A and B are symmetric and positive denite. This leads to a

symmetric denite generalized eigenvalue problem.

See Chapter 9 for details of some specic applications of these problems.

problem is to nd unitary matrices U and V , and a \diagonal" matrix such that

A = U ()V :

The above decomposition is known as the Singular Value Decomposition of A. The entries

of are singular values. The column vectors of U and V are called the singular vectors.

Many areas of engineering such as control and systems theory, biomedical engineering, signal

and image processing, and statistical applications give rise to the singular value decomposition

problem. These applications typically require the rank of A, an orthonormal basis, projections, the

3

distance of a matrix from another matrix of lower rank, etc., in the presence of certain impurities

(known as noises) in the data. The singular values and singular vectors are the most numerically

reliable tools to nd these entities. The singular value decomposition is also the most numerically

eective approach to solve the least squares problem, especially, in the rank-decient case.

vious Approaches

In this section we would like to point out some computational diculties one might face while

attempting to solve some of the above-mentioned linear algebra problems using \obvious" ways.

Solving a Linear System by Cramer's Rule: Cramer's Rule, as taught at an under-

graduate linear algebra course, is of signicant theoretical and historical importance (for

a statement of this rule, see Chapter 6). Unfortunately, it can not be recommended as a

practical computational procedure.

Solving a 20 20 linear system, even on a fast modern-day computer, might take more than

a million years to compute the solution with this rule.

Computing the unique solution of a linear system by matrix inversion: The unique

solution of a nonsingular linear system can be written explicitly as x = A;1b.

Unfortunately, computing a solution to a linear system by rst explicitly computing the

matrix inverse is not practical.

The computation of the matrix inverse is about three times as expensive as solving the linear

system problem itself using a standard elimination procedure (see Chapter 6), and often leads

to more inaccuracies. Consider a trivial example: Solve 3x = 27. An elimination procedure

will give x = 9 and require only one division. On the other hand, solving the equation using

matrix inversion will be cast as x = (1=3) 27, giving x = 0:3333 27 = 8:999 (in four digit

arithmetic), and will require one division and one multiplication.

Note that computer time consumed by an algorithm is theoretically measured by the number

of arithmetic operations needed to execute the algorithm.

Solving a least squares problem by normal equations: If the m n matrix A has

full rank, and m is greater than or equal to n, then the least squares problem has a unique

solution and this solution is theoretically given by the solution x to the linear system

AT Ax = AT b:

4

The above equations are known as the normal equations. Unfortunately, this procedure

has some severe numerical limitations. First, in nite precision arithmetic, during an explicit

formation of AT A, some vital information might be lost. Second, the normal equations are

more sensitive to perturbations than the ordinary linear system Ax = b, and this sensitivity,

in certain instances, corrupts the accuracy of the computed least squares solution to an extent

not warranted by the data. (See Chapter 7 for more details.)

Computing the eigenvalues of a matrix by nding the roots of its characteristic

polynomial: The eigenvalues of a matrix A are the zeros of its characteristic polynomial.

Thus an \obvious" procedure for nding the eigenvalues would be to compute the charac-

teristic polynomial of A and then nd its zeros by a standard well-established root-nding

procedure. Unfortunately, this is not a numerically viable approach. The round-o errors pro-

duced during a process for computing the characteristic polynomial, will very likely produce

some small perturbations in the computed coecients. These small errors in the coecients

can aect the computed zeros very drastically in certain cases. The zeros of certain poly-

nomials are known to be extremely sensitive to small perturbations in the coecients. A

classic example of this is the Wilkinson-polynomial (see Chapter 3). Wilkinson took a poly-

nomial of degree 20 with the distinct roots: 1 through 20, and perturbed the coecient of x19

by a signicantly small amount. The zeros of this slightly perturbed polynomial were then

computed by a well-established root-nding procedure, only to nd that some zeros became

totally dierent. Some even became complex.

Solving the Generalized Eigenvalue Problem and the Quadratic Eigenvalue Prob-

lems by Matrix Inversion: The generalized eigenvalue problem in the case where B is

nonsingular

Ax = Bx

is theoretically equivalent to the ordinary eigenvalue problem:

B;1 Ax = x:

However, if the nonsingular matrix B is sensitive to perturbations, then forming the matrix

on the left hand side by explicitly computing the inverse of B will lead to inaccuracies that

in turn will lead to computations of inaccurate generalized eigenvalues.

Similar results hold for the quadratic eigenvalue problem. In major engineering applications,

such as in vibration engineering, the matrix A is symmetric positive denite, and is thus

5

nonsingular. In that case the quadratic eigenvalue problem is equivalent to the eigenvalue

problem

Eu = u ; where

0 I

!

E= :

;A B ;A C

;1 ;1

But numerically it is not advisable to solve the quadratic eigenvalue problem by actually

computing the matrix E explicitly. If A is sensitive to small perturbations, the matrix E

cannot be formed accurately, and the computed eigenvalues will then be inaccurate.

Finding the Singular Values by computing the eigenvalues of AT A: Theoretically,

the singular values of A are the nonnegative square roots of the eigenvalues of AT A. However,

nding the singular values this way is not advisable. Again, explicit formation of the matrix

might lead to the loss of signicant relevant information. Consider a rather trivial example:

01 11

B CC

AB@ 0A

0 0

where is such that!in nite precision computation 1 + 2 = 1. Then computationally we

1 1

have AT A = . The eigenvalues now are 2 and 0. So the computed singular values

1 1 p p p

will now be given by 2, and 0. The exact singular values, however are 2 and = 2 (See

Chapter 10 for details.)

Conclusion: Above we have merely pointed out how certain obvious theoretical approaches to

linear algebra problems might lead to computational diculties and inaccuracies in computed re-

sults. Numerical linear algebra deals with in-depth analysis of such diculties, investigations of how

these diculties can be overcome in certain instances, and with formulation and implementations

of viable numerical algorithms for scientic and engineering use.

6

1. A REVIEW OF SOME REQUIRED CONCEPTS FROM CORE LINEAR ALGE-

BRA

1.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7

1.2 Vectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7

1.2.1 Subspace and Basis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8

1.3 Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9

1.3.1 Range and Null Spaces : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13

1.3.2 Rank of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13

1.3.3 The Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14

1.3.4 Similar Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15

1.3.5 Orthogonality and Projections : : : : : : : : : : : : : : : : : : : : : : : : : : 15

1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix : : : : 17

1.4 Some Special Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18

1.5 The Cayley-Hamilton Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26

1.6 Singular Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27

1.7 Vector and Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28

1.7.1 Vector Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28

1.7.2 Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30

1.7.3 Convergence of a Matrix Sequence and Convergent Matrices : : : : : : : : : : 34

1.7.4 Norms and Inverses : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37

1.8 Norm Invariant Properties of Orthogonal Matrices : : : : : : : : : : : : : : : : : : : 40

1.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 41

1.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 42

CHAPTER 1

A REVIEW OF SOME REQUIRED CONCEPTS

FROM CORE LINEAR ALGEBRA

1. A REVIEW OF SOME REQUIRED CONCEPTS FROM

CORE LINEAR ALGEBRA

1.1 Introduction

Although a rst course in linear algebra is a prerequisite for this book, for the sake of completeness

we establish some notation and quickly review the basic denitions and concepts on matrices and

vectors in this chapter, and then discuss in somewhat greater detail the concepts and fundamen-

tal results on vector and matrix norms and their applications to the study of convergent

matrices. These results will be used frequently in the later chapters of the book.

1.2 Vectors

An ordered set of numbers is called a vector; the numbers themselves are called the components

of the vector. A lower case italic letter is usually used to denote a vector. A vector v having n

components has the form 2 3 v

66 v 1

77

v = 666 .. 2 77 :

75

4.

vn

A vector in this form is referred to as a column vector and its transpose is a row vector. The

set of all n-vectors (that is, each vector having n components) will be denoted by Rn1 or simply

by Rn . The transpose of a vector v will be denoted by v T . Unless otherwise stated, a column vector

will simply be called a vector.

If u and v are two row vectors in Rn, then their sum u + v is dened by

u + v = (u + v ; u + v ; : : :; um + vn)T :

1 1 2 2

If c is a scalar, then cu = (cu1; cu2; : : :; cun)T . The inner product of two vectors u and v is

the scalar given by

uvT = u1 v1 + u2 v2 + + unvn:

p

The length of a vector v , denoted by kv k, is v T v; that is, the length of v (or Euclidean length of

p

v) is v12 + v22 + + vn2 .

A set of vectors fm1; : : :; mk g in Rn is said to be linearly dependent if there exist scalars

c1; : : :; ck, not all zero, such that

c m + + ck mk = 0

1 1 (zero vector).

Otherwise, the set is called linearly independent.

7

Example 1.2.1

The set of vectors

ei = (0; 0; : : :; 0; 1 ; 0 : : :; 0)T ; i = 1; : : :; n

"

ith component

is linearly independent, because

0c 1

BB c 1

CC

c e + c e + + cnen = BBB .. 2 CC = 0

CA

@.

1 1 2 2

cn

is true if and only if

c = c = = cn = 0:

1 2

Example 1.2.2

1 ;3 ! !

The vectors and are linearly dependent, because

;2 6

1

!

;3 0

! !

3 + = :

;2 6 0

Thus, c1 = 3; c2 = 1.

Orthogonality of two vectors : The angle between two vectors u and v is given by

cos() = kuukkvv k

T

.

Two vectors u and v are orthogonal if = 90o , that is uT v = 0. The symbol ? is used to denote

orthogonality.

Let S be a set of vectors in Rn . Then S is called a subspace of Rn if s1 ; s2 2 S implies

c1s1 + c2 s2 2 S , where c1 and c2 are any scalars. That is, S is a subspace if any linear combination

of two vectors in S is also in S . Note that the space Rn itself is a subspace of Rn . For every

subspace there is a unique smallest positive integer r such that every vector in the subspace can be

8

expressed as a linear combination of at most r vectors in the subspace; r is called the dimension

of the subspace and is denoted by dim[S ]. Any set of r linearly independent vectors from S of

dim[S ] = r forms a basis of the subspace.

Orthogonality of Two Subspaces. Two subspaces S1 and S2 of Rn are said to be orthogonal

if sT1 s2 = 0 for every s1 2 S1 and every s2 2 S2 . Two orthogonal subspaces S1 and S2 will be

denoted by S1 ? S2.

1.3 Matrices

A collection of n vectors in Rn arranged in a rectangular array of m rows and n columns is called

a matrix. A matrix A, therefore, has the form

0a b b n 1

BB a 11 12

b

1

b n C

C

A=BBB .. 21 22 2 CC :

CA

@ .

am bm bmn

1 2

and j = 1; : : :; n. A is said to be of order m n. The set of all m n matrices is denoted by Rmn .

A matrix A having the same number of rows and columns is called a square matrix. The

square matrix having 1's along the main diagonal and zeros everywhere else is called the identity

matrix and is denoted by I .

The sum of two matrices A = (aij ) and B = (bij ) in Rmn is a matrix of the same order as A

and B and is given by

A + B = (aij + bij ):

If c is a scalar, then cA is a matrix given by

cA = (caij ):

Let A be m n and B be n p. Then their product AB is an m p matrix given by

X n i = 1; : : :; m;

AB = ( aikbkj );

=1 k j = 1; : : :; p:

Note that if b is a column vector, then Ab is a column vector. On the other hand, if a is a

column vector and bT is a row vector, then abT is a matrix, known as the outer product of the

two vectors a and b. Thus 0 1 0a b a b 1

a m

BB a C

1

C B

1 1

B a b

1

a b

CC

abT = B B C B .. C

m

B@ ... C ( b b bm ) = B C:

2 2 1 2

C B ..

1

A

2

@ . . C A

an anb an bm

1

9

Example 1.3.1

011

B CC

a=B

@2A; b = (2 3 4)

3

02 3 4 1

B C

Outer product abT = B@ 4 6 8 CA (a matrix).

6 9 120 1

2

Inner product aT b = ( 1 2 3 ) B@ 3 CCA = 20 (a scalar).

B

4

The transpose of a matrix A of order m n, denoted by AT , is a matrix of order n m with

rows and columns interchanged:

i = 1; : : :n;

AT = (aji);

j = 1; : : :m:

Note that the matrix product is not commutative; that is, in general

AB 6= BA:

Also, (AB )T = B T AT .

A = (A)T = A,

where A is the complex conjugate of A. A real matrix is symmetric if AT = A.

Writing B = (b1; : : :; bp), where bi is the ith column of B , the matrix product AB can be written as

AB = (Ab ; : : :; Abp):

1

0a B1

BB a B CC

1

AB = BBB .. CCC :

2

@ . A

am B

10

Block Matrices

If two matrices A and B can be partitioned as

A A ! B B !

A= 11 12

; B= 11 12

;

A 21 A22 B 21 B22

then considering each block as an element of the matrix, we can perform addition, scalar multipli-

cation and matrix multiplication in the usual way. Thus,

A11 + B11 A12 + B21

!

A+B =

A21 + B21 A22 + B22

and

A11B11 + A12B21 A11 B12 + A12B22

!

AB = ;

A21B11 + A22B21 A21 B12 + A22B22

assuming that the partioning has been done conformably so that the corresponding matrix mul-

tiplications are possible. The concept of two block partioning can be easily generalized.

Thus, if A = (Aij ) and B = (Bij ) are two block matrices, then C = AB is given by

X

n !

C = (Cij ) = Aik Bkj ;

k=1

where each Aik ; Bkj , and Cij is a block matrix.

A block diagonal matrix is a diagonal matrix where each diagonal element is a square matrix.

That is,

A = diag(A11; : : :; Ann);

where Aii are square matrices.

The Determinant of a Matrix

For every square matrix A, there is a unique number associated with the matrix called the

determinant of A, which is denoted by det(A). For a 2 2 matrix A, det(A) = a11a22 ; a12a21;

for a 3 3 matrix A = (aij ), det(A) = a11 det(A11) ; a12 det(A12) + a13 det(A13), where A1i

is a 2 2 submatrix obtained by eliminating the rst row and the ith column. This can be easily

generalized. For an n n matrix A = (aij ) we have

det(A) = (;1)i+1ai1 det(Ai1) + (;1)i+2 ai2 det(Ai2)

+ + (;1)i+nain det(Ain);

where Aij is the submatrix of A of order (n ; 1) obtained by eliminating the ith row and j th column.

11

Example 1.3.2

01 2 31

B 4 5 6 CC :

A=B

@ A

7 8 9

Set i = 1. Then

5 6

! 4 6

! 4 5

!

det(A) = 1 det ; 2 det + 3 det

8 9 7 9 7 8

= 1(;3) ; 2(;6) + 3(;3) = 0:

Theorem 1.3.1 The following simple properties of det(A) hold:

1. det(A) = det(AT )

2. det(A) = n det(A), where is a scalar.

3. det(AB ) = det(A) det(B ).

4. If two rows or two columns of A are identical, then det(A) = 0.

5. If B is a matrix obtained from A by interchanging two rows or two columns, then det(B ) =

; det(A).

6. The determinant of a triangular matrix is the product of its diagonal entries.

(A square matrix A is triangular if its elements below or above the diagonal are all zero.)

The Characteristic Polynomial, the Eigenvalues and Eigenvectors of a Matrix

Let A be an n n matrix. Then the polynomial pn () = det(I ; A) is called the characteristic

polynomial. The zeros of the characteristic polynomial are called the eigenvalues of A. Note

that this is equivalent to the following: is an eigenvalue of A i there exists a nonzero vector x

such that Ax = x. The vector x is called a right eigenvector (or just an eigenvector), and the

vector y satisfying y A = y is called a left eigenvector associated with .

Denition 1.3.1 An n n matrix A having fewer than n linearly independent eigenvectors is

called a defective matrix .

Example 1.3.3

12

1 2

!

The matrix A =

0 1 " 1 # " ;1 #

is defective. The two eigenvectors and are linearly independent.

0 0

The Determinant of a Block Matrix

A !

Let

A

A= ;

11 12

0 A 22

11 22 11 22

For every m n matrix A, there are two important associated subspaces: the Range of A, denoted

by R(A), and the Null Space of A, denoted by N (A):

R(A) = fb 2 Rm j b = Ax for some x 2 Rng

N (A) = fx 2 Rn j Ax = 0g:

Let S be a subspace of Rm . Then the subspace S ? dened by

S ? = fy 2 Rm j y T x = 0 for all x 2 S g

is called the orthogonal complement of S . It can be shown (Exercise) that

(ii) R(A)? = N (AT ).

Let A be an m n matrix. Then the subspace spanned by the row vectors of A is called the row

space of A. The subspace spanned by the columns of A is called the column space of A.

The rank of a matrix A is the dimension of the column space of A. It is denoted by rank(A).

A square matrix A 2 Rnn is called nonsingular if rank(A) = n. Otherwise it is singular.

13

An n n matrix A 2 Rnn is said to have full column rank if its columns are linearly

independent. The full row rank is similarly dened. A matrix A is said to have full rank if it

has either full row rank or full column rank. If A does not have full rank, it is rank decient.

Example 1.3.4

01 21

B CC

A=B

@3 4A

5 6

has full rank; rank(A) = 2 (it has full column rank); null(A) = 0.

Example 1.3.5

01 21

B 2 4 CC

A=B

@ A

0 0

is rank decient; rank(A) = 1; Null (A) = 1.

Let A be an m n matrix. Then

1. rank(A) = rank(AT ).

2. rank(A) + null(A) = n.

3. rank(AB ) rank(A) + rank(B ) ; n, where B is n p.

4. rank(BA) = rank(A) = rank(AC ), where B and C are nonsingular matrices

of order m.

5. rank(AB ) minfrank(A); rank(B )g.

Let A be an n n matrix. Then a matrix B such that

AB = BA = I;

where I is the n n identity matrix, is called the inverse of A. The inverse of A is denoted by

A;1 . The inverse is unique.

An interesting property of the inverse of the product of two matrices is:

14

(AB );1 = B ;1 A;1:

1. A is nonsingular.

2. det(A) is nonzero.

3. rank(A) = rank(AT ) = n.

4. N (A) = f0g.

5. A;1 exists.

6. A has linearly independent rows and columns.

7. The eigenvalues of A are nonzero.

Two matrices A and B are called similar if there exists a nonsingular matrix T such that

T ; AT = B:

1

An important property of similar matrices. Two similar matrices have the same

eigenvalues. (for a proof, see Chapter 8, section 8.2)

1.3.5 Orthogonality and Projections

A set of vectors fv ; : : :; vmg in Rn is orthogonal if

1

viT vj = 0; i 6= j:

If, in addition, viT vi = 1, for each i, then they are called orthonormal.

A basis for a subspace that is also orthonormal is called an orthonormal basis for the subspace.

Example 1.3.6

15

0 0 01

B C

@ 1 CA :

A=B 1

2

1

1

0 0 1 2

B ;p

The vector B

C

C

@ 1

5 A forms an orthonormal basis for R(A). (See section 5.6.1)

;p 1

5

Example 1.3.7

01 21

B 0 1 CC

A=B

@ A

1 0

0 ;p ;p 1 1 1

B 0 p; C 2 3

The matrix V = B

1 @ C

A forms an orthonormal basis for R(A). (See section 5.6.1)

1

3

;1

p p13

2

Orthogonal Projection

Let S be a subspace of Rn. Then an n n matrix P having the properties:

(i) R(P ) = S

(ii) P T = P (P is symmetric)

(iii) P 2 = P (P is idempotent)

is called the orthogonal projection onto S or simply the projection matrix. We denote the

orthogonal projection P onto S by PS . The orthogonal projection onto a subspace is unique.

PS = V V T

is the unique orthogonal projection onto S . Note that V is not unique, but PS

is.

If PS is the orthogonal projection onto S , then I ; PS , where I is the identity matrix of the

same order as PS , is the orthogonal projection onto S ? . (Exercise : 14(a))

16

The Orthogonal Projections onto R(A) and N (AT )

When the subspace S is R(A) or N (AT ) associated with the matrix A, we will denote the unique

orthogonal projections onto R(A) and N (AT ) by PA and PN , respectively.

It can be shown (exercise 14(b)) that if A is m n(m n) and has full rank,

then

PA = A(AT A); AT 1

PN = I ; A(AT A); AT : 1

Example 1.3.8

01 21

B CC

A=B @0 1A

1 0

! 0 1

AT A =

2 2

; (AT A); = @

; A

1

5

6

1

3

2 5 ; 1 1

0 1 3 3

B CC

5 1 1

PA = B

6 3 6

B@ 1

3

; C A

1

3

1

3

; ;

1

6

1

3

1

6

1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix

Any vector b can be written as

b = bS + bS? ;

where bS 2 S and bS ? 2 S ? . Let S be the R(A) of a matrix A. Then bS 2 R(A) and bS ? 2 N (AT ).

We will therefore denote bS by bR and bS ? by bN , meaning that bR is in the range of A and bN is

in the null space of AT .

bR = PA b and bN = PN b:

17

bR and bN are called the orthogonal projection of b onto R(A) and the orthogonal projection

of b onto N (AT ), respectively.

From above, we easily see that

bTR bN = 0:

Example 1.3.9

0 0 01 011

B C B C

A=B

@ 1 CA ; b = B@ 1 CA

1

2

1

1 1

0 1

2

B 0 C

B@ ; p CCA

V = an orthonormal basis = B 1

2

;p 1

2

00 0 01

B C

PA = V V T = B

@0 C

A 1

2

1

2

0 1 1

00 0 01011 001

2 2

B0

bR = PAb = B

CC BB 1 CC = BB 1 CC

@ A@ A @ A

1

2

1

2

0 1 1

1 1

01 0 0 1

2 2

B C

PN = (I ; PA) = B

@0 ; C

A 1

2

1

2

0 ; 1 1

011

2 2

B C

bN = PN b = B

@0CA

0

Note that b = bR + bN .

1. Diagonal Matrix { A square matrix A = (aij ) is a diagonal matrix if aij = 0 for i 6= j .

We write A = diag(a11; : : :; ann).

2. Triangular Matrix { A square matrix A = (aij ) is an upper triangular matrix if aij = 0 for

i > j.

18

The transpose of an upper triangular matrix is lower triangular; that is, A = (aij ) is lower

triangular if aij = 0 for i > j .

0 1 0 1

BB 0 CC BB CC

@ A 0

@ A

LOWER TRIANGULAR UPPER TRIANGULAR

The following properties of triangular matrices are useful.

1. The product of two upper (lower) triangular matrices is an upper (lower) triangular matrix.

The diagonal entries of the product matrix are just the products of the diagonal entries of

the individual matrices. (Exercise 19(a) ).

2. The inverse of a nonsingular upper (lower) triangular matrix is an upper (lower) triangular

matrix. The diagonal entries of the inverse are the reciprocals of the diagonal entries of the

original matrix. (Exercise 19(b)).

3. The eigenvalues of a triangular matrix are its diagonal entries (Exercise 19(d)).

4. The determinant of a triangular matrix is the product of its diagonal entries. (Exercise

(19(c)).

Thus, a triangular matrix is nonsingular i all of its diagonal entries are nonzero.

3. Unitary (Orthogonal) Matrix | A square complex matrix U is unitary if

U U = UU = I;

where U = (U )T ; U is the complex conjugate of U .

If U is real, then U is orthogonal if

U T U = UU T = I:

Orthogonal matrices play a very important role in numerical matrix computations.

19

The following two important properties of orthogonal mattices make them so at-

tractive for numerical computation :

1. The inverse of an orthogonal matrix is just its transpose O;1 = OT

2. The product of two orthogonal matrices is an orthogonal matrix.

there is exactly one nonzero entry in each row and column which is 1 and the rest are all zero.

Thus, if (1 ; : : :; n) is a permutation of (1; 2; : : :; n), then

P = ( e1 ... e ) ; n

where ei is the ith row of the n n identity matrix I , is a permutation matrix. Similarly,

P = (e1 ; e2 ; : : :; e );

n

Example 1.4.1

00 1 01 01 0 01 01 0 01

B C B C B C

P =B

1 @ 0 0 1 CA ; P = B@ 0 1 0 CA ; P = B@ 0 0 1 CA

2 3

1 0 0 0 0 1 0 1 0

are all permutation matrices.

Eects of Pre-multiplication and Post-multiplication by a permutation matrix.

0e 1

BB .. C 1

If P = @ . C

1 A, then

e n

0 1

BB th row of A

1

CC

P A=B

B th row of A

2 CC :

1 BB ..

.

CC

@ A

nth row of A

Similarly, if P = (e1 e1 e ), where e is the ith column of A, then

2 n i

2 1 2

20

Example 1.4.2

0a a a 1 00 1 01 0e 1

B 31 12 13

C B C B 2

CC

1. A = B

@a a 21 22 a 23

C

A; P = B

@ 0 0 1 CA = B@ e

1 3 A

a13 a 23 a 33 1 0 0 e1

0a a a23 1 0 2nd row of A 1

B 21 22

C B C

P A=B

1 @a 31 a 32 a33 C

A=B

@ 3rd row of A CA

a 11 a 12 a13 1st row of A

00 1 01

B 0 0 1 CC = (e ; e ; e )

2. P = B

1 @ A 3 1 2

1 0 0

0a a a 1

Ba

AP = B

13

a

11

a

12

CC = (3rd column of A, 1st column of A, 2nd column of A)

@1 23 21 22 A

a 33 a 31 a 32

Thus:

tation matrix.

2. The product of two permutation matrices is a permutation matrix, and there-

fore is orthogonal.

for i > j + 1. The transpose of an upper Hessenberg matrix is a lower Hessenberg matrix, that is,

a square matrix A = (aij ) is a lower Hessenberg matrix if aij = 0 for j > i + 1. A square matrix A

that is both upper and lower Hessenberg is tridiagonal.

0 0

1 0 1

BB .. ... C CC BB C

C

BB . C BB C

. . . ... ... C

B@ C A B@ CA

0

LOWER HESSENBERG UPPER HESSENBERG

21

An upper Hessenberg matrix A = (aij ) is unreduced if

ai;i; 6= 0 for i = 2; 3; : : :; n

1

ai;i 6= 0 for i = 1; 2; : : :; n ; 1

+1

Example 1.4.3

01 2 01

A=B

B2 3 4C

C

@ A is an unreduced lower Hessenberg matrix.

1 1 1

01 1 1

1

B C

A=B

@1 1 1CA is an unreduced upper Hessenberg matrix.

0 2 3

Some Useful Properties

1. Every square matrix A can be transformed to an upper (or lower) Hessenberg matrix by

means of an unitary similarity, that is, given a complex matrix A, there exists a unitary

matrix U such that

UAU = H

where H is a Hessenberg matrix.

Proof. (A constructive proof in the case where A is real is given in Chapter 5.)

2. If A is symmetric (or complex Hermitian), then the transformed Hessenberg matrix as ob-

tained in 1 is tridiagonal.

3. An arbitrary Hessenberg matrix can always be partitioned into diagonal blocks such that each

diagonal block is an unreduced Hessenberg matrix.

Example 1.4.4

01 2 3 41

BB 2 1 1 1 CC

A = BBB CC :

CA

@ 0 0 1 1

0 0 1! 1

A A

A = 1 2

:

0 A3

22

Note that

1 2

!

A1 =

2 1

and

1 1

!

A3 =

1 1

are unreduced Hessenberg matrices.

Companion Matrix | A normalized upper Hessenberg matrix of the form

00 0 a 1

BB 1 0 a CC 1

BB CC

2

C = B0 1 C

B

BB .. . . . . . . . . . .. CCC

@. .A

0 0 0 1 an

is called an upper companion matrix. The transpose of an upper companion matrix is a lower

companion matrix.

The characteristic polynomial of a companion matrix can be easily written down.

det (C ; I ) = det (C T ; I )

= (;1)n(n ; ann;1 ; an;1 n;2 ; ; a2 ; a1 ):

matrix; that is, A is nonderogatory if there exists a nonsingular T such that TAT ;1 = a companion

matrix.

There are, of course, other equivalent characterizations of a nonderogatory matrix. For example,

a matrix A is nonderogatory i there exists a vector b such that

rank(b; Ab; : : :; An;1b) = n:

The matrix (b; Ab; : : :; An;1b) = n is called the controllability matrix in control theory. If the

rank condition is satised, then the pair (A; b) is called controllable.

true.

Example 1.4.5

23

1 2

!

A=

0 3

is an upper Hessenberg matrix with (2; 1) entry equal to zero, but A is nonderogatory.

1

!

Pick b = :

2

1 5

!

Then (b; Ab) = is nonsingular.

2 6

A matrix that is not nonderogatory is called derogatory. A derogatory matrix is similar to a

direct sum of a number of companion matrices

0C 1

BB 1

0 CC

BB .

CC

BB .. CC

BB CC

@ 0 A

Ck

where each Ci is a companion matrix, k > 1 and the characteristic polynomial of each Ci divides

the characteristic polynomial of all the preceding Ci's. The above form is also known as Frobenius

Canonical Form.

7. Diagonally Dominant Matrix | A matrix A = (aij ) is row diagonally dominant if

X

jaiij > jaij j for all i

j 6=i

A column diagonally dominant matrix is similarly dened. The matrix

0 10 1 1 1

B C

A=B @ 1 10 1 CA

1 1 10

is both row and column diagonally dominant.

Note : Sometimes in the literature of Linear algebra, a matrix A having the above properties

is called a strictly diagonally matrix.

8. Positive Denite Matrix | A symmetric matrix A is positive denite if for every nonzero

vector x,

xT Ax > 0

X n

Let x = (x ; x ; : : :; xn) . Then x Ax =

1 2

T T aij xixj is called the quadratic form associated

i;j=1

with A.

24

Example 1.4.6

2 1

!

A=

1 !5

x

x= 1

x 2 ! !

2 1 x

x Ax = (x x )

T 1

1

1 5 x !2

2

x

= (2x + x x + 5x ) 1

1

x 2 1 2

2

= 2x + 2x x + 5x

2

1 1 2

2

2

= 2(x + 2x x + x ) + x

2

1 1 2

1

4

2

2

9

2

2

2

= 2(x + x ) + x > 0

1

1

2 2

2 9

2

2

2

A positive semidenite matrix is similarly dened. A symmetric matrix A is positive semidef-

inite if xT Ax 0 for every x.

A commonly used notation for a symmetric positive denite (positive semidenite

matrix) is A > 0( 0).

Some Characterizations and Properties of Positive Denite Matrices

Here are some useful characterizations of positive denite matrices:

1. A matrix A is positive denite if and only if all its eigenvalues are positive. Note that in the

above example the eigenvalues are 1.6972 and 5.3028.

2. A matrix A is positive denite i all its leading principal minors are positive.

There are n leading principal minors of an n n matrix A. The ith leading principal minor,

denoted by " 1 2 i !#

det A ;

1 2 i

is the determinant

0 10of the submatrix

1 of A formed out of the rst i rows and i columns.

1 1

Example: A = BB@ 1 10 1 CCA. Thus, in the above example,

1 1 10

The rst leading principal minor = 10; !

10 1

The second leading principal minor = det A = 99;

1 10

The third leading principal minor = det A = 972.

25

3. A symmetric diagonally dominant matrix is positive denite. Note that the matrix A, in the

example above, is diagonally dominant.

4. If A = (aij ) is positive denite, then aii > 0 for all i.

5. If A = (aij ) is positive denite, then the largest element (in magnitude) of the whole matrix

must lie on the diagonal.

6. The sum of two positive denite matrices is positive denite.

Remarks: Note that (4) and (5) are only necessary conditions for a symmetric matrix to be

positive denite. They can serve only as initial tests for positive deniteness. For example, the

matrices 04 1 1 11

BB 1 0 1 2 CC 0 20 12 25 1

A=B BB CC ; B = BB 12 15 2 CC

@ A

@ 1 1 2 3 CA 25 2 5

1 2 3 4

cannot be positive denite, since in the matrix A, there is a zero entry on the diagonal and in B ,

the largest entry 25 is not on the diagonal.

A square matrix A satises its own characteristic equation; that is, if A = (aij ) is an n n matrix

and Pn () is the characteristic polynomial of A, then

Pn(A) is a ZERO matrix.

Proof. (see Matrix Theory by Franklin, pp. 113-114).

26

Example 1.5.1

Let

0 1

!

A = :

1 2

P () =

2 2 ; 2!; 1: ! 1 0!

1 2 0 1

P (A) = ;2 ;

2

2 5! 1 2 0 1

0 0

= :

0 0

1.6 Singular Values

Let A by m n. Then the eigenvalues of the n n hermitian matrix A A are real and non-negative.

Let these eigenvalues be denoted by i2 where 12 22 n2 . Then 1 ; 2; : : :; n are called

the singular values of A. Every m n matrix A can be decomposed into

A = U V T ;

where Umm and Vnn are unitary and is an m n \diagonal" matrix. This decomposition is

called the Singular Value Decomposition or SVD. The singular values i ; i = 1; : : :; n are the

diagonal entries of . The number of nonzero singular values is equal to the rank of the

matrix A. The singular values of A are the nonnegative square roots of the eigenvalues

of AT A (see Chapter 10, section 10.3)

Example 1.6.1

Let

0 1

!

A = :

2 2!

1 2

AT A = :

2 8

h p i

The eigenvalues of AT A are : 92 65

q p

1 = q[ 9+2 65 ]

p

2 = [ 9;2 65 ]

27

1.7 Vector and Matrix Norms

1.7.1 Vector Norms

Let 0x 1

BB x 1

CC

x=BBB .. 2 CC

CA

@.

xn

be an n-vector and V be a vector space. Then, a vector norm, denoted by the symbol kxk, is a

real-valued continuous function of the components x1; x2; : : :; xn of x, dened on V , that has the

following properties:

1. kxk > 0 for every non-zero x. kxk = 0 i x is the zero vector.

2. kxk = jjkxk for all x on V and for all scalars .

3. kx + y k kxk + ky k for all x and y in V .

The property (3) is known as the Triangle Inequality.

Note:

k ; xk = kxk

kxk ; kyk k(x ; y)k:

It is simple to verify that the following are vector norms.

(a) kxk1 = jx1 j + jx2j + jxnj (sum norm or one norm)

p

(b) kxk2 = x21 + x22 + x2n (Euclidean norm or two norm)

(c) kxk1 = maxi jxi j (innity norm or maximum norm)

In general, if p is a real number greater than or equal to 1, the p-norm or Holder norm is

dened by

kxkp = (jx1jp + + jxnjp) 1 :

p

28

Example 1.7.1

Let x = (1; 1; ;2). Then

kxk = 4

1

q p

kxk = 1 + 1 + (;2) = 6

2

2 2 2

kxk1 = 2

An important property of the Holder-norm is the Holder inequality

kxT yk kxkp kykq ;

where

1 1

p + q = 1:

A special case of the Holder-inequality is the Cauchy-Schwartz inequality

jxT yj kxk kyk ; 2 2

that is, v vn

X

n u

u n u

X uX

j xj yj j t xj t yj : 2 2

j =1 j =1 j =1

29

Equivalent Property of the Vector-norms

All vector norms are equivalent in the sense that there exist positive constants and such

that

kxk kxk kxk

for all x.

For the 2, 1, or 1 norms, we can compute and easily:

kxk kxk pnkxk

2 1 2

2

1

Let A be an m n matrix. Then, analogous to the vector norm, we dene a matrix norm kAk with

the following properties:

1. kAk > 0; kAk = 0 i A is the zero matrix

2. kAk = jjkAk for any scalar

3. kA + B k kAk + kB k

4. kAB k kAkkB k

for all A and B .

Subordinate Matrix Norms

Given a matrix A and a vector norm k k, a non-negative number dened by:

kAkp = max kAxkp

x6 =0kxk p

satises all the properties of a matrix norm. This norm is called the matrix norm subordinate to

the vector norm.

A very useful and frequently used property of a subordinate matrix norm (we shall sometimes

call it the p-norm of a matrix A) is

kAxkp kAkpkxkp:

This property easily follows from the denition of p-norms. Note:

kAkp kkAx kp

xkp

for any particular nonzero vector x. Multiplying both sides by kxkp gives the original inequality.

30

The two easily computable p-norms are:

X m

kAk = max

j n i

1

1

jaij j

=1

X n

kAk1 = max

im

jaij j

1

j =1

(maximum row-sum norm)

Example 1.7.2

0 1

B 1 ; 2 C

A=BB@ 3 4 CCA

;5 6

kAk = 12 1

kAk1 = 11

Another useful p-norm is the spectral norm:

Denition 1.7.1 : q

kAk = maximum eigenvalue of AT A

2

Example 1.7.3

2 5

!

A =

1 3 !

5 13

AT A =

13 34

The eigenvalues of AT A are 0.0257 and 38.9743.

p

kAk = 38:9743 = 6:2429

2

31

The Frobenius Norm

An important matrix norm compatible with the vector norm kxk2 is the Frobenius norm:

2n m 3 1

X X 2

kAkF = 4 jaij j 5 2

j =1 i=1

A matrix norm kkM and a vector norm kkv are compatible if

kAxkv kAkM kxkv :

Example 1.7.4

1 2

!

A=

3 4

p

kAkF = 30:

Notes:

1. For the identity matrix I ,

p

kI kF = n;

whereas kI k1 = kI k2 = kI k1 = 1.

2. kAk2F = trace(AT A), where trace(A) is dened as the sum of the diagonal

entries of A, that is, if A = (aij ), then trace(A) = a11 + a22 + : : : + ann.

As in the case of vector norms, the matrix norms are also related. There exist scalars and such

that:

kAk kAk kAk :

In particular, the following inequalities relating various matrix norms are true and are used very

frequently in practice.

Theorem 1.7.1 Let A be m n.

p

(1) pn kAk1 kAk mkAk1 .

1

2

p

(2) kAk kAk nkAk .

2 F 2

32

p

(3) p1m kAk1 kAk2 nkAk1.

p

(4) kAk2 kAk1kAk1 .

We prove here inequalities (1) and (2) and leave the rest as exercises.

Proof of (1)

By denition:

kAk1 = max kAxk1 :

x6 kxk =0 1

Again, from the equivalence property of the vector-norms, we have:

kAxk1 kAxk 2

and

kxk pnkxk1:

2

1 n

kxk1 kxk2 :

It therefore follows that

kAxk1 pn kAxk 2

kxk1 kxk 2

or

p

kAxk1 n max kAxk2 = nkAk ; p

max

x6=0 kxk1 6 0 kxk2

x= 2

i.e.,

p1n kAk1 kAk : 2

The rst part is proved. To prove the second part, we again use the denition of kAk2 and the

appropriate equivalence property of the vector-norms.

kAk = max kAxk ; 2

2

p

x6 kxk =0 2

kAxk mkAxk1;

2

kxk1 kxk : 2

Thus,

kAxk pm kAxk1 : 2

kxk kxk1 2

2

x6 kxk

=0 x6 kxk1

=0

1

2

2

33

We prove (2) using a dierent technique. Recall that

kAkF = trace(AT A):

2

OT (AT A)O = D = diag(d ; : : :; dn):

1

Now, the trace is invariant under similarity transformation (Exercise). We then have

trace(AT A) = trace(D) = d1 + + dn:

Let dk = maxi (di). Then, since d1; : : :; dn are also the eigenvalues of AT A, we have

kAk = dk:

2

2

Thus,

kAkF = trace(AT A) = d + + dn dk = kAk :

2

1

2

2

kAkF = d + + dn dk + dk + + dk = ndk

2

1

2 2

p

That is, kAkF ndk = nkAk . So, kAkF nkAk .

2 2

A sequence of vectors v (1); v (2); : : : is said to converge to the vector v if

Limit

k!1 i

v k = vi ; i = 1; : : :; n:

( )

Limit a k = aij ; i; j = 1; 2; : : :; n:

k!1 ij

( )

Limit

k!1

A k = A:( )

We now state, without proof, necessary and sucient conditions for the convergence of vector

and matrix sequences. The proofs can be easily worked out.

34

Theorem 1.7.2 The sequence v ; v ; : : : converges to v if and only if for any vector norm

(1) (2)

Limit

k!1

kv k ; vk = 0:

( )

Theorem 1.7.3 The sequence of matrices A ; A ; : : : converges to the matrix A if and only if

(1) (2)

Limit

k!1

kA k ; Ak = 0:

( )

We now state and prove a result on the convergence of the sequence of powers of a matrix to

the zero matrix.

Theorem 1.7.4 The sequence A; A ; : : : of the powers of the matrix A converges to the zero matrix

2

Proof. For every n n matrix A, there exists a nonsingular matrix T such that

0J 0

1

T ; AT = J = B

B . . . CC ; 1

1

@ A

0 Jr

where each Ji has the form 0 1 1

i

BB 1 0 CC

BB i

... ...

CC

Ji = B BB CC :

B@ 0 . .. 1 C CA

i

The above form is called the Jordan Canonical Form of A, and the diagonal block matrices are

called Jordan matrices. It is an easy computation to see that

0 k k; ;k k; 1

B i ki 1

i 2

CC

B

B 0 i k kki ;

2

CC

B

1

B .. . .

Jik = B ... ... .. C

. CC ;

B . .

B

B .. ... CC

B

@ . i ki C

k k ;

A

1

0 0 0 ki

from where we see that Jik ! 0 if and only if jij < 1. This means that Limit

k!1

Ak = 0 i jij < 1

for each i.

Denition 1.7.2 A matrix A is called a convergent matrix if Ak ! 0 as k ! 1.

35

We now prove a sucient condition for a matrix A to be a convergent matrix in terms of a

norm of the matrix A. We rst prove the following result.

A Relationship between Norms and Eigenvalues

Theorem 1.7.5 Let be an eigenvalue of a matrix A. Then for any subordinate matrix norm,

jj < kAk:

Proof. By denition, there exists a nonzero vector x such that

Ax = x:

Taking the norm of each side, we have

kAxk = kxk = jj kxk:

However, kAxk kAk kxk, so jj kxk = kAxk kAkkxk, giving jj kAk.

Denition 1.7.3 The quantity (A) dened by

(A) = max jij

is called the spectral radius of A.

(A) kAk:

matrix norm.

36

Convergence of an Innite Matrix Series

Theorem 1.7.6 The matrix series

I +A+ A + 2

Proof. For the series to converge, Ak must approach a zero matrix when k approaches innity.

Thus, the necessity is obvious.

Next, let A be a convergent matrix; that is, Ak ! 0 as k ! 1. Then from Theorem 1.7.4, we

must have jij < 1 for each eigenvalue i of A. This means that the matrix (I ; A) is nonsingular,

since the eigenvalue of I ; A are i ; 1 ; 1 ; 2 ; : : :; 1 ; n, and a matrix A is nonsingular if and only

if its eigenvalues are nonzero. This is because the eigenvalues of (I ; A) are 1 ; 1 ; 1 ; 2; : : :; 1 ; n,

and jij < 1 implies that none of them is zero. Thus from the identity

(I ; A)(I + A + A2 + + Ak ) = I ; Ak+1 ;

we have

(I + A + A2 + + Ak ) = (I ; A);1 ; (I ; A);1Ak+1 :

Since A is a convergent matrix,

Ak ! 0 as k ! 1:

+1

While analyzing the errors in an algorithm, we sometimes need to know, given a nonsingular matrix

A, how much it can be perturbed so that the perturbed matrix A + E is nonsingular, and how to

estimate the error in the inverse of the perturbed matrix.

We start with the identity matrix. In the following, k k is a matrix norm for which kI k = 1.

Theorem 1.7.7 Let kE k < 1, then (I ; E ) is nonsingular and

k(I ; E ); k (1 ; kE k); :

1 1

1

37

Since kE k < 1; jij < 1 for each i. Thus, none of the quantities 1 ; 1 ; 1 ; 2 ; : : :; 1 ; n is

zero. This proves that I ; E is nonsingular. (Note that a matrix A is nonsingular i all its

eigenvalues are nonzero.)

To prove the second part, we write

(I ; E );1 = I + E + E 2 +

Since kE k < 1,

Limit

k!1

E k = 0:

Thus, the series on the right side is convergent. Taking the norm on both sides, we have

k(I ; E ); k kI k + kE k + kE k = (1 ; kE k); (since kI k = 1):

1 2 1

1 ;

1;x

i jxj < 1.)

Theorem 1.7.8 If kE k < 1 then

k(I ; E ); ; I k 1 ;kEkkE k

1

A; ; B; = A; (B ; A)B; :

1 1 1 1

B=I

and A = I ; E ; then we have

(I ; E );1 ; I = (I ; E );1E:

(Note that I ;1 = I .) Taking the norm on both sides yields

k(I ; E ); ; I k kI ; E k; kE k:

1 1

kI ; E k; (1 ; kE k);

1 1

38

Implication of the Result

If the matrix E is very small, then 1 ; kE k is close to unity. Thus the above result

implies that if we invert a slightly perturbed identity matrix, then the error in the

inverse of the perturbed matrix does not exceed the order of the perturbation.

Theorem 1.7.9 Let A be nonsingular and let kA; E k < 1. Then A ; E is nonsingular, and

1

kA; ; (A ; E ); k kA; E k

1 1 1

kA; k 1 ; kA; E k

1 1

A ; E = A(I ; A; E ): 1

Since kA; E k < 1, from Theorem 1.7.7, we have I ; A; E is nonsingular. Thus, A ; E , which is

1 1

A; ; B; = A; (B ; A)B; :

1 1 1 1

A; ; (A ; E ); = ;A; E (A ; E ); :

1 1 1 1

kA; ; (A ; E ); k kA; E kk(A ; E ); k:

1 1 1 1

Now, since

B = A ; (A ; B )

= A[I ; A; (A ; B )]; 1

B; = [I ; A; (A ; B)]; A; :

1 1 1 1

(Note that (XY );1 = Y ;1X ;1.) If we now substitute B = A ; E , we then have

(A ; E );1 = [I ; A;1 E ];1A;1 :

Taking norms, we get

k(A ; E ); k kA; k kI ; A; E k; :

1 1 1 1

39

But from Theorem 1.7.7, we know that

kI ; A; E k; (1 ; kA; E k); :

1 1 1 1

So, we have

k(A ; E ); k 1 ;kkAA;kE k :

; 1

1

1

kA; ; (A ; E ); k kA1 ;EkkA;kAE k k ;

; ; 1 1

1 1

1

or kA ;kA(A;1;k E ) k 1 ;kAkA;E1kE k .

;1 ;1 ;1

The above result states that if kA;1 E k is small and is much less than unity, then

the relative error in (A ; E );1 is bounded by kA;1E k.

We conclude the chapter by listing some very useful norm properties of orthogonal matrices that

are often used in practice.

Theorem 1.8.1 Let O be an orthogonal matrix. Then

kOk = 1: 2

Proof. By denition,

q

kOk = (OT O)

2

q

= (I ) = 1:

Theorem 1.8.2

kAOk = kAk 2 2

40

Proof.

q

kAOk = (OT AT AO)

2

q

= (AT A) = kAk 2

(Note that the spectral radius is invariant under similarity transformation.) (See Chap-

ter 8.)

Theorem 1.8.3

kAOkF = kAkF

Proof. kAOkF = trace(OT AT AO) = trace(AT A) = kAkF .

2 2

The very basic concepts that will be required for smooth reading of the rest of the book have been

brie y summarized in this chapter. The most important ones are:

1. Special Matrices: Diagonal, triangular, orthogonal, permutation, Hessenberg, tridiagonal,

diagonally dominant, and positive denite matrices have been dened and important proper-

ties discussed.

2. Vector and Matrix Norms: Some important matrix norms are: row-sum norm, column-

sum norm, Frobenius-norm, and spectral norm.

A result on the relationship between dierent matrix norms is stated and proved in Theorem

1.7.1.

Of special importance is the norm-property of orthogonal matrices. Three simple but impor-

tant results have been stated and proved in section 1.8 (Theorems 1.8.1, 1.8.2, and 1.8.3).

These results say that

(i) the spectral norm of an orthogonal matrix is 1, and

(ii) the spectral and the Frobenius norms remain invariant under matrix multiplication.

3. Convergence of a Matrix Sequence. The notion of the convergence of the sequence of

matrix powers fAk g is important in the study of convergence of iterative methods for linear

systems.

The most important results in this context are:

41

(i) The sequence fAkg converges to the \zero" matrix if and only if jij < 1 for

each eigenvalue of i of A (Theorem 1.7.4).

(ii) The sequence fAkg converges to a zero matrix if kAk < 1. (Corollary to

Theorem 1.7.5).

4. Norms and Inverses. If a nonsingular matrix A is perturbed by a matrix E , it is sometimes

of interest to know if the perturbed matrix A + E remains nonsingular and how to estimate

the error in the inverse of A + E .

Three theorems (Theorems 1.7.7, 1.7.8, and 1.7.9) are proved in this context in Section 1.7.4.

These results will play an important role in the perturbation analysis of linear systems (Chap-

ter 6).

The material covered in this chapter can be found in any standard book on Linear Algebra and

Matrix Theory. In particular, we suggest the following books for further reading:

1. Matrix Theory by Joel N. Franklin, Prentice Hall, Englewood Clis, NJ, 1968.

2. Linear Algebra With Applications by Steven J. Leon, McMillan, New York, 1986.

3. Linear Algebra and its Applications, (Second Edition), by Gilbert Strang, Academic

Press, New York, 1980.

4. Introduction to Linear Algebra by Gilbert Strang, Wellesley-Cambridge Press, 1993.

5. The Algebraic Eigenvalue Problem by James H. Wilkinson, Clarendon Press, Oxford,

1965 (Chapter 1).

6. Matrix Analysis by Roger Horn and Charles Johnson, Cambridge University Press, 1985.

7. The Theory of Matrices by Peter Lancaster, Academic Press, New York, 1969.

8. The Theory of Matrices with Applications by Peter Lancaster and M. Tismenetsky,

2nd ed., Academic Press, Dover, New York, 1985.

9. The Theory of Matrices in Numerical Analysis by A. S. Householder, Dover Publi-

cations, Inc., New York, 1964.

10. Matrices and Linear Algebra by Hans Schneider and George Philip Barker, Dover

Publications Inc., New York, 1989.

42

11. Linear Algebra and Its Applications by David C. Lay, Addison-Wesley New York,

1994.

12. Elementary Linear Algebra with Applications by Richard O. Hill, Jr., Harcourt-

Brace-Jovanovich, 1991.

43

Exercises on Chapter 1

PROBLEMS ON SECTIONS 1.2 AND 1.3

1. Prove that

(a) a set of n linearly independent vectors in Rn is a basis for Rn.

(b) the set fe1; e2; : : :; eng is a basis of Rn.

(c) a set of m vectors in Rn, where m > n, is linearly dependent.

(d) any two bases in a vector space V have the same number of vectors.

(e) dim(Rn) = n.

(f) spanfv1; : : :; vng is a subspace of V, where spanfv1 ; : : :; vng is the set of linear combina-

tions of the n vectors v1 ; : : :; vn from a vector space V.

(g) spanfv1; : : :; vng is the smallest subspace of V containing v1 ; : : :; vn.

2. Prove that if S = fs; : : :; sk g is an orthogonal set of nonzero vectors, then S is linearly

independent.

3. Let S be an m-dimensional subspace of Rn. Then prove that S has an orthonormal basis.

(Hint: Let S = fv1; : : :; vng.) Dene a set of vectors fuk g by:

u = kvv k

1

1

0

uk = kvvk0 k ;

1

+1

+1

k +1

where

vk0 = vk ; (vkT u )u ; (vkT u )u ; ; (vkT uk)uk ;

+1 +1 +1 1 1 +1 2 2 +1 k = 1; 2; : : :; m ; 1:

Then show that fu1 ; : : :; um g is an orthonormal basis of S . This is the classical Gram-

Schmidt process.

4. Using the Gram-Schmidt process construct an orthonormal basis of R3 .

5. Construct an orthonormal basis of R(A), where

01 21

B 2 3 CC :

A=B

@ A

4 5

44

6. Let S1 and S2 be two subspaces of Rn . Then prove that

dim(S1 + S2 ) = dim(S1) + dim(S2) ; dim(S1 \ S2):

8. Prove that

(a) null(A) = 0 i A has linearly independent columns.

(b) rank(A) = rank(AT ).

(c) rank(A) + null(A) = n.

(d) if A is m n and m < n, then rank(A) m.

(e) if A and B are m n and n p matrices then

rank(AB ) minfrank(A); rank(B )g:

(f) the rank of a matrix remains unchanged when the matrix is multiplied by an invertible

matrix.

(g) if B = UAV , where U and V are invertible, then rank(B ) = rank(A).

(h) N (A) = R(AT )? and R(A)? = N (AT )

9. Let A be m n. Then A has rank 1 i A can be written as A = abT , where a and b are

column vectors.

10. Prove the following basic facts on nonsingularity and the inverse of A:

(a) (A;1);1 = A

(b) (AT );1 = (A;1 )T

(c) (cA);1 = 1c A;1 , where c is a nonzero scalar

(d) (AB );1 = B ;1 A;1

11. Suppose a matrix A can be written as

A = LU;

where L is a lower triangular matrix with 1's along the diagonal and U = (uij ) is an upper

triangular matrix. Prove that

Y

n

det A = uii :

i=1

45

A 0

!

12. Let A = 1

, where A and A are square. Prove that det(A) = det(A ) det(A ).

A A

2 3

1 3 1 3

A = LDLT

where L is lower triangular with 1's as diagonal entries and D = diag(d ; : : :; dnn) is a diag-

11

onal matrix. Prove that the leading principal minors (determinant) of A are d ; d d ; : : :,

11 11 22

d : : :dnn.

11

14. (a) Show that if PS is an orthogonal projection onto S , then I ; PS is the orthogonal

projection onto S? .

(b) Prove that

i. PA = A(AT A);1 AT .

ii. PN = I ; A(AT A);1AT .

iii. kPA k2 = 1

(c) Prove that

i. bR = PA b

ii. bN = PN b

15. (a) Find PA and PN for the matrices

01 21 0 1 1

1

A=B

B 2 3 CC ; A = BB 10;4 0 CC

@ A @ A

0 0 0 10 ; 4

011

B 0 CC, nd b and b for each of the above matrices.

(b) For the vector b = B

@ A R N

1

(c) Find an orthonormal basis for each of the above matrices using the Gram-Schmidt

process and then nd PA ; PN ; bR , and bN . For a description of the Gram-Schmidt

algorithm, see Chapter 7 or problem #3 of this chapter.

16. Let A be an m n matrix with rank r. Consider the Singular Value Decomposition of

A:

A = U V T

= (Ur ; U^r )(Vr ; V^r )T :

Then prove that

46

(a) Vr VrT is the orthogonal projection onto N (A)? = R(AT ).

(b) Ur UrT is the orthogonal projection onto R(A).

(c) Vr (U^ )r (U^r )T is the orthogonal projection onto R(A)? = N (AT ).

(d) V^r (V^r)T is the orthogonal projection onto N (A).

17. (Distance between two subspaces). Let S1 and S2 be two subspaces of Rn such that

dim(S1 ) = dim(S2). Let P1 and P2 be the orthogonal projections onto S1 and S2 respectively.

Then kP1 ; P2k2 is dened to be the distance between S1 and S2. Prove that the distance

between S1 and S2 , dist(S1 ; S2) = sin() where is the angle between S1 and S2 .

18. Prove that if PS is an orthogonal projection onto S , then I ; 2PS is an orthogonal projec-

tion.

19. Prove the following.

(a) The product of two upper (lower) triangular matrices is an upper (lower) triangular

matrix . In general, if A = (aij ) is an upper (lower) triangular matrix , then p(A), where

p(x) is a polynomial, is an upper (lower) triangular matrix whose diagonal elements are

p(aii); i = 1; : : :; n.

(b) The inverse of a lower (upper) triangular matrix is another lower (upper) triangular

matrix, whose diagonal entries are the reciprocals of the diagonal entries of the triangular

matrix.

(c) The determinant of a triangular matrix is the product of its diagonal entries.

(d) The eigenvalues of a triangular matrix are its diagonal entries.

(e) If A 2 Rnn is strictly upper triangular, then An = 0.

(A = (aij ) is strictly upper triangular if A is upper triangular and aii = 0 for each i.)

(f) The inverse of a nonsingular matrix A can be written as a polynomial in A (use the

Cayley-Hamilton Theorem).

20. Prove that the product of an upper Hessenberg matrix and an upper triangular matrix is an

upper Hessenberg matrix.

21. Prove that a symmetric Hessenberg matrix is symmetric tridiagonal.

47

22. A square matrix A = (aij ) is a band matrix of bandwidth 2k + 1 if ji ; j j > k implies that

aij = 0. What are the bandwidths of tridiagonal and pentadiagonal matrices? Is the product

of two banded matrices having the same bandwidth a banded matrix of the same bandwidth?

Give reasons for your answer.

23. (a) Show that the matrix

H = I ; 2 uuuT u ;

T

matrix.

(b) Show that the matrix !

c s

J= ; where c2 + s2 = `

;s c

is orthogonal. (The matrix J is called a Givens matrix.)

(c) Prove that the product of two orthogonal matrices is an orthogonal matrix.

(d) Prove that a triangular matrix that is orthogonal is diagonal.

24. Let A and B be two symmetric matrices.

(a) Prove that (A + B ) is symmetric.

(b) Prove that AB is not necessarily symmetric. Derive a condition under which AB is

symmetric.

(c) If A and B are symmetric positive denite, prove that (A + B ) is positive denite. Is

AB also positive denite? Give reasons for your answer. When is (A ; B) symmetric

positive denite?

25. Let A = (aij ) be an n n symmetric positive denite matrix. Prove the following.

(a) Each diagonal entry of A must be positive.

(b) A is nonsingular.

(c)

(aij )2 < aii ajj for i = 1; : : :; n;

j = 1; : : :; n;

i 6= j

(d) The largest element of the matrix must lie on the diagonal.

26. Let A be a symmetric positive denite matrix and x be a nonzero n-vector. Prove that

A + xxT is positive denite.

48

27. Prove that a diagonally dominant matrix is nonsingular, and a diagonally dominant symmetric

matrix with positive diagonal entries is positive denite.

28. Prove that if the eigenvalues of a matrix A are all distinct, then A is nonderogatory.

29. Prove that a symmetric matrix A is positive denite i A;1 exists and is positive denite.

30. Let A be an m n matrix (m n) having full rank. Then AT A is positive denite.

31. Prove the following basic facts on the eigenvalues and eigenvectors.

(a) A matrix A is nonsingular i A does not have a zero eigenvalue.

(Hint: det A = 12; : : :; n.)

(b) The eigenvalues of AT and A are the same.

(c) If two matrices have the same eigenvalues, they need not be similar (construct an example

to show this).

(d) A symmetric matrix is positive denite i all its eigenvalues are positive.

(e) The eigenvalues of a triangular matrix are its diagonal elements.

(f) The eigenvalues of a unitary (orthogonal) matrix have moduli 1.

(g) The eigenvectors of a symmetric matrix are orthogonal.

(h) Let A be a symmetric matrix and let Q be orthogonal such that QAQT is diagonal.

Then show that the columns of Q are the eigenvectors of A.

32. Let 0c c c c1

BB n1; n0;

1 2 1

0 0C

0

C

BB CC

C=B BB 0. 1 0 0 0C

B@ .. .. . . . . . . .. .. CC

. . .C A

0 0 1 0

Then show that

(a) the matrix V dened by

0 n n n 1

BB n; n; nn; CC

1 2

BB . . . 1

n C

C;

1 1

. ... C

1 2

B

V =B . . . . . CC where i 6= j ;

BB

@ n C

1 2 A

1 1 1

is such that V ;1 CV = diag(i); i = 1; : : :; n.

49

(b) The eigenvector xi corresponding to the eigenvalue i of C is given by xTi = (ni ;1; ni ;2; : : :,

i; 1).

33. Let H be an unreduced upper Hessenberg matrix. Let X = (x1 ; : : :; xn) be dened by

011

BB 0 CC

x1 = e1 = BBB .. CCC ; xi+1 = Hxi; i = 1; 2; : : :; n ; 1:

@.A

0

Then prove that X is nonsingular and X ;1HX is a companion matrix (upper Hessenberg

companion matrix.)

34. What are the singular values of a symmetric matrix? What are the singular values of a

symmetric positive denite matrix? Prove that a square matrix A is nonsingular i it has no

zero singular value.

35. Prove that

(a) trace(AB ) = trace(BA).

X

m X

n

(b) trace(AA ) = jaijj , where A = (aij ) is m n.

2

i=1 j =1

(c) trace(A + B ) = trace(A) + trace(B ).

(d) trace(TAT ;1) = trace(A).

36. Show that kxk1; kxk1 ; kxk2 (as dened in section 1.7 of the book) are vector norms.

37. Show that, if x and y are two vectors, then

kxk ; kyk kx ; yk kxk + kyk

(a) jxT y j kxk2 ky k2 (Cauchy-Schwarz inequality)

(b) kxy T k2 kxk2 ky k2 (Schwarz inequality)

39. Let x and y be two orthogonal vectors, then prove that

kx + yk = kxk + kyk :

2

2

2

2

2

2

50

40. Prove that for any vector x, we have

kxk1 kxk kxk :

2 1

42. Let A = (aij ) be m n. Dene A` = maxij jaij j. Is A` a matrix norm? Give reasons for your

answer.

43. (a) Prove that the vector length is preserved by orthogonal matrix multiplication. That is,

if x 2 Rn and Q 2 Rnn be orthogonal, then kQxk2 = kxk2 (Isometry Lemma).

(b) Is the statement in part (a) true if the one and innity norms are used? Give reasons.

What if the Frobenius norm is used?

44. Prove that kI k2 = 1. Prove that kI kF = n.

p

(a) kQAP kF = kAkF

(b) kQAP k2 = kAk2

46. Prove that the spectral norm of a symmetric matrix is the same as its spectral radius.

47. Let A 2 Rnn and let x; y and z be n-vectors such that Ax = b and Ay = b + z . Then prove

that

kzk kx ; yk kA; k kzk

2 1

kAk 2

2 2 2

48. Prove that kAk2 kAkF (use Cauchy-Schwarz inequality).

49. Prove that kAk2 is just the largest singular value of A. How is kA;1 k2 related with a singular

value of A?

50. Prove that kAT Ak2 = kAk22 .

51. Prove that

kABkF kAkF kBk : 2

51

52. Let A = (a1; : : :; an), where aj is the j th column of A. Then prove that

X

n

kAkF =

2

kaikF :

2

i=1

k(A + E ); ; A; k kE k kA; k k(A + E ); k

1 1 1 1

54. Let A 2 Rmn have full rank, then prove that AyE has also full rank if E is such that

kE k < kAy k , where Ay = (AT A); AT .

2

1 1

01 2 31 01 1 1

1

B 2 3 4 CC ;

A=B

B

A=B

2 3

CC ;

@ A @ 1

2

1

3

1

4 A

4 5 6 1 1 1

01 2 31 00 1 01

3 4 5

A=B

B 0 5 4 CC ; B 0 0 1 CC

A=B

@ A @ A

0 0 1 1 2 3

are not convergent matrices.

56. Construct a simple example where the norm test for convergent matrices fails, but still the

matrix is convergent.

57. Prove that the series (I + A2 + A2 + : : :) converges if kB k < 1, where B = PAP ;1 . What is

the implication of this result? Construct a simple example to see the usefulness of the result

in practical computations. (For details, see Wilkinson AEP, p. 60.)

52

2. FLOATING POINT NUMBERS AND ERRORS IN COMPUTATIONS

2.1 Floating Point Number Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53

2.2 Rounding Errors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 56

2.3 Laws of Floating Point Arithmetic : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59

2.4 Addition of n Floating Point Numbers : : : : : : : : : : : : : : : : : : : : : : : : : : 63

2.5 Multiplication of n Floating Point Numbers : : : : : : : : : : : : : : : : : : : : : : : 65

2.6 Inner Product Computation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66

2.7 Error Bounds for Floating Point Matrix Computations : : : : : : : : : : : : : : : : : 69

2.8 Round-o Errors Due to Cancellation and Recursive Computations : : : : : : : : : : 71

2.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75

2.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77

CHAPTER 2

FLOATING POINT NUMBERS

AND ERRORS IN COMPUTATIONS

2. FLOATING POINT NUMBERS AND ERRORS IN COMPU-

TATIONS

2.1 Floating Point Number Systems

Because of limited storage capacity, a real number may or may not be represented exactly on a

computer. Thus, while using a computer, we have to deal with approximations of the real number

system using nite computer representations. This chapter will be conned to the study of the

arithmetic of such approximate numbers. In particular, we will examine the widely accepted IEEE

standard for binary
oating-point arithmetic (IEEE 1985).

A nonzero normalized
oating point number in base 2 has the form:

(;1)sd1:d2d3 dt 2e

x = :d d dt2e

1 2

or

x = :r2e;

where e is the exponent, r is the signicant, d2 d3 dt is called the fraction, t is the precision,

and (;1)s is the sign. (Note that t is nite.)

d =11

di = 0 or 1; 2 i t

Three parameters specify all numerical values that can be represented. These are: the precision,

t, and L and U , the minimum and maximum exponents. The numbers L and U vary among

computers, even those that adhere to the IEEE standard, since the standard recommends only

minimums. As an example, the standard recommends, for single precision, that t = 24; L = ;126,

and U = 127. The recommendation for double precision is t = 53; L = ;1022, and U = 1023.

Consider the example for a 32-bit word

1 8 23

s e f

Here s is the sign of the number, e is the eld for the exponent, and f is the fraction.

Note that for normalized
oating point numbers in base 2, it is known that d1 = 1 and can thus

be stored implicitly.

53

The actual storage of the exponent is accomplished by storing the true exponent plus an oset,

or bias. The bias is chosen so that e is always nonnegative. The IEEE standard also requires that

the unbiased exponent have two unseen values of L ; 1 and U + 1. L ; 1 is used to encode 0 and

denormalized numbers (i.e., those for which d1 6= 1). U +1 is used to encode 1 and nonnumbers,

such as (+1) + (;1), which are denoted by NaN.

Note that for the single precision example given above, the bias is 127. Thus, if the biased

exponent is 255, then 1 or a NaN is inferred. Likewise if the biased exponent is 0, then 0 or

a denormalized number is inferred. The standard species how to determine the dierent cases

for the special situations. It is not important here to go into such detail. Curious readers should

consult the reference (IEEE 1985).

From the discussion above, one sees that the IEEE standard for single precision provides approx-

imately 7 decimal digits of accuracy, since 2;23 = 1:2 10;7. Similarly, double precision provides

approximately 16 decimal digits of accuracy (2;52 = 2:2 10;16).

There is also an IEEE standard for
oating point numbers which are not necessarily of base 2.

By allowing one to choose the base, , we see that the set of all
oating point numbers, called the

Floating Point Number System, is thus characterized by four parameters:

| The number base.

t | The Precision.

L; U | Lower and upper limits of the exponent.

This set has exactly:

2( ; 1) t;1(U ; L + 1) + 1 numbers in it.

We denote the set of normalized
oating point numbers of precision t by Ft . The set of Ft is

NOT closed under arithmetic operations; that is, the sum, dierence, product, or quotient

of two
oating numbers in Ft is not necessarily a number in Ft. To see this, consider the

simple example in the
oating system with = 10; t = 3; L = ;1; U = 2:

a = 11:2 = :112 10 2

b = 1:13 = :113 10 1

The above example shows that during the course of a computation, a computed number may

very well fall outside of Ft.

There are, of course, two ways a computed number can fall outside the range of Ft; rst, the

exponent of the number may fall outside the interval [L; U ]; second, the fractional part may contain

more than t digits (this is exactly what happened in the above example).

54

If the computations produce an exponent too large (too small) to t in a given computer, then

the situation is called over ow (under ow).

Over ow is a serious problem; for most systems the result of an over ow is 1. Under ow

is usually considerably less serious. The result of an under ow can be to set the value to zero, a

denormalized number or 2L .

Example 2.1.1 Over ow and Under ow

1. Let = 10; t = 3; L = ;3; U = 3.

a = :111 10 ;

3

b = :120 103

c = a b = :133 10 5

2. Let = 10; t = 3; L = ;2; U = 3

a = :1 10; ; 1

b = :2 10; 1

c = ab = 2 10; 4

Simple mathematical computations such as nding a square root, or exponent of a number

or computing factorials can give over ow. For example, consider computing

p

c= a +b :

2 2

The IEEE standard also sets forth the results of operations with innities and NaNs. All oper-

ations with innities correspond to the limiting case in real analysis. Those ambiguous situations,

such as 0 1, result in NaNs, and all binary operations with one or two NaNs result in a NaN.

55

Computing the Length of a Vector

Over
ow and under
ow can sometimes be avoided just by organizing the computations dier-

ently. Consider, for example, the task of computing the length of an n-vector x with components

x ; : : :; xn

1

kxk = x + x + + xn:

2

2

2

1

2

2

2

If some xi is too big or too small, then we can get over
ow or under
ow with the usual way

of computing kxk2. However, if we normalize each component of the vector by dividing it by

m = max(jx1j; : : :; jx1j) and then form the squares and the sum, then over
ow problems can be

avoided. Thus, a better way to compute kxk22 will be:

1: m = max(jx1j; : : :; jxnj)

2: yi = xi=m; i = 1; : : :; n

p

3: kxk2 = m (y12 + y22 + + yn2 )

2.2 Rounding Errors

If a computed result of a given real number is not machine representable, then there are two ways

it can be represented in the machine. Consider

d dtdt

1 +1

Then the rst method, chopping, is the method in which the digits from dt+1 on are simply

chopped o. The second method is rounding, in which the digits dt+1 through the rest are not

only chopped o, but the digit dt is also rounded up or down depending on whether dt+1 =2 or

dt+1 < =2.

Let
(x) denote the
oating point representation of a real number x.

Example 2.2.1 Rounding

Consider base 10. Let x = 3:141596

t=2

(x) = 3:1

t=3

(x) = 3:14

t=4

(x) = 3:142

56

We now give an expression to measure the error made in representing a real number x on the

computer, and then show how this measure can be used to give bounds for errors in other
oating

point computations.

Denition 2.2.1 Let x^ denote an approximation of x, then there are two ways we can measure

the error:

Absolute Error = jx^ ; xj

; x 6= 0:

Relative Error = jx^ j;xjxj

Note that the relative error makes more sense than the absolute error. The following

simple example shows this:

Example 2.2.2

Consider

x 1 = 1:31

x^ 1 = 1:30

and x 2 = 0:12

x^ 2 = 0:11

jx^ ; x j = jx^ ; x j = 0:01:

1 1 2 2

jx^1 ; x1j = 0:0076335

jx j

1

jx^2 ; x2j = 0:0833333:

jx j

2

Thus, the relative errors show that x^1 is closer to x1 than x^2 is to x2 , whereas the absolute errors

give no indication of this at all.

The relative error gives an indication of the number of signicant digits in an approximate

answer. If the relative error is about 10;s, then x and x^ agree to about s signicant digits. More

specically,

Denition 2.2.2 x^ is said to approximate x to s signicant digits if s is the largest

non-negative

1in-

j x ; x

^ j j x

teger for which the relative error jxj < 5(10;s); that is, s is given by s = ; log jxj + 2 .; x

^ j

57

Thus, in the above examples, x^1 and x1 agree to two signicant digits, while x^2 and x2 agree

to about only one signicant digit.

We now give an expression for the relative error in representing a real number x by its
oating

point representation
(x):

Theorem 2.2.1 Let
(x) denote the
oating point representation of a real number

x. Then

81 9

j
(x) ; xj = >

< ;t for rounding

2

1 >

=

: (2.2.1)

jxj >

: ;t for chopping

1 >

;

Proof. We establish the bound for rounding and leave the other part as an exercise.

Let x be written as

x = (d d dtdt ) e

1 2 +1

where d1 6= 0 and 0 di < . When we round o x we obtain one of the following point numbers:

x0 = (d d dt) e

1 2

1 2

Obviously we have x 2 (x0 ; x00). Assume, without any loss of generality, that x is closer to x0.

We then have

jx ; x0j 12 jx0 ; x00j = 12 e;t:

Thus, the relative error

jx ; x0j 1 ;t

jxj 2 d1d2 dt

;t

12 1 (since di < )

= 12 1;t:

Example 2.2.3

58

Consider the three digit representation of the decimal number x = 0:2346 ( = 10; t = 3).

Then, if rounding is used, we have:

(x) = 0:235

Relative Error = 0:001705 < 1 10;2:

2

Similarly, if chopping is used, we have:

(x) = :234

Relative Error = 0:0025575 < 10;2:

Deniton: The number in (2.2.1) is called the machine precision or unit roundo error.

It is the smallest positive
oating point number such that:

(1 + ) > 1:

is usually between 10; and 10; (on most machines), for double and single precision, respec-

16 6

tively. For the IBM 360 and 370, = 16; t = 6; = 4:77 10; .7

and U for a computer are not known, the following simple FORTRAN program can be run to

estimate for that computer (Forsythe, Malcolm and Moler (1977), p. 14.)

REAL MEU, MEU 1

MEU = 1.0

10 MEU = 0.5 * MEU

MEU 1 = MEU + 1.0

IF (MEU 1.GT.1.0) GOTO 10

The above FORTRAN program computes an approximation of which diers from by at most a

factor of 2. This approximation is quite acceptable, since an exact value of is not that important

and is seldom needed.

The book by Forsythe, Malcolm and Moler (CMMC 1977) also contains an extensive list of L

and U for various computers.

The formula 8

j (x) ; xj = >

< ;t for chopping

1

> 1

59

can be written as

(x) = x(1 + )

where j j

Assuming that the IEEE standard holds, we can easily derive the following simple laws of
oating

point arithmetic.

Theorem 2.3.1 Let x and y be two
oating point numbers, and let
(x + y),

(x ; y );
(xy ), and
(x=y ) denote the computed sum, dierence, product and quo-

tient. Then

1.
(x y ) = (x y )(1 + ), where j j .

2.
(xy ) = (xy )(1 + ), where j j .

3. if y 6= 0, then
(x=y ) = (x=y )(1 + ), where j j .

On computers that do not use the IEEE standard, the following
oating point law

of addition might hold:

4.
(x + y ) = x(1 + 1 ) + y (1 + 2 ) where j1 j , and j2 j .

Let

= 10; t=3

in examples 1 through 3.

1. x = :999 102; y = :111 100.

x + y = 100:0110 = :100011 10 3

(x + y ) = :100 10 3

= ;1:0999 10; ; j j < 21 (10; ):

4 2

xy = 11:0889

(xy ) = :111 10 2

60

Thus,
(xy ) = xy (1 + ), where

= 1:00100 10; ; j j 12 (10 ; ):

3 1 3

x = 900

x y

y = :900 10

3

= 0:

4. Let

= 10; t=4

x = 0:1112

y = :2245 105

xy = :24964 104;

(xy ) = :2496 104

Thus, j
(xy ) ; xy j = :44 and

jj = 1:7625 10; 4

< 12 10; 3

Theorem 2.3.1 and the examples following this theorem show that the relative errors in com-

puting the sum, dierence, product and quotient in oating point arithmetic are small. However,

there are computers without guard digits, such as the Cybers and the current CRAYS (the CRAY

arithmetic is changing), in which additions and subtractions may not be accurate. We describe this

aspect in some detail below.

A guard digit is an extra digit on the lower end of the arithmetic register whose purpose is

to catch the low order digit which would otherwise be pushed out of existence when the decimal

points are aligned. The following example shows the dierence between two models.

Examples of oating point additions

Let

= 10; t = 3; = 0:001

61

Example 2.3.2 Addition with a Guard Digit

x = 0:101 10 ; 2

y = ;0:994 10 1

guard digit

x = 0:101 0 102

y = ;0:099 4 102

Step 2. Add (with an extra digit)

0:1010 102

;0:0994 102

(x + y ) = 0:001 6 102

Step 3. Normalize

(x + y ) = 0:160 100

Result:
(x + y) = (x + y)(1 + ) with = 0.

Example 2.3.3 Addition without a guard digit

x = 0:101 10 ; 2

y = ;0:994 10 1

x = 0:101 10 2

y = ;0:099[4] 10 2

Step 2. Add

0:101 102

;0:099 102

(x + y ) = 0:002 102

Step 3. Normalize

(x + y ) = 0:200 100

Result: (x + y) = (x + y)(1 + ) with = 0:25 = 250.

62

Thus, we repeat that for computers with a guard digit,

(x y ) = (x + y )(1 + ) j j

However, for those without a guard digit

(x y ) = x(1 + 1 ) y (1 + 2 );

j j ;

1 j j :

2

A FINAL REMARK: Throughout this book, we will assume that the computations have been

performed with a guard digit, as they are on almost all available machines.

We shall call results 1 through 3 of Theorem 2.3.1 along with (2.2.1) the fundamental laws

of
oating point arithmetic. These fundamental laws form the basis for establishing bounds for

oating point computations.

For example, consider the
oating point computation of x(y + z ):

(x(y + z )) = [x
(y + z )](1 + 1 )

= x(y + z )(1 + 2 )(1 + 1 )

= x(y + z )(1 + 1 2 + 1 + 2 )

x(y + z)(1 + 3);

where 3 = 1 + 2 ; since 1 and 2 are small, their product is neglected.

We can now easily establish the bound of 3 . Suppose = 10, and that rounding is used. Then

j j = j + j j j + j j

3 1 2 1 2

21 10 ;t + 12 10 ;t

1 1

= 10 ;t:1

Thus, the relative error due to round-o in computing
(x(y + z )) is about 101;t in the

worst case.

2.4 Addition of n Floating Point Numbers

Consider adding n
oating point numbers x1; x2; : : :; xn with rounding. Dene s2 =
(x1 + x2 ).

Then

s2 =
(x1 + x2) = (x1 + x2 )(1 + 2 );

63

where j1 j 21 1;t. That is, s2 ; (x1 + x2 ) = 2 (x1 + x2). Dene s3 ; s4; : : :; sn recursively by

si =
(si + xi ); i = 2; 3; : : :; n ; 1:

+1 +1

s ; (x + x + x ) = (x + x ) + (x + x )(1 + ) + x

3 1 2 3 1 2 2 1 2 2 3 3 3

(x + x ) + (x + x + x )

1 2 2 1 2 3 3

(neglecting the term 2 3 which is small, and so on). Thus, by induction we can show that

sn ; (x + x + + xn) (x + x ) + (x + x + x )

1 2 1 2 2 1 2 3 3

+ + (x + x + + xn )n 1 2

The above can be written as

sn ; (x + x + + xn) x ( + + + n )

1 2 1 2 3

+x ( + + n ) + x ( + + n )

2 2 3 3

+ + xnn

where each jij 12 1;t = . Dening 1 = 0, we can write:

1 2

x1(1 + 2 + + n) + x2(2 + + n) + + xnn;

where each jij , i = 1; 2; : : :; n.

Remark: From the above formula we see that we should expect smaller error in general when

adding n
oating point numbers in ascending order of magnitude:

jx j jx j jx j jxnj:

1 2 3

If the numbers are arranged in ascending order of magnitude, then the larger errors will be associ-

ated with the smaller numbers.

64

2.5 Multiplication of n Floating Point Numbers

Proceeding as in the case of addition of n
oating point numbers in the last section, it can be shown

that

Theorem 2.5.1 Y

n

(x1 x2 xn ) (1 + ) xi

i=1

where = j(1 + 2 )(1 + 3) (1 + n ) ; 1j and ji j ; i = 1; 2; : : :; n.

A bound for

Assuming that (n ; 1) < :01, we will prove that < 1:06(n ; 1). (This assumption is

quite realistic; on most machines this assumption will hold for fairly large values of n).

Since ji j , we have (1 + )n; ; 1. Again, since

1

we have

(1 + )n;1 < e(n;1):

Thus,

(1 + )n;1 ; 1 < e(n;1) = (n ; 1) + ((n ;21)) +

2

= (n ; 1) 1 + 2 + 6 +

< (n ; 1) 1 + 1 0;:05:05

(Note that (n ; 1) < :01.)

Thus,

; 0 : 05

(1 + ) ; 1 < (n ; 1) 1 + 1 ; :05 < 1:06(n ; 1):

n 1

(2.5.1)

Thus, combining Theorem 2.5.1 and (2.5.1), we can write

Theorem 2.5.2 The relative error in computing the product of n
oating point

numbers is at most 1:06(n ; 1), assuming that (n ; 1) < :01.

65

2.6 Inner Product Computation

A frequently arising computational task in numerical linear algebra is the computation of the inner

product of two n-vectors x and y:

xT y = x y + x y + + xnyn

1 1 2 2 (2.6.1)

where xi and yi , i = 1; : : :; n, are the components of x and y .

Let xi and yi , i = 1; : : :; n be
oating-point numbers. Dene

S =
(x y );

1 1 1 (2.6.2)

S =
(S +
(x y ));

2 1 2 2 (2.6.3)

..

.

Sk =
(Sk;1 +
(xkyk )); (2.6.4)

k = 3; 4; : : :; n:

We then have, using Theorem 2.3.1,

S = x y (1 + );

1 1 1 1 (2.6.5)

S = [S + x y (1 + )] (1 + )

2 1 2 2 2 2 (2.6.6)

..

.

Sn = [Sn;1 + xn yn (1 + n )] (1 + n ); (2.6.7)

where each ji j , and jij . Substituting the values of S1 through Sn;1 in Sn and making

some rearrangements, we can write

X

n

Sn = xi yi(1 + i ) (2.6.8)

i=1

where

1 + i = (1 + i )(1 + i)(1 + i+1) (1 + n)

1 + i + i + i+1 + + n (1 = 0) (2.6.9)

(ignoring the products i j and j k , which are small).

66

For example, when n = 2, it is easy to check that

S = x y (1 + ) + x y (1 + );

2 1 1 1 2 2 2 (2.6.10)

where 1 + 1 1 + 1 + 2; 1 + 2 1 + 2 + 2 (neglecting the products of 1 2 and 2 2, which are

small).

As in the last section, it can be shown (see Forsythe and Moler CSLAS, pp. 92{93) that if

n < 0:01, then

jij 1:01(n + 1 ; i); i = 1; 2; : : :; n: (2.6.11)

From (2.6.8) and (2.6.11), we have

j
(xT y) ; xT yj

Xn

jxij jyij jij

i=1

njxjT jyj

n kxk kyk

2 2

(using the Cauchy{Schwarz inequality (Chapter 1, section 1.7)),

where jj = (j1j; j2j; : : :; jnj)T .

Theorem 2.6.1

j
(xT y) ; xT yj njxjT jyj n kxk kyk ;

2 2

While talking about inner product computation, let's mention that since most computers allow

double precision computations, it is recommended that the inner product be computed in double

precision (using 2t digits arithmetic) to retain greater accuracy. The rationale here is that if we use

single precision to compute xT y , then there will be (2n ; 1) single precision rounding errors (one

for each multiplication and each addition). A better strategy is to convert each xi and yi to double

precision by extending their mantissa with zeros, multiply them in double precision, add them in

double precision and, nally, round the nal result in single precision. This process is known as

accumulation of inner product in double precision (or extended precision). We summarize

the process in the following.

67

Accumulation of Inner Product in Double Precision

1. Convert each xi and yi in double precision.

2. Compute the individual products xi yi in double precision.

X

n

3. Compute the sum xiyi in double precision.

i=1

4. Round the sum in single precision.

The process gives low round-o error at little extra cost. It can be shown (Wilkinson AEP, pp.

117{118) that the error in this case is essentially independent of n. Specically, it can be

shown that if the inner product is accumulated in double precision and 2(xT y ) denotes the result

of such computations, then

Theorem 2.6.2

j
(xT y) ; xT yj cjxT yj;

2

Remark: The last sentence in Theorem 2.6.2 is important. One can construct a very simple

example (Exercise #6(b)) to see that if cancellation takes place, the conclusion of Theorem 2.6.2

does not hold. The phenomenon of catastrophic cancellation is discussed in the next

section.

68

2.7 Error Bounds for Floating Point Matrix Computations

Theorem 2.7.1 Let jM j = (jmijj). Let A and B be two
oating point matrices

and c a
oating point number. Then

1.
(cA) = cA + E; jE j jcAj

2.
(A + B ) = (A + B ) + E; jE j jA + B j

If A and B are two matrices compatible for matrix-multiplication, then

3.
(AB ) = AB + E; jE j njAj jB j + O(2 ).

Proof. See Wilkinson AEP, pp. 114-115, Golub and Van Loan MC (1989, p. 66).

Meaning of O( ) 2

In the above expression the notation O(2 ) stands for a complicated expression that is bounded

by c2 , where c is a constant, depending upon the problem. The expression O(2 ) will be used

frequently in this book.

Remark: The last result shows that the matrix multiplication in
oating point arithmetic can be

very inaccurate, since jAj jB j may be much larger than jAB j itself (exercise #9). For this reason,

whenever possible, while computing matrix-matrix or matrix-vector product, accumu-

lation of inner products in double precision should be used, because in this case the

entries of the error matrix can be shown to be bounded predominantly by the entries

of the matrix jABj, rather than those of jAjjBj; see Wilkinson AEP, p. 118.

Error Bounds in Terms of Norms

Traditionally, for matrix computations the bounds for error matrices are given in terms of the

norms of the matrices, rather than in terms of absolute values of the matrices as given above.

Here we rewrite the bound for error matrices for matrix multiplications using norms, for easy

reference later in the book. We must note, however, that entry-wise error bounds are

more meaningful than norm-wise errors (see remarks in Section 3.2).

Consider again the equation:

(AB ) = AB + E; jE j njAj jB j + O(2):

69

Since kE k k jE j k, we may rewrite the equation as:

(AB ) = AB + E;

where

kE k k jE j k nk jAj k k jBj k + O( ): 2

2

2

2

2 2

2

2

2

2 2

2

A. Matrix-vector multiplication

(Ab) = Ab + e

where

kek n kAk kbk + O( ):

2

2

2 2

2

(QA) = Q(A + E );

where kE k2 n2 kAk2 + O(2 ).

70

Implication of the above result

The result says that, although matrix multiplication can be inaccurate in general,

if one of the matrices is orthogonal then the
oating point matrix multiplication gives

only a small and acceptable error. As we will see in later chapters, this result forms the basis

of many numerically viable algorithms discussed in this book.

For example, the following result, to be used very often in this book, forms the basis of the QR

factorization of a matrix A (see Chapter 5) and is a consequence of the above result.

P = I ; 2 uu

T

u u;

T

arithmetic. Then

^ ) = P (A + E );

(PA

where

kE k cn kAk

2

2

2

Moreover, if the inner products are accumulated in double precision, then the bound

will be independent of n2 .

Proof. (See Wilkinson AEP, pp. 152-160).

Intuitively, it is clear that if a large number of oating point computations is done, then the

accumulated error can be quite large. However, round-o error can be disastrous even at a single

step of computation. For example, consider the subtraction of two numbers:

x = :54617

y = :54601

The exact value is

d = x ; y = :00016:

71

Suppose now we use four digit arithmetic with rounding. Then we have

x^ = :5462 (Correct to four signicant digits)

y^ = :5460 (Correct to ve signicant digits)

d^ = x^ ; y^ = :0002:

How good is the approximation of d^ to d? The relative error is

jd ; d^j = :25(quite large!)

jdj

What happened above is the following. In four digit arithmetic, the numbers .5462 and .5460 are

of almost the same size. So, when the rst one was subtracted from the second, the most signicant

digits canceled and the very least signicant digit was left in the answer. This phenomenon, known

as catastrophic cancellation, occurs when two numbers of approximately the same size are

subtracted. Fortunately, in many cases catastrophic cancellation can be avoided. For example,

consider the case of solving the quadratic equation:

ax + bx + c = 0;

2

a 6= 0:

The usual way the two roots x1 and x2 are computed is:

p2

x1 = ; b + b ; 4ac

p2a2

x2 = ; b ; b ; 4ac

2a

It is clear from above that if a; b, and c are numbers such that ;b is about the same size

p

as b2 ; 4ac (with respect to the arithmetic used), then a catastrophic cancellation will occur in

computing x2 and as a result, the computed value of x2 can be completely erroneous.

Example 2.8.1

As an illustration, take a = 1, b = ;105, c = 1 (Forsythe, Malcolm and Moler CMMC pp.

20-22). Then using = 10; t = 8; L = ;U = ;50, we see that

p 10

x1 = 10 5

+ 10 ; 4 = 105 (true answer)

2

x2 = 10 ;2 10 = 0 (completely wrong).

5 5

The true x2 = 0:000010000000001 (correctly rounded to 11 signicant digits). The catastrophic

p

cancellation took place in computing x2, since ;b and b2 ; 4ac) are the same order. Note that

p

in 8-digit arithmetic, 1010 ; 4 = 105.

72

How Cancellation Can be Avoided

Cancellation can be avoided if an equivalent pair of formulas is used:

p

x = ; b + sign(b ) b ; 4ac 2

1

2a

c

x = ax

2

1

where sign(b) is the sign of b. Using these formulas, we easily see that:

x = 100000:00

1

1:0000000

x = 100000

2

:00

= 0:000010000

Example 2.8.2

For yet another example to see how cancellation can lead to inaccuracy, consider the problem

of evaluating

f (x) = ex ; x ; 1 at x = :01.

Using ve digit arithmetic, the correct answer is .000050167. If f (x) is evaluated directly from the

expression, we have

f (:01) = 1:0101 ; (:01) ; 1 = :0001

; :000050167

Relative Error = :0001:00005016

= :99 100 ;

indicating that we cannot trust even the rst signicant digit.

Fortunately, cancellation can again be avoided using the convergent series

ex = 1 + x + x2 + x3! +

2 3

ex ; x ; 1 = (1 + x + x2 + x3! + ) ; x ; 1

2 3

= x + x + x

2 3 4

2 3! 4!

For x = :01, this formula gives

(:01)2 + (:01)3 + (:01)4 +

2 3! 4!

= :00005 + :000000166666 + :00000000004166 +

= :000050167 (Correct gure up to ve signicant gures)

73

Remark: Note that if x were negative, then use of the convergent series for ex would not have

helped. For example, to compute ex for a negative value of x, cancellation can be avoided by using:

e;x = e1x = 1

1 + x + 2! + x3! +

x 2 3

Recursive Computations

In the above examples, we saw how subtractive cancellations can give inaccurate answers. There

are, however, other common sources of round-o errors, e.g., recursive computations, which are

computations performed recursively so that the computation of one step depends upon the results

of previous steps. In such cases, even if the error made in the rst step is negligible, due to the

accumulation and magnication of error at every step, the nal error can be quite large, giving a

completely erroneous answer.

Certain recursions propagate errors in very unhealthy fashions. Consider a very nice example

involving recursive computations, again from the book by Forsythe, Malcolm, and Moler CMMC

[pp. 16-17].

Example 2.8.3

Suppose we need to compute the integral

Z 1

En = xnex; dx

1

Z 1 Z 1

En = n x;1

xe dx = (x e n x;1)1 ; nxn; ex; dx

1 1

0

0 0

or

En = 1 ; nEn; ; n = 2; 3; : : :1

Thus, if E is known, then for dierent values of n, En can be computed, using the above recursive

1

formula.

Indeed, with = 10 and t = 6, and starting with E1 = 0:367879 as a six-digit approximation

to E1 = 1=e, we have from above:

E1 = 0:367879

E2 = 0:264242

E3 = 0:207274

E4 = 0:170904

..

.

E9 = ;0:068480

74

Although the integrand is positive throughout the interval [0; 1], the computed value

of E is negative. This phenomenon can be explained as follows.

9

The error in computing E was ;2 times the error in computing E , the error in computing E

2 1 3

was ;3 times the error in E (therefore, the error at this step was exactly six times the error in

2

E ). Thus, the error in computing E was (;2)(;3)(;4) (;9) = 9! times the error in E . The

1 9 1

error in E was due to the rounding of 1=e using six signicant digits, which is about 4:412 10; .

1

7

However, this small error multiplied by 9! gave 9! 4:412 10; = :11601, which is quite large.

7

Again, for this example, it turned out that we could get a much better result by simply rear-

ranging the recursion so that the error at every step, instead of being magnied, is reduced. Indeed,

if we rewrite the recursion as

En;1 = 1 ;nEn ; n = : : :; 3; 2

then the error at each step will be reduced by a factor of 1=n. Thus, starting with a large value of n

(say, n = 20) and working backward, we will see that E9 will be accurate to full six-digit precision.

To obtain a starting value, we note that

Z1 Z1

En = xn en;1dx xndx = n +1 1 :

0 0

With n = 20, E20 1 . Let's take E20 = 0. Then, starting with E20 = 0, it can be shown (Forsythe,

21

Malcolm, and Moler CMMC, p. 17) that E9 = 0:0916123, which is correct to full six-digit precision.

The reason for obtaining this accuracy was that the error in E20 was at most 21 1 ; this error was

1 in computing E , giving an error of at most 1 1 = 0:0024 in the computation

multiplied by 20 19

20 21

of E19, and so on.

The concepts of oating point numbers and rounding errors have been introduced and discussed in

this chapter.

1. Floating Point Numbers: A normalized oating point number has the form

x = r e ;

where e is called exponent, r is the signicant, and is the base of the number system.

The oating point number system is characterized by four parameters:

75

| the base

t | the precision

L; U | the lower and upper limits of the exponent.

2. Errors: The error(s) in a computation is measured either by absolute error or relative

error.

The relative errors make more sense than absolute errors.

The relative error gives an indication of the number of signicant digits in an approximate

answer.

The relative error in representing a real number x by its
oating point representation
(x) is

bounded by a number , called the machine precision (Theorem 2.2.1).

3. Laws of Floating Point Arithmetic.

(x y ) = (x y )(1 + )

where * indicates any of the four basic arithmetic operations +; ;; , or , and j j .

4. Addition, Multiplication, and Inner Product Computations. The results of addi-

tion and multiplication of n
oating point numbers are given in Theorems 2.4.1 and 2.5.1,

respectively.

While adding n
oating point numbers, it is advisable that they are added in ascending

order of magnitude.

While computing the inner product of two vectors, accumulation of inner product in

double precision, whenever possible, is suggested.

5. Floating Point Matrix Multiplications. The entry-wise and normalize error bounds for

matrix multiplication of two
oating point matrices are given in Theorems 2.7.1 and 2.7.2,

respectively.

Matrix multiplication in
oating point arithmetic can be very inaccurate, unless one of

the matrices is orthogonal (or unitary, if complex). Accumulation of inner product is

suggested, whenever possible, in computing a matrix-matrix or a matrix-vector product.

The high accuracy in a matrix product computation involving an orthogonal matrix

makes the use of orthogonal matrices in matrix computations quite attractive.

76

6. Round-o Errors Due to Cancellation and Recursive Computation.

Two major sources of round-o errors are subtractive cancellation and recursive com-

putations.

They have been discussed in some detail in section 2.8.

Examples have been given to show how these errors come up in many basic computations.

An encouraging message here is that in many instances, computations can be reorganized so

that cancellation can be avoided, and the error in recursive computations can be diminished

at each step of computation.

For details of IEEE standard, see the monograph \An American National Standard: IEEE Standard

for Binary Floating-Point Arithmetic," IEEE publication, 1985.

For results on error bounds for basic oating point matrix operations, the books by James

H. Wilkinson (i) The Algebraic Eigenvalue Problem (AEP) and (ii) Rounding Errors

in Algebraic Processes (Prentice-Hall, New Jersey, 1963) are extremely useful and valuable

resources.

Discussion on basic oating point operations and rounding errors due to cancellations and

recursive computations are given nowadays in many elementary numerical analysis textbooks. We

shall name a few here which we have used and found useful.

1. Elementary Numerical Analysis by Kendall Atkinson, John Wiley and Sons, 1993.

2. Numerical Mathematics and Computing by Ward Cheney and David Kincaid,

Brooks/Cole Publishing Company, California, 1980.

3. Computer Methods for Mathematical Computations by George E. Forsythe, Michael

A. Malcolm and Cleve B. Moler, Prentice Hall, Inc., 1977.

4. Numerical Methods: A Software Approach by R. L. Johnston, John Wiley and Sons,

Toronto, 1982.

5. Numerical Methods and Software by D. Kahaner, C. B. Moler and S. Nash, Prentice

Hall, Englewood Clis, NJ, 1988.

77

Exercises on Chapter 2

1. (a) Show that 81

j
(x) ; xj = >

< ;t for rounding

2

1

jxj >

: ;t for chopping

1

(x) = x(1 + ); j j :

2. Let x be a oating point number and k be a positive integer, then

( xk! ) = xk! (1 + ek );

k k

where

jekj 2k + O( ): 2

3. Construct examples to show that the distributive law for
oating point addition and multi-

plication does not hold. What can you say about the commutativity and associativity for

these operations? Give reasons for your answers.

4. Let x1 ; x2; : : :; xn be the n
oating point numbers. Dene

s =
(x + x ); sk =
(sk; + xk ); k = 3; : : :; n:

2 1 2 1

sn = (x + x + + xn ) = x (1 + ) + x (1 + ) + + xn(1 + n):

1 2 1 1 2 2

5. (a) Construct an example to show that, when adding a list of oating point numbers, in

general, the rounding error will be less if the numbers are added in order of increasing

magnitude.

(b) Find another example to show that this is not always necessarily true.

6. (a) Prove that the error in computing an inner product with accumulation in double precision

is essentially independent of n. That is, show that if 2 (xT y ) denotes the computation of

the inner product with accumulation in double precision, then unless severe cancellation

takes place,

j 2(xT y) ; xT yj cjxT yj + O(2)

(Wilkinson AEP, pp. 116-117).

78

(b) Show by means of a simple example that if there is cancellation, then 2(xT y ) can dier

signicantly from xT y (take t = 3).

(c) If s is a scalar, then prove that

!

x y = x y (1 + ) + + xnyn (1 + n ) :

T

1 1 1

2

s s=(1 + )

Find bounds for and i , i = 1; : : :; n. (See Wilkinson AEP, p. 118).

7. Show that

(a) (cA) = cA + E; jE j jcAj

(b) (A + B ) = (A + B ) + E; jE j (jAj + jB j)

(c) (AB ) = AB + E; jE j njAj jB j + O(2 )

(Wilkinson AEP, p. 115.)

8. Construct a simple example to show that the matrix multiplication in oating point arithmetic

need not be accurate. Rework your example using accumulation of inner product in double

precision.

9. Let A and B be a n n matrices, then show that

(AB ) = AB + E;

where

jeij j n jfij j + 0( );

2 2

10. Prove that if Q is orthogonal then

(QA) = Q(A + E ); where kE k2 n2 kAk2jO(2 ):

11. Let b be a column vector and x = Ab. Let x^ =
(x). Then show that

kx^ ; xk p(n)kA;1k kAk;

kxk

where p(n) is a polynomial in n of low degree.

(The number kA;1k kAk is called the condition number of A. There are matrices for which

this number can be very big. For those matrices we then conclude that the relative error

in matrix-vector product can be quite large.)

79

12. Using Theorem 2.7.1, prove that, if B is nonsingular,

k
(AB) ; ABkF n kBk
B;1
+ O(2):

kABkF F F

yi = Ayi; i = 1; 2; : : :; n ; 1:

+1

Let y^i =
(yi ). Find a bound for the relative error in computing each yi ; i = 1; : : :; n.

14. Let = 10; t = 4. Compute

(AT A);

where 0 1

BB 1 1 C

A=B@ 10

4 ; 0 C CA :

0 10;4

Repeat your computation with t = 9. Compare the results.

15. Show how to arrange computation in each of the following so that the loss of signicant digits

can be avoided. Do one numerical example in each case to support your answer.

(a) ex ; x ; 1, for negative values of x.

p

(b) x +1;x ,

4 2

for large values of x.

1 1

(c)

x ; x + 1, for large values of x.

(d) x ; sin x, for values of x near zero.

(e) 1 ; cos x, for values of x near zero.

16. What are the relative and absolute errors in approximating

(a) by 227?

(b) 13 by .333?

(c) 16 by .166?

How many signicant digits are there in each computation?

17. Let = 10; t = 4. Consider computing

a = ( 16 ; :1666)=:1666:

How many correct digits of the exact answer will you get?

80

18. Consider evaluating

p

e= a +b : 2 2

How can the computation be organized so that over
ow in computing a2 + b2 for large values

of a or b can be avoided?

19. What answers will you get if you compute the following numbers on your calculator or com-

puter?

p

(a) 10 ; 1,

8

p ;

(b) 10 ; 1, 20

(c) 10 ; 50

16

20. What problem do you foresee in solving the quadratic equations

(a) x2 ; 106x + 1 = 0;

(b) 10;10x2 ; 1010x + 1010 = 0

using the well-known formula p

x= ; b b ; 4ac : 2

2a

What remedy do you suggest? Now solve the equations using your suggested remedy, using

t = 4.

21. Show that the integral Z

yi =

1

xi dx

0 x+5

can be computed by using the recursion formula:

yi = 1i ; 5yi;1:

Compute y1 ; y2; : : :; y10 using this formula, taking

y = ln(x + 5)jx = ln 6 ; ln 5 = ln(1:2):

0

1

=0

Now rearrange the recursion so that the values of yi can be computed more accurately.

22. Suppose that x approximates 104; 50000 and 55596 to ve signicant gures. Find the largest

interval in each case containing x .

81

3. STABILITY OF ALGORITHMS AND CONDITIONING OF PROBLEMS

3.1 Some Basic Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82

3.1.1 Computing the Norm of a Vector : : : : : : : : : : : : : : : : : : : : : : : : : 82

3.1.2 Computing the Inner Product of Two Vectors : : : : : : : : : : : : : : : : : : 83

3.1.3 Solution of an Upper Triangular System : : : : : : : : : : : : : : : : : : : : : 83

3.1.4 Computing the Inverse of an Upper Triangular Matrix : : : : : : : : : : : : : 84

3.1.5 Gaussian Elimination for Solving Ax = b : : : : : : : : : : : : : : : : : : : : 86

3.2 Denitions and Concepts of Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : 91

3.3 Conditioning of the Problem and Perturbation Analysis : : : : : : : : : : : : : : : : 95

3.4 Conditioning of the Problem, Stability of the Algorithm, and Accuracy of the Solution 96

3.5 The Wilkinson Polynomial : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98

3.6 An Ill-conditioned Linear System Problem : : : : : : : : : : : : : : : : : : : : : : : : 100

3.7 Examples of Ill-conditioned Eigenvalue Problems : : : : : : : : : : : : : : : : : : : : 100

3.8 Strong, Weak and Mild Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 103

3.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 105

3.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 106

CHAPTER 3

STABILITY OF ALGORITHMS

AND CONDITIONING OF PROBLEMS

3. STABILITY OF ALGORITHMS AND CONDITIONING OF

PROBLEMS

3.1 Some Basic Algorithms

Denition 3.1.1 An algorithm is an ordered set of operations, logical and arithmetic, which

when applied to a computational problem dened by a given set of data, called the input data,

produces a solution to the problem. A solution is comprised of a set of data called the output

data.

In this book, for the sake of convenience and simplicity, we will very often describe algorithms by

means of pseudocodes which can be translated into computer codes easily. Describing algorithms

by pseudocodes has been made popular by Stewart through his book IMC (1973).

Given x = (x ; : : :; xn)T , compute kxk .

1 2

Input Data: n; x ; : : :; xn.

1

1

Step 3: Compute s = kxk = rp(y + + yn).

2

2

1

2

Output Data: s.

Pseudocodes

r = max(jx j; : : :; jxnj)

1

s=0

For i = 1 to n do

yi = xi=r

s = s + yi2

s = r(s) =

1 2

G. W. Stewart, a former student of the celebrated numerical analyst Alston Householder, is a professor of computer

science at the University of Maryland. He is well known for his many outstanding contributions in numerical linear

algebra and statistical computations. He is the author of the book Introduction to Matrix Computations.

82

An Algorithmic Note

In order to avoid over
ow, each entry of x has been normalized before

using the formula q

kxk2 = x21 + + x2n:

Given x and y two n-vectors, compute the inner product

xT y = x y + x y + + xnyn:

1 1 2 2

Input Data: n; x ; x ; : : :; xn:

1 2

Xn

Step 2: Add the partial products: Sum = si:

i=1

Pseudocodes

Sum = 0

For i = 1; : : :; n do

Sum = Sum + xiyi

3.1.3 Solution of an Upper Triangular System

Consider the system

Ty = b

where T = (tij ) is a nonsingular upper triangular matrix and y = (y1 ; y2; : : :; yn)T . Specically,

t y + t y + + t nyn = b

11 1 12 2 1 1

t y + + t nyn = b

22 2 2 2

t y + + t nyn = b

33 3 3 3

..

.

83

tn; ;n; yn; + tn; ;nyn = bn;

1 1 1 1 1

tnnyn = bn

where each tii 6= 0 for i = 1; 2; : : :; n.

The last equation is solved rst to obtain yn , then this value is inserted in the next to the last

equation to obtain yn;1, and so on. This process is known as back substitution. The algorithm

can easily be written down.

Algorithm 3.1.3 Back Substitution

Input Data: T = (tij ), an n n upper triangular matrix, and b an n-vector.

Step 1: Compute yn = tbn

nn

Step 2: Compute yn; through y successively:

0 1

1 1

1 Xn

yi = t @bi ; tij yj A ; i = n ; 1; : : :; 2; 1:

ii j =i+1

1

Pseudocodes

For i = n;0n ; 1; : : :; 3; 2; 11do

Xn

yi = t @bi ;

1

ii

tij yj A

j =i+1

3.1.4 Computing the Inverse of an Upper Triangular Matrix

Finding the inverse of an n n matrix A is equivalent to nding a matrix X such that

AX = I:

Let X = (x1 ; : : :; xn) and I = (e1; : : :; en), where xi is the ith column of X and ei is the ith column

of I . Then the matrix equation AX = I amounts to solving n linear systems:

Axi = ei; i = 1; : : :; n:

The job is then particularly simple when A is a triangular matrix. Let T be an upper triangular

matrix. Then nding its inverse S = (s1 ; : : :; sn) amounts to solving n upper triangular linear

systems:

Txi = ei; i = 1; : : :; n:

84

Let si = (s1i ; s2i; : : :; sni)T .

For i=1: Ts = e gives

1 1

s = t1 :

11

11

For i=2: Ts = e gives

2 2

s = t1 ; s = ; t1 (t s ):

22 12 12 22

22 11

For i=k: Tsk = ek gives

skk = t1 ; sik = ; t1 (ti;i si+1 +1 ;k ; : : :; tik skk ); i = k ; 1; k ; 2; :::; 1:

kk ii

(The other entries of the column sk are all zero.)

The pseudocodes of the algorithm can now be easily written down.

A Convention: From now onwards, we shall use the following format for algorithm

descriptions.

Let T be an n n nonsingular upper triangular matrix. The following algorithm computes S;

the inverse of T .

For k = n; n ; 1; : : :; 1 do

(1) s = 1

kk

tkk

X

k

(2) sik = ;t;ii 1 tij sjk (i = k ; 1; k ; 2; : : : 1).

j =i+1

Example 3.1.1

05 2 31

B C

@ 0 2 1 CA

T =B

0 0 4

85

k = 3:

s 33 = 14

s 23 = ; t1 (t23s33 )

22

1

= ; 2 (1 14 ) = ; 81 :

11

= ; 5 (2 (; 18 ) + 3 41 )

1

1

= ; 10

k = 2:

s = 1 =1

22

t22 2

s 12 = ; t1 (t12s22 )

11

= ; (2 1 ) = ; 1

1

5 2 5

k = 1:

s 11 = t1 = 15

0 ; 1

11

1 1

;1

B

T; = S = B

5 5 10

CC

1

@0 1

2

; 1

8 A

0 0 1

4

Consider the problem of solving the linear system of n equations in n unknowns:

a x + a x + + a n xn = b

11 1 12 2 1 1

a x + a x + + a n xn = b

21 1 22 2 2 2

..

.

an x + an x + + annxn

1 1 2 2 = bn

or, in matrix notation,

Ax = b;

86

where a = (aij ) and b = (b1; : : :; bn)T .

A well-known approach for solving the problem is the classical elimination scheme known as

Gaussian elimination. A detailed description and mechanism of development of this historical

algorithm and its important practical variations will appear in Chapters 5 and 6. However, for

a better understanding of some of the material presented in this chapter, we just give a brief

description of the basic Gaussian elimination scheme.

Basic idea. The basic idea is to reduce the system to an equivalent upper triangular system so

that the reduced upper triangular system can be solved easily using the back substitution algorithm

(Algorithm 3.1.3).

Step 1: At step 1, the unknown x is eliminated from

1

athe second athe n equations.

a through

th

21

11

31

11

n1

11

respectively, to the 2nd through n equations. The quantities

th

mi = ; aai ; i = 2; : : :; n

1

1

11

are called multipliers. At the end of step 1, the system Ax = b becomes A x = b , where

(1) (1)

the entries of A = (aij ) and those of b are related to the entries of A and b as follows:

(1) (1) (1)

aij = aij + mi a j (i = 2; : : :; n; j = 2; : : :; n)

(1)

1 1

bi = bi + mi b (i = 2; : : :; n):

(1)

1 1

21 ; a31 ; : : :; ann are all zero.)

(1) (1)

Step 2: At step 2, x is eliminated from the 3rd through the nth equations of A x = b by

2

(1) (1)

mi = ; aai ; i = 3; : : :; n

2

2

22

and adding it, respectively, to the 3rd through nth equations. The system now becomes

A x = b , whose entries are given as follows:

(2) (2)

aij = aij + mi a j (i = 3; : : :; n; j = 3; : : :; n)

(2) (1) (1)

2 2

bi = bi + mi b (i = 3; : : :; n)

(2) (1) (1)

2 2

and so on.

87

k;1)

Step k: At step k, the (n ; k) multipliers mik = ; aikk; , i = k + 1; : : :; n are formed and using

(

akk

( 1)

them, xk is eliminated from the (k + 1)th through the nth equations of A(k;1)x = b(k;1). The

entries of A(k) and those of b(k) are given by

aijk = aijk; + mik akjk;

( ) ( 1) ( 1)

(i = k + 1; : : :; n; j = k + 1; : : :; n)

bik = bik; + mikbkk;

( ) ( 1) ( 1)

(i = k + 1; : : :; n)

Step n-1: At the end of the (n ; 1)th step, the reduced matrix A n; is upper triangular and the

( 1)

We are now ready to write down the pseudocodes of the Gaussian elimination scheme.

1. There are (n ; 1) steps

(k = 1; 2; : : :; n ; 1):

2. For each value of k, there are (n ; k) multipliers: mik ; (i = k + 1; : : :; n).

3. For each value of k, only (n ; k)2 entries of A(k) are modied (i = k +

1; : : :; n; j = k + 1; : : :; n). The (n ; k) entries below the (k; k)th entry of

the kth column are zeros and the remaining other entries that are not modi-

ed remain the same as those of the corresponding entries of A(k;1).

For k = 1; 2; : : :; n ; 1 do

For i = k + 1; : : :; n do

k;1)

mi;k = ; aikk; .

(

akk ( 1)

For j = k + 1; : : :; n do

aijk = aijk; + mik akjk;

( ) ( 1) ( 1)

( ) ( 1) ( 1)

(0) (0)

88

Remarks:

1. The above basic Gaussian elimination algorithm is commonly known as the Gaussian elim-

ination algorithm without row interchanges or the Gaussian elimination algorithm

without pivoting. The reason for having such a name will be clear from the discussion of

this algorithm again in Chapter 5.

2. The basic Gaussian algorithm as presented above is not commonly used in practice. Two prac-

tical variations of this algorithm, known as Gaussian elimination with partial and complete

pivoting, will be described in Chapters 5 and 6.

3. We have assumed that the quantities a11; a(1) (n;1)

22 ; : : :; ann are dierent from zero. If

any of them is computationally zero, the algorithm will stop.

Example 3.1.2

5x1 + x2 + x3 = 7

x1 + x2 + x3 = 3

2x1 + x2 + 3x3 = 6

or

05 1 110x 1 071

BB CB 1

CC B C

@ 1 1 1 CA B@ x2 A = B@ 3 CA

2 1 3 x 3 6

Ax = b:

Step 1: k=1.

i = 2; 3:

m = ; aa = ; 15 ; m = ; aa = ; 52 :

21

21

31

31

11 11

j = 2; 3:

89

i = 2; j = 2 : a = a22 + m21a12 = 45

(1)

22

i = 2; j = 3 : a(1)

23 = a23 + m21a13 = 45

i = 3; j = 2 : a(1)

32 = a32 + m31a12 = 35

i = 3; j = 3 : a(1)

33 = a33 + m31a13 = 13 5

b(1)

2 = b2 + m21b1 = 58

b(1)

3 = b3 + m31b2 = 165

(Note: b1 = b1; a21 = a31 = 0, a11 = a11; a12 = a12 ; a13 = a13.)

(1) (1) (1) (1) (1) (1)

05 1 1 10x 1 071

BB CC BB CC B C 1

@0 A @ x A = B@ CA

4

5

4

5 2

8

5

0 3

5

13

5

x 3

16

5

A x = b :

(1) (1)

Step 2: k=2.

i=3

m = ; a = ; 34

(1)

32

32

a (1)

22

i = 3; j = 3 : a (2)

33 = a(1)

33 + m32 a23 = 2

(1)

b (2)

3 = b(1)

3 + m32 b2

(1)

=2

05 1 1 10x 1 071

BB 0 CC BB x CC = BB CC1

@ A@ A

4

5

4

5 @ A 2

8

5

0 0 2 x3 2

A x = b

(2) (2)

Back Substitution: The above triangular system is easily solved using back substitution:

2x3 = 2 ) x3 = 1

90

4x + 4x = 8 ) x = 1

5 2 5 3 5 2

5x1 + x2 + x + 3 = 7 ) x1 = 1

The examples on catastrophic cancellations and recursive computations in the last chapter had one

thing in common: the inaccuracy of the computed result in each case was due entirely

to the algorithm used, because as soon as the algorithm was changed or rearranged and applied

to the problem with the same data, the computed result became very satisfactory. Thus, we are

talking about two dierent types of algorithms for a given problem. The algorithms of the rst

type|giving inaccurate results|are examples of unstable algorithms, while the ones of the

second type|giving satisfactory results|are stable algorithms.

The study of stability is very important. This is done by means of round-o error analysis.

There are two types: backward error analysis and forward error analysis.

In forward analysis an attempt is made to see how the computed solution obtained by the

algorithm diers from the exact solution based on the same data.

Denition 3.2.1 An algorithm will be called forward stable if the computed solution x^ is close

to the exact solution, x, in some sense.

The round-o error bounds obtained in Chapter 2 for various matrix operations are the result

of forward error analyses.

On the other hand, backward analysis relates the error to the data of the problem rather than

to the problem's solution. Thus we dene backward stability as follows:

Denition 3.2.2 An algorithm is called backward stable if it produces an exact solution to a

nearby problem.

Backward error analysis, introduced in the literature by J. H. Wilkinson, is nowadays widely

used in matrix computations and using this analysis, the stability (or instability) of many algorithms

in numerical linear algebra has been established in recent years. In this book, by \stability"

we will imply \backward stability", unless otherwise stated.

James H. Wilkinson, a British mathematician, is well known for his pioneering work on backward error analysis

for matrix computations. He was aliated with the National Physical Laboratory in Britain, and held visiting

appointments at Argonne National Laboratory, Stanford University, etc. Wilkinson died an untimely death in 1986. A

fellowship in his name has since been established at Argonne National Laboratory. Wilkinson's book The Algebraic

Eigenvalue Problem is an extremely important and very useful book for any numerical analyst.

91

As a simple example of backward stability, consider the case of computing the sum of two

oating point numbers x and y . We have seen before that

(x + y ) = (x + y )(1 + )

= x(1 + ) + y (1 + )

= x0 + y 0

Thus, the computed sum of two
oating point numbers x and y is the exact sum of another two

oating point numbers x0 and y 0 . Since

jj ;

both x0 and y 0 are close to x and y , respectively. Thus we conclude that the operation of adding

two
oating point numbers is backward stable. Similar statements, of course, hold for other

oating point arithmetic operations.

For yet another type of example, consider the problem for solving the linear system Ax = b:

Denition 3.2.3 An algorithm for solving Ax = b will be called stable if the computed solution

x^ is such that

(A + E )^x = b + b

with E and b small.

How Do We Measure Smallness?

The \smallness" of a matrix or a vector is measured either by looking into its entries or by

computing its norm.

While measuring errors in computations using norms is traditional in matrix

computations, component-wise measure of errors is becoming increasingly

important. It really does make more sense.

An n n matrix A has n2 entries, but the norm of A is a single number.

Thus the smallness or largeness of the norm of an error matrix E does not

truly re ect the smallness or largeness of the individual entries of E . For

example, if E = (10; :00001; 1)T , then kE k = 10:0499: Thus the small entry

.00001 was not re ected in the norm measure.

92

Examples of Stable and Unstable Algorithms by Backward Error Analysis

Example 3.2.1 A Stable Algorithm | Solution of an Upper Triangular System by

Back Substitution

Consider Algorithm 3.1.3 (the back substitution method). Suppose the algorithm is imple-

mented using accumulation of inner product in double precision. Then it can be shown (see Chapter

11) that the computed solution x^ satises

(T + E )^x = b;

where the entries of the error matrix E are quite small. In fact, if E = (eij ) and T = (tij ), then

jeij j jtij j10;t; i; j = 1; : : :; n;

showing that the error can be even smaller than the error made in rounding the entries of T . Thus,

the back substitution process for solving an upper triangular system is stable.

Example 3.2.2 An Unstable Algorithm | Gaussian Elimination Without Pivoting

Consider the problem of solving the nonsingular linear system Ax = b using Gaussian elimina-

tion (Algorithm 3.1.5).

It has been shown by Wilkinson (see Chapter 11 of this book) that, when the process does

not break down, the computed solution x^ satises

(A + E )^x = b;

with

kE k1 cn kAk1 + 0( );

3 2

where

A k = (aijk )

( ) ( )

are the reduced matrices in the elimination process and , known as the growth factor, is given

by

max max ja(ijk)j

k ij

= max ja j :

ij ij

i;j ij

ja(k)j, then the growth factor is given by

i;j ij

= max(; ; : : :; n; ) :

1 1

93

Now for an arbitrary matrix A, can be quite large, because the entries of the

reduced matrices A k can grow arbitrarily. To see this, consider the simple matrix

( )

10; 1

! 10

A= :

1 2

One step of Gaussian elimination using 9 decimal digit
oating point arithmetic will yield the

reduced matrix

10 ;10 1

! 10;10 1 !

A(1) = = :

0 2 ; 1010 0 ;1010

The growth factor for this problem is then

= max(; 1) = max(22; 10 ) = 102 ;

10 10

which is quite large. Thus, if we now proceed to solve a linear system with this reduced upper

triangular matrix, we cannot expect a small error matrix E . Indeed, if we wish to solve

10;10x1 + x2 = 1

x1 + 2x2 = 3

using the above A(1) , then the computed solution will be x1 = 0; x2 = 1, whereas the exact solution

is x1 = x2 = 1. This shows that Gaussian elimination is unstable for an arbitrary linear

system.

all matrices. There are certain classes of matrices such as symmetric

positive denite matrices, etc., for which Gaussian elimination is stable.

We shall discuss this special system in Chapter 6 in some detail.

If an algorithm is stable for a given matrix A, then one would like to see that the algorithm

is stable for every matrix A in a given class. Thus, we may give a formal denition of stability as

follows:

Denition 3.2.4 An algorithm is stable for a class of matrices C if for every matrix A in C ,

the computed solution by the algorithm is the exact solution of a nearby problem.

Thus, for the linear system problem

Ax = b;

94

an algorithm is stable for a class of matrices C if for every A 2 C and for each b, it produces a

computed solution x^ that satises

(A + E )^x = = b + b

for some E and b, where (A + E ) is close to A and b + b is close to b.

From the preceding discussion we should not form the opinion that if a stable algorithm is used

to solve a problem then the computed solution will be accurate. A property of the problem called

conditioning also contributes to the accuracy or inaccuracy of the computed result.

The conditioning of a problem is a property of the problem itself. It is concerned

with how the solution of the problem will change if the input data contains some impurities. This

concern arises from the fact that in practical applications very often the data come from some

experimental observations where the measurements can be subjected to disturbances (or \noise")

in the data. There are other sources of error also, for example, round-o errors (discussed in

Chapter 11), discretization errors, etc. Thus, when a numerical analyst has a problem in hand

to solve, he or she must frequently solve the problem not with the original data, but with data that

has been perturbed. The question naturally arises: What eects do these perturbations

have on the solution?

A theoretical study done by numerical analysts to investigate these eects, which is independent

of the particular algorithm used to solve the problem, is called perturbation analysis. This

study helps one detect whether a given problem is \bad" or \good" in the sense of whether small

perturbations in the data will create a large or small change in the solution. Specically we dene:

Denition 3.3.1 A problem (with respect to a given set of data) is called an ill-conditioned or

badly-conditioned problem if a small relative error in data causes a large relative error in the

computed solution, regardless of the method of solution. Otherwise, it is called well-conditioned.

Suppose a problem P is to be solved with an input c. Let P (c) denote the computed value

of the problem with the input c. Let c denote the perturbation in c. Then P will be said to be

ill-conditioned for the input data c if the relative error in the answer:

jP (c + c) ; P (c)j

jP (c)j

is much larger than the relative error in the data:

jcj :

jcj

95

Note: The denition of conditioning is data-dependent. Thus, a problem which is

ill-conditioned for one set of data could be well-conditioned for another set.

3.4 Conditioning of the Problem, Stability of the Algorithm, and Accuracy of

the Solution

As stated in the previous section, the conditioning of a problem is a property of the problem itself,

and has nothing to do with the algorithm used to solve the problem. To a user, of course, the

accuracy of the computed solution is of primary importance. However, the accuracy of a computed

solution by a given algorithm is directly connected with both the stability of the algorithm and the

conditioning of the problem. If the problem is ill-conditioned, no matter how stable the

algorithm is, the accuracy of the computed solution cannot be guaranteed.

Note that the denition of backward stability does not say that the

computed solution x^ by a backward stable algorithm will be close to

the exact solution of the original problem. However, when a stable

algorithm is applied to a well-conditioned problem, the computed

solution should be near the exact solution.

The ill-conditioning of a problem contaminates the computed solution, even with the use of

a stable algorithm, therefore yielding an unacceptable solution. When a computed solution is

unsatisfactory, some users (who are not usually concerned with conditioning) tend to put the

blame on the algorithm for the inaccuracy. To be fair, we should test an algorithm for stability

only on well-conditioned matrices. If the algorithm passes the test of stability on well-conditioned

matrices, then it should be declared a stable algorithm. However, if a \stable" algorithm is applied

to an ill-conditioned problem, it should not introduce more error than what the data warrants.

From the previous discussion, it is quite clear now that investigating the conditioning of a

problem is very important.

96

The Condition Number of a Problem

Numerical analysts usually try to associate a number called the con-

dition number with a problem. The condition number indicates

whether the problem is ill- or well-conditioned. More specically,

the condition number gives a bound for the relative error in the so-

lution when a small perturbation is applied to the input data.

In numerical linear algebra condition numbers for many (but not all) problems have been

identied. Unfortunately, computing the condition number is often more involved and

time consuming than solving the problem itself. For example (as we shall see in Chapter

6), for the linear system problem Ax = b, the condition number is

Cond(A) = kAk kA;1k:

Thus, computing the condition number in this case involves computing the inverse of A; it is more

expensive to compute the inverse than to solve the system Ax = b. In Chapter 6 we shall discuss

methods for estimating Cond(A) without explicitly computing A;1 .

We shall discuss conditioning of each problem in detail in the relevant chapter. Before closing

this section, however, let's mention several well-known examples of ill-conditioned problems.

An Ill-Conditioned Subtraction

Consider the subtraction: c = a ; b:

a = 12354101

b = 12345678

c = a ; b = 8423:

Now perturb a in the sixth place:

a^ = 12354001

c^ = c + c = a^ ; b = 8323:

Thus, a perturbation in the sixth digit in the input value caused a change in the second digit in

the answer. Note that the relative error in the data is

a ; a^ = :000008;

a

97

while the relative error in the computed result is

c ; c^ = :01187722:

c

An Ill-Conditioned Root-Finding Problem

Consider solving the simple quadratic equation:

f (x) = x ; 2x + 1:

2

The roots are x = 1; 1: Now perturb the coecient 2 by 0.00001. The computed roots of the

perturbed polynomial f^(x) = x2 ; 2:00001x + 1 are: x1 = 1:0032 and x2 = :9968: Relative errors

in x1 and x2 are .0032. The relative error in the data is 5 10;6.

The above example involved multiple roots. Multiple roots or roots close to each other

invariably make the root-nding problem ill-conditioned; however, the problem can be

ill-conditioned even when the roots are very well separated. Consider the following well-known

example by Wilkinson (see also Forsythe, Malcolm and Moler CMMC, pp. 18{19).

P (x) = (x ; 1)(x ; 2) (x ; 20)

= x ; 210x +

20 19

The zeros of P (x) are 1; 2; : : :; 20 and are distinct. Now perturb the coecient of x19 from ;210

to ;210 + 2;20, leaving other coecients unchanged. Wilkinson used a binary computer with

t = 30. Therefore, this change signied a change in the 30th signicant base 2 digit. The roots of

the perturbed polynomial, carefully computed by Wilkinson, were found to be (reproduced from

CMMC, p. 18):

1.00000 0000 10.09526 6145 0.64350 0904i

2.00000 0000 11.79363 3881 1.65232 9728i

3.00000 0000 13.99235 8137 2.51883 0070i

4.00000 0000 16.73073 7466 2.81262 4894i

4.99999 9928 19.50243 9400 1.94033 0347i

6.00000 6944

6.99969 7234

8.00726 7603

8.91725 0249

20.84690 8101

98

The table shows that certain zeros are more sensitive to the perturbation than are others.

The following analysis, due to Wilkinson (see also Forsythe, Malcolm and Moler CMMC, p. 19),

attempts to explain this phenomenon.

Let the perturbed polynomial be

P (x; ) = x ; x +

20 19

x

x i

=

measures the sensitivity of the root n = i; i = 1; 2; : : :; 20. To compute this number, dierentiate

the equation P (x; ) = 0 with respect to :

x ;P=

= P=x

= x

19

:

20 XX 20

(x ; j ) i

=1 i j=1

6 j

i=

x for i = 1; : : :; 10 are listed below. (For the complete list, see CMMC, p. 19.)

The values of =x i

= =

1 ;8:2 10; 18

11 ;4:6 10

7

2 8:2 10 11 ; 12 2:0 10

8

3 ;1:6 10; 6

13 ;6:1 10

8

4 2:2 10

3 ; 14 1:3 10

9

5 ;6:1 10; 1

15 ;2:1 10

9

6 5:8 10

1

16 2:4 10

9

7 ;2:5 10 3

17 ;1:9 10

9

8 6:0 10

4

18 1:0 10

9

9 ;8:3 10 5

19 ;3:1 10

8

10 7:6 10

6

20 4:3 10

7

99

Root-nding and Eigenvalue Computation

The above examples teach us a very useful lesson: it is not a good

idea to compute the eigenvalues of a matrix by explicitly

nding the coecients of the characteristic polynomial and

evaluating its zeros, since the round-o errors in computations will

invariably put some small perturbations in the computed coecients

of the characteristic polynomial, and these small perturbations in the

coecients may cause large changes in the zeros. The eigenvalues will

then be computed inaccurately.

The matrix 01 1

1

1 1

n

B

B

2 3

C

n C

H=B C

1 1 1 1

B

B . .

2

.

3 4

.. C

+1

@ .. . . . . . C

A

n n

1

1

+1 2

1

n;1

is called the Hilbert matrix after the celebrated mathematician David Hilbert. The linear system

problem, even with a Hilbert matrix of moderate order, is extremely ill-conditioned. For example,

take n = 5 and consider solving

Hx = b;

where b (2:2833; 1:4500; 1:0929; 0:8845; 0:7456). The exact solution is x = (1; 1; 1; 1; 1; )T : Now

perturb the (5; 1)th element of H in the fth place to obtain .20001. The computed solution

with this very slightly perturbed matrix is (0:9937; 1:2857; ;0:2855; 2:9997; 0:0001)T . Note that

Cond(H ) = 0(105).

For more examples of ill-conditioned linear system problems see Chapter 6.

Example 3.7.1

100

Consider the 10 10 matrix: 01 1 1

BB 1 1 0 CC

BB . .

CC

A=B B . . . C

. CC :

BB . .. 1 C

@ 0 A

1

The eigenvalues of A are all 1. Now perturb the (10,1) coecient of A by a small quantity =

10;10. Then the eigenvalues of the perturbed matrix computed using the software MATLAB to be

described in the next chapter (that uses a numerically eective eigenvalue-computation algorithm)

were found to be:

0

1:0184 + 0:0980i

0:9506 + 0:0876i

1:0764 + 0:0632i

0:9051 + 0:0350i

1:0999 + 0:00i

1:0764 ; 0:0632i

0:9051 ; 0:0350i

1:0184 ; 0:0980i

0:9506 ; 0:0876i

(Note the change in the eigenvalues.)

Example 3.7.2 The Wilkinson-Bidiagonal Matrix

Again, it should not be thought that an eigenvalue problem can be ill-conditioned only when

the eigenvalues are multiple or are close to each other. An eigenvalue problem with well-separated

eigenvalues can be very ill-conditioned too. Consider the 20 20 triangular matrix (known as the

Wilkinson-bidiagonal matrix):

0 20 20 1

B

B 19 20 0

CC

B

B ... ...

CC

A=B B CC :

B

B . . . 20 C CA

@ 0

1

The eigenvalues of A are 1; 2; : : :; 20. Now perturb the (20,1) entry of A by = 10;10). If the

eigenvalues of this slightly perturbed matrix are computed using a stable algorithm (such as the

101

QR iteration method to be described in Chapter 8), it will be seen that some of them will change

drastically; they will even become complex.

In this case also, certain eigenvalues are more ill-conditioned than others. To explain this,

Wilkinson computed the condition number of each of the eigenvalues. The condition number

of the eigenvalue i of A is dened to be (see Chapter 8):

Cond(i) = jy T1x j ;

i i

where yi and xi are, respectively, the normalized left and right eigenvectors of A corresponding

to the eigenvalue i (recall that x is a right eigenvector of A associated with an eigenvalue if

Ax = x, x 6= 0); similarly, y is a left eigenvector associated with if yT A = yT ).

In our case, the right eigenvector xr corresponding to r = r has components (see Wilkinson

AEP, p. 91): 20 ; r (20 ; r)(19 ; r)

1; ;20 ; 20 ; r

; : : :; (;20)20;r ; 0; : : :; 0 ;

(;20)2

while the components of yr are

( r ; 1)! ( r ; 1)( r ; 2) r ; 1

0; 0; : : :; 0; 20r;1 ; : : :; 202 ; 20 ; 1 :

These vectors are not quite normalized, but still, the reciprocal of their products gives us an estimate

of the condition numbers.

In fact, Kr , the condition number of the eigenvalue = r, is

Kr = y T1x = (20(;;1)

r 2019

r r r)!(r ; 1)!

The number Kr is large for all values or r. The smallest Ki for the Wilkinson matrix are K1 =

K20 4:31 107 and the largest ones are K11 = K10 3:98 1012.

Example 3.7.3 (Wilkinson AEP, p. 92)

0 n (n ; 1) (n ; 2) 3 2 11

BB (n ; 1) (n ; 1) (n ; 2) 3 2 1C CC

BB

BB 0 (n ; 2) (n ; 2) ... .. .. C

. . CC

A=B

B ... ... . . . . . . ... ... C CC

BB . . . . . . . 2 .. C.

BB .. CC

BB .. C

@ . 2 2 1C A

0 0 1 1

102

As n increases, the smallest eigenvalues become progressively ill-conditioned. For example, when

n = 12, the condition numbers of the rst few eigenvalues are of order unity while those of the last

three are of order 107.

While establishing the stability of an algorithm by backward error analysis, we sometimes get

much more than the above denition of stability calls for. For example, it can be shown that

when Gaussian elimination with partial and complete pivoting, (for discussions on partial and

complete pivoting, see Chapter 5) is applied to solve a nonsingular system Ax = b, the computed

solution x^ not only satises

(A + E )^x = b

with a small error matrix E , but also we have (A + E ) is nonsingular. The standard denition of

stability, of course, does not need that.

On the other hand, if Gaussian elimination without pivoting is applied to a symmetric positive

denite system, the computed solution x^ satises (A + E )^x = b with a small error matrix E , but

the standard backward error analysis does not show that (A + E ) is symmetric positive denite.

Thus, we may talk about two types of stable algorithms: one type giving not only a small error

matrix, but also the perturbed matrix A + E belonging to the same class as the matrix A itself,

and the other type just giving a small error matrix without any restriction on A + E .

To distinguish between these two types of stability, Bunch (1987) has recently introduced in

the literature the concept of strong stability. Following Bunch we dene:

Denition 3.8.1 An algorithm for solving the linear system problem Ax = b is strongly stable

for a class of matrices C if, for each A in C the computed solution is the exact solution of a nearby

problem, and the matrix (A + E ) also belongs to C .

Examples (Bunch).

1. The Gaussian elimination with pivoting is strongly stable on the class of nonsingular matrices.

(See Chapter 6 for a description of the Cholesky-algorithm.)

2. The Cholesky-algorithm for computing the factorization of a symmetric positive denite ma-

trix of the form, A = HH T , is strongly stable on the class of symmetric positive denite

matrices.

James R. Bunch is a professor of mathematics at the University of California at San Diego. He is well known for his

work on ecient factorization of symmetric matrices (popularly known as the Bunch-Kaufman and the Bunch-Parlett

factorization procedures), and his work on stability and conditioning.

103

3. The Gaussian elimination without pivoting is strongly stable on the class of nonsingular

diagonally dominant matrices.

With an analogy of this denition of strong stability, Bunch also introduced the concept of

weak stability that depends upon the conditioning of the problem.

Denition 3.8.2 An algorithm is weakly stable for a class of matrices C if for each well-

conditioned matrix in C the algorithm produces an acceptable accurate solution.

Thus, an algorithm for solving the linear system Ax = b is weakly stable for a class of matrices

C if for each well-conditioned matrix A in C and for each b, the computed solution x^ to Ax = b is

such that

kx ; x^k is small:

kxk

Bunch was motivated to introduce this denition to point out that the well-known (and fre-

quently used by engineers) Levinson Algorithm for solving linear systems involving TOEPLITZ

matrices (a matrix T = (tij ) is TOEPLITZ if the entries along each diagonal row are the same)

are weakly stable on the class of symmetric positive denite Toeplitz matrices. This very

important and remarkable result was proved by Cybenko (1980). The result was important be-

cause the signal processing community had been using the Levinson algorithm routinely for years,

without fully investigating the stability behavior of this important algorithm.

1. If an algorithm is strongly stable, it is necessarily stable.

2. Note that stability implies weak stability. Weak stability is good

enough for most users.

3. If a numerical analyst can prove that a certain algorithm is not weakly

stable, then it follows that the algorithm is not stable, because, \not

weakly stable" implies \not stable".

Mild Stability: We have dened an algorithm to be backward stable if the algorithm produces

a solution that is an exact solution of a nearby problem. But it might very well happen that an

algorithm produces a solution that is only close to the exact solution of a nearby problem.

George Cybenko is a professor of electrical engineering and computer science at Dartmouth College. He has made

substantial contributions in numerical linear algebra and signal processing.

104

How should we then call such an Algorithm?

Van Dooren, following de Jong (1977) has called such an algorithm mixed stable algorithm, and

Steward (IMC, 1973) has dened such an algorithm as just stable algorithm under the additional

restriction that the data of the nearby problem and the original data belong to the same class.

We believe that it is more appropriate to call such stability as mild stability. After all, such

an algorithm is stable in the mild sense.

We thus dene:

Denition 3.8.3 An algorithm is mildly stable if it produces a solution that is close to the exact

solution of a nearby problem.

Example 3.8.1 1. The QR-algorithm for the rank decient least squares problems is mildly

stable (see Lawson and Hanson SLP, p. 95 and Chapter 7 of this book).

2. The QR-algorithm for the full-rank underdetermined least squares problem is mildly stable

(see Lawson and Hanson SLP, p. 93 and Chapter 7 of this book).

In this chapter we have introduced two of the most important concepts in numerical linear algebra,

namely, the conditioning of the problem and stability of the algorithm, and have discussed how

they eect the accuracy of the solution.

1. Conditioning of the Problem: The conditioning of the problem is a property of the

problem. A problem is said to be ill-conditioned if a small change in the data causes a large

change in the solution, otherwise it is well-conditioned.

The conditioning of a problems is data dependent. A problem can be ill-conditioned with

respect to one set of data while it may be quite well-conditioned with respect to another set.

Ill-conditioning or well-conditioning of a matrix problem is generally measured by means of a

number called the condition number. The condition number for the linear system problem

Ax = b is kAk kA;1k.

The well-known examples of ill-conditioned problems are: the Wilkinson polynomial for the

root-nding problem, the Wilkinson bidiagonal matrix for the eigenvalue problem, the Hilbert

matrix for the algebraic linear system problem, etc.

Paul Van Dooren is a professor of electrical engineering at University of Illinois at Urbana-Champaign. He has

received several prestigious awards and fellowships including the Householder award and the Wilkinson fellowship

for his important contributions to numerical linear algebra which turned out to be extremely valuable for solving

computational problems arising in control and systems theory and signal processing.

105

2. Stability of an Algorithm: An algorithm is said to be a backward stable algorithm if

it computes the exact solution of a nearby problem. Some examples of stable algorithms are:

backward substitution and forward elimination for triangular systems, Gaussian elimination

with pivoting for linear systems, QR factorization using Householder and Givens transforma-

tions, QR iteration algorithm for eigenvalue computations, etc.

The Gaussian elimination algorithm without row changes is unstable for arbitrary matrices.

3. Eects of conditioning and stability on the accuracy of the solution: The condi-

tioning of the problem and the stability of the algorithm both have eects on accuracy of the

solution computed by the algorithm.

If a stable algorithm is applied to a well-conditioned problem, it should compute accurate

solution. On the other hand, if a stable algorithm is applied to an ill-conditioned problem,

there is no guarantee that the computed solution will be accurate. The denition of backward

stability does not imply that. However, if a stable algorithm is applied to an ill-conditioned

problem, it should not introduce more errors than what the data warrants.

For computer codes of standard matrix computations, see the book Handbook for Matrix

Computations by T. Coleman and Charles Van Loan, SIAM, 1988. The concepts of stability

and conditioning have been very thoroughly discussed in the book An Introduction to Matrix

Computations by G. W. Stewart, Academic Press, New York, 1973 (Chapter 2). Note that

Stewart's denition of backward stability is slightly dierent from the usual denition

of backward stability introduced by Wilkinson. We also strongly suggest the readers to read

an illuminating paper by James R. Bunch in the area: The Weak and Strong Stability of Algorithms

in Numerical Linear Algebra, Lin. Alg. Appl. (1987), volume 88-89, 49-66.

Wilkinson's AEP is a rich source of knowledge for results on backward stability for matrix

algorithms.

An important paper discussing notions and concepts of dierent types of stability in general is

a paper by L. S. de Jong (1977).

106

Exercises on Chapter 3

Note: Use MATLAB (see Chapter 4 and the Appendix), whenever appropriate.

1. (a) Show that the
oating point computations of the sum, product and division of two

numbers are backward stable.

(b) Are the
oating computations of the inner and outer product of two vectors backward

stable? Give reasons for your answer.

2. Find the growth factor of Gaussian elimination without pivoting for the following matrices.

0 1 ;1 ;1 1

BB 0 1 ;1 ;1 CC

:00001 1

! 1

!

1 B BB .. . . . . . . . . . .. CCC

; ; . . C

1 1 :00001 1 B BB .. . . C

@. . . . ;1 C

. A

0 0 1 1010

1 1

!

1 1

!

1 1

!

1:0001 1

!

; ; ; ;

1 :9 1 :99 1 :999 1 1

01 1 1 1 0 1 1 1 01 1 1

1

BB 1 :9 :81 CC ; BB :9 :9 C ; BB

:9 C

2 3

CC :

@ A@ A@ 1

2

1

3

1

4 A

1 1:9 3:61 1 1:9 3:61 1

3

1

4

1

5

4. Show that Cond(cA) = Cond(A) for all nonzero scalars c.

Show that if kI k 1, then Cond(A) 1.

5. Prove that

k (AB) ; ABk nCond (B) + O( ); 2

F F

kABkF

where A and B are matrices and B is nonsingular. (CondF (B )F = kB kF kB ; kF ) 1

6. Are the following
oating point computations backward stable? Give reasons for your answer

in each case.

(a)
(x(y + z ))

(b)
(x1 + x2 + + xn )

(c)
(x1x2 xn )

107

(d)
(xT y=c); where x and y are vectors and c is a scalar

p

(e)
x21 + x22 + + x2n

7. Find the growth factor of Gaussian elimination for each of the following matrices and hence

conclude that Gaussian elimination for linear systems with these matrices is backward stable.

0 10 1 1 1

B C

(a) B@ 1 10 1 C

A

1 1 10

04 0 21

B C

(b) B@ 0 4 0 C

A

2 0 5

0 10 1 1 1

B 1 15 5 C

(c) B C

@ A

1 5 14

8. Show that Gaussian elimination without pivoting for the matrix

0 10 1 1 1

BB C

@ 1 10 1 CA

1 1 10

is strongly stable.

9. Let H be an unreduced upper Hessenberg matrix. Find a diagonal matrix D such that

D;1HD is a normalized upper Hessenberg matrix (that is, all subdiagonal entries are 1).

Show that the transforming matrix D must be ill-conditioned, if one or several subdiagonal

entries of H are very small. Do a numerical example of order 5 to verify this.

10. Show that the roots of the following polynomials are ill-conditioned:

(a) x3 ; 3x2 + 3x + 1

(b) (x ; 1)3(x ; 2)

(c) (x ; 1)(x ; :99)(x ; 2)

11. Using the result of problem #5, show that the matrix-vector multiplication with an ill-

conditioned matrix may give rise to a large relative error in the computed result. Construct

your own 2 2 example to see this.

12. Write the following small SUBROUTINES for future use:

(1) MPRINT(A,n) to print a square matrix A or order n.

108

(2) TRANS(A,TRANS,n) to compute the transpose of a matrix.

(3) TRANS(A,n) to compute the transpose of a matrix where the transpose is overwritten

by A.

(4) MMULT(A,B,C,m,n,p) to multiply C = Amn Bnp .

(5) SDOT(n,x,y,answer) to compute the inner product of two n-vectors x and y in single

precision.

(6) SAXPY(n,A,x,y) to compute y ax + y in single precision, where a is a scalar and x

and y are vectors. (The symbol y ax + y means that the computed result of a times

x plus y will be stored in y.)

(7) IMAX(n,x,MAX) to nd

jxij = maxfjxj j : j = 1; : : :; ng:

(8) SWAP(n,x,y) to swap two vectors x and y .

(9) COPY(n,x,y) to copy a vector x to y .

(10) NORM2(n,x,norm) to nd the Euclidean length of a vector x.

X

n

(11) SASUM(n,x,sum) to nd sum jxij.

i=1

(12) NRMI(x,n) to compute the innity norms of an n-vector x.

(13) Rewrite the above routines in double precision.

(14) SNORM(m,n,A,LDA) to compute the 1-norm of a matrix Amn . LDA is the leading

dimension of the array A.

(15) Write subroutines to compute innity and Frobenius norms of a matrix.

(16) Write a subroutine to nd the largest element in magnitude in a column vector.

(17) Write a subroutine to nd the largest element in magnitude in a matrix.

(Note: Some of these subroutines are a part of BLAS (LINPACK). See also the book Hand-

book for Matrix Computations by T. Coleman and Charles Van Loan, SIAM, 1988.)

109

4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATHEMATICAL SOFT-

WARE

4.1 Denitions and Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 110

4.2 Flop-Count and Storage Considerations for Some Basic Algorithm : : : : : : : : : : 113

4.3 Some Existing High-Quality Mathematical Softwares for Linear Algebra Problems : 122

4.3.1 LINPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122

4.3.2 EISPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122

4.3.3 LAPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 123

4.3.4 NETLIB : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 124

4.3.5 NAG : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125

4.3.6 IMSL : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125

4.3.7 MATLAB : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125

4.3.8 MATLAB Codes and MATLAB Toolkit : : : : : : : : : : : : : : : : : : : : : 126

4.3.9 The ACM Library : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126

4.3.10 ITPACK (Iterative Software Package) : : : : : : : : : : : : : : : : : : : : : : 126

4.4 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126

4.5 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 127

CHAPTER 4

NUMERICALLY EFFECTIVE ALGORITHMS

AND MATHEMATICAL SOFTWARE

4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATH-

EMATICAL SOFTWARE

4.1 Denitions and Examples

Solving a problem on a computer involves the following major steps performed in sequence:

1. Making a mathematical model of the problem, that is, translating the problem into the

language of mathematics. For example, mathematical models of many engineering problems

are sets of ordinary and partial dierential equations.

2. Finding or developing constructive methods (theoretical numerical algorithms) for solv-

ing the mathematical model. This step usually consists of a literature search to nd what

methods are available for the problems.

3. Identifying the best method from a numerical point of view (the best one may be a

combination of several others). We call it the numerically eective method.

4. Finally, implementing on the computer the numerically eective method identied in

step 3. This amounts to writing and executing a reliable and ecient computer program

based on the identied numerically eective method, and may also require exploitation of the

target computer architecture.

The purpose of creating a mathematical software is to provide a scientist or engineer with a

piece of a computer program he can use with condence to solve a problem for which the software

was designed. Thus, a mathematical software should be of high quality.

Let's be specic about what we mean when we call a software a high quality mathematical

software. A high quality mathematical software should have the following features. It should be

1. Powerful and
exible | can be used to solve several dierent variations of the original

problem and the closely associated problems. For example, closely associated with the linear

system Ax = b problem are:

(a) Computing the inverse of A, i.e., nding an X such that AX = I . Though nding the

inverse of A and solving Ax = b are equivalent problems, solution of a linear system

using the inverse ofthe system matrix is not advisable. Computing the inverse explicitly

should be avoided, unless a specic application really calls for it.

(b) Finding the determinant and rank of A.

(c) Finding AX = B , where B is a matrix, etc.

110

Also, a matrix problem may have some special structures. It may be positive denite, banded,

Toeplitz, dense, sparse, etc. The software should state clearly what variations of the problem

it can handle and whether it is special-structure oriented.

2. Easy to read and modify | The software should be well documented. The documentation

should be clear and easy to read, even for a non-technical user, so that if some modications

are needed, they can be made easily. To quote from the cover page of Forsythe, Malcolm and

Moler (CMMC):

\: : : it is an order of magnitude easier to write two good subroutines than to de-

cide which one is best. In choosing among the various subroutines available for

a particular problem, we placed considerable emphasis on the clarity and style of

programming. If several subroutines have comparable accuracy, reliability, and

eciency, we have chosen the one that is the least dicult to read and use."

3. Portable | Should be able to run on dierent computers with few or no changes.

4. Robust | Should be able to deal with an unexpected situation during execution.

5. Based on a numerically eective algorithm | Should be based on an algorithm that

has attractive numerical properties.

We have used the expression \numerically eective" several times without qualication. This

is the most important component of a high quality mathematical software. We shall call a matrix

algorithm numerically eective if it is:

(a) General Purpose | The algorithm should work for a wide class of matrices.

(b) Reliable | The algorithm should give warning whenever it is on the verge of breakdown due

to excessive round-o errors or not being able to meet some specied criterion of convergence.

There are algorithms which produce completely wrong answers without giving warning at all.

Gaussian elimination without pivoting (for the linear system or equivalent problem) is

one such algorithm. It is not reliable.

(c) Stable | Total rounding errors of the algorithm should not exceed the errors that are

inherent in the original problem (see the earlier section on stability).

(d) Ecient | The eciency of an algorithm is measured by the amount of computer time con-

sumed in its implementation. Theoretically, the number of
oating-point operations needed

to implement the algorithm indicates its eciency.

111

Denition 4.1.1 A
oating-point operation, or
op, is the amount of computer time re-

quired to execute the Fortran statement

lations. Similarly, one division coupled with an addition or subtraction will be

counted as one op. This denition of a op has been used in the popular software package

LINPACK (this package is brie y described in section 4.2).

With the advent of supercomputing technology, there is a tendency

to count an addition or subtraction as a op as well. This denition

of a op has been adopted in the second edition of the book by

Golub and Van Loan (MC 1989). However, we have decided to stick

to the original LINPACK denition. Note that if an addition (or

subtraction) is counted as a op, then the \new op" has twice the

value of the \old op".

Denition 4.1.2 A matrix algorithm involving computation with matrices of order n will be

called an ecient algorithm if it takes no more than O(n )
ops. (The historical Cramer's

3

rule for solving a linear system is therefore not ecient, since O(n!)
ops are required for

its execution.) (See Chapter 6.)

One point is well worth mentioning here. An algorithm may be ecient, but still

n3

unstable. For example, Gaussian elimination without pivoting requires 3
ops for an n n

matrix. Therefore, while it is ecient, it is unreliable and unstable for an arbitrary matrix.

(e) Economic in the use of storage | Usually, about n2 storage locations are required

to store a dense matrix of order n. Therefore, if an algorithm requires storage of several

matrices during its execution, a large number of storage locations will be needed even when

n is moderate. Thus, it is important to give special attention to economy of storage while

designing an algorithm.

112

By carefully rearranging an algorithm, one can greatly reduce its storage requirement (ex-

amples of this will be presented later). In general, if a matrix generated during execution

of the algorithm is not needed for future use, it should be overwritten by another computed

element.

We will use the notation

ab

to denote that \b overwrites a". Similarly, if two computed quan-

tities a and b are interchanged, they will be written symbolically

a $ b:

Numerical linear algebra often deals with triangular matrices, which can be stored using only

n(n + 1) locations rather than n2 locations. This useful fact should be kept in mind while designing

2

an algorithm in numerical linear algebra, so that the extra available space can be used for something

else.

Sparse matrices, in general, have lots of zero entries. A convenient scheme for storing a sparse

matrix will be such that only the nonzero entries are stored. In the following, we illustrate the

op-count and storage scheme for some simple basic matrix computations with dense matrices.

multiplications and additions in a matrix algorithm are roughly the same.

Thus, a count of only the number of multiplications in a matrix algorithm

gives us an idea of the total op-count for that algorithm. Also, the counts

involving zero elements can be omitted.

Example 4.2.1

0 x 1Inner Product 0 y 1Computation

BB x CC

1

B

B

1

y C

C X n

Let x = BB C B C

B@ ... CCA and y = B . C be two n-vectors. Then the inner product z = x y = xi yi

2 2

T

B

@ .. CA i =1

xn yn

can be computed as (Algorithm 3.1.2):

113

For i = 1; 2; : : :; n do

z = z + xi yi

Just one
op is needed for each i. Then a total of n
ops is needed to execute the algorithm.

Example 4.2.2 Outer Product Computation

The outer product xy T is an n n matrix Z as shown below. The (i; j )th component of the

matrix is xi yj . Since there are n2 components and each component requires one multiplication,

the outer-product computation requires n2
ops and n2 storage locations. However, very

often one does not require the matrix from the outer-product explicitly.

10x 0x y x y x y 1

BB x

CC 1

BB x y x y x yn CC

1 1 1 2 1

Z = xy = B

T CC ( y y yn ) = BB .

BB ..

CA

2

B@ ..

nC

CC :

2 1 2 2 2

@.

1 2

A

xn xny xny xnyn 1 2

Let A = (aij ) be an n n matrix and let

0b 1

BB b 1

CC

b=BBB .. 2 CC

CA

@.

bn

be an n-vector. Then 0 a b +a b ++ a b 1

n n

BB 11 1 12 2

C 1

Ab = @ a b + a b + + a nbn C

21 1 22 2 A 2

an b + an b + + annbn

1 1 2 2

Flop-count. Each component of the vector Ab requires n multiplications and additions. Since

there are n components, the computation of Ab requires n
ops. 2

Let U = (uij ) and V = (vij ) be two upper triangular matrices of order n. Then the following

algorithm computes the product C = UV . The algorithm overwrites V with the product UV .

For i = 1; 2; : : :; n do

For j = i; i + 1; : : :; n do

X

j

vij cij uik vkj

k=i

114

An Explanation of the Above Pseudocode

Note that in the above pseudocode j represents the inner-loop and i represents the outer-loop.

For each value of i from 1 to n, j takes the values i through n.

Flop-Count.

1. Computing cij requires j ; i + 1 multiplications.

2. Since j runs from i to n and i runs from 1 to n, the total number of multiplications is:

X

n X

n X

n

(j ; i + 1) = (1 + 2 + + (n ; i + 1)

i=1 j =i i=1

Xn (n ; i + 1)(n ; i + 2)

= 2

i=1

n6 (for large n)

3

6

r + 1) .)

For large n, the product of two n n triangular matrices requires about n6 ops.

3

Let A be an m n matrix and B be a n p matrix. Then the following algorithm computes

the product C = AB .

For i = 1; 2; : : :; m do

For j = 1; 2; : : :; p do

Xn

cij = aik bkj .

k=1

115

Flop-count. There are n multiplications in computing each cij . Since j runs from 1 to p and

i runs from 1 to m, the total number of multiplication is mnp. Thus, for two square matrices A

and B , each of order n, this count is n . 3

Let A be m n and B be n p. Then computing C = AB requires

mnp ops. In particular, it take n3 ops to compute the product of

two n n square matrices.

The algorithm above for matrix-matrix multiplication for two n n matrices will obviously

require n2 storage locations for each matrix. However, it can be rewritten in such a way that there

will be a substantial savings in storage, as illustrated below.

The following algorithm overwrites B with the product AB , assuming that an additional column

has been annexed to B (alternatively one can have a work vector to hold values temporarily).

Example 4.2.6 Matrix-Matrix Product with Economy in Storage

For j = 1; 2; : : :; n do

X n

1. hi = aik bkj (i = 1; 2; : : :; n)

k=1

2.bij hi (i = 1; 2; : : :; n)

(h is a temporary work vector).

!

Example 4.2.7 Computation of I ; 2uuu

T

T u A.

!

I; 2 uu T

A;

uT u

where I is an m m identity matrix, u is an m-vector, and A is m n. The matrix I ; 2uuu

T

Tu !is

called a Householder matrix (see Chapter 5). Naively, one would form the matrix I ; 2uu

T

uT u

from the vector u, then form the matrix product explicitly with A. This will require O(n ) ops.

3

We show below that this matrix product can implicitly be performed with O(n2!)
ops. The key

observation here is that we do not need to form the matrix I ; 2uu explicitly.

T

uT u

116

The following algorithm computes the product. The algorithm overwrites A with the product.

Let

= uT2u :

Then !

I ; 2uuu

T

Tu A

becomes A ; uuT A:

Let 0u 1

BB u 1

CC

u=BBB .. 2 CC :

CA

@.

un

Then the (i; j )th entry of (A ; uuT A) is equal to aij ; (u a j + u a j + + un anj )ui. Thus, we

1 1 2 2

Algorithm 4.2.1 Computing (I ; 2uuu

T

T u )A

1. Compute = uT2u .

For j = 1; 2; : : :; n do

= u1a1j + u2a2j + + um amj

For i = 1; 2; : : :; m do

aij aij ; ui

Flop-Count.

1. There are (m + 1)
ops to compute (m
ops to compute the inner product and 1
op to

divide 2 by the inner product).

2. There are n 's, and each costs (m + 1)
ops. Thus, we need n(m + 1)
ops to compute

the 's.

3. There are mn aij to compute. Each aij costs just one
op, once the 's are computed.

117

We now summarize the above very important result (which will be used repeatedly in this book)

in the following:

!

Flop-count for the Product: I ; 2uuu

T

Tu A

product can be computed with only (m + 1) + 2mn + n ops. In

particular, if m = n, then it takes roughly 2n2 ops, compared to n3

ops if done naively.

A Numerical Example

Let 01 11

B 2 1 CC :

u = (1; 1; 1)T ; A = B

@ A

0 0

Then

= 23

j=1: (Compute the rst column of the product)

= u a +u a +u a = 1+2 = 3

1 11 2 21 3 31

= 23 3 = 2

a a ; u = 1 ; 2 = ;1

11 11 1

a a ; u = 2 ; 2 1 ; 2 ; 2 = ;0

21 21 2

a a ; u = 0 ; 2 1 = ;2

31 31 3

= u a +u a +u a = 1+1 = 2

1 12 2 22 3 32

= 2 23 = 43

a a ; u = 1 ; 4 1 = ;1

12 12 1

3 3

118

a22 a ; u = 1 ; 43 1 = ;31

22 2

a 32 a ; u = 0 ; 4 = ;4

32 3

3 3

Thus,

!

I ; 2uuu

T

A Tu A

0 ;1 ;1 1

B0 ;1 C

3

=B

@ 3 A

C

;2 ;4

3

119

Example 4.2.8 Flop-count for Algorithm 3.1.3 (Back Substitution Process)

>From the pseudocodes of this algorithm, we see that it takes one
op to compute yn , two
ops

to compute yn;1 , and so on. Thus to compute y1 through yn , we need (1 + 2 + 3 + + n) =

n(n + 1) n2
ops.

2 2

It requires roughly n2 ops to solve an upper triangular system using

2

Example 4.2.9 Flop-count and Storage Considerations for Algorithm 3.1.4 (The In-

verse of an Upper Triangular Matrix

Let's state the algorithm once more here

For k = n; n ; 1; : : :; 2; 1

skk = t1

kk

For i = k ; 10; k ; 2; : : :; 11

X n

s =;1 @

ik tii t s Aij jk

j =i+1

Flop-count.

k=1 1
op

k=2 3
ops

k=3 6
ops

..

.

k=n n(n + 1)
ops

2

X

n r(r + 1)

(approximately ):Total
ops 1 + 3 + 6 + + n(n2+ 1) = 2

r=1

X

n r2 X

n r n: 3

= +

2 r=1 2 6

r=1

120

Flop-count for Computing the Inverse of an Upper

Triangular Matrix

It requires about n6
ops to compute the inverse of an upper trian-

3

gular matrix.

Storage Considerations. Since the inverse of an upper triangular matrix is an upper triangular

matrix, and it is clear from the algorithm that we can overwrite tik by sik , the algorithm can be

rewritten so that it overwrites T with the inverse S . Thus we can rewrite the algorithm as

with Economy in Storage

For k = n; n ; 1; : : :; 2; 1

tkk skk = t1

kk

For i = k ; 1; k ; 2; : : :; 1

Xk

t s =;1

ik ik tii j t s ij jk

i

= +1

Example 4.2.10 Flop-count and Storage Considerations for Algorithm 3.1.5 (Gaussian

Elimination)

It will be shown in Chapter 5 that the Gaussian elimination algorithm takes about n

3

3

ops. The algorithm can overwrite A with the upper triangular matrix A(n;1); in fact, it can

overwrite A with each A(k) . The multipliers mik can be stored in the lower half of A as they are

computed. Each b(k) can overwrite the vector b.

121

Gaussian Elimination Algorithm with Economy in Storage

For k = 1; 2; : : :; n ; 1 do

For i = k + 1; : : :; n do

aik mik = ; aaik

kk

For j = k + 1; : : :; n do

aij aij + mik akj

bi bi + mikbk

Problems

Several high quality mathematical software packages for various types of matrix computations are

in existence. These are LINPACK, EISPACK, MATLAB, NETLIB, IMSL, NAG, and the most

recently released LAPACK.

4.3.1 LINPACK

LINPACK is \a collection of Fortran subroutines which analyze and solve various systems of si-

multaneous linear algebraic equations. The subroutines are designed to be completely machine

independent, fully portable, and to run at near optimum eciency in most operating environ-

ments." (Quotation from LINPACK Users' Guide.)

Though primarily intended for linear systems, the package also contains routines for the singu-

lar value decomposition (SVD) and problems associated with linear systems such as computing

the inverse, the determinant, and the linear least square problem. Most of the routines are

for square matrices, but some handle rectangular coecient matrices associated with overdeter-

mined or underdetermined problems.

The routines are meant for small and dense problems of order less than a few hundred and

band matrices of order less than several thousand. There are no routines for iterative methods.

4.3.2 EISPACK

EISPACK is an eigensystem package. The package is primarily designed to compute the eigenvalues

and eigenvectors of a matrix; however, it contains routines for the generalized eigenvalue problem

of the form Ax = Bx and for the singular value decomposition.

122

The eigenvalues of an arbitrary matrix A are computed in several sequential phases. First,

the matrix A is balanced. If it is nonsymmetric the balanced matrix is then reduced to an upper

Hessenberg by matrix similarities (if it is symmetric, it is reduced to symmetric tridiagonal). Fi-

nally the eigenvalues of the transformed upper Hessenberg or the symmetric tridiagonal matrix are

computed using the implicit QR iterations or the Sturm-sequence method. There are EISPACK

routines to perform all these tasks.

4.3.3 LAPACK

The building blocks of numerical linear algebra algorithms have three levels of BLAS (Basic Linear

Algebra Subroutines). They are:

Level 1 BLAS: These are for vector-vector operations. A typical Level 1 BLAS is of the form

y x + y, where x and y are vectors and is a scalar.

Level 2 BLAS: These are for matrix-vector operations. A typical Level 2 BLAS is of the form

y Ax + y .

Level 3 BLAS: These are for matrix-matrix operations. A typical Level 3 BLAS is of the form

C AB + C .

Level 1 BLAS is used in LINPACK. Unfortunately, the algorithms composed of Level 1 BLAS

are not suitable for achieving high eciency on most supercomputers of today, such as on CRAY

computers.

While Level 2 BLAS can give good speed (sometimes almost peak speed) on many vector

computers such as CRAY X-MP or CRAY Y-MP, those are not suitable for high eciency on other

modern supercomputers (e.g., CRAY 2).

The Level 3 BLAS are ideal for most of today's supercomputers. They can perform O(n3)

oating-point operations on O(n2 ) data.

Therefore, during the last several years, an intensive attempt was made by numerical linear

algebraists to restructure the traditional BLAS-1 based algorithms into algorithms rich in BLAS-2

and BLAS-3 operations. As a result, there now exist algorithms of these types, called blocked

algorithms, for most matrix computations. These algorithms have been implemented in a software

package called LAPACK.

\LAPACK is a transportable library of Fortran 77 subroutines for solving the most common

problems in numerical linear algebra: systems of linear equations, linear least squares problems,

eigenvalue problems, and singular value problems. It has been designed to be ecient on a wide

range of modern high-performance computers.

123

LAPACK is designed to supersede LINPACK and EISPACK, principally by restructuring the

software to achieve much greater eciency on vector processors, high-performance \super-scalar"

workstations, and shared memory multiprocessors. LAPACK also adds extra functionality, uses

some new or improved algorithms, and integrates the two sets of algorithms into a unied package.

The LAPACK Users' Guide gives an informal introduction to the design of the algorithms and

software, summarizes the contents of the package, describes conventions used in the software and

its documentation, and includes complete specications for calling the routines." (Quotations from

the cover page of LAPACK Users' Guide.)

4.3.4 NETLIB

Netlib stands for network library. LINPACK, EISPACK, and LAPACK subroutines are available

electronically from this library, along with many other types of softwares for matrix computations.

A user can obtain software from these packages by sending electronic mail to

netlib@netlib.ornl.gov

ftp research.alt.com

ftp netlib2.cs.utk.edu

send index

Information on the subroutines available in a given package can be obtained by sending e-mail:

send index from {library}

send index from LINPACK

send {routines} from {library}

Thus, to get a subroutine called SGECO from LINPACK, send the message:

send SGECO from LINPACK

124

(This message will send you SGECO and other routines that call SGECO.) Xnetlib, which uses an

X Window interface for direct downloading, is also available (and more convenient) once installed.

Further information about Netlib can be obtained by anonymous FTP to either of the following

sites:

netlib2.cs.utk.edu

research.att.com

4.3.5 NAG

NAG stands for Numerical Algorithm Group. This group has developed a large software library

(also called NAG) containing routines for most computational problems including numerical linear

algebra problems, numerical dierential equations problems (both ordinary and partial), optimiza-

tion problems, integral equations problems, statistical problems, etc.

4.3.6 IMSL

IMSL stands for International Mathematical and Statistical Libraries. As the title suggests, this

library contains routines for almost all mathematical and statistical computations.

4.3.7 MATLAB

The name MATLAB stands for MATrix LABoratory. It is an interactive computing system

designed for easy computations of various matrix-based scientic and engineering problems. MAT-

LAB provides easy access to matrix software developed by the LINPACK and EISPACK software

projects.

MATLAB can be used to solve a linear system and associated problems (such as inverting a

matrix or computing the rank and determination of a matrix), to compute the eigenvalues and

eigenvectors of a matrix, to nd the singular value decomposition of a matrix, to compute

the zeros of a polynomial, to compute generalized eigenvalues and eigenvectors, etc. MATLAB

is an extremely useful and valuable package for testing algorithms for small problems and for use

in the classroom. It has indeed become an indispensable tool for teaching applied and numerical

linear algebra in the classroom. A remarkable feature of MATLAB is its graphic capabilities (see

more about MATLAB in the appendix).

There is a student edition of MATLAB, published by Prentice Hall,1992.It is designed for easy

use in the classroom.

125

4.3.8 MATLAB Codes and MATLAB Toolkit

MATLAB codes for selected algorithms described in this book are provided for beginning students

in the APPENDIX.

Furthermore, an interactive MATLAB Toolkit called MATCOM implementing all the major

algorithms (to be taught in the rst course) will be provided along with the book, so that students

can compare dierent algorithms for the same problem with respect to numerical eciency, stability,

accuracy, etc.

The library provided by the Association for Computing Machinery contains routines for basic

matrix-vector operations, linear systems and associated problems, nonlinear systems, zeros of poly-

nomials, etc. The journal TOMS (ACM Transactions on Mathematical Software) publishes these

algorithms.

The package contains routines for solving iteratively mainly linear systems problems in the large

and sparse cases.

The purpose of this chapter has been to introduce concepts of numerically eective algorithms and

the associated high quality mathematical software.

1. Writing codes for a given algorithm is rather a trivial task. However, all softwares are not

high quality software.

We have dened a high quality mathematical software as one which is (1) powerful and

exible, (2) easy to read and modify, (3) portable, (4) robust, and more importantly (5) based

on a numerically eective algorithm.

2. Like softwares, there may exist many algorithms for a given problem. However, not all

algorithms are numerically eective. We have dened a numerically eective algorithm

as one which is (1) general purpose, (2) reliable, (3) stable, and (4) ecient in terms of both

time and storage.

3. The eciency of a matrix algorithm is measured by computer time consumed by the algorithm.

A theoretical measure is the number of ops required to execute the algorithm. Roughly, one

126

multiplication (or a division) together with one addition (or a subtraction) has been dened

to be one op.

A matrix algorithm involving matrices of order n requiring no more than O(n3) ops has

been dened to be an ecient algorithm; stability of an algorithm was dened in Chapter

2.

An important point has been made: An algorithm may be ecient without being

stable. Thus, an ecient algorithm may not necessarily be a numerically eective

algorithm. There are algorithms which are fast but not stable.

4. Several examples (Section 4.2) have been provided to show how certain basic matrix opera-

tions can be reorganized rather easily to make them both storage and time ecient, without

implementing them naively. Example 4.2.7 is the most important one in this context.

Here we have shown how to Torganize the computation of the product of an n n matrix A

with the matrix H = I ; 2uuuT u , known as the Householder matrix, so that the product can

be computed with O(n ) ops rather than the O(n3) ops that would be needed if computed

2

naively, ignoring the structure of H . This computation forms a basis for many other matrix

computations described later in the book.

5. A statement for each of the several high quality mathematical software package such as

LINPACK, EISPACK, LAPACK, MATLAB, IMSL, NAG, etc., has been provided in the

nal section.

A clear and nice description of how to develop high quality mathematical software packages is

included in the book Matrix Computations and Mathematical Software by John Rice,

McGraw-Hill Book Company, 1981. This book contains a chapter (Chapter 12) on software projects.

The students (and the instructors) interested in comparative studies of various softwares may nd

them interesting. An excellent paper by Demmel (1984) on the reliability of numerical software is

worth reading.

Another recent book in the area is the book Handbook of Matrix Computations by

T. Coleman and C. Van Loan, SIAM, 1988. These two books are a must for readers interested in

the software-development for matrix problems.

The books by Forsythe, Malcom and Moler (CMMC) and by Hager (ANAL) contain some useful

subroutines for matrix computations. See also the books by Johnston, and by Kahaner, Moler and

Nash referenced in Chapter 2.

127

Stewart's book (IMC) is a valuable source for learning how to organize and develop algorithms

for basic matrix operations in time-ecient and storage-economic ways.

Each software package has its own users' manual that describes in detail the functions of the

subroutines, and how to use them, etc. We now list the most important ones.

1. The LINPACK Users' Guide by J. Dongarra, J. Bunch, C. Moler and G. W. Stewart,

SIAM, Philadelphia, PA, 1979.

2. The EISPACK Users' Guide can be obtained from either NESC or IMSL. Matrix Eigen-

system Routines EISPACK Guide by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S.

Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler, has been published by Springer-Verlag,

Berlin, 1976, as volume 6 of Lecture Notes in Computer Science. A later version, prepared by

Garbow, Boyle, Dongarra and Moler in 1977, is also available as a Springer-Verlag publication.

3. LAPACK Users' Guide, prepared by Chris Bischof, James Demmel, Jack Dongarra, Jerry

D. Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen, is available from SIAM.

SIAM's address and telephone number are: SIAM (Society for Industrial and Applied Math-

ematics), 3600 University City Science Center, Philadelphia, PA 19114-2688; Tel: (215) 382-

9800.

4. The NAG Library and the associated users' manual can be obtained from:

Numerical Algorithms Group, Inc.

1400 Opus Place, Suite 200

Downers Grove, IL 60151-5702

5. IMSL: The IMSL software library and documentation are available from:

IMSL, Houston, Texas

6. MATLAB: The MATLAB software and Users' Guide are available from:

The MathWorks, Inc.

Cochituate Place

24 Prime Park Way

Natick, MA 01760-1520

TEL: (508) 653-1415; FAX: (508) 653-2997

e-mail: info@mathworks.com

The student edition of MATLAB has been published by Prentice Hall, Englewood Clis, NJ

07632. A 5 14 disk is included with the book for MS-DOS personal computers.

128

For more information on accessing mathematical software electronically, the paper \Distribution

of Mathematical Software via Electronic Mail" by J. Dongarra and E. Grosse, Communications of

the ACM, 30(5) (1987), 403-407, is worth reading.

Finally, a nice survey of the blocked algorithms has been given by James Demmel (1993).

129

Exercises on Chapter 4

1. Develop an algorithm to compute the product C = AB in each of the following cases. Your

algorithm should take advantage of the special structure of the matrices in each case. Give

op-count and storage requirement in each case.

(a) A and B are both lower triangular matrices.

(b) A is arbitrary and B is lower triangular.

(c) A and B are both tridiagonal.

(d) A is arbitrary and B is upper Hessenberg.

(e) A is upper Hessenberg and B is tridiagonal.

(f) A is upper Hessenberg and B is upper triangular.

2. A square matrix A = (aij ) is said to be a band matrix of bandwidth 2k + 1 if

aij = 0 whenever ji ; j j > k:

Develop an algorithm to compute the product C = AB , where A is arbitrary and B is a band

matrix of bandwidth 2, taking advantage of the structure of the matrix B . Overwrite A with

AB and give
op-count.

3. Using the ideas in Algorithm 4.2.1, develop an algorithm to compute the product A(I + xy T ),

where A is an n n matrix and x and y are n-vectors. Your algorithm should require roughly

2n2
ops.

4. Rewrite the algorithm of the problem #3 in the special cases when the matrix I + xy T is

(a) an elementary matrix: I + meTk , m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T , and eTk is the kth

row of I .

(b) a Householder matrix: I ; 2uu

T

T , where u is an n-vector.

uu

5. Let A and B be two symmetric matrices of the same order. Develop an algorithm to compute

C = A + B , taking advantage of symmetry for each matrix. Your algorithm should overwrite

B with C . What is the
op-count?

6. Let A = (aij ) be an unreduced lower Hessenberg matrix of order n. Then, given the rst row

r1, it can be shown (Datta and Datta [1976]) that the successive rows r2 through rn of Ak (k

130

is a positive integer n) can be computed recursively as follows:

X

i;1

(riBi ; aij rj )

j =1

ri =

+1

ai;i+1 ; i = 1; 2; : : :; n ; 1; where Bi = A ; aii I:

Develop an algorithm to implement this. Give
op-count for the algorithm.

7. Let ar and br denote, respectively, the rth columns of the matrices A and B . Then develop

an algorithm to compute the product AB from the formula

X

n

AB = ai bTi :

i=1

Give
op-count and storage requirement of the algorithm.

8. Consider the matrix 0 12 11 10 3 2 1 1

BB 11 11 10 3 2 1 CC

BB .. .. C

C

BB 0 10 10 . . . . .C C

B .. . . . . . . . . . ... ... C

A=B BB . . CC

BB .. . . . . . 2 .. C

. . CC

BB .. C

@. 2 2 1C A

0 0 1 1

Find the eigenvalues of this matrix. Use MATLAB.

Now perturb the (1,12) element to 10;9 and compute the eigenvalues of this perturbed matrix.

What conclusion do you make about the conditioning of the eigenvalues?

131

MATLAB AND MATCOM PROGRAMS AND

PROBLEMS ON CHAPTER 4

1. Using the MATLAB function `rand', create a 5 5 random matrix and then print out the

following outputs:

A(2,:), A(:,1), A (:,5),

A (1, 1: 2 : 5), A([1, 5]), A (4: -1: 1, 5: -1: 1).

2. Using the function `for', write a MATLAB program to nd the inner product and outer

product of two n-vectors u and v.

[s] = inpro(u,v)

[A] = outpro(u,v)

Test your program by creating two dierent vectors u and v using rand (4,1).

3. Learn how to use the following MATLAB commands to create special matrices:

compan companion matrix

diag diagonal matrix

ones constant matrix

zeros zero matrix

rand random matrix

tri triangular part

hankel hankel matrix

toeplitz Toeplitz matrix

hilb Hilbert matrix

triu upper triangular

vander Vandermonde matrix

4. Write MATLAB programs to create the following well-known matrices:

(a) [A] = wilk(n) to create the Wilkinson bidiagonal matrix A = (aij ) of order n:

aii = n ; i + 1; i = 1; 2; ; 20

132

ai; ;i = n; i = 2; 3; ; n

1

aij = 0; otherwise :

(b) [A] = pie(n) to create the Pie matrix A = (aij ) of order n:

aij = ; is a parameter near 1 or n ; 1:

aii = 1 for i 6= j:

5. Using \help" commands for \
ops", \clock", \etime", etc., learn how to measure
op-count

and timing for an algorithm.

6. Using MATLAB functions `for', `size', `zero', write a MATLAB program to nd the product

of two upper triangular matrices A and B of order m n and n p, respectively. Test your

program using

A = triu(rand (4,3)),

B = triu(rand (3,3)).

7. Run the MATLAB program housmul(A; u) from MATCOM by creating a random matrix A

of order 6 3 and a random vector u with six elements. Print the output and the number of

ops and elapsed time.

8. Modify the MATLAB program housmul to housxy(A; x; y ) to compute the product (I +

xy T )A.

Test your program by creating a 15 15 random matrix A and the vectors x and y of

appropriate dimensions. Print the product and the number of
ops and elapsed time.

133

5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LINEAR ALGEBRA

AND THEIR APPLICATIONS

5.1 A Computational Methodology in Numerical Linear Algebra : : : : : : : : : : : : : : 135

5.2 Elementary Matrices and LU Factorization : : : : : : : : : : : : : : : : : : : : : : : 135

5.2.1 Gaussian Elimination without Pivoting : : : : : : : : : : : : : : : : : : : : : 136

5.2.2 Gaussian Elimination with Partial Pivoting : : : : : : : : : : : : : : : : : : : 147

5.2.3 Gaussian Elimination with Complete Pivoting : : : : : : : : : : : : : : : : : : 155

5.3 Stability of Gaussian Elimination : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 160

5.4 Householder Transformations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 163

5.4.1 Householder Matrices and QR Factorization : : : : : : : : : : : : : : : : : : : 167

5.4.2 Householder QR Factorization of a Non-Square Matrix : : : : : : : : : : : : : 173

5.4.3 Householder Matrices and Reduction to Hessenberg Form : : : : : : : : : : : 174

5.5 Givens Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 180

5.5.1 Givens Rotations and QR Factorization : : : : : : : : : : : : : : : : : : : : : 186

5.5.2 Uniqueness in QR Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : 188

5.5.3 Givens Rotations and Reduction to Hessenberg Form : : : : : : : : : : : : : : 191

5.5.4 Uniqueness in Hessenberg Reduction : : : : : : : : : : : : : : : : : : : : : : : 193

5.6 Orthonormal Bases and Orthogonal Projections : : : : : : : : : : : : : : : : : : : : : 194

5.7 QR Factorization with Column Pivoting : : : : : : : : : : : : : : : : : : : : : : : : : 198

5.8 Modifying a QR Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 203

5.9 Summary and Table of Comparisons : : : : : : : : : : : : : : : : : : : : : : : : : : : 205

5.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 209

CHAPTER 5

SOME USEFUL TRANSFORMATIONS IN

NUMERICAL LINEAR ALGEBRA

AND THEIR APPLICATIONS

5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LIN-

EAR ALGEBRA AND THEIR APPLICATIONS

Objectives

The major objective of this chapter is to introduce fundamental tools such as elementary,

Householder, and Givens matrices and their applications. Here are some of the highlights of the

chapter.

Various LU-type matrix factorizations: LU factorization using Gaussian elimination

without pivoting (Section 5.2.1), MA = U factorization using Gaussian elimination with

partial pivoting (Section 5.2.2), and MAQ = U factorization using Gaussian elimination

with complete pivoting (Section 5.2.3).

QR factorization using Householder and Givens matrices (Section 5.4.1 and Section 5.5.1).

Reduction to Hessenberg form by orthogonal similarity using Householder and Givens

matrices (Sections 5.4.3 and 5.5.3).

Computations of orthonormal bases and orthogonal projections using QR factoriza-

tions (Section 5.6).

Background Material Needed for this Chapter

The following background material and tools developed in earlier chapters will be needed for

comprehension of this chapter.

1. Subspace and basis (Section 1.2.1)

2. Rank properties (Section 1.3.2)

3. Orthogonality and projections: Orthonormal basis and orthogonal projections (Sections

1.3.5 and 1.3.6)

4. Special matrices: Triangular, permutation, Hessenberg, orthogonal (Section 1.4)

5. Basic Gaussian elimination (Algorithm 3.1.5)

6. Stability concepts of algorithms (Section 3.2)

134

5.1 A Computational Methodology in Numerical Linear Algebra

Most computational algorithms to be presented in this book have a common basic structure that

can be described as follows:

1. The problem is rst transformed to a \reduced" problem.

2. The reduced problem is then solved exploiting the special structure exhibited by the problem.

3. Finally, the solution of the original problem is recovered from the solution of the reduced

problem.

The reduced problem typically involves a \condensed" form or forms of the matrix A, such

as triangular, Hessenberg (almost triangular), tridiagonal, Real Schur Form (quasi-triangular), or

bidiagonal. It is the structures of these condensed forms which are exploited in the solution of the

reduced problem. For example, the solution of the linear system Ax = b is usually obtained rst

by triangularizing the matrix A, and then solving an equivalent triangular system. In eigenvalue

computations, the matrix A is transformed to a Hessenberg form before applying the QR iterations.

To compute the singular value decomposition, the matrix A is rst transformed to a bidiagonal

matrix, and then singular values of the bidiagonal matrix are computed. These condensed forms

are normally achieved through a series of transformations known as elementary, Householder

or Givens transformations. We will study these transformation here and show how they can be

applied to achieve various condensed forms.

In this section we show how to triangularize a matrix A using the classical elimination scheme

known as the Gaussian elimination scheme. The tools of Gaussian elimination are elementary

matrices.

Denition 5.2.1 An elementary lower triangular matrix of order n is a matrix of the form

01 0 0 0 01

B

B 0 1 0 0 0C

C

B

B CC

B

B 0 0 1 0 0C

B

B . . ... ... ... .. CC

B .

. .

. .C CC

E=B B .. .. . .

B . . 0 1 . . .C

. C

B

B .

. . . . 0 ... C CC

B 0 . m k ;k

B

B .

..

+1

. .. .. 0 C

. CA

@0 0

0 0 0 mn;k 0 1

135

Thus, it is an identity matrix except possibly for a few nonzero elements below the diagonal of a

single column. If the nonzero elements lie in the kth column, then E has the form:

E = I + meTk ;

where I is the identity matrix of order n, m = (0; 0; : : :; 0; mk +1 ;k; : : :; mn;k)T , and ek is the kth

unit-vector.

Elementary matrices can be very conveniently used to create zeros in a vector, as shown in the

following lemma.

Lemma 5.2.1 Given 0a 1

BB a 1

CC

a=BBB .. 2 CC ;

CA a 6= 0;

@.

1

an

there is an elementary matrix E such that Ea is a multiple of e . 1

Proof. Dene 0 1 0 0 01

BB ; a 1 0 0 CC

B a CC

2

E=B BB .. 1

... .. C :

@ .a .C A

; an 0 0 1

Then E is an elementary lower triangular matrix and is such that

1

0a 1

BB 0 CC 1

BB CC

Ea = B BB 0. CCC

B@ .. CA

0

1

1

We described the basic Gaussian elimination scheme (Gaussian elimination without pivoting) in

Chapter 3 (Algorithm 3.1.5). We will see in this section that this process yields an LU factorization

of A, A = LU , whenever the process can be carried to completion. The key observation is that

the matrix A k is a result of premultiplication of A k; by a suitable elementary lower

( ) ( 1)

triangular matrix.

Set A = A(0).

136

Step 1. Find an elementary matrix E such that A = E A has zeros below the (1,1) entry

1

(1)

1

(1)

0a a a n 1

BB 0 a 11 12

a n C CC

1

A =B

(1) (1)

BB .. ..

(1) 22

. . . ... CCA :

2

@ . .

0 a(1)n2 ann (1)

0a 1 0a 1

BB a 11

CC BB 0 11

CC

EBBB .. 21 CC = BB .

CA B@ ..

CC :

CA

@ .

1

0 an 1

Then A(1) = E1A will have the above form and is the same as the matrix A(1) obtained at the end

of step 1 of Algorithm 3.1.5.

Record the multipliers: m21; m31; : : :; mn1; mi1 = ; aai1 ; i = 2; : : :n.

11

Step 2. Find an elementary matrix E such that A = E A has zeros below the (2,2)

2

(2)

2

(1)

First, nd an elementary matrix Eb2 of order (n ; 1) such that

0a (1) 1 0a (1) 1

BB a 22

CC BB 0 22

CC

B CC BB CC

(1)

bE BBB ...

32

CC = BB 0 CC :

2

BB .. CC BB .. CC

@ . A @ . A

an

(1)

2 0

Record the multipliers: m ; : : :; mn ; mi = ; ai ; i = 3; : : :; n. Then dene

(1)

2

32

a 2 2 (1)

01 0 01 0 1

22

B

B0 C

CC BB 1 0 CC

E =BB

B . C = B Eb CA :

@ ..

2

E^ C A @0 2

2

0

A(2) = E2A(1) will then have zeros below the (2,2) entry in the second column.

0a a a n 1

BB 0 a 11 12

a n C CC

1

BB

(1) (1)

A =B

22

a n C

C2

BB 0. 0. a C:

(2) (2) (2)

. . . ... C

33 3

B@ .. .. CA

0 0 an (2)

3 ann (2)

137

Note that premultiplication of A(1) by E2 does not destroy zeros already created in A(1). This

matrix A(2) is the same as the matrix A(2) of Algorithm 3.1.5.

Step k. In general, at the kth step, an elementary matrix Ek is found such that A k = ( )

Ek A k; has zeros below the (k; k) entry in the kth column. Ek is computed in two successive

( 1)

0 1

0 a k; 1 B akkk; C ( 1)

( 1)

BB ak;kk; CC BB 0 CC

B C

Ebk BBB k .. ;k CCC = BB 0 CC ;

( 1)

+1

@ . A BB ... CC

a k; @( A 1)

nk 0

and then Ek is dened as 0I 1

0

BB k; 1

CC

Ek = BBB CC :

@ Ebk CA

0

Here Ik;1 is the matrix of the rst (k ; 1) rows and columns of the n n identity matrix I . The

matrix A(k) = Ek A(k;1) is the same as the matrix A(k) of Algorithm 3.1.5.

Record the multipliers:

aik ( k;1)

mk ;k ; : : :; mik; mi;k = ; k; ; i = k + 1; : : :; n:

+1

akk ( 1)

Step n-1. At the end of the (n ; 1)th step, the matrix A n; is upper triangular and the ( 1)

0a a a n 1

B

B 0 a

11

12

a n

1

CC

B CC

(1) (1)

B

B0 0 a

22

an

2

CC

=B

(2) (2)

A n;1)

(

B

B

33

... ..

3

CC :

B 0 . CC

B

B . ... .. CA

@ .. .

0 0 0 0 annn; ( 1)

Obtaining L and U .

A n; = En; A n; = En; En; A n; = = En; En; En; E E A:

( 1)

1

( 2)

1 2

( 3)

1 2 3 2 1

Set

U = A n; ;

( 1)

L = En; En; E E :

1 1 2 2 1

138

Then from above we have

U = L A: 1

Since each Ek is a unit lower triangular matrix (a lower triangular matrix having 1's along the

diagonal), so is the matrix L1 and, therefore, L;1 1 exists. (Note that the product of two triangular

matrices of one type is a triangular matrix of the same type.)

Set L = L;1 1 . Then the equation U = L1A becomes

A = LU:

This factorization of A is known as LU factorization.

Denition 5.2.3 The entries a ; a ; : : :; annn; are called pivots, and the above process of ob-

11

(1)

22

( 1)

commonly known as Gaussian elimination without pivoting.

An Explicit Expression for L

Since L = L;1 1 = E1;1E2;1 En;;11 and Ei;1 = I ; meTi , where

m = (0; 0; : : :; 0; mi ;i; : : :; mn;i);

+1

B

B ;m 1 0 0C

C

B

B 21

.

CC

B

L = B ;m ;m 1 . . 0C CC :

B

B . . .

31

.

32

@ .. .. .. .. 0C A

;mn ;mn ;mn;n; 11 2 1

Existence and Uniqueness of LU Factorization

It is very important to note that for an LU factorization to exist, the pivots must be dierent

!

0 1

from zero. Thus, LU factorization may not exist even for a very simple matrix. Take A = .

1 0

It can be shown (exercise #3) that if Ar is the submatrix of A consisting of the rst r rows and

columns, then

(r ;1)

22 arr

det Ar = a11a(1) :

Karl Friedrich Gauss (1777-1855) was a German mathematician and astronomer, noted for development of many

classical mathematical theories, and for his calculation of the orbits of the asteroids Ceres and Pallas. Gauss is still

regarded as one of the greatest mathematicians the world has ever produced.

139

Thus, if det Ar ; r = 1; 2; : : :; n is nonzero, then an LU factorization always exists. Indeed, the LU

factorization in this case is unique, for if

A=L U =L U ;1 1 2 2

det A = det(L1U1 ) = det L1 det U1

and

det A = det(L2U2 ) = det L2 det U2;

it follows that U1 and U2 are also nonsingular. Hence

L; L = U U ; ;

2

1

1 2 1

1

where L2L;1 1 is the product of two unit lower triangular matrices and is therefore unit lower

triangular; U2 U1;1 is the product of two upper triangular matrices and is therefore upper triangular.

Since a unit lower triangular matrix can be equal to an upper triangular matrix only if both are

the identity, we have

L1 = L2; U1 = U2:

Theorem 5.2.1 (LU Theorem) Let A be an nn matrix with all nonzero leading

principal minors. Then A can be decomposed uniquely in the form

A = LU;

where L is unit lower triangular and U is upper triangular.

Remark: Note that in the above theorem, if the diagonal entries of L are not specied,

then the factorization is not unique.

Algorithm 5.2.1 LU Factorization Using Elementary Matrices

Given an n n matrix A, the following algorithm computes, whenever possible, elementary

matrices

E ; E ; : : :; En;

1 2 1

140

and an upper triangular matrix U such that with

L = E ; E ; En;; ;

1

1

2

1 1

1

For k = 1; 2; ; n ; 1 do

1. If akk = 0; stop.

2. Find an elementary matrix Ebk = I + meTk , of order (n ; k + 1), where

m = (0; 0; : : :; 0; mk +1 ;k; : : :; mn;k);

such that 0a 1 0a 1

BB kk... CC BB 0kk CC

Ebk BBB .. CCC = BBB .. CCC :

@ . A @ . A

ank 0

Ik; 0

!

3. Dene Ek = 1

; where Ik;1 is the matrix of the rst (k ; 1) rows and columns of

0 Ebk

the n n identity matrix I .

4. Save the multipliers

mk +1 ;k ; : : :; mn;k:

5. Compute A(k) = Ek A.

6. Overwrite A with A(k).

Example 5.2.1

Find an LU factorization of 02 2 31

B 4 5 6 CC :

A=B

@ A

1 2 4

141

Step 1. Compute E . The multipliers are: m = ;2; m = ; .

1 21 31

1

0 1 0 01

2

B C

E =B @ ;2 1 0 CA 1

; 0 1 1

0 1 0 0102 2 31 02 2 3 1 2

B CB C B C

AA =E A=B

(1)

@ ;2 1 0 CA B@ 4 5 6 CA = B@ 0 1 0 CA

1

; 0 1 1 2 1 1

2

0 1 ; 1

2

2 32

1 0

!

Eb =

;1 1 2

01 0 01

!

E =B

B C 1 0

@ 0 1 0 CA = 0 Eb

2

0 ;1 1 2

01 0 0102 2 31 02 2 31

AA =E A =B

B 0 1 0 CC BB 0 1 0 CC BB C

(2)

@ 2

(1)

A@ A = @ 0 1 0 CA

0 ;1 1 0 1 0 0 5 5

02 2 31

2 2

B 0 1 0 CC :

Thus U = B

@ A

0 0 5

Compute L: 2

0 1 0 01

B C

L =E E =B @ ;2 1 0 CA

1 2 1

; ;1 1 1

0 1 0 0

1 0 1 0 01 2

L = L; = B

B ;m 1 0C

C BB C

@1

1

A = @ 2 1 0 CA

21

;m ;m 1 31 1 1 32

1

2

1. Each elementary matrix Ek is uniquely determined by (n ; k) multipliers

mk +1;k ; : : :; mnk:

142

2. The premultiplication of A(k;1) by Ek aects only the rows (k +1) through n. The rst k rows

of the product remain same as those of A(k;1) and only the last (n ; k) rows are modied.

Thus, if Ek A(k;1) = A(k) = (a(ijk)), then

(a) a(ijk) = a(ijk;1) (i = 1; 2; : : :; k; j = 1; 2; : : :; n) (the rst k rows).

(b) a(ijk) = a(ijk;1) + mik a(kjk;1) (i = k + 1; : : :; n; j = k + 1; : : :; n) (the last (n ; k) rows).

(Note that this is how the entries of A(k) were obtained from those of A(k;1) in Algorithm

3.1.5.)

For example, let n = 3.

E A = A = (aij );

1

(1) (1)

0 1 0 01

B C

E =B

@ m 1 0 CA

1 21

m 0 1

0 1 0 010a 1 0a a 1

31

a a a

B CB 11 12 13

CC BB

11 12 13

CC

E A=B

1 @ m 1 0 CA B@ a

21 a21 22 a 23 A=@ 0 a

(1)

22 a (1)

23 A

m 31 0 1 a a 31 32 a 33 0 a(1)

32 a (1)

33

where

a (1)

22 = a22 + m21a21

a (1)

23 = a23 + m31a13

and so on.

3. As soon as A(k) is formed, it can overwrite A.

4. The vector (mk+1;k; : : :; mnk) has (n ; k) elements and at the kth step exactly (n ; k) zeros

are produced, an obvious scheme of storage will then be to store these (n ; k) elements in

the positions (k + 1; k); (k + 2; k); : : :; (n; k) of A below the diagonal.

5. The entries of the upper triangular matrix U then can be stored in the upper half part of

A including the diagonal. With this storage scheme, the matrix A(n;1) at the end of the

(n ; 1)th step will be

0 a a an 1

BB ;m a

11 12

an

1

CC

BB . . . CC

(1) (1)

21

..

21 2

A A n;

( 1)

=B BB ... . . . . . CC

B@ .. . .. . .. .. CC

. A

;mn ;mn;n; annn;

1 1

( 1)

143

Thus a typical step of Gaussian elimination for LU factorization consists of

(1) forming the multipliers and storing them in the appropriate places below diagonal,

(2) updating the entries of the rows (k + 1) through n and saving them in the upper half of A.

Based on the above discussion, we now present the following algorithm.

Algorithm 5.2.2 Triangularization Using Gaussian Elimination without Pivoting

Let A be an n n matrix. The following algorithm computes triangularization of A, whenever

it exists. The algorithm overwrites the upper triangular part of A including the diagonal with U ,

and the entries of A below the diagonal are overwritten with multipliers needed to compute L.

For k = 1; 2; : : :; (n ; 1) do

1. (Form the multipliers)

aik mik = ; aik (i = k + 1; k + 2; : : :; n)

akk

2. (Update the entries)

aij aij + mikakj (i = k + 1; : : :; n; j = k + 1; n).

Remark: The algorithm does not give the matrix L explicitly; however, it can be formed out of

the multipliers saved at each step, as shown earlier (see the explicit expression for L).

Flop-Count. The algorithm requires roughly n3
ops. This can be seen as follows:

3

2

requires one
op and updating each entry also requires 1
op. Thus, step 1 requires (n ; 1) +(n ; 1)

2

ops.

Step 2, similarly, requires [(n ; 2)2 + (n ; 2)]
ops, and so on. In general, step k requires

[(n ; k)2 + (n ; k)]
ops. Since there are (n ; 1) steps, we have

nX

;1 nX

;1

Total
ops = (n ; k)2 + (n ; k)

k=1 k=1

= n(n ; 1)(2

6

n ; 1) + n(n ; 1)

2

n 3

' 3 + O(n ) :

2

144

Recall

1. 12 + 22 + + r2 = r(r+1)(2r+1)

6

2. 1 + 2 + + r = r(r+1)

2

The above described Gaussian elimination process for an n n matrix A can be easily extended

to an m n matrix to compute its LU factorization, when it exists. The process is identical, only

the number of steps in this case is k = minfm ; 1; ng: The following is an illustrative example. Let

01 21

B CC

A=B @3 4A

5 6

m = 3 ; n = 2:

The number of steps k = min(2; 2) = 2:

a a = a + m a = ;2

22

(1)

22 22 21 12

a a = a + m a = ;4 (1)

01 2 1

32 32 32 31 13

A A =B

B 0 ;2 CC :

@ A (1)

0 ;4

Step 2. The multiplier is m = ;2; a a = 0. (2)

32

01 2 1 32 32

B C

A A =B @ 0 ;2 CA (2)

0 0

01 2 1

B C

U = B

@ 0 ;2 CA :

0 0

Note that U in this case is upper trapezoidal rather than an upper triangular matrix

0 1 0 0

1 01 0 01

B C B C

L=B @ ;m21 1 0 CA = B@ 3 1 0 CA :

;m31 ;m32 1 5 2 1

145

Verify that 01 0 0101 2 1 01 21

B CB C B C

LU = B

@ 3 1 0 CA B@ 0 ;2 CA = B@ 3 4 CA = A:

5 2 1 0 0 5 6

Diculties with Gaussian Elimination without Pivoting

As we have seen before, Gaussian elimination without pivoting fails if any of the pivots is zero.

However, it is worse yet if any pivot becomes close to zero: in this case the method can be

carried to completion, but the obtained results may be totally wrong.

Consider the following celebrated example from Forsythe and Moler (CSLAS, p. 34):

Let Gaussian elimination without pivoting be applied to

0:0001 1

!

A= ;

1 1

and use three-digit arithmetic. There is only one step. We have: multiplier m21 = 10;;14 = ;104

0:0001 1

!

U = A = (1)

0 ;104

1 0

!

L = :

104 1

The product of the computed L and U gives

0:0001 1

!

LU = ;

2 0

which is dierent from A. Who is to blame?

Note that the pivot a(1)

11 = 0:0001 is very close to zero (in three-digit arithmetic). This small

pivot gave a large multiplier. The large multiplier, when used to update the entries, eliminated the

small entries (e.g., (1 ; 104) gave ;104). Fortunately, we can avoid this small pivot just by

row interchanges. Consider the matrix with the rst row written second and the second written

rst:

1 1

!

A=0 :

0:0001 1

Gaussian elimination now gives

1 1

! 1 0

!

U =A = (1)

; L=

0 1 0:0001 1

1 1

!

Note that the pivot in this case is a = 1. The product LU =

(1)

= A0 .

11

0:0001 1:0001

146

5.2.2 Gaussian Elimination with Partial Pivoting

In the example above, we have found a factorization of the matrix A0 which is a permuted version

of A in the sense that the rows have been swapped. A primary purpose of factorizing a matrix A

into LU is to use this factorization to solve a linear system. It is easy to see that the solution of the

system Ax = b and that of the system A0 x = b0, where b0 has been obtained in a manner similar to

that used to generate A0 , are the same. Thus, if the row interchanges can help avoid a small pivot,

it is certainly desirable to do so.

As the above example suggests, disaster in Gaussian elimination without pivoting can perhaps

be avoided by identifying a \good pivot" (a pivot as large as possible) at each step, before the

process of elimination is applied. The good pivot may be located among the entries in a column

or among all the entries in a submatrix of the current matrix. In the former case, since the search

is only partial, the method is called partial pivoting; in the latter case, the method is called

complete pivoting. It is important to note that the purpose of pivoting is to prevent

large growth in the reduced matrices which can wipe out original data. One way to

do this is to keep multipliers less than one in magnitude, and this is exactly what is accomplished

by pivoting. However, large multipliers do not necessarily mean instability (see our discussion of

Gaussian elimination without pivoting for symmetric positive denite matrices in Chapter 6). We

rst describe Gaussian elimination with partial pivoting.

The process consists of (n ; 1) steps.

Step 1. Scan the rst column of A to identify the largest element in magnitude in that column.

Let it be ar1 1 .

;

Form a permutation matrix P by interchanging the rows 1 and r of the identity matrix and

1 1

Form P A by interchanging the rows r and 1 of A.

1 1

Find an elementary lower triangular matrix M such that A = M P A has zeros below the

1

(1)

1 1

It is sucient to construct M1 such that

0a 1 01

BB ... 11

CC BB 0 CC

MBBB .. CC = BB . CC :

CA B@ .. CA

@ .

1

an 1 0

147

Note that 0 1 0 01

BB m C

1 0 0C

BB 21 CC

M =B BB m. 0 1 0CC

... C

1 31

B@ .. C A

mn 0 0 11

where m = ; aa , m = ; aa ; : : :; mn = ; aan . Note that aij refers to the (i; j )th entry of the

21

21

31

31

1

1

permuted matrix P A. Save the multipliers mi ; i = 2; : : :; n and record the row interchanges.

11 11 11

1 1

0 1

BB 0 C CC

BB C

A =B BB 0. C

.. C

(1)

B@ .. .. C

. .C A

0

Step 2. Scan the second column of A below the rst row to identify the largest element

(1)

in magnitude in that column. Let the element be a(1) r2 ;2 . Form the permutation matrix P2 by

interchanging the rows 2 and r2 of the identity matrix and leaving the other rows unchanged. Form

P2 A(1).

Next, nd an elementary lower triangular matrix M2 such that A(2) = M2P2 A(1) has zeros below

the (2,2) entry. M2 is constructed as follows. First, construct an elementary matrix M c2 of order

(n ; 1) such that 0 1 0 1 a

BB a 22

CC BB 0 CC

BB . 32 CC BB CC

c

M =B BB ... CC = BB 0 CC ;

2

B@ .. CC BB .. CC

A @.A

an 2 0

then dene 01 0 01

BB 0 CC

M =BB CC :

B@ ...

2

c

M C A 2

0

Note that aij refers to the (i; j )th entry of the current matrix P A . At the end of Step 2,

2

(1)

148

we will have

0 1

BB 0 C CC

BB C

A =M P A =B

(2)

BB 0.

2 2

(1)

0 C CC ;

B@ .. .. . . . .C .

.

. A

0 0

01 0 0 01

BB 0 1 0 0C CC

BB C

B0 m 1 0C

M =B BB .. .. 32

... .. CC

2

BB .. .. .C C

B@ .. .. ... 0 C CA

0 mn2 0 1

where mi2 = ; aai2 , i = 3; 4; : : :; n.

Save the multipliers mi2 and record the row interchange.

22

Step k. In general, at the kth step, scan the entries of the kth column of the matrix A(k;1)

below the row (k ; 1) to identify the pivot ar , form the permutation matrix Pk , and nd an

k;k

elementary lower triangular matrix Mk such that A(k) = Mk Pk A(k;1) has zeros below the (k; k)

entry. Then Mk is constructed rst by constructing M ck of order (n ; k + 1) such that

0a 1 01

BB kk... CC BB 0 CC

ck BB .. CC = BB .. CC ;

M B@ . CA B@ . CA

ank 0

and then dening !

Ik; 0

Mk = 1

ck ;

0 M

where `0' is a matrix of zeros. The elements ai;k refer to the (i; k)th entries of the matrix

Pk A k; .

( 1)

Step n-1. At the end of the (n ; 1)th step, the matrix A n; will be an upper triangular( 1)

matrix.

Form U : Set

A n; = U:

( 1)

(5.2.1)

149

Then

U = A n; = Mn; Pn; A n;

( 1)

1 1

( 2)

1 1 2 2

( 3)

1 1 2 2 2 2 1 1

Set

Mn; Pn; Mn; Pn; M P M P = M

1 1 2 2 2 2 1 1 (5.2.2)

Then we have from above the following factorization of A :

U = MA

nonsingular matrix A, Gaussian elimination with partial pivoting gives an upper

triangular matrix U and a \permuted" lower triangular matrix M such that

MA = U;

where M and U are given by (5.2.2) and (5.2.1), respectively.

From (5.2.2) it is easy to see that there exists a permutation matrix P such that PA = LU .

Dene

P = Pn; P P 1 2 1 (5.2.3)

L = P (Mn; Pn; M P ); : 1 1 1 1

1

(5.2.4)

Then PA = LU .

n n nonsingular matrix A, Gaussian elimination with partial pivoting yields LU

factorization of a permuted version of A:

PA = LU;

where P is a permutation matrix given by (5.2.3), L is a unit lower triangular matrix

given by (5.2.4), and U is an upper triangular matrix given by (5.2.1).

150

Example 5.2.2

0:0001 1

!

A= :

1 1

Only one step. The pivot entry is 1, r1 = 2

0 1

!

P = ;

1

1 0

1 1

!

PA =

1

0:0001 1

m = 1 = ;10;4

; 0:0001

21

1 0

! 1 0

!

M = =

1

m21 1 ;10;4 1

1 0

! 1 1! 1 1!

MPA = = =U

1 1

;10;4 1 0:0001 1 0 1

1

!

0 0 1

! 0 1 !

M = M1P1 = =

;10;4 1 1 0 1 ;10;4

0 1

! 10;4 1 ! 1 1 !

MA = = :

1 ;10;4 1 1 0 1

Example 5.2.3

Triangularize 0 0 1 11

B 1 2 3 CC

A=B

@ A

1 1 1

using partial pivoting. Express A = MU . Find also P and L such that PA = LU .

21 1

00 1 01

B C

P = B @ 1 0 0 CA

1

0 0 1

01 2 31

B C

PA = B

1 @ 0 1 1 CA ;

1 1 1

151

0 1 0 01

B C

M = B

@ 0 1 0 CA

1

;1 0 1

01 2 3 1

A = M P A=B

B 0 1 1 CC

(1)

@ A 1 1

0 ;1 ;2

Step 2. The pivot entry is a = 1 22

01 2 3 1

PA = B

B 0 1 1 CC ;

2 @

(1)

A

0 ;1 ;2

P = I (no interchange is necessary)

!

2 3

c = 1 0

M ;

2

1 1

01 0 01

!

M =

I 0 1

=

BB 0 1 0 CC

2

c

0 M @ A

0 1 1 2

01 2 3 1

B C

U = A(2) = M2 P2 A(1) = B

@ 0 1 1 CA

0 0 ;1

00 1 01

M = M2 P2 M1P1 = B

B 1 0 0 CC :

@ A

1 ;1 1

It is easily veried that A = MU .

Form L and P :

00 1 01

B 1 0 0 CC

P = P P =B

@ 2 A 1

0 0 1

01 0 01

B C

L = P (M P M P ); = B

@ 0 1 0 CA :

2 2 1 1

1

1 ;1 1

It is easy to verify that PA = LU .

152

Forming the Matrix M and Other Computational Details

1. Each permutation matrix Pk can be formed just by recording the index rk ; since Pk is the

permuted identity matrix in which rows k and rk have been interchanged. However, neither

the permutation matrix Pk nor the product Pk A(k;1) needs to be formed explicitly. This is

because the matrix Pk A(k;1) is just the permuted version of A(k;1) in which the

rows rk and k have been interchanged.

2. Each elementary matrix Mk can be formed just by saving the (n ; k + 1) multipliers. The

matrices MkPk A(k;1) = Mk B also do not need to be computed explicitly. Note that

the elements in the rst k rows of the matrix Mk B are the same as the elements of the rst

k rows of the matrix B, and the elements in the remaining (n ; k) rows are given by:

bij + mik bkj (i = k + 1; : : :; n; j = k + 1; : : :; n):

3. The multipliers can be stored in the appropriate places of lower triangular part of A (below

the diagonal) as they are computed.

4. The nal upper triangular matrix U = A(n;1) is stored in the upper triangular part.

5. The pivot indices rk are stored in a separate single subscripted integer array.

6. A can be overwritten with each A(k) as soon as the latter is formed.

Again, the major programming requirement is a subroutine that computes an elemen-

tary matrix M such that, given a vector a, Ma is a multiple of the rst column of the

identity matrix.

In view of our above discussion, we can now formulate the following practical algorithm for

LU factorization with partial pivoting.

Algorithm 5.2.3 Triangularization Using Gaussian Elimination with Partial Pivoting

Let A be an n n nonsingular matrix. Then the following algorithm computes the triangu-

larization of A with rows permuted, using Gaussian elimination with partial pivoting. The upper

triangular matrix U is stored in the upper triangular part of A, including the diagonal. The multi-

pliers needed to compute the permuted triangular matrix M such that MA = U are stored in the

lower triangular part of A. The permutation indices rk are stored in a separate array.

For k = 1; 2; : : :; n ; 1 do

153

1. Find rk so that jar ;k j = kmax

k

ja j. Save rk.

in ik

If ar ;k = 0, then stop. Otherwise, continue.

k

3. (Form the multipliers) aik mik = ; aaik (i = k + 1; : : :; n).

kk

4. (Update the entries) aij aij + mik akj = aij + aik akj (i = k + 1; : : :; n; j = k + 1; : : :; n).

3

2

(Note that the search for the pivot at step k requires (n ; k) comparisons.)

Note: Algorithm 5.2.3 does not give the matrices M and P explicitly. However, they

can be constructed easily as explained above, from the multipliers and the permutation

indices, respectively.

Remark: The above algorithm accesses the rows of A in the innermost loop and that is why

it is known as the row-oriented Gaussian elimination (with partial pivoting) algorithm.

It is also known as the kij algorithm; note that i and j appear in the inner loops. The column-

oriented algorithm can be similarly developed. Such a column-oriented algorithm has been used in

LINPACK (LINPACK routine SGEFA).

Example 5.2.4 01 2 41

B C

A=B

@ 4 5 6 CA :

7 8 9

Step 1. k = 1.

1. The pivot entry is 7: r1 = 3.

2. Interchange rows 3 and 1: 07 8 91

B C

AB

@ 4 5 6 CA :

1 2 4

3. Form the multipliers:

a m = ; 74 ; a m = ; 17 :

21 21 31 31

154

4. Update: 07 8 9 1

B CC

AB

@0 A: 3

7

6

7

0 6

7

19

7

Step 2. k = 2.

1. The pivot entry is 67 .

2. Interchange rows 2 and 3:

07 8 9 1

B CC

AB

@0 A: 6

7

19

7

0 3

7

6

7

m = ; 12 :

32

4. Update: 07 8 9 1

B0

A = B

CC :

@ A6

7

19

7

0 0 ; 1

2

Form M . 00 0 11

M =B

B 1 0 ; CC :

@ A 1

7

; 1 ;

1

2

1

2

In Gaussian elimination with complete pivoting, at the kth step, the search for the pivots is made

among all the entries of the submatrix below the rst (k ; 1) rows. Thus, if the pivot is ars , to

bring this pivot to the (k; k) position, the interchange of the rows r and k has to be followed by

the interchange of the columns k and s. This is equivalent to premultiplying the matrix A(k;1) by

a permutation matrix Pk obtained by interchanging rows k and r and post-multiplying Pk A(k;1)

by another permutation matrix Qk obtained by interchanging the columns k and s of the identity

matrix I . The ordinary Gaussian elimination is then applied to the matrix Pk A(k;1)Qk ; that is, an

elementary lower triangular matrix Mk is sought such that the matrix

A k = Mk Pk A k; Qk

( ) ( 1)

has zeros on the kth column below the (k; k) entry. The matrix Mk can of course be computed in

two smaller steps as before.

155

At the end of the (n ; 1)th step, the matrix A(n;1) is an upper triangular matrix. Set

A n; = U:

( 1)

(5.2.5)

Then

U = A n; = Mn; Pn; A n; Qn;

( 1)

1 1

( 2)

1

1 1 2 2

( 3)

2 1

1 1 2 2 1 1 1 2 1

Set

Mn; Pn; M P = M;

1 1 1 1 (5.2.6)

Q Qn; = Q:

1 1 (5.2.7)

Then we have

U = MAQ:

matrix A, Gaussian elimination with complete pivoting yields an upper triangular

matrix U , a permuted lower triangular matrix M and a permutation matrix Q such

that

MAQ = U;

where U; M , and Q are given by (5.2.5){(5.2.7).

As in the case of partial pivoting, it is easy to see from (5.2.4) and (5.2.7) that the factorization

MAQ = U can be expressed in the form:

PAQ = MU:

156

Corollary 5.2.2 (Complete Pivoting LU Factorization Theorem). Gaussian

elimination with complete pivoting yields the factorization PAQ = LU , where P

and Q are permutation matrices given by

P = Pn; P ; 1 1

Q = Q Qn; ;1 1

L = P (Mn; Pn; M P ); :

1 1 1 1

1

Example 5.2.5

Triangularize 00 1 1 1

B1 2 3

A=B

CC

@ A

1 1 1

using complete pivoting.

23

00 1 01

B C

P =B

1 @ 1 0 0 CA ;

0 0 1

00 0 11 03 2 11

B C B C

Q =B

1 @ 0 1 0 CA ; P AQ = B

1 @ 1 1 0 CA ;

1

1 0 0 1 1 1

0 1 0 01

M =B

B; C

@

1 1 0C

A; 1

3

; 0 1 1

03 2 1 3

1

B CC

A = M P AQ = B

(1)

1 1@0 ;1

1

3

1

3 A:

0 1

3

2

3

157

Step 2. The pivot entry is a = 23 .

(1)

33

01 0 01 01 0 01

B C B C

P =B @ 0 0 1 CA ;

2 Q =B

2 @ 0 0 1 CA :

0 1 0 0 1 0

03 2 11

!

B C c2 = 1 1 0

@ 0 23 13 CA ; M

P2A(1) Q2 = B :

1

0 ;3 31 1 2

01 0 01

M2 = B

B 0 1 0 CC ;

@ A

0 21 1

03 2 11

B 2 1 CC :

U = A(2) = M2P2A(1)Q2 = M2P2(M1P1AQ1 )Q2 = B @0 3 3 A

0 0 12

(Using Corollary 5.2.2, nd for yourself P , Q, L, and U such that PAQ = LU .)

Forming the Matrix M and other Computational Details

Remarks similar to those as in the case of partial pivoting hold. The matrices Pk ; Qk Pk A(k;1)Qk ,

Mk and Mk PkA(k;1)Qk do not have to be formed explicitly wasting storage unnecessarily. It is

enough to save the indices and the multipliers.

In view of our discussion on forming the matrices Mk and the permutation matrix Pk , we now

present a practical Gaussian elimination algorithm with complete pivoting, which does not show

the explicit formation of the matrices Pk ; Qk ; Mk ; MkA and Pk AQk . Note that partial pivoting

is just a special case of complete pivoting.

Algorithm 5.2.4 Triangularization Using Gaussian Elimination with Complete Pivot-

ing

Given an n n matrix, the following algorithm computes triangularization of A with rows and

columns permuted, using Gaussian elimination with complete pivoting. The algorithm overwrites

A with U . U is stored in the upper triangular part of A (including the diagonal) and the multipliers

mik are stored in the lower triangular part. The permutation indices rk and sk are saved separately.

For k = 1; 2; : : :; n ; 1 do

1. Find rk and sk such that jar ;s j = max fjaij j : i; j kg ; and save rk and sk .

k k

k k

158

2. (Interchange the rows rk and k) akj $ ar ;j (j = k; k + 1; : : :; n).

k

k

akk

5. (Update the entries of A) aij aij + mik akj = aij + aik akj (i = k + 1; : : :; n; j = k + 1; : : :; n).

Note: Algorithm 5.2.4 does not give the matrices M; P , and Q explicitly; they have

to be formed, respectively, from the multipliers mik and the permutation indices rk

and sk , as explained above.

Flop-count: The algorithm requires n3
ops and O(n ) comparisons.

3

3

Example 5.2.6

1 2

!

A= :

3 4

Just one step is needed.

r = 2

1

s = 2:

1

First, the second and rst rows are switched and this is then followed by the switch of the second

and rst column to obtain the pivot entry 4 in the (1,1) position:

3 4

!

A

1 2

(After the interchange of the rst and second rows).

4 3

!

A

2 1

(After the interchange of the rst and second columns).

Multiplier is: a21 m21 = ; aa21 = ; 42 = ; 12

11

4 3

!

A

0 ; 12

159

(After updating the entries of A).

0 1

! 0 1

!

P1 = ; Q = Q1 = ;

1 0 1 0

!

1 0 0 1

! 0 1

!

M = M1P1 = 1 = :

;2 1 1 0 1 ; 12

5.3 Stability of Gaussian Elimination

The stability of Gaussian elimination algorithms is better understood by measuring the growth of

the elements in the reduced matrices A(k). (Note that although pivoting keeps the multipli-

ers bounded by unity, the elements in the reduced matrices still can grow arbitrarily).

We remind the readers of the denition of the growth factor in this context, given in Chapter 3.

Denition 5.3.1 The growth factor is the ratio of the largest element (in magnitude) of

A; A ; : : :; A n; to the largest element (in magnitude) of A:

(1) ( 1)

= max(; ; ; : : :; n; ) ;

1 2 1

i;j ij

ja k j.

i;j ij

( )

Example 5.3.1

0:0001 1

!

A=

1 1

1. Gaussian elimination without pivoting gives

0:0001 1

!

A =U =

(1)

0 ;104

max ja(1)

ij j = 10

4

max jaij j = 1

= the growth factor = 10 4

1 1

!

A =U =

(1)

0 1

max ja(1)

ij j = 1

max jaij j = 1

= the growth factor = 1

160

The question naturally arises: how large can the growth factor be for an arbitrary

matrix? We answer this question in the following.

1.

Pivoting

For Gaussian elimination with complete pivoting,

fn 2 3 12 4 13 n

1 ;1 1 g1=2:

n

This is a slowly growing function of n. Furthermore, in practice this bound is never attained.

Indeed, there was an unproven conjecture by Wilkinson (AEP, p. 213) that the growth

factor for complete pivoting was bounded by n for real n n matrices. Unfortunately,

this conjecture has recently been settled by Gould (1991) negatively. Gould (1991)

exhibited a 13 13 matrix for which Gaussian elimination with complete pivoting gave the growth

factor = 13:0205. In spite of Gould's recent result, Gaussian elimination with complete

pivoting is a stable algorithm.

2.

Pivoting

For Gaussian elimination with partial pivoting, 2n;1, that is,

can be as big as 2n; . 1

Unfortunately, one can construct matrices for which this bound is attained.

161

Consider the following example:

0 1

BB 1 0 0 0 1C

BB ;1 1 0 0 1C C

BB ... ... ... .. CC

. C

A=B BB .. C

.. C

BB . ... ... . C

C

BB ... ... . . . ... C CC

@ A

;1 ;1 1

That is, 8

> for j = i; n;

< 1

>

aij = > ;1 for j < i;

>

: 0 otherwise.

Wilkinson (AEP, pp. 212) has shown that the growth factor for this matrix with partial pivoting

is 2n;1. To see this, take the special case with n = 4.

0 1 0 0 11

BB ;1 1 0 1 CC

A = B BB CC

@ ;1 ;1 1 1 CA

;1 ;1 ;1 1

01 0 0 11

BB 0 1 0 2 CC

A = B

(1)

BB CC

CA

@ 0 ; 1 1 2

0 ;1 ;1 2

01 0 0 11

BB CC

B 0 1 0 2 C

A(2) = B B@ 0 0 1 4 CCA

0 0 ;1 4

01 0 0 11

BB 0 1 0 2 CC

A = B

(3)

BB CC

@ 0 0 1 4 CA

0 0 0 8

Thus the growth factor

= 81 = 23 = 24;1:

162

Remarks: Note that this is not the only matrix for which = 2n; . Higham and Higham (1987)

1

0 0:7248 0:7510 0:5241 0:7510 1

BB 0:7317 0:1889 0:0227 ;0:7510 CC

B=B BB CC

@ 0:7298 ;0:3756 0:1150 0:7511 CA

;0:6993 ;0:7444 0:6647 ;0:7500

is such a matrix. See Higham (1993).

Examples of the above type are rare. Indeed, in many practical examples, the elements of the

matrices A(k) very often continue to decrease in size. Thus, Gaussian elimination with partial

pivoting is not unconditionally stable in theory, but in practice it can be considered a

stable algorithm.

3.

without Pivoting

For Gaussian elimination without pivoting, can be arbitrarily

large, except for a few special cases, as we shall see later, such as

symmetric positive denite matrices. Thus Gaussian elimi-

nation without pivoting is, in general, a completely unstable

algorithm.

Denition 5.4.1 A matrix of the form

H = I ; 2uuu

T

Tu ;

where u is a nonzero vector, is called a Householder matrix after the celebrated numerical analyst

Alston Householder.

Alston Householder, an American mathematician, was the former Director of the Mathematics and Computer

Science Division of Oak Ridge National Laboratory at Oak Ridge, Tennessee and a former Professor of Mathematics

at the University of Tennessee, Knoxville. A research conference on Linear and Numerical Linear Algebra

dedicated to Dr. Householder, called \HOUSEHOLDER SYMPOSIUM" is held every three years around the world.

Householder died in 1993 at the age of 89. A complete biography of Householder appears in the SIAM Newsletter,

October 1993.

163

A Householder matrix is also known as an Elementary Re
ector or a Householder transformation.

We now give a geometric interpretation of a Householder transformation.

u(uT x) x

;2u(uT x)

Hx = ( I ; 2uu)T x

= x ; 2u(uT x)

With this geometric interpretation the following results become clear:

kHxk = kxk for every x 2 Rn.

2 2

H is an orthogonal matrix.

kHxk = kxk for every x implies that H is orthogonal.

2 2

H = I.

2

2

Hy = y for every y 2 P .

Vectors in P cannot be re
ected away.

H has a simple eigenvalue ;1 and (n ; 1)-fold eigenvalue 1.

P = fv 2 Rn : v> u = 0g has n ; 1 linearly independent vectors y ; : : :; yn; and

1 1

u to ;u, i.e., Hu = ;u. Thus ;1 is an eigenvalue of H which must be a simple

eigenvalue because H can have only n eigenvalues.

164

det(H ) = ;1

det(H ) = (;1) 1 1 = ;1.

Also from Figure 5.1, for given x; y 2 Rn with kxk2 = ky k2, if we choose u to be a unit vector

parallel to x ; y , then H = I ; 2uu> re ects x to y .

The importance of Householder matrices lies in the fact that they can also be used to create

zeros in a vector.

1

Proof. Dene

H = I ; 2 uu

T

uT u

with u = x + sign(x1 )kxk2e1 , then it is easy to see that Hx is a multiple of e1 :

Note: If x is zero, its sign can be chosen either + or ;. Any possibility of over
ow or under
ow

1

in the computation of kxk can be avoided by scaling the vector x. Thus the vector u should be

determined from the vector max xfjx jg rather than from the vector x itself.

2

i i

Algorithm 5.4.1 Creating zeros in a vector with a Householder matrix

Given an n-vector x, the following algorithm replaces x by Hx = (; 0; : : :; 0)T , where H is a

Householder matrix.

1. Scale the vector x max xfjx jg .

i i

2. Compute u = x + sign(x1 )kxk2e1 .

3. Form Hx where H = I ; 2uu .

T

uT u

Remark on Step 3: Hx in step 3 should be formed by exploiting the structure of H as shown

in Example 7 in Chapter 4.

165

Example 5.4.1

001

B CC

x=B

@4A

1

001

x maxxfjx jg

B CC

=B

@1A

i 1

001 011 0 p 4

1

p

B CC 17 BB CC BB

17

4

CC

u=B

@1A+ @0A = @ 1

4 A

1

0 1

4

0 0 ; 0: 9701 ; 0 : 2425

14

T B CC

H = I ; 2uuu B

T u = @ ;0:9701 0:0588 ;0:2353 A

;0:2425 ;0:2353 0:9412

and 0 ;4:1231 1

B 0 CC :

Hx = B

@ A

0

Flop-Count and Round-o Property. Creating zeros in a vector by a Householder matrix

is a cheap and numerically stable procedure.

It takes only 2(n-2)
ops to create zeros in the positions 2 through n in a vector, and it can

be shown (Wilkinson AEP, pp. 152-162) that if Hb is the computed Householder matrix, then

kH ; Hb k 10:

Moreover,

b ) = H (x + e);

(Hx

where

jej cn kxk ;

2

2

166

5.4.1 Householder Matrices and QR Factorization

matrix A, there exists an orthogonal matrix Q and an upper triangular matrix R

such that

A = QR:

The matrix Q can be written as Q = H1H2 Hn;1, where each Hi is a Householder

matrix.

As we will see later, the QR factorization plays a very signicant role in numerical solutions

of linear systems, least-squares problems, eigenvalue and singular value computations.

We now show how the QR factorization of A can be obtained using Householder matrices, which

will provide a constructive proof of Theorem 5.4.1.

As in the process of LU factorization, this can be achieved in (n ; 1) steps; however, unlike

the Gaussian elimination process, the Householder process can always be carried out to

completion.

Step 1. Construct a Householder matrix H such that H A has zeros below the (1,1) entry

1 1

BB 0 CC

BB CC

H A = B0 C

B :

BB .. .. .. CCC

1

@. . .A

0

Note that it is sucient to construct H = I ; 2unuTn =(uTn un ) such that

1

0a 1 01

BB a CC BB 0 CC

11

21

@ . A @.A

1

an 1 0

for then H1 A will have the above form.

Overwrite A with A = H1A for use in the next step.

(1)

167

Since A overwrites A(1) , A(1) can be written as:

0a a a n1

BB 0 a 11 12 1

C

a nC

AA =BBB .. .. . . .

(1)

22 2 C

.. C :

@ . . . C

A

0 an 2 ann

Step 2. Construct a Householder matrix H such that H A has zeros below the (2,2) entry

2 2

(1)

in the 2nd column and the zeros already created in the rst column of A(1) in step 1 are not

destroyed: 0 1

BB 0 C CC

BB C

A =H A =B

(2)

BB 0. (1)

0 C C

. . . ... C

2

B@ .. .. CA

.

0 0

H can be constructed as follows:

2

that 0 1 0 1 a

BB a 22

CC BB 0 CC

B CC BB CC

bH BBB ...

32

CC = BB 0 CC ;

2

BB .. CC BB .. CC

@ . A @.A

an 2 0

and then dene 01 0 01

BB CC

B 0 CC :

H =B

2

B@ ... bH CA 2

0

A(2) = H2A(1) will then have the form above.

Overwrite A with A . (2)

Note: Since H also has zeros below the diagonal on the lst column, premultiplication of A by

2

(1)

2

Hb k = In;k ; (2uuTn;k uun;k )

T +1 +1

+1

n;k n;k +1 +1

168

of order n ; k + 1 such that 0a 1 01

BB kk... CC BB 0 CC

Hb k BBB .. CCC = BBB .. CCC ;

@ . A @.A

ank 0

and then, dening !

Ik; 0

Hk = 1

;

0 Hb k

compute A(k) = Hk A(k;1).

Overwrite A with A k :( )

The matrix A(k) will have zeros on the kth column below the (k; k)th entry and the zeros already

created in previous columns will not be destroyed. At the end of the (n ; 1)th step, the resulting

matrix A(n;1) will be an upper triangular matrix R.

Now, since

A(k) = Hk A(k;1); k = n ; 1; : : :; 2;

we have

R = A n; = Hn; A n; = Hn; Hn; A n;

( 1)

1

( 2)

1 2

( 3)

(5.4.1)

= = Hn; Hn; H H A:

1 2 2 1

Set

QT = Hn; Hn; H H :

1 2 2 (5.4.2)

1

R = QT A or A = QR: (5.4.3)

(Note that Q = H1T H2T HnT;1 is also orthogonal.)

Forming the Matrix Q and Other Computational Details

1. Since each Householder matrix Hk is uniquely determined by the vector un;k+1; to construct

Hk it is sucient just to save the vector un;k+1:

2. A(k) = Hk A can be constructed using the technique described in Chapter 4 (Example 4.7)

which shows that A(k) can be constructed without forming the product explicitly.

3. The vector un;k+1 has (n ; k +1) elements, whereas only (n ; k) zeros are produced at the kth

step. Thus, one possible scheme for storage will be to store the elements (un;k+1;1; un;k+1;2; : : :;

un;k+1;n;k) in positions (k + 1; k); : : :; (n ; k + 1; k) of A: The last element un;k+1;n;k+1 has

to be stored separately.

169

4. The matrix Q, if needed, can be constructed from the Householder matrices H1 through Hn;1.

The major programming requirement is a subroutine for computing a Householder

matrix H such that, for a given vector x, Hx is a multiple of e . 1

Given an n n matrix A, the following algorithm computes Householder matrices H1 through

Hn;1 and an upper triangular matrix R such that with Q = H1 Hn;1; A = QR: The algorithm

overwrites A with R.

For k = 1 to n ; 1 do

1. Find a Householder matrix Hb k = In;k+1 ; 2un;k+1uTn;k+1=uTn;k+1un;k+1 of order n ; k + 1

such that 0a 1 0r 1

BB kk... CC BB 0kk CC

Hb k BBB .. CCC = BBB .. CCC :

@ . A @ . A

ank 0

2. Dene

Ik;1 0

!

Hk = :

0 Hb k

3. Save the vector un;k+1.

4. Compute A(k) = Hk A.

5. Overwrite A with A(k).

Example 5.4.2

Let 00 1 11

B 1 2 3 CC :

A=B

@ A

1 1 1

Step 1.

k = 1:

170

Construct H :

001 01

1

B CC BB CC

HB

@1A = @0A

1

1 0

001 0 1 1 0 p2 1

B 1 CC + p BB CC BB CC

u =B

3 @ A 2@0A = @ 1 A

1 0 1

01 0 01 0 1 p12 p12

1

H = I ; 2uuT uu

T B 0 1 0 CC ; BB p

= B

CC

@ A @ A

3 3 1 1 1

1 3 2 2 2

3

0 0 1

3

p12 12 1

0 0 ; p1 ; p1 1 2

B p1 2 2

C

= B

@; 2 1

2

; C

A 1

2

;p 1

2

; 1

2

1

2

Form A :

(1)

0 ;p2 ;3p2 p 1

2 2

B

A =H A=B

p

2

1; 2 ;p C

CA

(1)

@ 0 1 2

p

2

2

p

2

0 ; (1+ 2)

2

; (2+ 2)

2

B 0 ;0:2071 0:2929 CC

AA B

@ (1)

A

0 ;1:2071 ;1:7071

Step 2.

k=2

c:

Construct H

! !

2

bH ;0:2071 =

;1:2071 02

;0:2071 ! 1

! ;1:4318 !

u = ; 1:2247 =

2

;1:2071 0 ;1:2071

;0:1691 ;0:9856 !

Hb =

;0:9856 0:1691

2

Construct H :2 01 1

0 0

H =B

B C

@ 0 ;0:1691 ;0:9856 CA

2

0 ;0:9856 0:1691

171

Form A : (2)

B C

H A =H A =H H A=B

(2)

2

(1)

@ 0 2 1 1:2247 1:6330 CA=R

0 0 ;0:5774

Form Q: 0 0 1

0:8165 0:5774

Q = H H = BB@ ;0:7071 0:4082 ;0:5774 CCA

1 2

Flop-Count. The algorithm requires approximately n ops just to compute the triangular

2

3

3

The construction of Hb k (and therefore of Hk ) requires about 2(n ; k) ops, while that of A(k)

from A(k) = Hk A (taking advantage of the special structure of Hk ) requires roughly 2(n ; k)2 ops.

Thus

nX

;1

Total number of ops = 2 [(n ; k)2 + (n ; k)]

k=1

= 2[(n ; 1)2 + (n ; 2)2 + + 12] + 2[(n ; 1) + (n ; 2) + + 1]

= 2 n(n ; 1)(2n ; 1) + 2 n(n ; 1)

6 2

23n :

3

Note: The above count does not take into account the explicit construction of Q. Q is available

only in factored form. It should be noted that in a majority of practical applications, it is sucient

to have Q in this factored form and, in many applications, Q is not needed at all. If Q is needed

explicitly, another 32 n3
ops will be required. (Exercise #22)

Round-o Property. In the presence of round-o errors the algorithm computes QR de-

composition of a slightly perturbed matrix. Specically, it can be shown (Wilkinson AEP p. 236)

that if Rb denotes the computed R, then there exists an orthogonal Qb such that

A + E = Qb R:

b

The error matrix E satises

kE kF (n)kAkF ;

where (n) is a slowly growing function of n and is the machine precision. If the inner

products are accumulated in double precision, then it can be shown (Golub and Wilkinson (1966))

that (n) = 12:5n. The algorithm is thus stable.

172

5.4.2 Householder QR Factorization of a Non-Square Matrix

In many applications (such as in least squares problems, etc.), one requires the QR factorization

of an m n matrix A. The above Householder method can be applied to obtain QR factorization

of such an A as well. The process consists of s = minfn; m ; 1g steps; the Householder matrices

H ; H ; : : :; Hs are constructed successively so that

1 2

8 R!

>

< ; if m n;

HsHs; H H A = Q A = > 0

1 2

T

1

: (R; S ); if m n.

Flop-Count and Round-o Property

The Householder method in this case requires

1. n2 (m ; n3 )
ops if m n.

2. m2(n ; m3 )
ops if m n.

The round-o property is the same as in the previous case. The QR factorization of a rectan-

gular matrix using Householder transformations is stable.

Example 5.4.3

0 1 1

1

B C

A = B

@ 0:0001 0 C A

0 0:0001

s = min(2; 3) = 2:

Step 1. Form H 1

0 1 1 011 0 2 1

B 0:0001 CC + p1 + (0:0001)

u =B

BB 0 CC = BB CC

@

2 A 2

@ A @ 0 :0001 A

0 0 0

0 ;1 ;0:0001 0 1

uu B ;0:0001

=B 0C

C

H =I;

1

2 2 T

2

uT2 u2 @ 1 A

0 0 1

0 ;1 ;1 1

B 0 ;0:0001 CC

A =H A=B

(1)

@ 1A

0 0:0001

173

Step 2: Form H 2

;0:0001 ! q 1

!

u = ; (;0:0001) + (0:0001) 2 2

1

0:0001 0

;2:4141 !

= 10; 4

:1000

1 0

! ;0:7071 0:7071 !

Hb = u u T

; 2 uT u =

1 1

2

0 1 1 0:7071 0:7071

1

01 0 0

1

BB C

H = 2 @ 0 ;0:7071 0:7071 CA

0 0:7071 0:7071

Form R 0 ;1 ;1 1

B C R

!

H A =B @ 0 0:0001 CA =

(1)

:

2

0

0 0

Form

0 ;1 0:0001 ;0:0001

1

B C

Q = H H =B

@ ;0:0001 ;0:7071 0:7071 CA

1 2

0 0:7071 0:7071

;1 ;1 !

R =

0 0:0001

5.4.3 Householder Matrices and Reduction to Hessenberg Form

trix can always be transformed to an upper Hessenberg matrix Hu by orthogonal

similarity:

PAP T = Hu:

The matrix A is routinely transformed to a Hessenberg matrix before the process of eigenvalue

computations (known as the QR iterations) starts. Hessenberg forms are also useful tools in

many other applications such as in control theory, signal processing, etc.

174

The idea of orthogonal factorization using Householder matrices described in the previous sec-

tion can be easily extended to obtain P and Hu .

The matrix P is constructed as the product of (n ; 2) Householder matrices P1 through Pn;2:

P1 is constructed to create zeros in the rst column of A below the entry (2,1), P2 is determined

to create zeros below the entry (3,2) of the second column of the matrix P1 AP1T , and so on.

The process consists of (n ; 2) steps. (Note that an n n Hessenberg matrix contains at

least (n;2)(2 n;1) zeros.)

1

0a 1 01

BB a CC BB 0 CC

21

31

@ . A @.A

1

an 1 0

Dene

I 0

!

P = 1

1

0 Pb1

and compute

A = P AP T :

(1)

1 1

BB C

C

BB CC

AA =B BB 0.

(1)

C CC

B@ .. .. CA

.

0

Step 2. Find a Householder matrix Pb of order (n ; 2) such that

2

0a 1 01

BB ... CC BB 0 CC

32

@ . A @.A

2

an 2 0

Dene

I 0

!

P = 2

2

0 Pb2

and compute A(2) = P2 A(1)P2T :

175

Overwrite A with A : Then

(2)

0 1

BB CCC

BB C

B0 C

AA =B BB (2) CC :

BB 0. 0 C CC

B@ .... .. CA

. .

0 0

The general Step k can now easily be written down.

At the end of (n ; 2) steps, the matrix A(n;2) is an upper Hessenberg matrix Hu:

Now,

Hu = A n; = Pn; A n; PnT;

( 2)

2

( 3)

2

= Pn;2(Pn;3A(n;4)PnT;3 )PnT;2

..

.

= (Pn;2Pn;3 P1 )A(P1T P2T PnT;3PnT;2 ): (5.4.4)

Set

P = Pn; Pn; P :

2 3 1 (5.4.5)

We then have Hu = PAP T : Since each Householder matrix Pi is orthogonal, the matrix P which

is the product of (n ; 2) Householder matrices is also orthogonal.

Ik 0

!

;

0 PbK

post-multiplication of Pk A by PkT does not destroy the zeros already created in Pk A: For example,

let n = 4 and k = 1: Then

01 0 0 01

BB 0 CC

P1 = B BB CC

CA

@ 0

0

0 1

BB CC

PA = BBB CC

@ 0 CA

1

0

176

and 0 101 0 0 0

1 0 1

B

B C

C BB 0 C

C BB C

C

B

P AP = B

T CC BB CC = BB CC :

1 1

B

@ 0 CA B@ 0 C A B@ 0 C A

0 0 0

1. Each Householder matrix Pk is uniquely determined by the vector un;k : It is therefore su-

cient to save the vector un;k to recover Pk later. If the matrix P is needed explicitly, it can

be computed from the Householder matrices P1 through Pn;2:

2. The vector un;k has (n ; k) elements, whereas the number of zeros produced at the kth step is

(n ; k ; 1). Thus, the (n ; k) elements of un;k can be stored in the appropriate lower triangular

part of A below the diagonal if the subdiagonal entry at that step is stored separately. This is

indeed a good arrangement, since subdiagonal entries in a Hessenberg matrix are very special

and play a special role in many applications. Thus, all the information needed to compute P

can be stored in the lower triangular part of A below the diagonal, storing the subdiagonal

entries separately in a linear array of (n ; 1) elements.

Other arrangements of storage are also possible.

Algorithm 5.4.3 Householder Hessenberg Reduction

Given an n n matrix A, the following algorithm computes Householder matrices P1 through

Pn;2 such that, with P T = P1 Pn;2, PAP T is an upper Hessenberg matrix Hu: The algorithm

overwrites A with Hu .

For k = 1; 2; : : :; n ; 2 do

1. Determine a Householder matrix Pbk = In;k ; 2un;k uTn;k =uTn;kun;k , of order n ; k, such that

0a 1 01

BB k ... ;k CC BB 0 CC

+1

@ . A @.A

ank 0

2. Save the vector un;k :

Ik 0

!

3. Dene Pk =

0 Pbk

177

4. Compute A(k) = Pk APkT

5. Overwrite A with A(k)

Flop-Count. The algorithm requires n
ops to compute Hu. This count does not include

5

3

3

another 32 n3 ops will be required. However, when n is large, the storage required to form P is

prohibitive.

351) that the computed Hu is orthogonally similar to a nearby matrix A + E , where

kE kF cn kAkF :

2

If the inner products are accumulated in double precision at the appropriate places in the

algorithm, then the term n2 in the above bound can be replaced by n, so in this case

kE kF cnkAkF ;

which is very desirable.

Example 5.4.4

Let 00 1 21

B C

A=B

@ 1 2 3 CA :

1 1 1

Since n = 3; we have just one step to perform.

Form Pb1:

1

! !

Pb1 =

1 0

! ! p

1 p 1 p 1! 1 + 2!

u2 = + 2e1 = + 2 =

1 1 0 1

! ! !

bP1 = I2 ; 2uu2uu22 1 0 ; :2929 5:8284 2:4142 = ;0:7071 ;0:7071 :

T

T

2 0 1 2:4142 1 ;0:7071 0:7071

178

Form P : 1

01 1 01

0 0 0 0

1

B0

P =B

CC = BB 0 ;0:7071 ;0:7071 CC

1 @ A @ A

0 Pb 1 0 ;0:7071 0:7071

0 0 ;2:1213 0:7071 1

A A = P AP T = B

B ;1:4142 3:5000 ;0:5000 CC = H :

(1)

@

1 1 A u

0 1:5000 ;0:5000

All computations are done using 4-digit arithmetic.

Tridiagonal Reduction

If the matrix A is symmetric, then from

PAP T = Hu

it follows immediately that the upper Hessenberg matrix Hu is also symmetric and, therefore, is

tridiagonal. Thus, if the algorithm is applied to a symmetric matrix A, the resulting matrix Hu

will be a symmetric tridiagonal matrix T . Furthermore, one obviously can take advantage of the

symmetry of A to modify the algorithm. For example, a signicant savings can be made in storage

by taking advantage of the symmetry of each A(k):

The symmetric algorithm requires only 23 n3
ops to compute T compared to 53 n3
ops needed to

compute Hu: The round-o property is essentially the same as the nonsymmetric algorithm. The

algorithm is stable.

Example 5.4.5

Let 00 1 11

B 1 2 1 CC :

A=B

@ A

1 1 1

Since n = 3; we have just one step to perform.

Form Pb1:

1

! !

Pb1 =

1 0

1

! p 1

! p 1 ! 1 + p2 !

u2 = + 2e1 = + 2 =

1 1 0 1

! ! !

bP1 = I2 ; 2uu2uu22 1 0 ; :2929 5:8284 2:4142 = ;0:7071 ;0:7071 :

T

T

2 0 1 2:4142 1:0000 ;0:7071 0:7071

179

Form P : 1 01 1 01 1

0 0 0 0

B0

P =B

CC = BB 0 ;0:7071 ;0:7071 CC :

1@ A @ A

0 Pb 1 0 ;0:7071 0:7071

Thus 0 0 ;1:4142 0 1

Hu = P AP T = B

B ;1:4142 2:5000 0:5000 CC :

1 @ 1 A

0 0:5000 0:5000

(Note that Hu is symmetric tridiagonal.)

Denition 5.5.1 A matrix of the form j thcolumns

ith

0 # # 1

BB 1 0 0

0 C

BB 0 1 0 0 C

C

BB ... ... ... .. C

BB . C CC

B0 0 0 c s 0 C

C

ith

J (i; j; c; s) = B BB .. .. .. .. C

C

BB . . . . C CC rows

BB 0 0 0 ;s c 0 C C j th

BB . . . .. C

B@ .. .. .. . C CA

0 0 0 0 1

where c2 + s2 = 1, is called a Givens matrix, after the numerical analyst Wallace Givens.

Since one can choose c = cos and s = sin for some , a Givens matrix as above can be

conveniently denoted by J (i; j; ). Geometrically, the matrix J (i; j; ) rotates a pair of coordinate

axes (ith unit vector as its x-axis and the j th unit vector as its y -axis) through the given angle

in the (i; j ) plane. That is why the Givens matrix J (i; j; ) is commonly known as a Givens

Rotation or Plane Rotation in the (i; j ) plane. This is illustrated in the following gure.

W. Givens was director of the Applied Mathematics Division at Argonne National Laboratory. His pioneering

work done in 1950 on computing the eigenvalues of a symmetric matrix by reducing it to a symmetric tridiagonal

form in a numerically stable way forms the basis of many numerically backward stable algorithms developed later.

Givens held appointments at many prestigious institutes and research institutions (for a complete biography, see the

July 1993 SIAM Newsletter). He died in March, 1993.

180

e2 + )

cos(

! cos ; sin ! cos

!

v= =

sin( + ) sin cos sin

cos

!

u=

sin

e1

Thus, when an n-vector 0x 1

BB x 1

CC

x=BBB .. 2 CC

CA

@.

xn

is premultiplied by the Givens rotation J (i; j; ), only the ith and j th components of x are aected;

the other components remain unchanged.

Note that since c2 + s2 = 1, J (i; j; ) J (i; j; )T = I , thus the rotation J (i; j; ) is orthogonal.

x

!

If x = 1

is a 2-vector, then it is a matter of simple verication that, with

x 2

c= p x ; s= p x 1 2

x +x 2

1

2

2 x +x 2

1

2

2

c s

! !

the Givens rotation J (1; 2; ) = is such that J (1; 2; )x = :

;s c 0

The above formula for computing c and s might cause some under
ow or over
ow. However,

the following simple rearrangement of the formula might prevent that possibility.

1+t

x

2

1

Otherwise, t = x , c = p 2 , s = ct.

2

1+t

(Note that computations of s and t do not involve .)

1

Example 5.5.1

181

1

!

x= 1

2

1+ 4 5 5

! 0 p2 p1 1 ! p5 !

c s 1

x = @ 15 25 A 1 = 2 :

;s c ;p p 2 0

5 5

Givens rotations are especially useful in creating zeros in a specied position in a vector. Thus,

if 0x 1

BB x CC

1

BB . CC 2

BB .. CC

BB x CC

x=B BB ..i CCC

BB . CC

BB xk CC

BB .. CC

@.A

xn

and if we desire to zero xk only, we can construct the rotation J (i; k; ) (i < k) such that J (i; k; )x

will have zero in the kth position. !

c s

To construct J (i; k; ), rst construct a 2 2 Givens rotation such that

;s c

! ! !

c s xi

=

;s c xk 0

and then form the matrix J (i; k; ) by inserting c in the positions (i; i) and (k; k), s and ;s

respectively in the positions (i; k) and (k; i), and lling the rest of the matrix with entries of the

identity matrix.

Example 5.5.2

011

B CC

x=B

@ ;1 A :

3

Suppose we want to create a zero in the third position, that is, k = 3. Choose i = 2.

182

1. Form a 2 2 rotation such that

c s ! ;1 ! !

= ; c = p;1 ; s = p3 :

;s c 3 0 10 10

01 0 0

1

B p; CC

2. Form J (2; 3; ) = B

@0 1

10

p310 A:

0 p;103 p;10

1

Then 01 0 10 1 1 0 1 1

0

B p; CC BB CC BB p CC

J (2; 3; )x = B

@0 1

10

p310 A @ ;1 A = @ 10 A :

0 p;103 p;10

1

3 0

Given an n-vector x, if we desire to zero all the entries of x except possibly the rst one, we

can construct J (1; 2; ), x(1) = J (1; 2; )x, x(2) = J (1; 3; )x(1), x(3) = J (1; 4; )x(2), etc., so that

with

P = J (1; n; ) J (1; 3; )J (1; 2; );

we will have Px a multiple of e1 . Since each rotation is orthogonal, so is P .

Example 5.5.3

011

B CC

x=B

@ ;1 A

2

0p 1

0 ;p1 1

Bp

J (1; 2; ) = B

2

0C

C p2

2

@ 1

2 A 1

0 0 1

0 p2 1

B CC

x(1) = J (1; 2; )x = B

@0A

2

0 q 2 0 p2 1

B 06 1 06 CC :

J (1; 3; ) = B

@ q2 A

;2 0

p 6 6

183

Then

0 p6 1

B CC

J (1; 3; )x = B

(1)

@0A

0

0 p16 ;

p16 p26

1

B

P = J (1; 3; )J (1; 2; ) = B

CC

q@ p12 p12

q q0 A

; 2 2 2

0 p6 1

6 6 6

B 0 CC :

Px = B

@ A

0

Flop-Count and Round-o Property. Creating zeros in a vector using Givens rotations

is about twice as expensive as using Householder matrices. To be precise, the process requires

only 1 21 times
ops as Householder's, but it requires O( n22 ) square roots, whereas the Householder

method requires O(n) square roots.

The process is as stable as the Householder method.

The idea of creating zeros in specied positions of a vector can be trivially extended to create

zeros in specied positions of a matrix as well. Thus, if we wish to create a zero in the (j; i)[(j > i)]

position of a matrix A, one way to do this is to construct the rotation J (i; j; ) aecting the ith

and j th rows only, such that J (i; j; )A will have a zero in the (j; i) position. The procedure then

is as follows.

Algorithm 5.5.1 Creating Zeros in a Specied Position of a Matrix Using Givens Ro-

tations

Given an n n matrix A, the following algorithm overwrites A by J (i; j; )A such that the latter

has a zero in the (j; i) position.

1. Find c = cos(); s = sin() such that

c s ! aii ! !

=

;s c aji 0

2. Form J (i; j; ):

184

Remark: Note that there are other ways to do this as well. For example, we can form J (k; j; )

aecting the kth and j th rows, such that J (k; j; )A will have a zero in the (j; i) position.

Example 5.5.4

Let 01 2 31

B C

A=B

@ 3 3 4 CA :

4 5 6

Create a zero in the (2,1) position of A using J (1; 2; ):

1. Find c and s such that

c s! 1

! x !

=

;s c 3 0

c= p ; 1

10

s = p310

0p 1 p310 0

1

B p; 10

C

J (1; 2; ) = B

@ 3

10

p110 0CA

0 0 1

0p p 10 11 p1510

1

B 0 ;p

J (1; 2; )A = B

10 10

5 C

p;10

C:

@ 3

10 A

4 5 6

Example 5.5.5

Let 01 2 31

B C

A=B

@ 2 3 4 CA :

4 5 6

Create a zero in the (3,1) position using J (2; 3; ):

1. Find c and s such that

c s 2

! ! !

=

;s c 4 0

c= p ; s= p

2

20

4

20

185

2. Form

01 0 0

1

B0 p

J (2; 3; ) = B p420

CC

@ 2

20 A

0 p;204 p220

01 0 0

101 2 31 0 1 2 3 1

B0 p

A = J (2; 3; )A = B

C BB 2 3 4 CC = BB p20 p26 p32 CC

p420 C

(1)

@ 2

20 A@ A @ 20 20 A

5.5.1 Givens Rotations and QR Factorization

It is clear from the foregoing discussion that, like Householder matrices, Givens rotations can also

be applied to nd the QR factorization of a matrix. The Givens method, however, is almost twice

as expensive as the Householder method. In spite of this, QR factorization using Givens rotations

seems to be particularly useful in QR iterations for eigenvalue computations and in the solution of

linear systems with structured matrices, such as Toeplitz, etc. Givens rotations are also emerging

as important tools in parallel computations of many important linear algebra problems.

Here we present the algorithm for QR factorization of an m n matrix A, m n, using Givens'

method.

The basic idea is just like Householder's: compute orthogonal matrices Q1; Q2; : : :; Qk , using

Givens rotations such that A(1) = Q1 A has zeros below the (1,1) entry in the rst column, A(2) =

Q2 A(1) has zeros below the (2,2) entry in the second column, and so on. Each Qi is generated as a

product of Givens rotations. One way to form fQi g is:

Q = J (1; m; )J (1; m ; 1; ) J (1; 2; );

1

2

and so on.

Let s = min(m; n ; 1): Then

R = A s; = Qs; A s; = Qs; Qs; A s; = = Qs; Qs; Q Q A = QT A:

( 1)

1

( 2)

1 2

( 3)

1 2 2 1

186

Theorem 5.5.1 (Givens QR Factorization Theorem) Given an m

n matrix A, m n, there exist s = min(m; n ; 1) orthogo-

nal matrices Q ; Q ; : : :; Qs; dened by Qi = J (i; m; )J (i; m ; 1; )

1 2 1

Q = QT QT ; ; QTs; ;

1 2 1

we have

A = QR;

where

R

!

R= 1

0

and R1 is upper triangular.

In practice there is no need to form the n n matrices J (k; `; ) and J (k; `; )A explicitly. Note

that J (k; `; ) can be determined by knowing c and s only, and J (k; `; )A replaces the kth and `th

rows of A by their linear combinations. Specically, the kth row of J (k; `; )A is c times the kth

row of A plus s times the `th row. Similarly, the `th row of J (k; `; ) is ;s times the kth row of A

plus c times the `th row. If the orthogonal matrix Q is needed explicitly, then it can be computed

from the product:

Q = QT1 QT2 QTs;1;

where each Qi is the product of (m ; i) Givens rotations:

Qi = J (i; m; )J (i; m ; 1; ) J (i; i + 1; ):

Given an m n matrix A, the following algorithm computes an orthogonal matrix Q, using

Givens rotations such that A = QR. The algorithm overwrites A with R.

For k = 1; 2; : : :; minfn; m ; 1g do

For ` = k + 1; : : :; m do

187

1. Find c and s such that

c s

! a ! !

kk

= :

;s c a`k 0

2. Save the indices k and ` and the numbers c and s.

3. Form J (k; `; ):

4. Overwrite A with J (k; `; )A:

Flop-Count. The algorithm requires 2n ;m ; n
ops. This count, of course, does not

2

3

include computation of Q. Thus, the algorithm is almost twice as expensive as the House-

holder algorithm for QR factorization.

Round-o Property. The algorithm is quite stable. It can be shown that the computed Qb

and Rb satisfy

Rb = Qb T (A + E );

where

kE kF ckAkF ; c is a constant of order unity (Wilkinson AEP, p. 240).

We have seen that QR factorization of a matrix A always exists and that this factorization may be

obtained in dierent ways. One therefore wonders if this factorization is unique. In the following we

will see that if A is nonsingular then the factorization is essentially unique. Moreover, if

the diagonal entries of the upper triangular matrices of the factorizations obtained by two dierent

methods are positive, then these two QR factorizations of A are exactly the same.

To see this, let

A = Q1 R1 = Q2 R2: (5.5.1)

Since A is nonsingular, then R1 and R2 are also nonsingular. Note that

det A = det(Q1R1) = det Q1 det R1

and

det A = det(Q2R2) = det Q2 det R2:

Since Q1 and Q2 are orthogonal, their determinants are 1. Thus det R1 and det R2 are dierent

from zero. From (5.5.1), we have

QT2 Q1 = R2R;1 1 = V;

188

so that V T V = QT1 Q2 QT2 Q1 = I . Now V is upper triangular (since R;1 1 and R2 are both upper

triangular), so equating elements on both sides, we see that V must be a diagonal matrix with 1

as diagonal elements. Thus,

R R; = V = diag(d ; d ; : : :; dn); with di = 1; i = 1; 2; : : :; n.

2 1

1

1 2

This means that R2 and R1 are the same except for the signs of their rows. Similarly, from

QT Q = V

2 1

we see that Q1 and Q2 are the same except for the signs of their columns. (For a proof, see Stewart

IMC, p. 214.)

If the diagonal entries of R1 and R2 are positive, then V is the identity matrix so that

R =R 2 1

and

Q =Q :

2 1

The above result can be easily generalized to the case where A is m n and has linearly

independent columns.

columns. Then there exist a unique matrix Q with orthonormal columns and a

unique upper triangular matrix R with positive diagonal entries such that

A = QR:

Example 5.5.6

We nd the QR factorization of 00 1 11

B C

A=B

@ 1 2 3 CA

1 1 1

using Givens Rotations and verify the uniqueness of this factorization.

189

Step 1. Find c and s such that

c s a ! ! !

= 11

;s c a 0 21

a = 0; a = 1

11 21

c = 0; s=1

0 0 1 01

B ;1 0 0 CC

J (1; 2; ) = B

@ A

0 0 1

0 0 1 0100 1 11 01 2 3 1

B CB C B C

A J (1; 2; )A = B

@ ;1 0 0 CA B@ 1 2 3 CA = B@ 0 ;1 ;1 CA

0 0 1 1 1 1 1 1 1

Find c and s such that

c s a ! ! !

= 11

;s c a 0 31

a = 1; c = p ; s = p

11

1 1

0p 0 p 1

2 2

1 1

B C 2 2

J (1; 3; ) = B

@ 0 1 0 CA

0 ;1

p p12

0 p1 0 p1 1 0 1 2 1 0p p 1

2

3 2 p32 2 2

B 2

A J (1; 3; )A = @ 0 1 0 C

B

2

C BB 0 ;1 C = BB 0 ;1 ;1 CC

;1 C

A@ A @ p A

p;12 0 p12 1 1 1 0 p;12 ; 2

Step 2. Find c and s such that

c s! a ! !

=

22

;s c a 0 32

p

a = ;1;

22 a = ;p ; c = ;p ; s = ;p

32

1

2

2

3

1

3

01 0

10p p 0

p 1

2 3

2 2

B ;pp2 C BB C 2

J (2; 3; )A = B

@0 3

; pp C

A @ 0 ;1 ;p1 CA

1

3

0 p13 ;p 2

3

0 ;p ; 2 1

2

0 p2 p3

p 1 0

2 2 1:4142 2:1213 2:8284

1

B p23 p C B C

= B

@0 p2 p C2

A = B@ 0 1:2247 1:6330 CA = R

2

3

0 0 p13 0 0 0:5774

(using four digit computations).

190

Remark: Note that the upper triangular matrix obtained here is essentially the same as the

one given by the Householder method earlier (Example 5.4.2), diering from it only in the signs of

the rst and third rows.

The Givens matrices can also be employed to transform an arbitrary n n matrix A to an up-

per Hessenberg matrix Hu by orthogonal similarity: PAP T = Hu : However, to do this, Givens

rotations must be constructed in a certain special manner. For example, in the rst step,

Givens rotations J (2; 3; ); J (2; 4; ); : : :; J (2; n; ) are successively computed so that with P1 =

J (2; n; ) J (2; 4; )J (2; 3; ),

0 1

BB C CC

BB C

P AP T = A = B BB 0. C

.. C

(1)

1 1

B@ .. .. C

. .C A

0

In the second step, Givens rotations J (3; 4; ), J (3; 5; ) : : :; J (3; n; ) are successively computed so

that with P = J (3; n; )J (3; n ; 1; ) J (3; 4; ),

2

0 1

BB CC

BB CC

B 0 C

P A PT = A = B

(1) (2)

BB .. C C

2

0

BB . . .0 .C

.. C

2

B@ .. .. .. C

.C A

0 0

and so on. At the end of (n ; 2)th step, the matrix A n; is the upper Hessenberg matrix Hu. The

( 2)

P = Pn; Pn; P P ;

2 3 2 1

Algorithm 5.5.3 Givens Hessenberg Reduction

Given an n n matrix A, the following algorithm overwrites A with PAP T = Hu, where Hu is

an upper Hessenberg matrix.

191

For p = 1; 2; : : :; n ; 2 do

For q = p + 2; : : :; n do

1. Find c = cos() and s = sin() such that

c s ap ;p

! ! !

=+1

;s c aq;p 0

2. Save c and s and the indices p and q:

3. Overwrite A with J (p + 1; q; )AJ (p + 1; q; )T :

Forming the Matrix P and Other Computational Details

There is no need to form J (p + 1; q; ) and J (p + 1; q; )A explicitly, since they are completely

determined by p; q and c and s (see the section at the end of the QR factorization algorithm using

Givens rotations). If P is needed explicitly, it can be formed from

P = Pn; P P

2 2 1

10

3

3 5

3

3

quired by the Householder method. Thus, the Givens reduction to Hessenberg form is

about twice as expensive as the Householder reduction. If the matrix A is symmetric, then

the algorithm requires about 34 n3
ops to transform A to a symmetric tridiagonal matrix T ; again

this is twice as much as required by the Householder method to do the same job.

Round-o Property. The round-o property is essentially the same as the Householder

method. The method is numerically stable.

Example 5.5.7

00 1 11

B 1 2 3 CC

A=B

@ A

1 1 1

192

Step 1. Find c and s such that

c s! a ! !

=

21

;s c a 0 31

a = a = 1; c = p ; s = p

21 31

1

2

1

2

01 0 0

1

B0

J (2; 3; ) = B p12 p12

CC

@ A

0 ; p12 p12

0 0 2:1213 0:7171

1

B C

A J (2; 3; )AJ (2; 3; )T = B

@ 1:4142 3:5000 0:5000 CA

0 ;1:5000 ;0:5000

= Upper Hessenberg.

Observation: Note that the Upper Hessenberg matrix obtained here is essentially the same as

that obtained by Householder's method (Example 5.4.4) the subdiagonal entries dier only in sign.

The above example and the observation made therein brings up the question of uniqueness in

Hessenberg reduction. To this end, we state a simplied version of what is known as the Implicit

Q Theorem. For a complete statement and proof, see Golub and Van Loan (MC 1983, pp.

223-234).

such that P T AP = H and QT AQ = H are two unreduced upper Hessenberg

1 2

matrices. Suppose that P and Q have the same rst columns. Then H1 and H2 are

essentially the same in the sense that H2 = D;1 H1D, where

D = diag(1; : : :; 1):

193

Example 5.5.8

Consider the matrix 00 1 21

B C

A=B

@ 1 2 3 CA

1 1 1

once more. The Householder method (Example 5.4.4) gave

0 0 ;2:1213 0:7171 1

B C

H1 = P1AP1T = B @ ;1:4142 3:5000 ;0:5000 CA

0 1:5000 ;0:5000

The Givens method (Example 5.5.7) gave

0 0 2:1213 0:7171

1

B C

H2 = J (2; 3; )AJ (2; 3; )T = B

@ 1:4142 3:5000 0:5000 CA = H2:

0 ;1:5000 ;0:5000

In the notation of Theorem 5.5.3 we have

P =P ; 1 QT = J (2; 3; ):

Both P and Q have the same rst columns, namely, the rst column of the identity. We verify that

H = D; H D

2

1

1

and

D = diag(1; ;1; 1):

Let A be m n, where m n.

Consider the QR factorization of A:

R!

QT A = :

0

Assume that A has full rank n. Partition Q = (Q1; Q2), where Q1 has n columns. Then the

columns of Q1 form an orthonormal basis for R(A). Similarly, the columns of Q2 form

an orthonormal basis for the orthogonal complement of R(A). Thus, the matrix

PA = Q QT 1 1

is the orthogonal projection onto R(A) and the matrix PA? = Q2QT2 is the projection onto

the orthogonal complement of R(A). Since the orthogonal complement of R(A) is denoted by

R(A)? = N (AT ), we shall denote PA? by PN .

194

Example 5.6.1

0 1 1

1

B 0:0001

A=B 0 C

C

@ A

0 0:0001

Using the results of Example 5.4.3 we have

0 ;1 0:0001 ;0:0001

1

Q = B

B ;0:0001 ;0:7071 0:7071 CC

@ A

0 0:7071 0:7071

0 ;1 0:0001 1 0 ;0:0001 1

B C B C

Q1 = B @ 0:0001 ;0:7071 CA ; Q2 = B@ 0:7071 CA

0 0:7071 0:7071

0 1:000 0:0003 0:0007 1

B C

PA = Q1QT1 = B @ 0:0003 0:5000 ;0:5000 CA

0 0;:00007 ;0:5000 0:5000

:0001 1

PN = PA? = Q2QT2 = B

B 0:7071 CC ( ;0:0001 0:7071 0:7071)

@ A

0:7071

0 0:00000001 ;0:0001 ;0:0001 1

B C

= B @ ;0:0001 0:5000 0:5000 CA

;0:0001 0:5000 0:5000

Projection of a vector

Given an m-vector b, the vector bR , the projection of b onto R(A), is given by

bR = PA b:

Similarly b? , the projection of b onto the orthogonal complement of R(A), is given by

b? = PA? b = PN b;

where we denote PA? by PN .

Note that b = bR + b?. Again, since the orthogonal complement of R(A) = N (AT ), the null

space of AT , we denote b? by bN for notational convenience.

195

Note: Since bN = b ; bR, it is tempting to compute bN just by subtracting bR from b once

bR has been computed. This is not advisable, since in the computation of bN from bN = b ; bR,

cancellation can take place when bR b.

Example 5.6.2

01 21 011

B CC B CC

A = B

@0 1A; b=B

@1A;

1 0 1

0 ;0:7071 ;0:5774 ;0:4082 1

B C

Q = B@ 0 ;0:5774 0:8165 CA

;0:7071 0:5774 0:4082

0 ;0:7071 ;0:5744 1

B C

Q1 = B

@ 0 ;0:5774 C

A

;0:7071 0:5774

0 0:8334 0:3334 0:1666 1

PA = Q QT = B

B 0:3334 0:3334 ;0:3334 CC

1 1 @ A

0:1666 ;0:3334 0:8334

0 1:3340 1

BB C

bR = PA b = @ 0:3334 CA

0:6666

0 0:1667 ;0:3333 ;0:1667 1

BB C

PN = @ ;0:3333 0:6667 0:3333 CA

;0:1667 0:3333 0:1667

0 ;0:3334 1

BB C

bN = PN b = @ 0:6666 CA

;0:3334

Example 5.6.3

0 1 1

1 0 2 1

B 0:0001

A = B 0 C

C; B 0:0001 CC

b=B

@ A @ A

0 0:0001 0:0001

196

0 ;1 ;0:0001 ;0:0081 1

B C

Q = B

@ ;0:0001 ;0:7071 0:7071 CA

0:1 0:7071 0:7071

0 1 0:0001 0:0001

1

PA = Q1QT1 = B

B 0:0001 0:5000 ;0:5000 CC

@ A

0:0001 ;5:0000 0:5000

0 2 1

B 0:0001 CC

bR = PA b = B

@ A

0:0001

001

bN = B

B 0 CC :

@ A

0

Projection of a Matrix Onto the Range of Another Matrix

Let B = (b1; : : :; bn) be an m n matrix. We can then think of projecting each column of B

onto R(A) and onto its orthogonal complement. Thus, the matrix

BR = PA(b ; : : :; bn) = PA B

1

BN = PN B

is the projection of B onto the orthogonal complement of R(A).

Example 5.6.4

01 21 01 2 31

B CC B C

A=B

@2 3A; @ 2 3 4 CA

B=B

4 5 3 4 5

A = QR gives

0 ;0:2182 ;0:8165 ;0:5345 1

B C

Q=B

@ ;0:4364 ;0:4082 0:8018 CA

;0:8729 0:4082 ;0:2673

197

Orthonormal Basis for R(A):

0 ;0:2182 ;0:8165 1

QB

B ;0:4364 ;0:4082 CC

@ 1 A

;0:8729 0:4082

0 0:7143 0:4286 ;0:1429 1

B C

PA = Q QT = B

1@ 0:4286 0:3571 0:2143 CA

1

Orthogonal Projection of B onto R(A):

0 1:1429 2:1429 3:1429 1

B C

PA B = B

@ 1:7857 2:7857 3:7857 CA :

3:0714 4:0714 5:0714

5.7 QR Factorization with Column Pivoting

If an m n (m n) matrix A has rank r < n, then the matrix R is singular. In this case the QR

factorization cannot be employed to produce an orthonormal basis of R(A).

To see this, just consider the following simple 2 2 example from Bjorck (1992, p. 31):

0 0 c ;s

! ! 0 s!

A= = = QR:

0 1 s c 0 c

If c and s are chosen such that c2 + s2 = 1, rank(A) = 1 < 2, and the columns of Q do not form an

orthonormal basis of R(A) nor for its complement.

Fortunately, however, the process of QR factorization (for example, the Householder method)

can be modied to yield an orthonormal basis. The idea here is to generate a permutation matrix

P such that

AP = QR;

where

R R

!

R= 11 12

:

0 0

Here R11 is r r upper triangular and r is the rank of A, and Q is orthogonal. The rst r columns

of Q will then form an orthonormal basis of R(A). The following theorem guarantees the existence

of such a factorization:

198

Theorem 5.7.1 (QR Column Pivoting Theorem) Let A be an m n matrix

with rank(A) = r min(m; n): Then there exist an n n permutation matrix P

and an m m orthogonal matrix A such that

R R

!

QT AP = 11 12

;

0 0

where R11 is an r r upper triangular matrix with positive diagonal entries.

AP = (A ; A ); 1 2

where A1 is = m r and has linearly independent columns. Consider now the QR factorization

of A1: !

T R11

Q A1 = ;

0

where by the uniqueness theorem (Theorem 5.5.2), Q and R11 are uniquely determined and R11

has positive diagonal entries. Then

R R

!

QT AP = (QT A ; QT A )= 11 12

:

1

0

2

22 R

Since rank(Q AP ) = rank(A) = r, and rank(R11) = r; we must have R22 = 0:

T

There are several ways one can think of creating this permutation matrix P . We present here

one such way, which is now known as QR factorization with column pivoting.

The permutation matrix P is formed as the product of r permutation matrices P1 through Pr .

These permutation matrices are applied to A one by one, before forming Householder matrices to

create zeros in appropriate columns. More specically, the following is done:

Step 1. Find the column of A having the maximum norm. Permute now the columns of A

so that the column of maximum norm becomes the rst column.

This is equivalent to forming a permutation matrix P1 such that the matrix AP1 has the rst

column having the maximum norm. Form now a Householder matrix H1 so that

A = H AP

1 1 1

199

Step 2. Find the column with the maximum norm of the submatrix A^ obtained from A 1 1

by deleting the rst row and the rst column. Permute the columns of this submatrix so that

the column of maximum norm becomes the rst column. This is equivalent to constructing a

permutation matrix from P^2 such that the second column of A^1 P^2 has the maximum norm. Form

P2 from P^2 in the usual way, that is:

01 0 01

BB 0 CC

P2 = BB CC :

B@ ... P^2 C A

0

Now construct a Householder matrix H2 so that

A = H A P = H H AP P

2 2 1 2 2 1 1 2

has zeros in the second column of A2 below the (2,2) entry. As before, H2 can be constructed in

two steps as in Section 5.3.1:

The kth step can now easily be written down.

The process is continued until the entries below the diagonal of the current matrix all become

zero.

Suppose r steps are needed. Then at the end of the rth step, we have

A A r = Hr H AP Pr

( )

1 1

!

R R

= QT AP = R = 11 12

:

0 0

Flop-Count and Storage Consideration. The above method requires 2mnr ; r (m + n)+ 2

r

2 3

3

ops. The matrix Q, as in the Householder factorization, is stored in factored form. The

matrix Q can be stored in factored form in the subdiagonal part of A and A can overwrite R.

Example 5.7.1

00 01

B C

@ 1 CA

A=B 1

2

= (a1; a2):

1

2

1 32

200

Step 1. a has the largest norm. Then

2

0 1

!

P =

1

1 0

00 01

B 1 CC

AP = B

1 @ A 1

2

1 1

2

00 ;1

p ;p1 1

B 2

;1 C

2

H = B

1 @ p; 1

2

1

2

C

2 A

;1

p ;1 1

1 0 ;p2 p; 1

2 2

00 ;p1 1 0 0

2

;1

p 0 1

B p;

A = H AP = B

2

;1 C

2

C BB 1 CC BB 0 0 CC = R

2

(1)

@

1 1

1

2

1

2 2 A@ 1

2 A=@ A

;1

p ;1 1

1 1

0 0

2 2 2

!

2

R R

R = 1 2

:

0 0

Thus, for this example

Q = HT = H ; 1

0 1

! 1

P = :

1

1 0

0 0 1

p B ;q CC forms an

The matrix A has rank 1, since R = 2 is 1 1. The column vector B

1 @ 1

2 A

;p1

orthonormal basis of R(A).

2

It is easy to see that the submatrix (R11; R12) can further be reduced by using orthogonal

transformations, yielding !

T 0

:

0 0

201

Theorem 5.7.2 (Complete Orthogonalization Theorem) Given Amn with

rank(A) = r, there exist orthogonal matrices Qmm and Wnn such that

T 0

!

QT AW = ;

0 0

where T is an r r upper triangular matrix with positive diagonal entries.

The above decomposition of A is called the complete orthogonal decomposition.

Rank-Revealing QR

The above process, known as the QR factorization with column pivoting, was developed

by Golub (1965). The factorization is known as the rank-revealing QR factorization, since in

exact arithmetic it reveals the rank of the matrix A which is the order of the nonsingular upper

triangular matrix R11. However, in the presence of rounding errors, we will actually have

R R

!

R= 11 12

;

0 R 22

and if R22 is \small" in some measure (say, kR22k is of O(), where is the machine precision),

then the reduction will be terminated. Thus, from the above discussion, we note that, given an

m n matrix A (m n), if there exists a permutation matrix P such that

R R

!

QT AP = R = 11 12

;

0 22 R

where R11 is r r, and R22 is small in some measure, then we will say that A has numerical rank r.

(For more on numerical rank, see Chapter 10, Section 10.5.5.)

Unfortunately, the converse is not true.

A celebrated counterexample due to Kahan (1966) shows that a matrix can be nearly rank-

decient without having kR22k small at all.

Gene H. Golub, an American mathematician and computer scientist, is well known for his outstanding contribu-

tion in numerical linear algebra, especially in the area of the singular value decomposition (SVD), least squares, and

their applications in statistical computations. Golub is a professor of computer science at Stanford University and

is the co-author of the celebrated numerical linear algebra book \Matrix Computations". Golub is a member of the

National Academy of Sciences and a past president of SIAM (Society for Industrial and Applied Mathematics).

202

Consider 01

;c ;c ;c 1

BB 01 ;c ;c C

C

BB .

... ... .. C

C=R

A = diag(1; s; : : :; s ) B

n;

BB ... 1

. C

C

B@ ... . . . . . ;c CCA

0 0 1

with c + s = 1; c; s > 0. For n = 100; c = 0:2, it can be shown that A is nearly singular (the

2 2

smallest singular value is O(10; )). On the other hand, rnn = sn; = :133, which is not small, so

8 1

The question whether at any stage R becomes really small for any matrix has been investigated

22

Suppose the QR factorization of an m k matrix A = (a1; : : :; ak ); (m k) has been obtained. A

vector ak+1 is now appended to obtain a new matrix:

A0 = (a ; : : :; ak; ak ):

1 +1

It is natural to wonder how the QR factorization of A0 can be obtained from the given QR factor-

ization of A, without nding it from scratch.

The problem is called the updating QR factorization problem. The downdating QR fac-

torization is similarly dened. The updating and downdating QR factorization arise in a variety of

practical applications, such as signal and image processing.

We present below a simple algorithm using Householder matrices to solve the updating problem.

Algorithm 5.8.1 Updating QR Factorization Using Householder Matrices

The following algorithm computes the QR factorization of A0 = (a1; : : :; ak ; ak+1) given the

Householder QR factorization of A = (a1; : : :; ak).

Step 1. Compute bk = Hk H ak , where H through Hk are Householder matrices such that

+1 1 +1 1

R

!

QT A = Hk H A = : 1

0

Step 2. Compute a Householder matrix Hk so that Hk bk = rk is zero in entries k +

+1 +1 +1 +1

2; : : :; m.

" R! #

Step 3. Form R0 = ; rk .

0 +1

203

Step 4. Form Q0 = Hk H .

+1 1

Theorem 5.8.1 R0 and Q0 dened above are such that (Q0)T A0 = R0.

Example 5.8.1

011

B CC

A = B

@2A

3

0 ;0:2673 ;0:5345 ;0:8018 1

B C

H H = QT = B

2 1 @ ;0:5345 0:7745 ;0:3382 CA

1

0 ;3:7417 1

B 0 CC

R = B

@ A

0

01 11

B CC

A0 = B

@2 4A

3 5

Step 1.

0 ;0:2673 ;0:5345 ;0:8018 1

B C

H = B

1 @ ;0:5345 0:7745 ;0:3382 CA

;0:8018 ;0:3382 0:4927

0 ;3:7417 1

B CC

R = B

@ 0 A

0

0 ;6:4143 1

B C

b =H a = B

2 @ 0:8227 CA

1 2

0:3091

Step 2. 01 1 0 ;6:4143 1

0 0

B

H =B

C B C

@ 0 ;0:9426 ;0:3339 CA ;

2 r =B

2 @ ;0:9258 CA

0 ;0:3339 0:9426 0

204

Step 3.

0 ;3:7417 ;6:4143 1

B C

R0 = (R; r ) = B

@ 0

2 ;0:9258 C

A

0 0

0 ;0:2673 ;0:5345 ;0:8018 1

Q0 = H H =B

B 0:7715 ;0:6172 0:1543 CC

2 @1 A

;0:5773 ;0:5774 0:5774

01 1

1

Verication: (Q0)T R0 = BB@ 2 4C

C = A0.

A

3 5

5.9 Summary and Table of Comparisons

For easy reference we now review the most important aspects of this chapter.

Elementary Lower Triangular matrix: An n n matrix E of the form E = I + meTk ,

where m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T ; is called an elementary lower triangular matrix.

If E is as given above, then E ;1 = I ; meTk .

Householder matrix: An n n matrix H = I ; 2uuu

T

T u , where u is an n-vector is called a

Householder matrix.

A Householder matrix is symmetric and orthogonal.

Givens matrix: A Givens matrix J (i; j; c; s) is an identity matrix except for (i; i); (i; j); (j; i)

and (j; j ) entries, which are, respectively, c; s; ;s; and c, where c + s = 1.

2 2

LU factorization: A factorization of A in the form A = LU , where L is unit lower triangular

and U is upper triangular, is called an LU factorization of A. An LU factorization of matrix

A does not always exist. If the leading principal minors of A are all dierent from zero, then

the LU factorization of A exists and is unique (Theorem 5.2.1).

The LU factorization of a matrix A, when it exists, is achieved using elementary lower trian-

gular matrices. The process is called Gaussian elimination without row interchanges.

205

The process is ecient, requiring only n33 ops, but is unstable for arbitrary matrices. Its use

is not recommended in practice unless A is symmetric positive denite or column diagonally

dominant. For decomposition of A into LU in a stable way, row interchanges (Gaussian

elimination with partial pivoting) or both row and column interchanges (Gaussian elimination

with complete pivoting) to identify an appropriate pivot will be needed. Gaussian elimination

with partial and complete pivoting yield factorizations MA = U and MAQ = U , respectively.

QR Factorization. Every matrix A can always be written in the form A = QR, where Q is

orthogonal and R is upper triangular. This is called QR factorization of A.

The QR factorization of a matrix A is unique if A has linearly independent columns (Theo-

rem 5.4.2).

The QR factorization can be achieved using either Householder or Givens matrices. Both

methods have guaranteed numerical stability. The Householder method is more ecient

than the Givens method ( 2n3 3 ops versus 4n3 3 ops (approximately)), but the Givens matri-

ces are emerging as useful tools in parallel matrix computations and for computations with

structured matrices. The Gram-Schmidt and modied Gram-Schmidt processes for QR

factorization are described in Chapter 7.

3. Hessenberg Reduction.

The Hessenberg form of a matrix is a very useful condensed form. We will see its use throughout

the whole book.

An arbitrary matrix A can always be transformed to an upper Hessenberg matrix by orthogonal

similarity: Given an n n matrix A, there always exists an orthogonal matrix P such that PAP T =

Hu, an upper Hessenberg matrix (Theorem 5.4.2).

This reduction can be achieved using elementary, Householder or Givens matrices. We have

described here methods based on Householder and Givens matrices (Algorithms 5.4.3 and 5.5.3).

Both the methods have guaranteed stability, but again, the Householder method is more ecient

than the Givens method.

For the aspect of uniqueness in Hessenberg reduction, see the statement of the Implicit Q

Theorem (Theorem 5.5.3). This theorem basically says that if a matrix A is transformed by

orthogonal similarity to two dierent unreduced upper Hessenberg matrices H1 and H2 using two

transforming matrices P and Q, then H1 and H2 are essentially the same, provided that P and Q

have the same rst columns.

206

4. Orthogonal Bases and Orthogonal Projections.

R

!

If Q A =

T is the QR factorization of an m n matrix (m n), then the columns of Q form

0 1

an orthonormal basis for R(A) and the columns of Q2 form an orthonormal basis for the orthogonal

complement of R(A), where Q = (Q1; Q2) and Q1 has n columns.

If we let B = (b1; : : :; bn) be an m n matrix, then the matrix BR = PA (b1; : : :; bn) = PA B ,

where PA = Q1 QT1 , is the orthogonal projection of B onto R(A).

Similarly, the matrix BN = PN B is the projection of B onto the orthogonal complement of

R(A), where PN = PA? = Q2QT2 .

Factorization.

If A is rank-decient, then the ordinary QR factorization cannot produce an orthonormal basis of

R(A).

In such a case, the QR factorization process needs to be modied. A modication, called the

QR factorization with column pivoting, has been discussed in this chapter. Such a factorization

always exists (Theorem 5.7.1).

A process to achieve such a factorization using Householder matrices due to Golub has been

described brie y in Section 5.6.

The QR factorization with column pivoting, in exact arithmetic, reveals the rank of A. That

is why it is called rank-revealing QR factorization. However, in the presence of rounding errors,

such a rank-determination procedure is complicated and not reliable. Finally, we have presented a

simple algorithm to modify the QR factorization of a matrix (updating QR).

6. Table of Comparisons.

We now summarize in the following table eciency and stability properties of some of these major

computations. We assume that A is m n (m n).

207

TABLE 5.1

TABLE OF COMPARISONS

PROBLEM METHOD FLOP-COUNT STABILITY

(APPROXIMATE)

2

;n 3

6

Unstable

without row interchange

2

;n Stable in

3

6

2

;n 3

6

Stable

with complete pivoting (+O(n3) comparisons)

QR Factorization Householder n (m ; n )

2

3

Stable

5 3

Stable

3

an n n matrix

10 3

Stable

3

an n n matrix

Column Pivoting ;2r2(m + n) + 23r3 ,

r = rank(A)

Concluding Remarks: Gaussian elimination without pivoting is unstable; Gaussian

elimination with partial pivoting is stable in practice; Gaussian elimination with com-

plete pivoting is stable. >From the above table, we see that the process of QR factorization

and that of reduction to a Hessenberg matrix using Householder transformations are most ecient.

The methods are numerically stable. However, as remarked earlier, Givens transformations are

useful in matrix computations with structured matrices, and they are emerging as important tools

for parallel matrix algorithms. Also, it is worth noting that Gaussian elimination can be

208

used to transform an arbitrary matrix to an upper Hessenberg matrix by similarity.

For details, see Wilkinson AEP, pp. 353-355.

5.10 Suggestions for Further Reading

The topics covered in this chapter are standard and can be found in any numerical linear algebra

text. The books by Golub and Van Loan (MC) and that by G. W. Stewart (IMC) are rich sources

of further knowledge in this area.

The book MC in particular contains a thorough discussion on QR factorization with column

pivoting using Householder transformations. (Golub and Van Loan MC, 1984, pp. 162-167.)

The book SLP by Lawson and Hanson contains in-depth discussion of triangularization using

Householder and Givens transformations, and QR factorization with column pivoting (Chapters 10

and 15).

The details of error analysis of the Householder and the Givens methods for QR factorization

and reduction to Hessenberg forms are contained in AEP by Wilkinson.

For error analyses of QR factorization using Givens transformations and variants of Givens

transformations, see Gentleman (1975).

A nice discussion on orthogonal projection is given in the book Numerical Linear Algebra

and Optimization by Philip E. Gill, Walter Murray, and Margaret H. Wright, Addison Wesley,

1991.

209

Exercises on Chapter 5

(Use MATLAB, whenever appropriate and necessary)

PROBLEMS ON SECTIONS 5.2 and 5.3

1. (a) Show that an elementary lower triangular matrix has the form

E = I + meTk ;

where m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T .

(b) Show that the inverse of E in (a) is given by

E ; = I ; meTk :

1

2. (a) Given !

0:00001

a= ;

1

using 3-digit arithmetic, nd an elementary matrix E such that Ea is a multiple of e1 .

(b) Using your computations in (a), nd the LU factorization of

0:00001 1

!

A=

1 2

(c) Let L^ and U^ be the computed L and U in part (b). Find

kA ; L^ U^ kF ;

kAkF

where k kF is the Frobenius norm.

(n;1)

3. Show that the pivots a11; a11

22; : : :; ann are nonzero i the leading principal minors of A are

nonzero.

Hint: Show that

(r ;1)

det Ar = a11 a(1)

22 : : :; arr :

4. Let A be a symmetric positive denite matrix. At the end of the rst step of the LU factor-

ization of A, we have 0 1

a a a n

BB 0

11 12 1

CC

BB CC

A =B

(1)

BB 0 CC

B@ .. CC

. A0 A

0

210

Prove that A0 is also symmetric and positive denite. Hence show that LU factorization of a

symmetric positive denite matrix using Gaussian elimination without pivoting always exists

and is unique.

5. (a) Repeat the exercise #4 when A is a diagonally dominant matrix; that is, show that LU

factorization of a diagonally dominant matrix always exists and is unique.

(b) Using (a), show that a diagonally dominant matrix is nonsingular.

6. Assuming that LU factorization of A exists, prove that

(a) A can be written in the form

A = LDU ; 1

where D is diagonal and L and U1 are unit lower and upper triangular matrices, respec-

tively.

(b) If A is symmetric, then

A = LDLT :

(c) If A is symmetric and positive denite, then

A = HH T ;

where H is a lower triangular matrix with positive diagonal entries. (This is known

as the Cholesky decomposition.)

7. Assuming that LU factorization of A exists, develop an algorithm to compute U by rows and

L by columns directly from the equation:

A = LU:

This is known as Doolittle reduction.

8. Develop an algorithm to compute the factorization

A = LU;

where U is unit upper triangular and L is lower triangular. This is known as Crout reduc-

tion.

Hint: Derive the algorithm from the equation A = LU .

211

9. Compare the Doolittle and Crout reductions with Gaussian elimination with respect to
op-

count, storage requirements and possibility of accumulating inner products in double preci-

sion.

10. A matrix G of the form

G = I ; geTk ;

is called a Gauss-Jordan matrix. Show that, given a vector x with the property that

eTk x 6= 0; there exists a Gauss-Jordan matrix G such that

Gx is a multiple of ek.

Develop an algorithm to construct Gauss-Jordan matrices G1; G2; : : :; Gn successively such

that

Derive conditions under which Gauss-Jordan reduction can be carried to completion.

Give op-count for the algorithm and compare it with those of Gaussian elimination, Crout

reduction and Doolittle reductions.

11. Given 01 2 31

B 2 5 4 CC ;

A=B

@ A

3 4 5

nd

(a) LU factorization of A using Gaussian elimination and Doolittle reduction.

(b) LU factorization of A using Crout reduction (note that U is unit upper triangular and

L is lower triangular).

12. Apply the Gauss-Jordan reduction to A of the problem #11.

13. (a) Let A be m n and let r = minfm ; 1; ng. Develop an algorithm to construct elementary

matrices E1; : : :; Er such that

Er Er; E E A

1 2 1

212

(b) Show that the algorithm requires about r3
ops.

3

01 21

B CC

A=B

@4 5A

6 7

14. Given a tridiagonal matrix A with nonzero o-diagonal entries, write down a set of simple

conditions on the entries of A that guarantees that Gaussian elimination can be carried to

completion.

15. Assuming that LU decomposition exists, from

A = LU

show that, when A is tridiagonal, L and U are both bidiagonals. Develop a scheme for

computing L and and U in this case and apply your scheme to nd LU factorization of

04 1 0 01

BB 1 4 1 0 CC

A=B BB CC

@ 0 1 4 1 CA

0 0 1 4

16. Prove that the matrix L in each of the factorizations PA = LU and PAQ = LU , obtained

by using Gaussian elimination with partial and complete pivoting, respectively, is unit lower

triangular.

00 1 0 01

BB 0 0 1 0 C C

17. Given A = B BB C

C

C ;

@ 0 0 0 1 A

2 3 4 5

Find a permutation matrix P , a unit lower triangular matrix L, and an upper triangular

matrix U such that PA = LU .

18. For each of the following matrices nd

(a) permutation matrices P1 and P2 and elementary matrices M1 and M2 such that

MA = M2P2M1P1A is an upper triangular matrix.

(b) permutation matrices P1 ; P2 ; Q1 ; Q2 and elementary matrices M1 and M2 such that

MAQ = M1 (P2(M1P1AQ1))Q2) is an upper triangular matrix.

213

01 1 1

1

B 2 3

C

i. A = B

@ 1

2

1

3

1

4

C

A;

1 1 1

0 100 99 98 1

3 4 5

B 98 55 11 CC ;

ii. A = B

@ A

0 10 01 111

B ;1 1 1 CC ;

iii. A = B

@ A

; 1 ; 1 1

0 0:00003 1:566 1:234 1

B C

iv. A = B@ 1:5660 2:000 1:018 CA ;

1 ;3:000

0 11:2340;1 10:018

B C

v. A = B@ ;1 2 0 C A:

0 ;1 2

(c) Express each factorization in the form PAQ = LU (note that for Gaussian elimination

without and with partial pivoting, Q = L).

(d) Compute the growth factor in each case.

19. Let x be an n-vector. Give an algorithm to compute a Householder matrix H = I ; 2 uuuT u

T

How many ops will be required to implement this algorithm?

Given x = (1; 2; 3)T , apply your algorithm to construct H such that Hx has a zero in the 3rd

position.

Develop an algorithm to triangularize H using

(a) Gaussian elimination,

(b) Householder transformations,

(c) Givens rotations.

Compute the op-count in each case and compare.

214

21. Let 0 10 1 1 1 1

BB 2 10 1 1 CC

H=BBB CC :

@ 0 1 10 1 CA

0 0 1 10

Triangularize H using

(a) Gaussian elimination,

(b) the Householder method,

(c) the Givens method.

22. Let H^ k = I ; 2uu

T

u uT ; where u is a (n ; k + 1) vector. Dene

Ik; 0

!

Hk = 1

:

0 H^ k

How many
ops will be required to compute Hk A, where A is arbitrary and n n? Your

count should take into account the special structure of the matrix H^ k .

Using this result, show that the Householder method requires about 2 n3
ops to obtain R

3

2

and another 3 n
ops to obtain Q in the QR factorization of A.

3

matrix A (m n) using Householder's method.

Given 01 21

A=B

B 3 4 CC ;

@ A

5 6

(a) nd Householder matrices H1 and H2 such that

R

!

H H A= ;

2 1

0

where R is 2 2 upper triangular.

(b) nd orthonormal bases for R(A) and for the orthogonal complement of R(A).

(c) Find orthogonal projections of A onto R(A) and onto its orthogonal complement; that

is, nd PA and PA? .

215

24. Given 011

B 2 CC

b=B

@ A

3

and A as in Problem #21, nd bR and bN .

25. Given 01 31

B CC

B=B

@2 4A

3 5

and A as in Problem #21, nd BR and BN .

26. Let H be an n n upper Hessenberg matrix and let

H = QR;

where Q is orthogonal and R is upper triangular obtained by Givens rotations.

Prove that Q is also upper Hessenberg.

27. Develop an algorithm to compute AH where A is m n arbitrary and H is a Householder

matrix. How many
ops will the algorithm require? Your algorithm should exploit the

structure of H .

28. Develop algorithms to compute AJ and JA, where A is m n and J is a Givens rotation.

(Your algorithms should exploit the structure of the matrix J ).

How many
ops are required in each case?

29. Show that the
op-count to compute R in the QR factorization of an m n matrix A (m n)

using Givens' rotations is about 2n2(m ; n3 ).

30. Give an algorithm to compute

Q = H H Hn

1 2

A; m n. Show that the algorithm can be implemented with 2(m2n ; mn2 + n3=3) ops.

31. Let A be m n. Show that the orthogonal matrix

Q = QT QT QTs; ;

1 2 1

where each Qi is the product of (m ; i) Givens rotations, can be computed with 2n2 ( m ; n )

3

ops.

216

32. Let 01 2 31

B 4 5 6 CC :

A=B

@ A

7 8 9

Find QR factorization of A using

(a) the Householder method

(b) the Givens method

Compare the results.

33. Apply both the Householder and the Givens methods of reduction to the matrix

00 1 0 0 01

BB 0 0 1 0 0 CC

BB CC

A = B0 0 0 1 0C

B CC

BB

@ 0 0 0 0 1 CA

1 2 3 4 5

to reduce it to a Hessenberg matrix by similarity. Compare the results.

34.

(a) Show that it requires 5 n3
ops to compute the upper Hessenberg matrix Hu using the

3 nX;2 nX;2

Householder method of reduction. (Hint: 2(n ; k)2 + 2n(n ; k) 2 n3 + n3 = 5n .)

3

k=1 k=1 3 3

(b) Show that if the transforming matrix P is required explicitly, another 2 n3
ops will be

3

needed.

(c) Work out the corresponding
op-count for reduction to Hessenberg form using Givens

rotations.

(d) If A is symmetric, then show that the corresponding count in (a) is 2n .

3

3

35. Given an unreduced upper Hessenberg matrix H , show that the matrix X dened by X =

(e1; He1; : : :; H n;1e1 ) is nonsingular and is such that X ;1HX is a companion matrix in upper

Hessenberg form.

(a) What are the possible numerical diculties with the above computations?

217

(b) Transform 0 1 1

2 3 4

BB 2 10; 4 4 4C

C

H=BB 5

CC

B@ 0 1 10 1 2 C

;

A3

0 0 1 1

to a companion form.

36.

(a) Given the pair (A; b) where A is n n and b is a column vector, develop an algorithm

to compute an orthogonal matrix P such that PAP T = upper Hessenberg Hu and Pb

is a multiple of the vector e1 .

(b) Show that Hu is unreduced and b is a nonzero multiple of e1 i rank (b; Ab; : : :; An;1b) = n:

(c) Apply your algorithm in (a) to

01 1 1 11 011

BB C BB CC

B 1 2 3 4C C BB 2 CC :

A=B C

B@ 2 1 1 1 CA ; b = B@ 3 CA

1 1 1 1 4

37. Given 01 2 31

BB 2 3 4 CC

A=BBB CC ;

CA

@ 4 5 6

7 8 9

nd a permutation matrix P , an orthogonal matrix Q, and an upper triangular matrix R

using the Householder method such that

AP = QR:

The permutation matrix P is to be chosen according to the criteria given in the book.

38. Give a proof of the complete orthogonalization theorem (Theorem 5.7.2) starting from the

QR factorization theorem (Theorem 5.4.1).

39. Work out an algorithm to modify the QR factorization of a matrix A from which a column

has been removed.

218

MATLAB AND MATCOM PROGRAMS AND

PROBLEMS ON CHAPTER 5

You will need the programs housmul, compiv, givqr and givhs from MATCOM.

1.

(a) Write a MATLAB function called elm(v ) that creates an elementary lower triangular

matrix E so that Ev is a multiple of e1 , where v is an n-vector.

(b) Write a MATLAB function elmul(A; E ) that computes the product EA, where E is an

elementary lower triangular matrix and A is an arbitrary matrix.

Your program should be able to take advantage of the structure of the matrix

E.

(c) Using elm and elmul, write a MATLAB program elmlu that nds the LU factorization

of a matrix, when it exists:

[L; U ] = elmlu(A):

(This program should implement the algorithm 5.2.1 of the book).

0 1 0 1

0:00001 1A 1 1A

A = @ ; A=@ ;

1 1 0:00001 1

0 1

BB 10 1 1 C

A = B@ 1 10 1 CCA ; A = diag(1; 2; 3):

1 1 10

A = 5 5 Hilbert matrix

Now compute (i) the product LU and (ii) jjA ; LU jjF in each case and print your results.

219

2. Modify your program elmlu to incorporate partial pivoting:

[M; U ] = elmlupp(A):

Test your program with each of the matrices of problem #1.

Compare your results with those obtained by MATLAB built-in function:

[L; U ] = lu(A):

3. Write a MATLAB program, called parpiv to compute M and U such that MA = U; using

partial pivoting:

[M; U ] = parpiv(A):

(This program should implement algoirthm 5.2.3 of the book).

Print M; U; jjMA ; U jjF and jjMA ; U jj2 for each of the matrices A of problem #1.

4. Using the program compiv from MATCOM, print M; Q; and U; and

jjMAQ ; U jj ; and jjMAQ ; U jjF

2

5.

(a) Write a MATLAB program called housmat that creates a Householder matrix H such

that Ha is a multiple of e1 , where a is an n;vector.

(b) Using housmat and housmul (from the Appendix of the book), write a MATLAB

program housqr that nds the QR factorization of A:

[Q; R] = housqr(A):

(This program should implement the algorithm 5.4.2 of the book).

(c) Using your program housqr, nd Q and R such that

A = QR

for each of the matrices in problem #1.

Now compute

220

i. jjI ; QT QjjF ,

ii. jjA ; QRjjF ,

and compare the results with those obtained using the MATLAB built-in function

[Q; R] = qr(A):

(d) Repeat (c) with the program givqr from MATCOM in place of housqr, that computes

QR factorization using Givens rotations.

6. Run the program givqr(A) from MATCOM with each of the matrices in problem #1. Then

using [Q; R] = qr(A) from MATLAB on those matrices again, verify the uniqueness of QR

factorization for each A.

7. Using givhs(A) from MATCOM and the MATLAB function hess(A) on each of the matrices

from problem #1, verify the implicit QR theorems: Theorem 5.5.3 (Uniqueness of

Hessenberg reduction).

8. Using the results of problems #5, nd an orthonormal basis for R(A), an orthonormal basis for

the orthogonal complement of R(A), the orthogonal projection onto R(A), and the projection

onto the orthogonal complement of R(A) for each of the matrices of problem #1.

9. Incorporate \maximum norm column pivoting" in housqr to write a MATLAB program

housqrp that computes the QR factorization with column pivoting of a matrix A. Test your

program with each of the matrices of problem #1:

[Q; R; P ] = housqrp(A):

Compare your results with those obtained by using the MATLAB function

[Q; R; P ] = qr(A):

Note : Some of the programs you have been asked to write such as parpiv, housmat, housqr,

etc. are in MATCOM or in the Appendix. But it is a good idea to write your own programs.

221

6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS

6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 223

6.2 Basic Results on Existence and Uniqueness : : : : : : : : : : : : : : : : : : : : : : : 224

6.3 Some Applications Giving Rise to Linear Systems Problems : : : : : : : : : : : : : : 226

6.3.1 An Electric Circuit Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 226

6.3.2 Analysis of a Processing Plant Consisting of Interconnected Reactors : : : : : 228

6.3.3 Linear Systems Arising from Ordinary Dierential Equations (Finite Dier-

ence Scheme) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 231

6.3.4 Linear Systems Arising from Partial Dierential Equations: A Case Study

on Temperature Distribution : : : : : : : : : : : : : : : : : : : : : : : : : : : 233

6.3.5 Special Linear Systems Arising in Applications : : : : : : : : : : : : : : : : : 238

6.3.6 Linear System Arising From Finite Element Methods : : : : : : : : : : : : : 243

6.3.7 Approximation of a Function by a Polynomial: Hilbert System : : : : : : : : 247

6.4 Direct Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 248

6.4.1 Solution of a Lower Triangular System : : : : : : : : : : : : : : : : : : : : : : 249

6.4.2 Solution of the System Ax = b Using Gaussian Elimination without Pivoting 249

6.4.3 Solution of Ax = b Using Pivoting Triangularization : : : : : : : : : : : : : : 250

6.4.4 Solution of Ax = b without Explicit Factorization : : : : : : : : : : : : : : : : 256

6.4.5 Solution of Ax = b Using QR Factorization : : : : : : : : : : : : : : : : : : : 258

6.4.6 Solving Linear System with Right Multiple Hand Sides : : : : : : : : : : : : 260

6.4.7 Special Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 262

6.4.8 Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 284

6.4.9 LU Versus QR and Table of Comparisons : : : : : : : : : : : : : : : : : : : : 286

6.5 Inverses, Determinant and Leading Principal Minors : : : : : : : : : : : : : : : : : : 288

6.5.1 Avoiding Explicit Computation of the Inverses : : : : : : : : : : : : : : : : : 288

6.5.2 The Sherman-Morrison and Woodbury Formulas : : : : : : : : : : : : : : : : 290

6.5.3 Computing the Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : 292

6.5.4 Computing the Determinant of a Matrix : : : : : : : : : : : : : : : : : : : : : 295

6.5.5 Computing The Leading Principal Minors of a Matrix : : : : : : : : : : : : : 297

6.6 Perturbation Analysis of the Linear System Problem : : : : : : : : : : : : : : : : : : 299

6.6.1 Eect of Perturbation in the Right-Hand Side Vector b : : : : : : : : : : : : 300

6.6.2 Eect of Perturbation in the matrix A : : : : : : : : : : : : : : : : : : : : : 304

6.6.3 Eect of Perturbations in both the matrix A and the vector b : : : : : : : : : 306

6.7 The Condition Number and Accuracy of Solution : : : : : : : : : : : : : : : : : : : : 308

6.7.1 Some Well-known Ill-conditioned Matrices : : : : : : : : : : : : : : : : : : : : 309

6.7.2 Eect of The Condition Number on Accuracy of the Computed Solution : : : 310

6.7.3 How Large Must the Condition Number be for Ill-Conditioning? : : : : : : : 311

6.7.4 The Condition Number and Nearness to Singularity : : : : : : : : : : : : : : 312

6.7.5 Conditioning and Pivoting : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 313

6.7.6 Conditioning and the Eigenvalue Problem : : : : : : : : : : : : : : : : : : : : 313

6.7.7 Conditioning and Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 314

6.7.8 Computing and Estimating the Condition Number : : : : : : : : : : : : : : : 315

6.8 Component-wise Perturbations and the Errors : : : : : : : : : : : : : : : : : : : : : : 320

6.9 Iterative Renement : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 321

6.10 Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 326

6.10.1 The Jacobi Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 328

6.7.2 The Gauss-Seidel Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 332

6.10.3 Convergence of Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : : : 334

6.10.4 The Successive Overrelaxation (SOR) Method : : : : : : : : : : : : : : : : : 342

6.10.5 The Conjugate Gradient Method : : : : : : : : : : : : : : : : : : : : : : : : : 349

6.10.6 The Arnoldi Process and GMRES : : : : : : : : : : : : : : : : : : : : : : : : 356

6.11 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 359

6.12 Some Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : 366

CHAPTER 6

NUMERICAL SOLUTIONS

OF LINEAR SYSTEMS

6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS

Objectives

The major objectives of this chapter are to study numerical methods for solving linear systems

and associated problems. Some of the highlights of this chapter are:

Theoretical results on existence and uniqueness of the solution (Section 6.2).

Some important engineering applications giving rise to linear systems problems (Section

6.3).

Direct methods (Gaussian elimination with and without pivoting) for solving linear systems

(Section 6.4).

Special systems: Positive denite, Hessenberg, diagonally dominant, tridiagonal and block

tridiagonal (Section 6.4.7).

Methods for computing the determinant and the inverse of a matrix (Section 6.5).

Sensitivity analysis of linear systems problems (Section 6.6).

Iterative renement procedure (Section 6.9).

Iterative methods (Jacobi, Gauss-Seidel, Successive Overrelaxation, Conjugate Gradient)

for linear systems (Section 6.10).

Required Background

The following major tools and concepts developed in earlier chapters will be needed for smooth

learning of material of this chapter.

1. Special Matrices (Section 1.4), and concepts and results on matrix and vector norms

(Section 1.7). Convergence of a matrix sequence and convergent matrices (Section

1.7.3)

2. LU factorization using Gaussian elimination without pivoting (Section 5.2.1, Algo-

rithms 5.5.1 and 5.5.2).

3. MA = U factorization with partial pivoting (Section 5.2.2, Algorithm 5.2.3).

4. MAQ = U factorization with complete pivoting (Section 5.2.3, Algorithm 5.2.4).

5. The concept of the growth factor (Section 5.3).

222

6. QR factorization of a matrix (Section 5.4.1, Section 5.5.1).

7. Concepts of conditioning and stability (Section 3.3 and Section 3.4).

8. Basic knowledge of dierential equations.

6.1 Introduction

In this chapter we will discuss methods for numerically solving the linear system

Ax = b;

where A is an n n matrix and x and b are n-vectors. A and b are given and x is unknown.

The problem arises in a very wide variety of applications. As a matter of fact, it might be

said that numerical solutions of almost all practical engineering and applied science

problems routinely require solution of a linear system problem. (See Section 6.3.)

We shall discuss methods for nonsingular linear systems only in this chapter. The case where

the matrix A is not square or the system has more than one solution will be treated in Chapter 7.

A method called Cramer's Rule, taught in an elementary undergraduate linear algebra course,

is of high signicance from a theoretical point of view.

CRAMER'S RULE

Let A be a nonsingular matrix of order n and b be an n-vector. The solution x to

the system Ax = b is given by

xi = det Ai ; i = 1; : : :; n;

det A

where Ai is a matrix obtained by replacing the ith column of A by the vector b and

x = (x1; x2; : : :; xn)T .

Remarks on Cramer's Rule: Cramer's Rule is, however, not at all practical from

a computational viewpoint. For example, solving a linear system with 20 equations and 20

unknowns by Cramer's rule, using the usual denition of determinant, would require more than a

million years even on a fast computer (Forsythe, Malcom and Moler CMMC, p. 30). For an n n

system, it will require about O(n!)
ops.

Two types of methods are normally used for numerical computations:

223

(1) Direct methods

(2) Iterative methods

The direct methods consist of a nite number of steps and one needs to perform all the steps in a

given method before the solution is obtained. On the other hand, iterative methods are based on

computing a sequence of approximations to the solution x and a user can stop whenever a certain

desired accuracy is obtained or a certain number of iterations are completed. The iterative

methods are used primarily for large and sparse systems.

The organization of this chapter is as follows:

In Section 6.2 we state the basic theoretical results (without proofs) on the existence and

uniqueness of solutions for linear systems.

In Section 6.3 we discuss several engineering applications giving rise to linear systems problems.

In Section 6.4 we describe direct methods for solving linear systems.

In Section 6.5 we show how the LU and QR factorization methods can be used to compute the

determinant, the inverse and the leading principal minors of a matrix.

In Section 6.6, we study the sensitivity issues of the linear systems problems and their eects

on the solutions.

In Section 6.9 we brie
y describe an iterative renement procedure for improving the accuracy

of a computed solution.

In Section 6.10 we discuss iterative methods: the Jacobi, Gauss-Seidel, SOR, and conjugate

gradient and GMRES methods.

Consider the system of m equations in n unknowns:

a11x1 + a12x2 + + a1nxn = b1

a12x1 + a22x2 + + a2nxn = b2

..

.

am1x1 + am2 x2 + + amnxn = bm :

In matrix form, the system is written as

Ax = b;

224

where 0 a a a 1 0x 1 0b 1

11 12 1n 1

BB a a a CC BB x CC BB b1 CC

A=BBB ..21 22 2n C

CC ; x = BBB ..2 CCC ; b = BBB ..2 CCC :

@ . A @.A @.A

am1 am2 amn xn bm

Given an m n matrix A and an m-vector b, if there exists a vector x satisfying Ax = b, then we

say that the system is consistent. Otherwise, it is inconsistent. It is natural to ask if a given

system Ax = b is consistent and, if it is consistent, how many solutions are there? when is the

solution unique? etc. To this end, we state the following theorem.

neous System)

(i) The system Ax = b is consistent i b 2 R(A); in other words, rank(A) =

rank(A; b).

(ii) If the system is consistent and the columns of A are linearly independent,

then the solution is unique.

(iii) If the system is consistent and the columns of A are linearly dependent,

then the system has an innite number of solutions.

Homogeneous Systems

If the vector b = 0, then the system Ax = 0 is called a homogeneous system. A homogeneous

system always has a solution, namely x = 0. This is the trivial solution.

(i) The system Ax = 0 has a nontrivial solution i the columns of A are linearly

dependent. If Ax = 0 has a nontrivial solution, it has innitely many solutions.

(ii) If m = n, then Ax = 0 has a nontrivial solution i A is singular.

225

Theorem 6.2.3 (Solution Invariance Theorem) A solution of a consistent sys-

tem

Ax = b

remains unchanged under any of the following operations:

(i) Any two equations are interchanged.

(ii) An equation is multiplied by a nonzero constant.

(iii) A nonzero multiple of one equation is added to another equation.

Two systems obtained from one another by applying any of the above operations are called

equivalent systems. Theorem 6.2.3 then says that two equivalent systems have the same

solution.

6.3 Some Applications Giving Rise to Linear Systems Problems

It is probably not an overstatement that linear systems problems arise in almost all practical

applications. We will give examples here from electrical, mechanical, chemical and civil engineering.

We start with a simple problem|an electric circuit.

Consider the following diagram of an electrical circuit:

226

A1 I1 A2 I2 A3

R12 = 1

R23 = 2

R25 = R34

V1 = 100

10

= 3

V6 = 0

I2

I4

R56 = 5

R45 = 4

A6 I3 A5 I2 A4

Figure 6-1

We would like to determine the amount of current between the nodes A1; A2 ; A3; A4 ; A5,

and A6 . The famous Kirchho's Current Law tells us that the algebraic sum of all currents

entering a node must be zero. Applying this law at node A2, we have

I1 ; I2 + I4 = 0 (6.3.1)

At node A5,

I2 ; I3 ; I4 = 0 (6.3.2)

At node A3,

I2 ; I2 = 0 (6.3.3)

At node A4,

I2 ; I2 = 0 (6.3.4)

Now consider the voltage drop around each closed loop of the circuit, A1 A2 A3 A4A5 A6 A1,

A1 A2A5 A6 A1 , A2A3A4A5A2. The Kirchho's Voltage Law tells us that the net voltage drop

around each closed loop is zero. Thus at the loop A1A2A3A4A5A6A1, substituting the values

of resistances and voltages, we have

I1 + 9I2 + 5I3 = 100 (6.3.5)

Similarly, at A1 A2 A5 A6A1 and A2 A3A4 A5 A2 we have, respectively

I1 ; 10I4 + 5I3 = 100 (6.3.6)

9I2 + 10I4 = 0 (6.3.7)

227

Note that (6.3.6) + (6.3.7) = (6.3.5). Thus we have four equations in four unknowns:

I1 ; I2 + I4 = 0 (6.3.8)

I2 ; I3 ; I4 = 0 (6.3.9)

I1 ; 10I4 + 5I3 = 100 (6.3.10)

9I2 + 10I4 = 0 (6.3.11)

The equations (6.3.8){(6.3.11) can be written as

0 1 ;1 0 1 1 0 I 1 0 0 1

1

B

B C

CC BB CCC BBB CCC

B

B

B C BB I CC BB 0 CC

B

B 0 1 ;1 ;1 CCC BB 2 CC BB CC

B

B CC BB CC = BB CC ;

B

B C BB CC BB CC

B

B 1 0 5 ;10 C CC BB I3 CC BB 100 CC

B

B CA B@ CA B@ CA

@

0 9 0 10 I4 0

the solution of which yields the current between the nodes.

Many mathematical models are based on conservation laws such as conservation of mass, conser-

vation of momentum, and conservation of energy. In mathematical terms, these conservation laws

lead to conservation or balance or continuity equations, which relate the behavior of a system or

response of the quantity being modeled to the properties of the system and the external forcing

functions or stimuli acting on the system.

As an example, consider a chemical processing plant consisting of six interconnected chemical

reactors (Figure 6-2), with dierent mass ow rates of a component of a mixture into and out of

the reactors. We are interested in knowing the concentration of the mixture at dierent reactors.

The example here is similar to that given in Chapra and Canale (1988), pp. 295-298.

228

Q55 = 2

Q15 = 3

C5

Q54 = 2

Q25 = 1

Q01 = 6 Q44 = 11

C1 C2 C4

C01 = 12 Q12 = 3 Q24 = 1

Q23 = 1 Q34 = 8

Q31 = 1 Q66 = 2

C3 C6

Q03 = 8 Q36 = 10

C03 = 20

Figure 6-2

Application of conservation of mass to all these reactors results in a linear system of equations as

shown below, consisting of ve equations in ve unknowns. The solution of the system will tell us

the concentration of the mixture at each of these reactors.

Steady-state, completely mixed reactor. Consider rst a reactor with two
ows coming

in and one
ow going out, as shown in Figure 6-3.

m1 ; Q1; C1-

m2 ; Q2; C2- C3 -

m3 ; Q3; C3

Figure 6-3

Application of the steady state conservation of mass to the above reactor gives

m_ 1 + m_ 2 = m_ 3 : (6.3.12)

Noting that

m_ i = Qi Ci

229

where

mi = mass
ow rate of the mixture at the inlet and outlet sections i; i = 1; 2; 3

Qi = volumetric
ow rate at the section i; i = 1; 2; 3

Ci = density or concentration at the section i; i = 1; 2; 3

we get from (6.3.12)

Q1C1 + Q2C2 = Q3C3 (6.3.13)

For given inlet
ow rates and concentrations, the outlet concentration C3 can be found from equa-

tion (6.3.13). Under steady state operation, this outlet concentration also represents the spatially

uniform or homogeneous concentration inside the reactor. Such information is necessary for design-

ing the reactor to yield mixtures of a specied concentration. For details, see Chapra and Canale

(1988).

Referring now to Figure 6-2, where we consider the plant consisting of six reactors, we have

the following equations (a derivation of each of these equations is similar to that of (6.3.13)). The

derivation of each of these equations is based on the fact that the net mass
ow rate into the reactor

is equal to the net mass
ow out of the reactor.

For reactor 1:

6C1 ; C3 = 72 (6.3.14)

(Note that for this reactor,
ow at the inlet is 72 + C3 and
ow at the outlet is 6C1.)

For reactor 2:

3C1 ; 3C2 = 0 (6.3.15)

For reactor 3:

; C2 + 11C3 = 200 (6.3.16)

For reactor 4:

C2 ; 11C4 + 2C5 + 8C6 = 0 (6.3.17)

For reactor 5:

3C1 + C2 ; 4C5 = 0 (6.3.18)

For reactor 6:

10C3 ; 10C6 = 0 (6.3.19)

230

Equations (6.3.14){(6.3.19) can be rewritten in matrix form as

0 6 0 ;1 0 0 0 1 0 C 1 0 72 1

BB 3 ;3 0 0 0 0 CC BB C1 CC BB 0 CC

BB CB 2C B C

BB 0 ;1 9 0 0 0 CCC BBB C3 CCC BBB 200 CCC

BB CB C = B C (6.3.20)

BB 0 1 8 ;11 2 8 CCC BBB C4 CCC BBB 0 CCC

B@ 3 1 0 0 ;4 0 CA B@ C5 CA B@ 0 CA

0 0 10 0 0 ;10 C6 0

or

AC = D:

The ith coordinate of the unknown vector C represents the mixture concentration at reactor i of

the plant.

6.3.3 Linear Systems Arising from Ordinary Dierential Equations (Finite Dierence

Scheme)

A Case Study on a Spring-Mass Problem

Consider a system of three masses suspended vertically by a series of springs, as shown below,

where k1 ; k2 , and k3 are the spring constants, and x1; x2 , and x3 are the displacements of each

spring from its equilibrium position.

k1

m1

x1

k2

m2

x2

k3

m3

x3

231

k3(x3 ; x2)

6k1x1 6k2(x2 ; x1) 6

m1 m2 m3

? m?1g m2 g

? ? m3 g

?

k2(x2 ; x1) k3(x3 ; x2)

Referring to the above diagram, the equations of motion, by Newton's second law, are:

m1 ddtx21 = k2(x2 ; x1) + m1g ; k1 x1

2

2

2

Suppose we are interested in knowing the displacement of these springs when the system eventually

returns to the steady state, that is, when the system comes to rest. Then, by setting the second-

order derivatives to zero, we obtain

k1x1 + k2(x1 ; x2) = m1 g

k2(x2 ; x1) + k3(x2 ; x3) = m2 g

k3(x3 ; x2) = m3 g

This system of equations in three unknowns, x1 ; x2, and x3 , can be rewritten in matrix form as

0 k + k ;k 0

10x 1 0m g1

1 2 2 1 1

B

B CC BB CC BB CC

B

B CC BB CC BB CC

B

B ;k2 k2 + k3 ;k3 C CC BBB x2 CCC = BBB m2g CCC

B

B CA B@ CA B@ CA

@

0 ;k3 k3 x3 m3 g

or

Kx = w:

The matrix 0 k + k ;k 0

1

1 2 2

BB CC

BB CC

K=B BB 2 2 3 3 CCC

; k k + k ; k

B@ CA

0 ;k3 k3

232

is called the stiness matrix.

6.3.4 Linear Systems Arising from Partial Dierential Equations: A Case Study on

Temperature Distribution

Many engineering problems are modeled by partial dierential equations. Numerical approaches

to these equations typically require discretization by means of dierence equations, that is, partial

derivatives in the equations are replaced by approximate dierences. This process of discretization

in turn gives rise to linear systems of many interesting types. We shall illustrate this with a problem

in heat transfer theory.

A major objective in a heat transfer problem is to determine the temperature distribution

T (x; y; z; t) in a medium resulting from imposed boundary conditions on the surface of the medium.

Once this temperature distribution is known, the heat transfer rate at any point in the medium or

on its surface may be computed from Fourier's law, which is expressed as

qx = ;K @T @x

qy = ;K @y @T

qz = ;K @T @z

where qx is the heat transfer rate in the x direction, @T@x is the temperature gradient in the x

direction, and the positive constant K is called the thermal conductivity of the material. Similarly

for the y and z directions.

Consider a homogeneous medium in which temperature gradients exist and the temperature

distribution T (x; y; z; t) is expressed in Cartesian coordinates. The heat diusion equation which

governs this temperature distribution is obtained by applying conservation of energy over an in-

nitesimally small dierential element, from which we obtain the relation

@ (K @T ) + @ (K @T ) + @ (K @T ) + q_ = C @T ; (6.3.21)

@x @x @y @y @z @z p @z

where is the density, Cp is the specic heat, and q_ is the energy generated per unit volume.

This equation, usually known as the heat equation, provides the basic tool for solving heat

conduction problems.

It is often possible to work with a simplied form of equation (6.3.21). For example, if the

thermal conduction is a constant, the heat equation is

@ 2T + @ 2T + @ 2T + q_ = 1 @T (6.3.22)

@x2 @y 2 @z2 K @t

233

where = K=(Cp) is a thermophysical property known as the thermal diusivity.

Under steady state conditions, there can be no changes of energy storage, i.e., the unsteady

state term @T

@t can be dropped, and equation (6.3.21) reduces to the 3-D Poisson's Equation

@ 2T + @ 2T + @ 2 T + q_ = 0 (6.3.23)

@x2 @y2 @z2 K

If the heat transfer is two-dimensional (e.g., in the x and y directions) and there is no energy

generation, then the heat equation reduces to the famous Laplace's equation

@ 2 T + @ 2T = 0 (6.3.24)

@x2 @y2

If the heat transfer is unsteady and one-dimensional without energy generation, then the heat

equation reduces to

@ 2T = 1 @T (6.3.25)

@x2 @t

Analytical solutions to the heat equation can be obtained for simple geometry and boundary con-

ditions. Very often there are practical situations where the geometry or boundary conditions are

such that an analytical solution has not been obtained or if it is obtained, it involves complex series

solutions that require tedious numerical evaluation. In such cases, the best alternatives are nite

dierence or nite element methods which are well suited for computers.

Finite Dierence Scheme

A well-known scheme for solving a partial dierential equation is to use nite dierences. The

idea is to discretize the partial dierential equation by replacing the partial derivatives with their

approximations, i.e., nite dierences. We will illustrate the scheme with Laplace's equation in the

following.

Let us divide a two-dimensional region into small regions with increments in the x and y

directions given as x and y , as shown in the gure below.

234

Nodal Points

Dy

Dx

Each nodal point is designated by a numbering scheme i and j , where i indicates x increment and

j indicates y increment:

(i,j + 1)

(i - 1,j) (i,j)

(i + 1, j)

(i,j - 1)

The temperature distribution in the medium is assumed to be represented by the nodal points

temperature. The temperature at each nodal point (xi ; yj ) (which is symbolically denoted by (i,j)

as in the diagram above) is the average temperature of the surrounding hatched region. As the

number of nodal points increases, greater accuracy in representation of the temperature distribution

is obtained.

A nite dierence equation suitable for the interior nodes of a steady two-dimensional system

can be obtained by considering Laplace's equation at the nodal point i; j as

@ 2T + @ 2 T = 0 (6.3.26)

@x2 i;j @y 2 i;j

235

The second derivatives at the nodal point (i; j ) can be expressed as

@T ; @T

@ 2T

@x i+ 2 ;jx@x i; 2 ;j

1 1

@x2 i;j (6.3.27)

@T ; @T

@ 2T @y i;j+ 2x@y i;j; 2

1 1

(6.3.28)

@y 2 i;j

As shown in the gure, the temperature gradients can be approximated (as derived from the

Taylor series) as a linear function of the nodal temperatures as

@T Ti+1;j;x Ti;j (6.3.29)

@x i+ 12 ;j

@T Ti;j ;xTi;1;j (6.3.30)

@x i; 12 ;j

@T Ti;j+1;y Ti;j (6.3.31)

@y i;j+ 12

@T Ti;j ;Ty i;j;1 (6.3.32)

@y i;j ; 12

where, Ti;j = T (xi ; yj ). Substituting (6.3.29){(6.3.32) into (6.3.27){(6.3.28), we get

@ 2T

=

Ti+1;j ; 2Ti;j + Ti;1;j (6.3.33)

@x i;j

2 (x)2

@ 2T

=

Ti;j+1 ; 2Ti;j + Ti;j ;1 (6.3.34)

@y i;j

2 (y )2

The equation (6.3.26) then gives

Ti+1;j ; 2Ti;j + Ti;1;j + Ti;j +1 ; 2Ti;j + Ti;j ;1 = 0

(x)2 (y )2

Assume x = y . Then the nite dierence approximation of Laplace's equation for

interior regions can be expressed as

Ti;j+1 + Ti;j ;1 + Ti+1;j + Ti;1;j ; 4Ti;j = 0 (6.3.35)

More accurate higher order approximations for interior nodes and boundary nodes are also obtained

in a similar manner.

Example 6.3.1

A two-dimensional rectangular plate (0 x 1; 0 y 1) is subjected to the uniform

temperature boundary conditions (with top surface maintained at 1000C and all other surfaces at

236

00C ) shown in the gure below, that is T (0; y ) = 0, T (1; y ) = 0, T (x; 0) = 0, and T (x; 1) = 1000C ,

Suppose we are interested only in the values of the temperature at the nine interior nodal points

(xi; yj ); where xi = ix and yj = j y , i; j = 1::3 with x = y = 14 .

o

100 C

(0,1) (1,1) (2,1) (3,1) (4,1)

o

O oC (0,2) (1,2) (2,2) (3,2) (4,2) O C

o

O C

However, we assume symmetry for simplifying the problem. That is, we assume that T33 = T13,

T32 = T12, and T31 = T11. We thus have only six unknowns: (T11; T12; T13) and (T21; T22; T23).

4T1;1 ; 0 ; 100 ; T2;1 ; T1;2 = 0

4T1;2 ; 0 ; T1;1 ; T2;2 ; T1;3 = 0

4T1;3 ; 0 ; T1;2 ; T2;3 ; 0 = 0

4T2;1 ; T1;1 ; 100 ; T1;1 ; T2;2 = 0

4T2;2 ; T1;2 ; T2;1 ; T1;2 ; T2;3 = 0

4T2;3 ; T1;3 ; T2;2 ; T1;3 ; 0 = 0

237

After suitable rearrangement, these equations can be written in the following form:

0 4 ;1 ;1 0 0 0 1 0 T1;1 1 0 100 1

BB CC BB CC BB CC

BB CC BB CC BB CC

BB ;2 4 0 ;1 0 0C CC BBB T2;1 CCC BBB 100 CCC

BB CC BB CC BB CC

BB C BB T CC BB 0 CC

BB 1 0 4 ;1 ;1 0C CC BB 1;2 CC BB CC

BB CC BB CC = BB CC

BB C BB CC BB CC

BB 0 ;1 ;2 4 0 ;1 C CC BB T2;2 CC BB 0 CC

BB CC BB CC BB CC

BB C BB CC BB CC

BB 0 0 ;1 0 4 ;1 C CC BB T1;3 CC BB 0 CC

BB CA B@ CA B@ CA

@

0 0 0 ;1 ;2 4 T2;3 0

The solution of this system will give us temperatures at the nodal points.

Many practical applications give rise to linear systems having special properties and structures, such

as tridiagonal, diagonally dominant, and positive denite systems and block tridiagonal.

The solution methods for solving these special systems are described in Section 6.4.6. We rst state

a situation which gives rise to a tridiagonal system.

Tridiagonal Systems

Consider one-dimensional steady conduction of heat such as heat conduction through a wire.

In such a case, the temperature remains constant with respect to time. The equation here is:

@ 2T = 0

@x2

The dierence analog of this equation is:

T (x + x) ; 2 T (x) + T (x ; x) = 0

where x is the increment in x, as shown below.

T0 = 1 2 3 T4 =

x

238

Using a similar numbering scheme as before, the temperature at any point is given by

Ti+1 ; 2 Ti + Ti;1 = 0;

that is, the temperature at any point is just the average of the temperatures of the two nearest

neighboring points.

Suppose the domain of the problem is 0 x 1. Divide now the domain into four segments of

equal length, say x. Thus x = :25. Then T at x = ix will be denoted by Ti . Suppose that

we know the temperature at the end points x = 0 and x = 1, that is,

T0 =

T4 =

These are then the boundary conditions of the problem.

>From the equation, the temperature at each node, x = 0; x = x, x = 2x; x = 3x; x = 1

is calculated as follows:

At x = 0, T0 = (given)

At x = x, T0 ; 2T1 + T2 = 0

At x = 2x, T1 ; 2T2 + T3 = 0

At x = 3x, T2 ; 2T3 + T4 = 0

At x = 1, T4 = (given)

In matrix form these equations can be written as:

01 0 0 0 010T 1 01

0

BB CC BB CC BB CC

BB CB C B C

BB 1 ;2 1 0 0 CCC BBB T1 CCC BBB 0 CCC

BB CC BB CC BB CC

BB CB C B C

BB 0 1 ;2 1 0 CCC BBB T2 CCC = BBB 0 CCC

BB CC BB CC BB CC

BB CB C B C

BB 0 0 1 ;2 1 CCC BBB T3 CCC BBB 0 CCC

BB CC BB CC BB CC

@ A@ A @ A

0 0 0 0 1 T4

The matrix of this system is tridiagonal.

239

Symmetric Tridiagonal and Diagonally Dominant Systems

In order to see how such systems arise, consider now the unsteady conduction of heat. This

condition implies that the temperature T varies with the time t. The heat equation in this case is

1 @T = @ 2 T :

@t @x2

Let us divide the grid in the (x; t) plane with spacing x in the x-direction and t in the t-direction.

ti+1 ; ti = t xi+1 ; xi = x

t2

t1

0 x1 x2 x3 xn 1

Let the temperature at the nodal point xi = ix and tj = j t, as before, be denoted by Tij .

Approximating @T

@t and @ 2T by the nite dierences

@x2

@T 1 (T

@t t i;j +1 ; Ti;j )

@ 2T 1 (T

@x2 (x)2 i+1;j +1 ; 2Ti;j +1 + Ti;1;j +1);

we obtain the following dierence analog of the heat equation:

(1 + 2C )Ti;j +1 ; C (Ti+1;j +1 + Ti;1;j +1) = Ti;j

i = 1; 2; : : :; n

where C = (xt)2 .

These equations enable us to determine the temperature at a time step j = k + 1, knowing the

temperature at the previous time step j = k.

For i = 1; j = k: (1 + 2C )T1;k+1 ; CT2;k+1 = CT0;k+1 + T1;k

For i = 2; j = k: (1 + 2C )T2;k+1 ; CT3;k+1 ; CT1;k+1 = T2;k

...

240

For i = n; j = k: (1 + 2C )Tn;k+1 ; CTn;1;k+1 = Tn;k + Tn+1;k+1

Suppose now the temperatures at the two vertical sides are known, that is,

T0;t = TW1

Tn+1;t = TW2

Then the above equations can be written in matrix notation as

0 (1 + 2C ) ;C 0 0 1 0 T1;k+1 1 0 T1;k + CTW1 1

BB CC BB CC BB CC

BB CC BB CC BB CC

BB ;C (1 + 2C ) ;C 0 0 C B T2;k+1 C

C B CC BBB T2;k CC

BB CC BB C B CC

BB . ... ... ... .. C CC BBB .. CCC BBB .. CC

BB .. . CB . C = B . CC

BB CC BB C B CC

BB . ... ...

CC BB . CCC BBB .. CC

BB .. ;C C CC BBB .. CCC BBB . CC

BB CA B@ CA B@ CC

@ A

0 ;C (1 + 2C ) Tn;k+1 Tn;k + CTW2

The matrix of the above system is clearly symmetric, tridiagonal and diagonally dominant

(note that C > 0).

For example, when C = 1, and we have

0 3 ;1 0 0 1 0 T 1 0 T + T 1

1;k+1 1;k W1

B

B ;1 3 ;1 0 C C B

BB T2;k+1 CC BB T2;k CCC

C B

B

B .. . . . . . . . . . .. CC BB ... CC = BB ... CC

B

B . . C

C BB . CC BB . CC

B

B .. . . . . . . ;1 C

C

@ . A B@ .. CA B@ .. CA

0 ;1 3 Tn;k+1 Tn;k + TW2

or

Ax = b:

The matrix A is symmetric, tridiagonal, diagonally dominant and positive denite.

Block Tridiagonal Systems

To see how block tridiagonal systems arise in applications, consider the two-dimensional

Poisson's equation:

@ 2T + @ 2T = f (x; y ); 0 x 1; 0 y 1:

@x2 @y 2

241

A discrete analog to this equation, similar to Laplace's equation derived earlier, is

Ti+1;j + Ti;1;j + Ti;j +1 + Ti;j;1 ; 4Tij = (x)2 fij ;

i = 1; 2; : : :; n j = 1; 2; : : :; n

This will give rise to a linear system of (n + 2)2 variables.

Assume now the values of T at the four sides of the unit square are known and we are interested

in the values of T at the interior grid points, that is,

T0;j ; Tn+1;j and Ti;0 ; Ti;n+1

(j = 0; 1; : : :; n + 1; i = 0; 1; : : :; n + 1)

are given and T11; : : :; Tn1; T12; : : :; Tn2; T1n; : : :; Tnn are to be found. Then we have a (n2 n2 )

system with n2 unknowns which can be written after suitable rearrangement as

04 ;1 0 0 ;1 0 1

BB ;1 4 ;1 0 0 ;1 0 CC

BB CC

BB CC

BB CC

BB CC

BB CC

BB CC

BB 0 0 ;1 4 ;1 0 0 ;1 CC

BB 0 0 ;1 4 0 0 0 ;1 0 CC

BB C

BB ;1 0 0 0 4 ;1 0 0 ;1 0 C CC

B@ 0 ;1 0 ;1 4 ;1 0 0 ;1 0 C A

..

.

242

0 T01 + T10 ; (x)2f11 1

0 T11 1 BB T20 ; (x)2f21 CC

B C B

B CC

B

B C

T21 C B B CC

B

B . C

.. C B B .

.. CC

B

B C

C B

B CC

B

B C

Tn1 C B B CC

B C B .. CC

B

B T12 CC =B

B . CC

B

B .. C

C BB CC

B . C B

B

B C B

BB Tn;1;0 ; (x)2fn;1;1 CCC

B Tn2 CC

B

B . C B C

@ .. CA BBB Tn+1;1 + Tn;0 ; (x)2fn;1 CCC

Tnn B@ T02 ; (x)2f12 CA

..

.

or in matrix form,

0 An ;In 1 0 T11 1

0 T + T ; (x)2f 1

BB CC BB T CC 01 10 11

BB C B 21 C B CC

BB ;In . . . . . . CC BB ... CC BB T20 ; (x) f21

2

CC

BB CC BB CC BB ..

. CC

BB CC BB Tn1 CC BB . CC

BB ... ... ... CC BB CC BB .. CC

C B T12 C = B (6.3.36)

BB CC BB .. CC B Tn;1;0 ; (x) fn;1;1 CC

B 2

BB CC BB . CC BB Tn+1;1 + Tn;0 ; (x)2fn;1 CC

BB . .

. . . . ;In C C BB Tn2 CC BB CC

BB CC BB .. CC @ B T02 ; (x) f12

2 CA

@ A@ . A ..

.

;In An Tnn

where

04;1 0 1

BB. . . . . . ... C

CC

;1

An = BBB ..

. . . . . . ;1 C CA (6.3.37)

@ .

0 ;1 4

The system matrix above is block tridiagonal and each block diagonal of matrix An is symmetric,

tridiagonal, and positive denite. For details, see Ortega and Poole (INMDE, pp. 268{272).

We have seen in the last few sections how discretization of dierential equations using nite dier-

ences gives rise to various types of linear systems problems. Finite element technique is another

popular way to discretize dierential equations and this results also in linear systems problems.

243

Just to give a taste to the readers, we illustrate this below by means of a simple dierential equa-

tion. The interested readers are referred to some of the well-known books on the subject: Ciarlet

(1981), Strang and Fix (1973), Becker, Carey and Oden (1981),Reddy(1993).

1. Variational formulation of a two-point boundary value problem.

Let us consider the following two-point boundary value problem

;u00 + u = f (x) 0<x<1 (6.3.38)

u(0) = u(1) 0 (6.3.39)

du and f is a continuous function on [0,1]. We further assume that f is such that

where u0 = dx

Problem (6.3.38)-(6.3.39) has a unique solution.

We introduce the space

V = fv : v is a continuous function on [0,1] - and v 0 is piece-wise continuous and -

bounded on [0,1], and v (0) = v (1) = 0g

Now, if we multiply the equation ;u00 + u = f (x) by an arbitrary function v 2 V (v is called a

test function) and integrate the left hand side by parts, we get

Z1 Z1

(;u00(x) + u(x))v (x)dx = f (x)v (x)dx

0 0

that is,

Z1 Z1

(u0 v 0 + uv )dx = f (x)v (x)dx (6.3.40)

0 0

Since v 2 V; and v (0) = v (1) = 0. We write (6.3.40) as:

a(u; v) = (f; v) for every u 2 V

where Z1

a(u; v ) = (u0 v 0 + uv )dx

0

and Z1

(f; v ) = f (x)v (x)dx

0

(Notice that the form a(; ) is symmetric (i.e. a(u; v ) = a(v; u)) and bilinear.) These two prop-

erties, will be used later. It can be shown that u is a solution of (6.3.40) if and only

if u is a solution to (6.3.38)-(6.3.39).

2. The Discrete Problem

244

We now discretize problem (6.3.40). We start by constructing a nite dimensional subspace Vn

of the space V .

Here, we will only consider the simple case where Vn consists of continuous piecewise linear

functions. For this purpose, we let 0 = x0 < x1 < x2 ::: < xn < xn+1 = 1 be a partition of the

interval [0,1] into subintervals Ij = [xj ;1; xj ] of length hj = xj ; xj ;1; j = 1; 2; :::; n + 1. With

this partition, we associate the set Vn of all functions v (x) that are continuous on the interval [0,1],

linear in each subinterval Ij , j = 1; :::; n + 1, and satisfy the boundary conditions v (0) = v (1) = 0.

We now introduce the basis functions f1; 2; :::; ng, of Vn.

We dene j8(x) by

< 1 if i = j

(i) j (xi ) = :

0 if i 6= j

(ii) j (x) is a continuous piecewise linear function.

j (x) can be computed explicitly to yield:

ϕ ( λ)

1 j

0

xj 1

8 x ; xj;1

>

< hj ; when xj;1 x xj

j (x) = > xj+1 ; x

: hj+1 ; when xj x xj+1:

Since 1; :::; n are the basis functions, any function v 2 Vn can be written uniquely as:

X

n

v (x) = vii(x); where vi = v (xi ):

i=1

We easily see that Vn V .

The discrete analogous of Problem (6.3.40) then reads:

nd un 2 Vn such that

a(un; v ) = (f; v) 8v 2 Vn (6.3.41)

X

n

Now, if we let un = ui i(x) and notice that equation (6.3.41) is particularly true for every

i=1

245

function j (x); j = 1; :::; n, we get n equations, namely.

X

a( ui i; j ) = (f; j ) 8j = 1; 2; :::; n

Now using the linearity of a (; j ) leads to n linear equations in n unknowns:

X

n

uia(i; j ) = (f; j ) 8j = 1; 2; :::; n:

i=1

which can be written in the matrix form as

Aun = (fn)i (6.3.42)

where (fn )i = (f; i) and A = (aij ) is a symmetric matrix given by

aij = aji = a(i ; j ); and un = (u1; :::; un)T :

The entries of the matrix A can be computed explicitly: We rst notice that

aij = aji = a(i ; j) = 0 if ji ; j j 2

(This is due to the local support of the function i(x)). A direct computation now leads to

Z x " 1 (x ; xj;1)2 # Z x +1 " 1 (xj+1 ; x) #2

aj;j = a(j ; j ) = dx + dx

j j

xj;1h2 + h2 j j h2 + h2 xj j +1 j +1

1 1 1

= h +h + 3 [hj + hj +1] :

j j +1

Z x " 1 (xj ; x) (x ; xj;1 # ;1 + hj

aj;j;1 = ; h2 + h dx

j

h = h 6

xj ;1 j j j j

Hence, the system (6.3.42) can be written as:

2 3 2 3 2 3

66 a1 b1 77 66 u1 77 66 (fn )1 77

66 b1 a2 0 77 66 u2 77 66 (fn )2 77

66 77 66 ..

.

77 66 ..

.

77

66 77 66 77 = 66 77

66 ... ... ... 77 66 .. 77 66 .. 77

66 77 66 . 77 66 . 77

64 ... ... bn;1 75 64 .. 75 64 .. 75

. .

0 bn;1 an un (fn )n

246

where aj = h1 + h 1 + 13 [hj + hj +1] and bj = ; h1 + h6j . In the special case of uniform grid

j j +1 j

1

hj = h = n + 1 , the matrix A then takes the form

2 3

66 2 ;1 0 7 2 3

66 ;1 2 ... ...

7 77 66 4 1

... ... 7

07

7

1

A = h 66

6 ... ... ... 77 + h 66 1 7

66 77 6 66 ... . . . 1 777 (6.3.43)

6

64 ... ... ;1 775 4 0 1 4

5

0 ;1 2

6.3.7 Approximation of a Function by a Polynomial: Hilbert System

In Chapter 3 (Section 3.6) we cited an ill-conditioned linear system with the Hilbert matrix. In

this section we show how such a system arises. The discussion here has been taken from Forsythe

and Moler (CSLAS, pp. 80{81).

Suppose a continuous function f (x) dened on the interval 0 x 1 is to be approximated by

a polynomial of degree n ; 1: n X

pi xi;1;

i=1

such that the error Z 1 "X #2

n

E= pixi;1 ; f (x) dx

0 i=1

is minimized. The coecients pi of the polynomial are easily determined by setting

E = 0; i = 1; : : :; n:

pi

(Note that the error is a dierentiable function of the unknowns pi and that a minimum occurs

when all the partial derivatives are zero.) Thus we have

2 3

E = 2 Z 1 4X

n

pj xj;1 ; f (x)5 xi;1 dx = 0; i = 1; : : :; n

pi 0 j =1

or n Z 1 Z1

X

xi+j ;2 dx pj = f (x)xi;1 dx; i = 1; : : :; n:

j =1 0 0

(To obtain the latter form we have interchanged the summation and integration.)

Letting Z1

hij = xi+j;2 dx

0

247

and Z1

bi = f (x)xi;1 dx (i = 1; 2; : : :; n);

0

we have

X

n

hij pj = bi; i = 1; : : :; n:

j =1

That is, we obtain the linear system

Hp = b;

0b 1

BB b1 C C

where H = (hij ); b = BBB ..2 C

C. The matrix H is easily identied as the Hilbert matrix, (see

@.C A

bn

Chapter 3, Section 3.6), since

Z1

hij = xi+j ;2 dx = i + j1 ; 1 :

0

In this section we will study direct methods for solving the problem Ax = b. These methods include

Gaussian elimination without pivoting, based on LU factorization of A (Section 6.4.2).

Gaussian elimination with partial pivoting, based on MA = U factorization of A

(Section 6.4.3).

Gaussian elimination with complete pivoting, based on MAQ = U factorization of A

(Section 6.4.3).

The method based on the QR factorization of A (Section 6.4.5).

The method based on the Cholesky decomposition of a symmetric positive denite

matrix (Section 6.4.7).

Gaussian elimination for special systems: Hessenberg, positive denite, tridiagonal,

diagonally dominant. (Section 6.4.7).

For a comparison of these methods and their relative merits and demerits, see Section 6.4.9 and

the accompanying Tables of Comparison. These methods are primarily used to solve small

and dense linear system problems of order up to 200 or so.

The basic idea behind all the direct methods is to rst reduce the linear system Ax = b to

equivalent triangular system(s) by nding triangular factors of the matrix A, and then solving the

248

triangular system(s) which is (are) much easier to solve than the original problem. In Chapter 5

we described various methods for computing triangular factors of A. In Chapter 3 (Section 3.1) we

described the back substitution method for solving an upper triangular system. We now describe

in the following an analogous method, called forward elimination, for solving a lower triangular

system.

The solution of the nonsingular lower triangular system

Ly = b

can be obtained analogously as in the upper triangular system. Here y1 is obtained from the rst

equation and then, inserting its value in the second equation, y2 is obtained and so on. This process

is called forward elimination.

Algorithm 6.4.1 Forward Elimination

For i = 10; 2; : : :; n do 1

X

i;1

yi = 1 @bi ; lij yj A

lii j =1

Flop-count and stability. The algorithm requires about n2 ops (1 op to compute y1, 2

2

ops to compute y2, 3
ops to compute y3 , and so on). The algorithm is as stable as the back

substitution process (Algorithm 3.1.3).

The Gaussian elimination method for the linear system Ax = b is based on LU factorization of the

matrix A. Recall from Chapter 5 (Section 5.2.1) that triangularization using Gaussian elimination

without pivoting, when carried out to completion, yields an LU factorization of A. Once we have

this factorization of A, the system Ax = b becomes equivalent to two triangular systems:

Ly = b;

Ux = y:

249

Solving Ax = b using Gaussian Elimination Without Pivoting

The solution of the system Ax = b using Gaussian elimination (without pivoting)

can be achieved in two stages:

First, nd a LU factorization of A.

Second, solve two triangular systems: the lower triangular system Ly = b rst,

followed by the upper triangular system Ux = y .

Flop-count. Since a LU factorization requires about n33
ops and the solution of each trian-

gular system needs only n22
ops, the total
ops count for solving the system Ax = b using Gaussian

elimination is about n33 + n2 .

A. Solution with Partial Pivoting

If Gaussian elimination with partial pivoting is used to triangularize A, then as we have seen

in Chapter 5 (Section 5.2.2), this process yields a factorization:

MA = U:

In this case, the system Ax = b is equivalent to the triangular system

Ux = Mb = b0:

To solve Ax = b using Gaussian elimination with partial pivoting:

1. Step 1. Find the factorization MA = U by the triangularization algorithm

using partial pivoting (Algorithm 5.2.3).

2. Step 2. Solve the triangular system by back substitution (Algorithm 3.1.3):

Ux = Mb = b0:

250

Implementation of Step 2

The vector

b0 = Mb = Mn;1Pn;1Mn;2 Pn;2 M1 P1b

can be computed as follows:

(1) s1 = b

(2) For k = 1; 2; : : :; n ; 1 do

sk+1 = Mk Pk sk

sn = b0 .

251

Computational Remarks

The practical Gaussian elimination (with partial pivoting) algorithm does not give Mk and Pk

explicitly. But, we really do not need them. The vector sk+1 can be computed immediately from

sk once the index rk of row interchange and the multipliers mik have been saved at the kth step.

This is illustrated with the 3 3 example as follows:

Example 6.4.1

Let 01 0 01 0 1 0 01

B C B C

n = 3; P1 = B

@ 0 0 1 CA ; M1 = B

@ m21 1 0 CA

0 1 0 m31 0 1

and let 0 s(2) 1

BB 1(2) CC

s2 = M1 P1s1 = @ s2 A :

s(2)

3

Then we have 0s 1

BB 1 CC

P1s1 = @ s3 A

s4

and the entries of s2 are then given by

s(2)

1 = s1 ;

s(2)

2 = m21s1 + s3 ;

s(2)

3 = m31s1 + s2 :

Gaussian elimination with complete pivoting (Section 5.2.3) gives

MAQ = U:

Using this factorization, the system Ax = b can be written as

Uy = Mb = b0

where

y = QT x:

Thus, we have the following.

252

Solving Ax = b Using Gaussian Elimination With Complete Pivoting

To solve Ax = b using complete pivoting:

Step 1. Find the factorization MAQ = U by the factorization algorithm using

complete pivoting (Algorithm 5.2.4).

Step 2. Solve the triangular system (for y) (Algorithm 3.1.3):

Uy = b0;

computing b0 as shown above.

Step 3. Finally, recover x from y : x = Qy .

Implementation of Step 3

Since x = Qy = Q1 Q2 Qn;1y , the following scheme can be adopted to compute x from y in

Step 3.

Set wn = y .

For k = n ; 1; : : : 2; 1 do

wk = Qk wk+1

Then x = w1.

Note: Since Qk is a permutation matrix, the entries of wk are simply those of wk+1

reordered according to the permutation index.

Example 6.4.2

Solve

Ax = b

with 00 1 11 021

B CC B CC

A=B

@ 1 2 3 A; b=B

@6A

1 1 1 3

(a) using partial pivoting,

253

(b) using complete pivoting.

(a) Partial Pivoting

With the results obtained earlier (Section 5.2.2, Example 5.2.3), we compute

061 00 1 01

B 2 CC ; P = BB 1 0 0 CC

P1 b = B

@ A 1 @ A

3 0 0 1

061 0 1 0 01

M1P1b = B

B 2 CC ; M = BB 0 1 0 CC

@ A 1 @ A

;3 ;1 0 1

061 01 0 01

B CC B C

P2M1P1b = B @ 2 A ; P2 = B@ 0 1 0 CA

;3 0 0 1

061 01 0 01

B CC B C

b0 = M2P2M1P1b = B @ 2 A ; M2 = B@ 0 1 0 CA :

;1 0 1 1

The solution of the system

Ux = b0

01 2 3 10x 1 0 6 1

BB CC BB 1 CC BB CC

@ 0 1 1 A @ x2 A = @ 2 A

0 0 ;1 x3 ;1

is x1 = x2 = x3 = 1.

Using the results obtained in the example in Section 5.2.4, we have

061 00 1 01

B CC

P1 b = B

B C

@ 2 A ; P1 = B@ 1 0 0 CA

3 0 0 1

061 061

B CC B C

M1P1b = B @ 0 A ; P2M1P1b = B@ 1 CA

1 0

061

B CC

b0 = M2 P2 M1P1 b = B

@1A:

1

2

254

The solution of the system

Uy = b0

03 2 110x 1 061

BB 0 2 1 CC BB 1 CC BB CC

@ 3 3 A @ x2 A = @ 1 A

0 0 12 x3 1

2

is y1 = y2 = y3 = 1. Since fxk g; k = 1; 2; 3 is simply the rearrangement of fyk g, we have

x1 = x2 = x3 = 1.

Some Details of Implementation

Note that it is not necessary to store the vectors si and wi separately, because all we need is

the vector sn for partial pivoting and w1 for complete pivoting. So starting with x = b, each new

vector can be stored in x as it is computed.

Also note that if we use the practical algorithms, the matrices Pk ; Qk and Mk are not available

explicitly; they have to be formed respectively out of indices rk ; sk and the multipliers mik . In this

case, the statements for computing the sk 's and wk 's are to be modied accordingly.

Flop-count. We have seen in Chapter 5 (Section 5.2.1) that the triangularization process

using elementary matrices requires n33
ops. The triangular system Ux = b0 or Uy = b0 can be

solved using back substitution with n
ops and, the vector b0 can be computed with n2
ops,

2

2

taking into account the special structures of the matrices Mk and Pk . Recovering x from y in Step

3 of the Complete Pivoting process does not need any
ops. Note that x is obtained from y just by

reshuing the entries of y . Thus, the solution of the linear3 system Ax = b using Gaussian

elimination with complete or partial pivoting requires n3 +O(n2)
ops. However, Gaussian

elimination with complete pivoting requires about n3 comparisons to identify (n ; 1) pivots,

3

Round-o Property

In Chapter 5 (Section 5.3) we discussed the round-o property of Gaussian elimination for

triangularization of A. We have seen that the growth factor determines the stability of the

triangularization procedure. The next question is how does the growth factor aect the

solution procedure of Ax = b using such a triangularization. The answer is given in the

following:

255

Round-o Error Result for Linear System Problem with Gaussian

Elimination

It can be shown that the computed solution x^ of the linear system Ax = b, using

Gaussian elimination, satises

(A + E )^x = b

where kE k1 c(n3 + 3n2 ) kAk1 , and c is a small constant. For a proof see

Chapter 11, Section 11.4.

Remark: The size of the above bound is really determined by , since when n is not too large,

n3 can be considerably small compared to and can therefore be neglected. Thus the growth

factor is again the deciding factor.

6.4.4 Solution of Ax = b without Explicit Factorization

As we have just seen, the Gaussian elimination method for solving Ax = b comes in two stages.

First, the matrix A is explicitly factorized:

A = LU (without pivoting)

MA = U (with partial pivoting)

MAQ = U (with complete pivoting).

Second, the factorization of A is used to solve Ax = b. However, it is easy to see that these two

stages can be combined so that the solution can be obtained by solving an upper triangular

system by processing the matrix A and the vector b simultaneously. In this case, the

augmented matrix (A; b) is triangularized and the solution is then obtained by back substitution.

We illustrate this implicit process for Gaussian elimination with partial pivoting.

Algorithm 6.4.2 Solving Ax = b With Partial Pivoting Without Explicit Factorization

Given an n n matrix A and a vector b, the following algorithm computes the triangular

factorization of the augmented matrix (A; b) using Gaussian elimination with partial pivoting. A

is overwriten by the transformed triangular matrix and b is overwritten by the transformed vector.

The multipliers are stored in the lower-half part of A.

256

For k = 1; 2; : : :; n ; 1 do

(1) Choose the largest element in magnitude in the column k below the (k; k) entry; call it ar :k;k

ar = max fjaik j : i kg ;

k;k

If ar = 0, Stop.

k;k

(2) Otherwise, interchange the rows k and rk of A and the kth and rkth entries of b:

ar $ akj ; (j = k; k + 1; : : :; n)

k;j

br $ bk :

k

aik mik = ; aaik (i = k + 1; : : :; n)

kk

(4) Update the entries of A:

aij aij + mikakj (i = k + 1; : : :; n; j = k + 1; : : :; n)

bi bi + mik bk (i = k + 1; : : :; n):

Example 6.4.3

00 1 11 021

B 2 2 3 CC ;

A=B

B 6 CC :

b=B

@ A @ A

4 1 1 3

Step 1. The pivot entry is a31 = 4, r1 = 3

Interchange the rows 3 and 1 of A and the 3rd and 1st entries of b.

04 1 11 031

B C B C

AB @ 2 2 3 CA ; b B@ 6 CA

0 1 1 2

m21 = ; aa21 = ; 12

11

04 1 11 031

A A(1) = B

B 0 3 5 CC ; b b(1) = BB 9 CC

@ 2 2A @2A

0 1 1 2

257

Step 2. The pivot entry is a22 = 32

m32 = ; aa3222 = ; 23

04 1 1 1 031

B 3 5 CC B 9 CC

A A(2) = B

@0 2 2 A; b b(2) = B

@2A

0 0 ; 23 ;1

The reduced triangular system A(2) x = b(2) is:

4x1 + x2 + x3 = 3

3x + 5x = 9

2 2 2 3 2

2

; 3 x3 = ;1

The solution is:

x3 = 32 ; x2 = 12 ; x1 = 14

If we have

A = QR;

the system Ax = b then becomes

QRx = b

or

Rx = QT b = b0:

Thus, once we have the QR factorization of A, the solution of the system Ax = b can be obtained

just by solving the equivalent upper triangular system:

Rx = b0 = QT b:

Solving Ax = b using QR

To solve Ax = b using QR factorization,

1. Find the QR factorization of A : QT A = R (Sections 5.4 and 5.5).

2. >From b0 = QT b.

3. Solve Rx = b0.

258

Forming b0

To compute b0 we do not need Q explicitly. It can be computed from the factored form of Q.

For example, if the QR factorization is obtained using the Householder method (Chapter 5, Section

5.4.1), then

QT = Hn;1Hn;2 H2H1;

and b0 = QT b can be computed as

(1) y1 = b

(2) For k = 1; 2; : : :; n ; 1 do

yk+1 = Hkyk

(3) yn = b0:

Example 6.4.4

Consider 00 1 11 021

B 1 2 3 CC ;

A=B

B 6 CC :

b=B

@ A @ A

1 1 1 3

>From the example of Section 5.4.1, we know that the Householder method gives us

0 ;1:4142 ;2:1213 ;2:8284 1

B C

R = B @ 0 1:2247 1:6330 C A;

0 0 ;0:5774

0 0 ; p1 ; p1 1 0 0 ;0:7071 ;0:7071 1

B p1 1 2 2 C B C

H1 = B @ ; 2 2 ; 12 CA = B@ ;0:7071 0:5000 ;0:5000 CA ;

; p12 ; 12 12 ;0:7071 ;0:5000 0:5000

01 0 0

1

BB C

H2 = @ 0 ;0:1691 ;0:9856 C A

0 ;0:9856 0:1691

Compute b0:

021

B CC

y1 = b = B

@6A

3

0 0 ; p1 ; p1 1 0 2 1 0 ; p9 1 0 ;6:3640 1

B 2 2 CB C B

y2 = H1y1 = @ ; p2 p2 ; p2 A @ 6 A = @ p2 C

B C B C B

2C B C

1 1 1 1

A = B@ 0:0858 CA

; p12 ; p12 p12 3 ; p52 ;2:9142

259

0 ;6:3640 1

B C

b0 = y3 = H2y2 = B

@ 2:8560 CA

;0:5770

(Note that b0 above has been computed without explicitly forming the matrix Q.)

Solve:

Rx = b0

011

B 1 CC :

x = B

@ A

1

Flop-Count. If the Householder method is used to factor A into QR, the solution of

Ax = b requires 23 n3 + O(n2)
ops (Chapter 5, Section 5.4.1); on the other hand, the Givens

rotations technique requires about twice as many (Chapter 5, Section 5.5.1).

Round-o Property

We know from Chapter 5 that both the Householder and the Givens methods for QR factor-

ization of A are numerically stable. The back-substitution process for solving an upper triangular

system is also numerically stable (Section 3.1.3). Thus, the method for solving Ax = b using

QR factorization is likely to be stable. Indeed, this is so.

Round-o Error Result for Solving Ax = b using QR

It can be shown (Lawson and Hanson (SLP p. 92)) that the computed solution x^ is

the exact solution of

(A + E )^x = b + b;

where kE kF (3n2 + 41n)kAkF + O(2); and kbk (3n2 + 40n)kbk + O(2).

6.4.6 Solving Linear System with Right Multiple Hand Sides

Consider the problem

AX = B

260

where B = (b1; :::; bm) is an n m matrix (m n) and X = (x1; x2; :::; xm). Here bi , and

xi ; i = 1; :::; m are n-vectors.

The problem of this type arises in many practical applications (one such application has been

considered in Section 6.4.7: Computing the Frequency Response Matrix).

Once the matrix A has been factorized using any of the methods described in Chapter 5, the

factorization can be used to solve the m linear systems above. We state the procedure only with

partial pivoting.

Step 1. Factorize: MA = U , using Gaussian elimination with partial pivoting.

Step 2. Solve the m upper triangular systems:

Uxi = b0i = Mbi ; i = 1; :::; m:

Example: Solve AX = B where

01 2 41 01 21

B C B CC

A=B @ 4 5 6 CA ; B=B

@3 4A;

7 8 9 5 6

07 8 9 1 00 0 11

B 0 6 19 CC ;

U =B

B 1 0 ; 1 CC

M =B

@ 7 7A @ 7A

0 0 ;21 ;2 1 ;2

1 1

Solve: 0 5 1

B 0:2857 CC

Ux1 = Mb1 = B

@ A

0

0 0:3333 1

B C

x1 = B

@ 0:3333 CA

0

261

Solve: 0 6 1 0 ;0:6667 1

B 1:1429 CC ;

Ux2 = Mb2 = B

B 1:3333 CC :

x2 = B

@ A @ A

0 0

6.4.7 Special Systems

In this subsection we will study some special systems. They are

(1) Symmetric positive denite systems.

(2) Hessenberg systems.

(3) Diagonally Dominant Systems.

(4) Tridiagonal and block tridiagonal systems.

We have seen in Section 6.3 that these systems occur very often in practical applications such

as in numerical solution of partial dierential equations, etc. Indeed, it is very often said by

practicing engineers that there are hardly any systems in practical applications which

are not one of the above types. These systems therefore deserve a special treatment.

Symmetric Positive Denite Systems

First, we show that for a symmetric positive denite matrix A there exists a unique

factorization

A = HH T

where H is a lower triangular matrix with positive diagonal entries. The factorization is

called the Cholesky Factorization, after the French engineer Cholesky.

The existence of the Cholesky factorization for a symmetric positive denite matrix A can be

seen either via LU factorization of A or by computing the matrix H directly from the above relation.

To see this via LU factorization, we note that A being positive denite and, therefore having

positive leading principal minors, the factorization

A = LU

is unique. The upper triangular matrix U can be written as

U = DU1

Andre-Louis Cholesky (1875-1918) served as an ocer in the French military. His work there involved geodesy

and surveying.

262

where

D = diag(u11; u22; : : :unn)

= diag(a11; a(1) (n;1)

22 : : :ann );

A = LDU1:

Since A is symmetric, from above, we have

U1T DLT = LDU1

or

D = (U1T );1LDU1 (LT );1 :

The matrix (U1T );1L is a unit lower triangular matrix and the matrix U1(LT );1 is unit upper

triangular. It therefore follows from above that

(U1T );1L = U1 (LT );1 = I

that is,

U1 = LT ;

so A can be written as

A = LDLT ;

where L is unit lower triangular. Since the leading principal minors of A are a11; a11a(1) (1)

22 ; : : :; a11a22

: : :a(nnn;1); A is positive denite i the pivots a11 ; a(1) (n;1)

22 ; : : :; ann are positive. This means that when

A is positive denite the diagonal entries of D are positive and, therefore, we can write

D = D1=2 D1=2

where p q q

(n;1)

D1=2 = diag a11; a(1)

22 ; : : :; ann :

So,

A = LDLT = LD1=2D1=2LT = H H T

Note that the diagonal entries of H = LD1=2 are positive.

The above discussion can be summarized in the following theorem:

263

Theorem 6.4.1 (The Cholesky Factorization Theorem) Let A be a symmetric

positive denite matrix. Then A can be written uniquely in the form

A = HH T ;

where H is a lower triangular matrix with positive diagonal entries. An explicit

expression for H is given by

H = LD1=2;

where L is the unit lower triangular matrix in the LU factorization of A obtained

by Gaussian elimination without pivoting and

D1=2 = diag u111=2; : : :; u1nn=2 :

The above constructive procedure suggests the following algorithm to compute the Cholesky

factorization of a symmetric positive denite matrix A:

Algorithm 6.4.3 Gaussian Elimination for the Cholesky Factorization

Step 1. Compute the LU factorization of A using Gaussian elimination without pivoting.

Step 2. Form the diagonal matrix D from the diagonal entries of U :

D = diag(u11; u22; : : :; unn):

Example 6.4.5

Find the Cholesky factorization of A using Gaussian elimination without pivoting.

2 3

!

A =

3 5

1 0

!

;

L = M1 = 3

1

;2 1

!

1 0 2 3

! 2 3!

A = M1A = 3

(1) =

;2 1 3 5 0 12

264

2 3

!

U = 1 ; D = diag(2; 12 )

0 2

1 0

!

L = 3

2 1

1 0

! p2 0 ! p2 0 !

H = 3 1 1 = 3 1

2 0 p 2

p2 p2

Verify p ! p2 ! !

2 0 p32 2 3

HH T = p32 p12 p12

= = A:

0 3 5

Stability of Gaussian Elimination for the Cholesky Factorization

We now show that Gaussian elimination without pivoting is stable for symmetric positive def-

inite matrices by exhibiting some remarkable invariant properties of symmetric positive denite

matrices. The following example illustrates that even when there is a small pivot, Gaussian

elimination without pivoting does not give rise to the growth in the entries of the

matrices A(k). Let !

0:00003 0:00500

A= :

0:00500 1:0000

There is only one step. The pivot entry is 0.00003. It is small. The multiplier m21 is large:

m21 = ; aa21 = ; 00::00003

00500 = ; 500 :

3

11

But

0 : 00003 0 :00500

!

A(1) = :

0 0:166667

The entries of A(1) did not grow. In fact, max(a(1)ij ) = 0:166667 < max(aij ) = 1. This interesting

phenomenon of Gaussian elimination without pivoting applied to the 2 2 simple example (positive

denite) above can be explained through the following result.

let A(k) = (aij(k) ) be the reduced matrices obtained by applying Gaussian elimination

without pivoting to A. Then

1. each A(k); k = 1; : : :; n ; 1, is symmetric positive denite,

2. max ja(ijk)j max ja(ijk;1)j; k = 1; 2; : : :; n ; 1.

265

Proof. We prove the results just for the rst step of elimination, because the results for the other

steps then follow inductively.

After the rst step of elimination, we have

0a a a1n 1 0 a11 a12 a1n 1

BB 011 a(1)

12

C BB 0 CC

2n C

a(1)

A(1) = BBB .. 22.. . . . ... C

C B

CA = BB@ ...

CC :

CA

@ . . B

0 a(1)

n2 a(1)

nn 0

To prove that A(1) is positive denite, consider the quadratic form:

X

n X

n X n

n X

xT Bx = a(1)

ij xi xj = aij ; aia1aj 1 xixj

i=2 j =2 i=2 j =2 11

Xn X n X

n !2

= aij xi xj ; a11 x1 + aai1 xi

i=1 j =1 i=2 11

If A(1) is not positive denite, then there will exist x2 ; : : :; xn such that

X

n X

n

ij xi xj 0:

a(1)

i=1 j =1

With these values of x2 through xn, if we dene

X

n a

i1

x1 = ; xi ;

i=2 a11

then the quadratic form

X

n X

n

aij xixj 0

i=1 j =1

which contradicts the fact that A is positive denite. Thus A(1) is positive denite.

Also, we note

ii aii ; i = 1; 2; : : :; n;

a(1)

for

0 a(1) ai1 a (because a > 0).

2

ii = aii ; ii

a11 11

is less than or equal to the corresponding diagonal entry of

A. Since the largest element (in magnitude) of a symmetric positive denite matrix lies on the

diagonal, we have max ja(1)

ij j max jaij j.

From Theorem 6.4.2, we immediately conclude that if jaij j 1; then ja(ijk)j 1. This

means that the growth factor in this case is 1.

266

The Growth Factor and Stability of Guassian Elimination for a Positive

Denite Matrix

The growth factor of Gaussian elimination without pivoting for a sym-

metric positive denite matrix is 1. Thus, Gaussian elimination without

pivoting for a symmetric positive denite matrix is stable.

Example 6.4.6

05 1 11

B C

A = B@ 1 1 1 CA

0 15 11 51 1

B 4 4 CC

A(1) = B

@0 5 5 A:

0 4 24

5 5

The leading principal minors of A(1) are 5, 4, 16. Thus A(1) is symmetric positive denite. Also,

max(a(1) 24

ij ) = 5 < max(aij ) = 5:

05 1 11

B 4 4 CC

A(2) = B@0 5 5 A

0 0 4

The leading principal minors of A(2) are 5, 4, 16. Thus A(2) is also positive denite. Furthermore,

max(a(2) (1) 5

ij ) = 5 = max(aij ). The growth factor = 5 = 1.

We have seen in the beginning of this section that a symmetric matrix A having nonzero leading

principal minors can be written uniquely in the form

A = LDLT ;

where L is unit lower triangular and D has positive diagonal entries. Furthermore, this decomposi-

tion can be obtained in a numerically stable way using Gaussian elimination without pivoting.

267

In several circumstances one prefers to solve the symmetric positive denite system Ax = b directly

from the factorization A = LDLT , without computing the Cholesky factorization. The advantage

is that the process will not require computations of any square roots. The process then

is as follows:

Gaussian Elimination for the Symmetric Positive Denite System Ax = b

Step 1. Compute the LDLT factorization of A:

A = LDLT :

Step 2. Solve

Lz = b

Step 3. Solve

Dy = z

Step 4. Solve

LT x = y

Example 6.4.7

2 3

! 5

!

A= ; b= :

3 5 8

Step 1. ! !

1 0 2 0

L= 3 ; D= :

2 1 0 21

Step 2. Solve Lz = b

1 0

! z! 5

!

1

= :

3 1

2 z2 8

z1 = 5; z2 = 1 21

Step 3. Solve Dy = z

2 0

! y! 5

!

1

=

0 21 y2 13

2

268

y1 = 52

y2 = 1

Step 4. Solve LT x = y

1 32

! x! 5 !

1 2

=

0 1 x2 1

x2 = 1

x2 = 1

We now show how the Cholesky factorization can be computed directly from A = HH T . >From

A = HH T

0h 0 0 10h h h 1

0a a a1n 1 B 11

C B 11 21 n1

C

BB 11 12 CC B h21 h22 0 0 C B 0 h22 hn2 CC

B C B

BB a.21 a22 . a2n C BB .. . . CC BB .. . . CC

B@ .. .. CC = B . . CC BB . . CC

A B B ..

. ... CA B@ ..

. ... CA

an1 an2 ann @

hn1 hnn 0 0 hnn

we have

h11 = pa11; hi1 = hai1 ; i = 2; : : :; n

11

X

i X

j

h2ik = aii; aij = hik hjk; j < i:

k=1 k=1

This leads to the following algorithm, known as the Cholesky Algorithm.

Algorithm 6.4.4 Cholesky Algorithm

Given an n n symmetric positive denite matrix A, the following algorithm computes the

Cholesky factor H . The matrix H is computed row by row and is stored in the lower triangular

part of A.

For k = 1; 2; : : :; n do

*This algorithm in some applications (such as in statistics) is known as the square-root algorithm. A square-

root free algorithm, however, can be developed.

269

For i = 1; 2; : : :; k ; 1 do

aki hki = h1 aki ; Pij;=11 hij hkj

q ii

;1 h2

kj

Remark:

X

0

1. In the above pseudocode, ( ) 0. Also when k = 1, the inner loop is skipped.

j =1

2. The Cholesky factor H is computed row by row.

3. The positive deniteness of A will make the quantities under the square-root sign positive.

Round-o property. Let the computed Cholesky factor be denoted by H^ . Then, it can be

shown (Demmel 1989) that

A + E = H^ (H^ )T ;

where jeij j (n + 1) (aii ajj )1=2, and

1 ; (n + 1)

E = (eij ):

Thus, the Cholesky Factorization Algorithm is Stable

Having the Cholesky factorization A = HH T at hand, the positive denite linear system Ax = b

can now be solved by solving the lower triangular system Hy = b rst, followed by the upper

triangular system H T x = y .

Algorithm 6.4.5 The Cholesky Algorithm for the Positive Denite System Ax = b

Step 1. Find the Cholesky factorization of

A = HH T ;

using Algorithm 6.4.4.

Step 2. Solve the lower triangular system for y:

Hy = b:

270

Step 3. Solve the upper triangular system for x:

H T x = y:

Example 6.4.8

Let 01 1 1 1 031

B 1 5 5 CC ;

A=B

B 11 CC :

b=B

@ A @ A

1 5 14 20

A. The Cholesky Factorization

1st row: (k = 1)

h11 = 1:

2nd row: (k=2)

h21 = ha21 = 1

q11 p

h22 = a22 ; h221 = 5 ; 1 = 2

(Since the diagonal entries of H have to be positive, we take the + sign.)

3rd row: (k=3)

h31 = ah31 = 1

n

h32 = h (a32 ; h21h31) = 12 (5 ; 1) = 2

1

q22 p p

h33 = a33 ; (h231 + h232) = 14 ; 5 = 9

(we take the + sign)

h033 = +3 1

1 0 0

B C

H=B

@ 1 2 0 CA

1 2 3

271

B. Solution of the Linear System Ax = b

(1) Solution of Hy = b

01 0 01 0y 1 0 3 1

B

B 1 2 0

CC BB y1 CC = BB 11 CC

@ A @ 2A @ A

1 2 3 y3 20

y1 = 3; y2 = 4; y3 = 3

(2) Solution of H T x = y

01 1 11 0x 1 031

BB 0 2 2 CC BB x1 CC = BB 4 CC

@ A @ 2A @ A

0 0 3 x3 3

x3 = 1; x2 = 1; x3 = 1;

Flop-Count. The Cholesky algorithm requires n6
ops to compute H; one half of the number

3

of
ops required to do the same job using LU factorization. Note that the process will also 2require

n square roots. The solution of each triangular system Hy = b and H T x = y requires n2
ops.

Thus the solution of the positive denite system Ax = b using the Cholesky algorithm

n 3

requires 6 + n
ops and n square roots.

2

Round-o property. If x^ is the computed solution of the system Ax = b using the Cholesky

algorithm, then it can be shown that x^ satises

(A + E )^x = b

where kE k2 ckAk2; and c is a small constant depending upon n. Thus the Cholesky algo-

rithm for solving a symmetric positive denite system is quite stable.

Relative Error in the Solution by the Cholesky Algorithm

Let x^ be the computed solution the symmetric positive denite system of Ax = b

using the Cholesky algorithm followed by triangular systems solutions as described

above, then it can be shown that

kx ; x^k2 Cond(A):

kx^k2

(Recall that Cond(A) = kAk kA;1k.)

272

Remark: Demmel (1989) has shown that the above bound can be replaced by O()Cond(A~),

where A~ = D;1AD;1 ; D = diag(pa11; : : :; pann). The latter may be much better than the previous

one, since Cond(A~) may be much smaller than Cond(A). (See the discussions on conditioning and

scaling in Section 6.5.)

Hessenberg System

Consider the linear system

Ax = b;

where A is an upper Hessenberg matrix of order n. If Gaussian elimination with partial pivoting

is used to triangularize A; and if jaij j 1; then ja(ijk)j k + 1; (Wilkinson AEP p. 218). Thus we

can state the following:

System

The growth factor for a Hessenberg matrix using Gaussian elimination with partial

pivoting is bounded by n. Thus a Hessenberg system can be safely solved

using partial pivoting.

Flop-count: It requires only n2
ops to solve a Hessenberg system, signicantly less than

n3

3
ops required to solve a system with an arbitrary matrix. This is because at each step of

elimination during triangularization process, only one element needs to be eliminated and since

n 2

there are (n ; 1) steps, the triangularization process requires about
ops. Once we have the

2

factorization

MA = U;

the upper triangular system

Ux = Mb = b0

can be solved in n
ops. Thus a Hessenberg system can be solved with only n2
ops in

2

2

a stable way using Gaussian elimination with partial pivoting.

Example 6.4.9

273

Triangularize 0 1 2 31

B C

A=B

@ 2 3 4 CA

0 5 6

using partial pivoting.

Step 1.

00 1 01

B C

P1 = B

@ 1 0 0 CA

0 0 1

02 3 41

B 1 2 3 CC

P1 A = B

@ A

0 5 6

1 0

!

c1 =

M

; 21 1

0 1 0 01

B 1 C

M1 = B

@ ; 2 1 0 CA

0 0 1

0 1 0 0102 3 41 02 3 41

B 1 CB C B C

@ ; 2 1 0 CA B@ 1 2 3 CA = B@ 0 21 1 CA :

M1P1A = A(1) = B

0 0 1 0 5 6 0 5 6

Step 2.

01 0 01

B C

P2 = B

@ 0 0 1 CA

0 1 0

02 3 41

P2A(1)

B0 5 6C

= B C

@ A

0 1 1

2

1 0

!

c2 =

M

; 101 1

01 0 0

1

B0

M2 = B 1 0C

C

@ A

0 ; 101 1

274

01 0 0

10 2 3 4

1 0 2 3 4

1

B CC BB CC BB C

U = A(2) = M2P2A(1) = B

@0 1 0A@0 5 6A = @0 5 6C A

0 ; 101 1 1

0 2 1 2

0 0 5

00 1 0

1

B0

M = M2P2M1P1 = B 0 1 C

C

@ A

1 ; 21 ; 101

= max(66; 6; 6) = 1.

In control theory it is often required to compute the matrix

G(j! ) = C (j!I ; A);1B

for many dierent values of ! , to study the response of a control system. The matrices A; B; C here

are matrices of a control system and are of order n n; n m, and r n, respectively (m n).

The matrix G(j! ) is called the frequency response matrix.

Since computing (j!I ; A);1B is equivalent to solving m linear systems (see Section 6.5.1):

(j!I ; A)xi = bi; i = 1; : : :; m

3

where bi is the ith column of B ; for each ! , we need about n3 + mn2 + rnm ops to compute

G(j! ) (using Gaussian elimination with partial pivoting). Typically, G(j! ) needs to be computed

for a very large number of ! , and thus, such computation will be formidable.

On the other hand, if A is transformed initially to a Hessenberg matrix:

A = PHP T ;

then

G(j! ) = C (j!I ; H );1P T B:

The computation of (j!I ; H );1 P T B will now require solutions of m Hessenberg systems, each

of which will require only n2 ops. Then, for computing G(j! ) = C (j!I ; H );1P T B for each ! ,

there will be a saving of order O(n) for each ! . This count does not include reduction to Hessenberg

275

form. Note that the matrix A is transformed to a Hessenberg matrix once and the same

Hessenberg matrix is used to compute G(j!) for each !.

Thus, the computation that uses an initial reduction of A to a Hessenberg matrix

is much more ecient. Moreover, as we have seen before, reduction to Hessenberg form and

solutions of Hessenberg systems using partial pivoting are both stable computations. This approach

was suggested by Laub (1981).

in control theory. Numerically viable algorithms for important control problems such as controlla-

bility and observability, matrix equations (Lyapunov, Sylvester, Observer-Sylvester, Riccati, etc.)

routinely transform the matrix A to a Hessenberg matrix before actual computations start. Readers

familiar and interested in these problems might want to look into the book \Numerical Methods

in Control Theory" by B. N. Datta for an account of these methods. For references on the

individual papers, see Chapter 8. The recent reprint book \Numerical Linear Algebra Tech-

niques for Systems and Control," edited by R. V. Patel, Alan Laub and Paul van Dooren,

IEEE Press, 1994, also contains all the relevant papers.

Diagonally Dominant Systems

Recall that a matrix A = (aij ) is column diagonally dominant if

ja11j > ja21j + ja31j + + jan1j

ja22j > ja12j + ja32j + + jan2j

..

.

jannj > ja1nj + ja2nj + + jan;1;nj

A column diagonally dominant matrix, like a symmetric positive denite matrix, pos-

sesses the attractive property that no row-interchanges are necessary at any step

during the triangularization procedure for Gaussian elimination with partial pivoting.

The pivot element is already there in the right place. Thus, to begin with, at the rst step, a11 being

the largest in magnitude of all the elements in the rst column, no row-interchange is necessary.

At the end of the rst step, we then have

0a a a1n 1 0a a a 1

1n

BB 011 a(1)

12

C BB 011 12 CC

2n C

a(1)

A(1) = BBB .. 22.. C

.. C =BB .. 0C

C

@ . . . C

A @ B . A CA

0 a(1)

n2 a(1)

nn 0

276

and, it can be shown [exercise] that A0 is again column diagonally dominant and therefore a(1)22

is the pivot for the second step. This process can obviously be continued, showing that pivoting

is not needed for column diagonally dominant matrices. Furthermore, the following can be easily

proved.

Dominant Systems

The growth factor for a column diagonally dominant matrix with partial pivoting

is bounded by 2 (exercise #16): 2.

Thus, for column diagonally dominant systems, Gaussian elimination with

partial pivoting is stable.

Example 6.4.10

0 5 1 11

B 1 5 1 CC :

A=B

@ A

1 1 5

Step 1. 05 1 1 1

B 0 24 4 CC

A(1) = B

@ 5 5A

0 4 25

5 5

Step 2. 05 1 1 1

B 24 4 CC

A(2) = B

@0 5 5 A:

0 0 14

3

The growth factor = 1.

(Note that for this example, the matrix A is column diagonally dominant and positive denite;

thus = 1.)

The next example shows that the growth factor of Gaussian elimination for a col-

umn diagonally dominant matrix can be greater than 1, but is always less than 2.

Example 6.4.11

277

5 ;8

!

A= :

1 10

5 ;8

!

A =

(1) :

0 11:6

The growth factor = max(10; 11:6) = 11:6 = 1:16.

10 10

Tridiagonal Systems

The LU factorization of a tridiagonal matrix T , when it exists, may yield L and U having

very special simple structures: both bidiagonal, L having 1's along the main diagonal and the

superdiagonal entries of U the same as those of T . Specically, if we write

0a b1 0

1

BB c1 ... ... .. C

. C

T =BBB ..2 ... ... b C

C

@. n;1 C

A

0 cn an

01 10u b1 1

B

B `2 ... 0 CC BB 1 ... ... 0 CC

B

B ... ...

CC BB ... ...

CC

B

=B CC BB CC :

B CC BB C

B

@ 0 ... ... A@ 0 ... bn;1 C A

`n 1 un

By equating the corresponding elements of the matrices on both sides, we see that

a1 = u1

ci = `i ui;1; i = 2; : : :; n

ai = ui + `i bi;1; i = 2; : : :; n;

from which f`ig and fui g can be easily computed:

Computing LU Factorization of a Tridiagonal Matrix

u1 = a1

For i = 2; : : :; n do

`i = ci

ui;1

ui = ai ; `ibi;1.

278

Flop-count: The above procedure only takes (2n-2)
ops.

Solving a Tridiagonal System

Once we have the above simple factorization of T; the solution of the tridiagonal

system Tx = b can be found by solving the two special bidiagonal systems:

Ly = b

and

Ux = y

.

Flop-count: The solutions of these two bidiagonal systems also require (2n-2)
ops. Thus,

a tridiagonal system can be solved by the above procedure in only 4n-4
ops, a very

cheap procedure indeed.

Stability of the Process: Unfortunately, the above factorization procedures breaks down if

any ui is zero. Even if all ui are theoretically nonzero, the stability of the process in general

cannot be guaranteed. However in many practical situations, such as in discretizing Poisson's

equation etc., the tridiagonal matrices are symmetric positive denite, in which cases, as we

have seen before, the above procedure is quite stable.

In fact, in the symmetric positive denite case, this procedure should be preferred

over the Cholesky-factorization technique, as it does not involve computations of any square

roots. It is true that the Cholesky factorization of a symmetric positive denite tridiagonal matrix

can also be computed in O(n)
ops however, an additional n square roots have to be computed

(see, Golub and Van Loan MC 1984, p. 97).

In the general case, to maintain the stability, Gaussian elimination with partial

pivoting should be used.

If jaij; jbij, jcij 1 (i ; 1; : : :; n), then it can be shown (Wilkinson AEP p. 219) that the entires

of A(k) at each step will be bounded by 2.

279

Growth Factor and Stability of Gaussian Elimination for a Tridiagonal

System

The growth factor for Gaussian elimination with partial pivoting for a tridiagonal

matrix is bounded by 2:

2:

Thus, Gaussian elimination with partial pivoting for a tridiagonal system

is very stable.

The
op-count in this case is little higher; it takes about 7n
ops to solve the system Tx = b (3n

ops for decomposition and 4n for solving two triangular systems), but still an O(n) procedure.

If T is symmetric, one naturally wants to take advantage of the symmetry; however, Gaussian

elimination with partial pivoting does not preserve symmetry. Bunch (1971,1974) and

Bunch and Kaufman (1977) have proposed symmetry-preserving algorithms. These algorithms can

be arranged to have
op-count comparable to that of Gaussian elimination with partial pivoting

and require less storage than the latter. For details see the papers by Bunch and Bunch and

Kaufman.

Example 6.4.12 Triangularize

0 0:9 0:1 0 1

B C

A=B

@ 0:8 0:5 0:1 CA ;

0 0:1 0:5

using (i) the formula A = LU and (ii) Gaussian elimination.

i. From A = LU

u1 = 0:9

i=2:

`2 = uc2 = 00::89 = 98 = 0:8889

1

u2 = a2 ; `2b1 = 0:5 ; 89 0:1 = 0:4111

280

i=3:

`3 = uc3 = 00::41

1 = 0:2432

2

u3 = a03 ; `3b2 = 0:5 ; 0:24 10:1 = 0:4757

1 0 0

B C

L = B

@ 0:8889 1 0C A;

0 0:90 0:01:2432 00:1 1

B C

U = B

@ 0 0:4111 0:1 CA

0 0 0:4757

0 0:9 0:1 0 1

B C

A(1) = B

@ 0 0:4111 0:1 CA

0 0:1 0:5

0 0:9 0:1 0

1

BB CC

A = @ 0 0:4111 0:1

(2)

A=U

0 0 1 0 0 0:4757

0

1 0 1 0 0

1

B C B C

L = B

@ ;m21 1 0 CA = B@ 0:8889 1 0 CA :

0 ;m32 1 0 0:2432 1

In this section we consider solving the block tridiagonal system:

Tx = b;

where T is a block tridiagonal matrix and b = (b1; b2; : : :; bn)T is a block vector. The number of

components of the block vector bi is the same as the dimension of the ith diagonal block matrix in

T.

281

A. Block LU Factorization

The factorization procedure given in the beginning of this section may be easily extended to

the case of the Block Tridiagonal Matrix

0A B1 1

1

B

B C2 ... ... 0 CC

B

B ... ... ...

CC

T =B

B CC :

B C

B

@ 0 ... ... BN ;1 C A

CN AN

Thus if T has the block LU factorization:

0I 10U B1 01

BB L ... 0

CC BB 1 ... ... CC

..

.

BB 2 ... ...

CC BB ... ...

CC

..

T =B BB CC BB .CC = LU;

CC BB .. C

B@ 0 ... ... A@ .

... BN ;1 C A

LN I 0 UN

then the matrices Li ; i = 2; : : :; N and Ui ; i = 1; N can be computed as follows:

Algorithm 6.4.6 Block LU Factorization

Set

U1 = A1

For i = 2; : : :; N do

(1) Solve for Li :

UiT;1 Li = Ci

(2) Compute Ui:

Ui = Ai ; LiBi;1 :

Once we have the above factorization, we can nd the solution x of the system

Tx = b

by solving Ly = b and Ux = y successively. The solution of Ly = b can be achieved by Block

Forward Elimination, and that Ux = y can be achieved by Block Back Substitution.

282

Algorithm 6.4.7 Block Forward Elimination

Set L1y0 = 0.

For i = 1; : : :; n do

yi = bi ; Liyi;1.

Set BN xN +1 = 0.

For i = N; : : :; 1 do

Ui xi = yi ; Bixi+1

Example 6.4.13

0 4 ;1 1 0 1 041

BB C BB CC

B ;1 4 0 1 C C BB 4 CC

A=B C

B@ 1 0 2 ;1 CA ; b = B@ 2 CA

0 1 ;1 2 2

4 ;1

! 2 ;1

! 1 0

!

A1 = ; A2 = ; B1 =

;1 4 ;1 2 0 1

4

! 2

!

b1 = ; b2 =

4 2

Block LU Factorization

!

4 ;1

Set U1 = A1 = :

;1 4

i=2:

(1) Solve for L2:

1 0

!

U1L2 = I2 =

0 1

0:2667 0:0667

!

L2 = U1;1 =

0:667 0:2667

283

(2) Compute U2 from

U2 = A2 ; L2 B1! ! !

2 ;1 0:2667 0:0667 1 0

= ;

;1 2 0:0667

! 0:2667 0 1

1:7333 ;1:0667

=

;1:0667 1:7333

Block Forward Elimination

4

!

y1 = b1 ; L1 y0 = b1 =

4

0:6667

!

y2 = b2 ; L2 y1 =

0:6667

Block Back Substitution

0:6667

!

U2x2 = y2 ; B2x3 = y2 = (B2x3 = 0)

0:6667

0:6667

! 0:9286 0:5714 ! 0:6666 ! 1 !

x2 = U2; 1 = =

0:6667 0:5714 0:9286 0:6667 1

4

! 1

! 3

!

U1x1 = y1 ; B1x2 = ; =

4 1 3

3

! !

0:2667 0:0667 3

! 1!

x1 = U1;1 = = :

3 0:0667 0:2667 3 1

Frequently in practice, the block tridiagonal matrix of a system may possess some special

properties that can be exploited to reduce the system to a single lower-order system by using a

technique called Block Cyclic Reduction. For details see the book by Golub and Van Loan (MC

1983 pp. 110{117) and the references therein.

6.4.8 Scaling

If the entires of the matrix A vary widely, then there is a possibility that a very small number

needs to be added to a very large number during the process of elimination. This can in uence the

284

accuracy greatly, because, \the big one can kill the small one." To circumvent this diculty,

often it is suggested that the rows of A are properly scaled before the elimination process begins.

The following simple example illustrates this.

Consider the system ! ! 106 !

10 106 x1

= :

1 1 x2 2

Apply now Gaussian elimination with pivoting. Since 10 is the largest entry in the rst column,

no interchange is needed. We have, after the rst step of elimination,

10 106

! x! 106

!

1

=

0 ;105 x2 ;105

1

!

which gives x2 = 1 x1 = 0. The exact solution, however, is . Note that the above system is

1

exactly equal to the system in Section 6.3.4, with the rst equation multiplied by 106. Therefore,

even choosing the false pivot 10 did not help us. However, if we scale the entries of the rst row

of the matrix by dividing it by 106 and then solve the system (after modifying the 1st entry of

b appropriately) using partial pivoting, we will then have the accurate solution, as we have seen

before.

Scaling of the rows of a matrix A is equivalent to nding an invertible diagonal matrix D1 so

that the largest element (in magnitude) in each row of D1;1A is about the same size. Once such D1

is found, the solution of the system

Ax = b

is found by solving the scaled system

~ = ~b

Ax

where

A~ = D1;1 A; ~b = D1;1b:

The process can be easily extended to scale both the rows and columns of A. Mathematically, this

is equivalent to nding diagonal matrices D1 and D2 such that the largest (in magnitude) element

in each row and column of D1;1AD2 lies between two xed numbers, say, 1 and 1, where is the

base of the number system. Once such D1 and D2 are found, the solution of the system

Ax = b

is obtained by solving the equivalent system

~ = ~b;

Ay

285

and then computing

x = D2y;

where

A~ = D1;1AD2

~b = D1;1 b:

The above process is known as equilibration (Forsythe and Moler, CSLAS, pp. 44{45).

In conclusion, we note that scaling or equilibration is recommended in general, and

should be used only on an ad-hoc basis depending upon the data of the problem. \The

round-o error analysis for Gaussian elimination gives the most eective results when a matrix is

equilibrated." (Forsythe and Moler CSLAS, p. )

We have just seen that the method for solving Ax = b using the QR factorization is about twice as

expensive as the Gaussian elimination with partial pivoting method if Householder method is used

to factor A and, about four times as expensive as this method if Givens rotations are used. On

the other hand, the QR factorization technique is unconditionally stable, whereas, from theoretical

point of view, with Gaussian elimination with partial or complete pivoting, there is always some

risk involved. Thus, if stability is the main concern and the cost is not a major factor, one can

certainly use the QR factorization technique. However, considering both eciency and stability

from a practical point of view, it is currently agreed that Gaussian elimination with partial

pivoting is the most practical approach for solution of Ax = b. If one really insists on

using an orthogonalization technique, certainly the Householder method is to be preferred over the

Givens method. Furthermore, Gaussian elimination without pivoting should not be used

unless the matrix A is symmetric positive denite or diagonally dominant.

We summarize the above discussion in the following two tables.

286

TABLE 6.1

(COMPARISON OF DIFFERENT METHODS

FOR LINEAR SYSTEM PROBLEM WITH ARBITRARY MATRICES)

(APPROX.)

Gaussian Elimination

Without Pivoting n3 Arbitrary Unstable

3

Gaussian Elimination

With Partial n3 (+O(n2 ) 2n;1 Stable in practice

3

Pivoting comparisons)

Gaussian Elimination

With Complete n3 (+O(n3 ) fn 21 3 21 n(1=n;1)g1=2 Stable

3

Pivoting comparisons)

QR Factorization

Using Householder 2n3 + (n None Stable

3

Transformations square roots)

QR Factorization

Using Givens 4n3 + ( n2 None Stable

3 2

Rotations square roots)

287

TABLE 6.2

(COMPARISON OF DIFFERENT METHODS FOR LINEAR

SYSTEM PROBLEM WITH SPECIAL MATRICES)

FLOP-COUNT GROWTH

MATRIX TYPE METHOD (APPROX.) FACTOR STABILITY

3

Symmetric Positive 1) Gaussian Elimination 1) n3 1) = 1

Denite without Pivoting Stable

3

1) Cholesky 1) n6 + (n 1) None

square roots)

Diagonally Gaussian Elimination

Dominant with partial n3 2 Stable

3

pivoting

Hessenberg Gaussian Elimination

with Partial n2 n Stable

Pivoting

Tridiagonal Gaussian Elimination O(n) 2 Stable

with Partial Pivoting

Associated with the problem of solving the linear system Ax = b are the problems of nding the

determinant, the inverse and the leading principal minors of the matrix A. In this section

we will see how these problems can be solved using the methods of various factorizations developed

earlier.

The inverse of a matrix A very seldom needs to be computed explicitly. Most computa-

tional problems involving inverses can be reformulated in terms of solution of linear systems. For

example, consider

1. A;1b (inverse times a vector)

2. A;1B (inverse times a matrix)

3. bT A;1 c (vector times inverse times a vector).

288

The rst problem, the computation of A;1 b; is equivalent to solving the linear system:

Ax = b:

Similarly, the second problem can be formulated in terms of solving sets of linear equations. Thus,

if A is of order n n and B is of order n m, then writing C = A;1 B = (c1; c2; : : :; cm). We see

that the columns c1 through cm of C can be found by solving the systems

Aci = bi; i = 1; : : :; m;

where bi ; i = 1; : : :; m are the successive columns of the matrix B .

The computation of bT A;1c can be done in two steps:

1. Find A;1c; that is, solve the linear system: Ax = c

2. Compute bT x.

As we will see later in this section, computing A;1 is three times as expensive as solving

the linear system Ax = b. Thus, all such problems mentioned above can be solved much more

eciently by formulating them in terms of linear systems rather than naively solving them using

matrix inversion.

The explicit computation of the inverse should be avoided whenever pos-

sible. A linear system should never be solved by explicit computation of

the inverse of the system matrix.

Having said that the most computational problems involving inverses can be reformulated in

terms of linear systems, let us remark that there are, however, certain practical applications where

the inverse of a matrix needs to be computed explicitly; and, in fact, the entries of the inverse

matrices in these applications have some physical signicance.

Example 6.5.1

Consider once again the spring-mass problem discussed in Section 6.3.3. The (i; j )th entry of

the inverse of the stiness matrix K tells us what the displacement of the mass i will be if a unit

external force is imposed on mass j . Thus the entries of K will tell us how the systems components

will respond to external forces.

289

Let's take a specic instance when the spring constants are all equal:

k1 = k2 = k3 = 1

Then 0 2 ;1 0 1 01 1 11

B ;1 2 ;1 CC ;

K=B

B 1 2 2 CC :

K ;1 = B

@ A @ A

0 ;1 1 1 2 3

Since the entries of the rst column of K ;1 are all 1's, it means that a downward unit load to the

rst mass will make a displacement of all the masses by 1 inch downward. Similar interpretations

can be given for the elements of the other columns of K ;1 .

Some Easily Computed Inverses

Before we discuss the computation of A;1 for an arbitrary matrix A, we note that the inverses

of some well-known matrices can be trivially computed.

(1) The inverse of the elementary lower triangular matrix M = I ; meTk is given by M ;1 = I + meTk

(2) The inverse of an orthogonal matrix Q is its transpose QT (note that a Householder matrix

and a permutation matrix are orthogonal).

(3) The inverse of triangular matrix T of one type is again a triangular matrix of the same type,

the diagonal entries being the reciprocals of the diagonal entries of the matrix T .

In many applications once the inverse of a matrix A is computed, it is required to nd the inverse

of another matrix B which diers from A only by a rank-one perturbation. The question naturally

arises if the inverse of B can be computed without starting all over again. That is, if the inverse of

B can be found using the inverse of A which has already been computed. The Sherman-Morrison

formula shows us how to do this.

290

The Sherman-Morrison Formula

If u and v are two n-vectors and A is a nonsingular matrix, then

(A ; uv T );1 = A;1 + (A;1uv T A;1 )

where

= (1 ; v T1A;1u) ; if v T A;1u 6= 1.

b) The Sherman-Morrison formula shows how to compute the inverse of the matrix

obtained from a matrix A by rank-one change, once the inverse of A has been computed,

without explicitly computing the inverse of the new matrix.

The Sherman-Morrison formula can be extended to the case where U and V are matrices. This

generalization is known as the Woodbury formula:

(5) (A ; UV T );1 = A;1 + A;1 U (I ; V T A;1 U );1V T A;1, if I ; V T A;1 U is nonsingular.

Example 6.5.2

Given

01 1 11

B C

A = B

@ 2 4 5 CA ;

6 7 8

0 ;3 ;1 1 1

B C

A;1 = B

@ 14 2 ;3 CA

;10 ;1 2

nd (A ; uv T );1, where u = v = (1; 0; 0)T .

= 1 ; vT1A;1 u = 41

0 ;3 ;1 1 1

; ; ; BB 74 43 41 CC

A + A uv A = @ 2 ; 2 2 A :

1 1 T 1

; 52 32 ; 12

291

Thus 0 ;3 1 1 1

B 47 ;43 14 CC

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.