Sie sind auf Seite 1von 300

Algebra

M. Anthony, M. Harvey
MT1173

2013

Undergraduate study in
Economics, Management,
Finance and the Social Sciences
This subject guide is for a 100 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 4 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk

This guide was prepared for the University of London International Programmes by:
Martin Anthony, Professor of Mathematics, and Michele Harvey, Course Leader, Department of
Mathematics, London School of Economics and Political Science.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the authors are unable to enter into any correspondence relating to, or arising from, the guide. If you have any comments on this subject guide, favourable or unfavourable, please use the form at the back of this guide.

University of London International Programmes


Publications Office
Stewart House
32 Russell Square
London WC1B 5DN
United Kingdom
www.londoninternational.ac.uk
Published by: University of London
University of London 2013
The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form,
or by any means, without permission in writing from the publisher. We make every effort to
respect copyright. If you think we have inadvertently used your copyright material, please let
us know.

Contents

Contents
Preface

1 Introduction

1.1

This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.2

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.3

Topics covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.1

Recommended reading . . . . . . . . . . . . . . . . . . . . . . . .

1.2.2

Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.1

The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.2

Making use of the Online Library . . . . . . . . . . . . . . . . . .

1.4

Using the guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5

Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.6

The use of calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7.1

Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7.2

Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7.3

Basic notations . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.7.4

Simple algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.7.5

Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

1.7.6

Quadratic equations . . . . . . . . . . . . . . . . . . . . . . . . .

12

1.7.7

Polynomial equations . . . . . . . . . . . . . . . . . . . . . . . . .

14

1.7.8

Comments on selected activities . . . . . . . . . . . . . . . . . . .

15

1.2

1.3

2 Matrices and vectors

17

Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.1

What is a matrix?

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2

Matrix addition and scalar multiplication . . . . . . . . . . . . . . . . . .

18

Contents

2.3

Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.4

Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.5

Matrix inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.5.1

Properties of the inverse . . . . . . . . . . . . . . . . . . . . . . .

26

Powers of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.6.1

Properties of powers . . . . . . . . . . . . . . . . . . . . . . . . .

26

Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.7.1

Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.8.1

The inner product of two vectors . . . . . . . . . . . . . . . . . .

30

Developing geometric insight vectors in R2 and R3 . . . . . . . . . . . .

32

2.9.1

Vectors in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2.9.2

Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

2.9.3

Vectors in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

2.10 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.10.1 Lines in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.10.2 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.11 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

2.12 Lines and hyperplanes in Rn . . . . . . . . . . . . . . . . . . . . . . . . .

49

2.12.1 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

52

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

2.6
2.7
2.8
2.9

3 Systems of linear equations


Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

3.1

Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . .

65

3.2

Row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

3.3

Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

3.3.1

The algorithm reduced row echelon form . . . . . . . . . . . .

69

3.3.2

Consistent and inconsistent systems . . . . . . . . . . . . . . . . .

72

3.3.3

Linear systems with free variables . . . . . . . . . . . . . . . . . .

74

3.3.4

Solution sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

Homogeneous systems and null space . . . . . . . . . . . . . . . . . . . .

77

3.4

ii

65

Contents

3.4.1

Homogeneous systems . . . . . . . . . . . . . . . . . . . . . . . .

77

3.4.2

Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

84

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4 Matrix inversion and determinants

89

Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

4.1

Matrix inverse using row operations . . . . . . . . . . . . . . . . . . . . .

89

4.1.1

Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . .

89

4.1.2

Row equivalence

. . . . . . . . . . . . . . . . . . . . . . . . . . .

92

4.1.3

The main theorem . . . . . . . . . . . . . . . . . . . . . . . . . .

93

4.1.4

Using row operations to find the inverse matrix . . . . . . . . . .

94

4.1.5

Result on inverses . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

4.2.1

Determinant using cofactors . . . . . . . . . . . . . . . . . . . . .

96

4.2.2

Determinant using row operations . . . . . . . . . . . . . . . . . .

102

4.2.3

The determinant of a product . . . . . . . . . . . . . . . . . . . .

106

Matrix inverse using cofactors . . . . . . . . . . . . . . . . . . . . . . . .

107

4.3.1

Cramers rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

110

Application: Leontief input-output analysis . . . . . . . . . . . . . . . . .

111

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

114

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

114

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

117

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

119

4.2

4.3
4.4

5 Rank, range and linear equations

127

Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127

5.1

The rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127

5.2

Rank and systems of linear equations . . . . . . . . . . . . . . . . . . . .

129

5.3

General solution of a linear system in vector notation . . . . . . . . . . .

133

5.4

Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

134

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

137

iii

Contents

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

137

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

139

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

141

6 Sequences, series and difference equations


Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147

6.1

Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147

6.1.1

Sequences in general . . . . . . . . . . . . . . . . . . . . . . . . .

147

6.1.2

Arithmetic progressions . . . . . . . . . . . . . . . . . . . . . . .

148

6.1.3

Geometric progressions . . . . . . . . . . . . . . . . . . . . . . . .

148

6.1.4

Compound interest . . . . . . . . . . . . . . . . . . . . . . . . . .

148

6.1.5

Frequent compounding . . . . . . . . . . . . . . . . . . . . . . . .

148

Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

149

6.2.1

Arithmetic series . . . . . . . . . . . . . . . . . . . . . . . . . . .

150

6.2.2

Geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . .

150

6.2

6.3

Finding a formula for a sequence

. . . . . . . . . . . . . . . . . . . . . .

151

6.4

Limiting behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

151

6.5

Financial applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

152

6.6

First-order difference equations . . . . . . . . . . . . . . . . . . . . . . .

153

6.7

Solving first-order difference equations . . . . . . . . . . . . . . . . . . .

155

6.8

Long-term behaviour of solutions . . . . . . . . . . . . . . . . . . . . . .

156

6.9

The cobweb model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

156

6.10 Financial applications of first-order difference equations . . . . . . . . . .

157

6.11 Homogeneous second-order difference equations . . . . . . . . . . . . . .

158

6.12 Non-homogeneous second-order equations . . . . . . . . . . . . . . . . . .

160

6.13 Behaviour of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

161

6.14 Economic applications of second-order difference equations . . . . . . . .

161

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

163

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

163

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

166

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

167

7 Vector spaces

iv

147

173

Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173

Contents

7.1

Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173

7.1.1

Definition of a vector space . . . . . . . . . . . . . . . . . . . . .

173

7.1.2

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

175

Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

176

7.2.1

An alternative characterisation of a subspace . . . . . . . . . . . .

178

Subspaces connected with matrices . . . . . . . . . . . . . . . . . . . . .

179

7.3.1

Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

179

7.3.2

Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

179

Linear span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

180

7.4.1

Lines and planes in R3 . . . . . . . . . . . . . . . . . . . . . . . .

181

7.4.2

Row space and column space

. . . . . . . . . . . . . . . . . . . .

182

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

182

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

183

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

184

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

187

7.2
7.3

7.4

8 Linear independence, bases and dimension

191

Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

191

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

191

8.1

Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

191

8.2

Testing for linear independence in Rn . . . . . . . . . . . . . . . . . . . .

193

8.3

Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

195

8.3.1

Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

198

8.3.2

Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

199

8.3.3

Dimension and bases of subspaces . . . . . . . . . . . . . . . . . .

200

8.4

Finding a basis for a linear span in Rn . . . . . . . . . . . . . . . . . . .

201

8.5

Basis and dimension of range and null space . . . . . . . . . . . . . . . .

202

8.6

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

205

8.7

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

205

8.8

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . .

207

8.9

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .

210

9 Linear transformations, change of basis

215

Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

215

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

215

9.1

215

Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents

9.2

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

216

9.3

Linear transformations and matrices . . . . . . . . . . . . . . . . . . . .

217

9.3.1

Rotation in R

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

218

9.3.2

Identity and zero linear transformations . . . . . . . . . . . . . .

219

9.3.3

Composition and combinations of linear transformations . . . . .

220

9.3.4

Inverse linear transformations . . . . . . . . . . . . . . . . . . . .

220

9.3.5

Linear transformations from V to W . . . . . . . . . . . . . . . .

221

9.4

Range and null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

222

9.5

Rank and nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

223

9.6

Coordinate change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

224

9.7

Change of basis and similarity . . . . . . . . . . . . . . . . . . . . . . . .

227

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

229

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

230

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

231

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

233

10 Diagonalisation

237

Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

237

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

237

10.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . .

237

10.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

237

10.1.2 Finding eigenvalues and eigenvectors . . . . . . . . . . . . . . . .

237

10.1.3 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

241

10.2 Diagonalisation of a square matrix

. . . . . . . . . . . . . . . . . . . . .

242

10.2.1 Diagonalisation: when can it fail? . . . . . . . . . . . . . . . . . .

245

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

248

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

249

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

250

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

251

11 Applications of diagonalisation

vi

255

Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

255

Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

255

11.1 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

255

11.2 Systems of difference equations . . . . . . . . . . . . . . . . . . . . . . .

257

11.2.1 Systems of difference equations . . . . . . . . . . . . . . . . . . .

257

Contents

11.2.2 Solving by change of variable . . . . . . . . . . . . . . . . . . . .

258

11.2.3 Solving using matrix powers . . . . . . . . . . . . . . . . . . . . .

260

11.2.4 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

261

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

266

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

266

Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . .

268

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

268

A Sample examination paper

275

B Commentary on the Sample examination paper

279

vii

Contents

viii

Preface
This subject guide is not a course text. It sets out a logical sequence in which to study
the topics in this subject. Where coverage in the main texts is weak, it provides some
additional background material. Further reading is essential.
We are very grateful to James Ward and Keith Martin for their careful readings of a
draft of this guide and for their many helpful comments.

Preface

Chapter 1
Introduction
In this very brief introduction, we aim to give you an idea of the nature of this subject
and to advise on how best to approach it. We give general information about the
contents and use of this subject guide, and on recommended reading and how to use the
textbooks.

1.1 This subject


Algebra as studied in this course is primarily what is usually called linear algebra:
the study of matrices, systems of linear equations, eigenvalues and eigenvectors,
diagonalisation of matrices, and related topics. However, we also, in this course, study
sequences, series and difference equations.
Our approach here is not just to help you acquire proficiency in techniques and
methods, but also to understand some of the theoretical ideas behind these. For
example, after completing this course, you will hopefully understand why the number of
free parameters in the set of solutions of a system of linear equations is linked with the
idea of the rank of the matrix that describes the system of equations. In addition to
this, we try to indicate the uses of some of the methods in applications to economics,
finance and related disciplines.

1.1.1

Aims of the course

The broad aims of this course are as follows:


to enable students to acquire skills in the methods of algebra, as required for their
use in further mathematics subjects and economics-based subjects
to prepare students for further courses in mathematics and/or related disciplines.
As emphasised above, however, we do also want you to understand why certain
methods work: this is one of the skills that you should aim to acquire. The examination
will test not simply your ability to perform routine calculations, but will probe your
knowledge and understanding of the fundamental principles underlying the area.

1.1.2

Learning outcomes

We now state the broad learning outcomes of this course, as a whole. More specific
learning outcomes can be found at the end of each chapter.
At the end of this course and having completed the reading and activities you should
have:

1. Introduction

used the concepts, terminology, methods and conventions covered in the course to
solve mathematical problems in this subject
the ability to solve unseen mathematical problems involving understanding of these
concepts and application of these methods
seen how algebra can be used to solve problems in economics and related subjects
the ability to demonstrate knowledge and understanding of the underlying
principles.
There are a couple of things we should stress at this point. First, note the intention that
you will be able to solve unseen problems. This means simply that you will be
expected to be able to use your knowledge and understanding of the material to solve
problems that are not completely standard. This is not something you should worry
unduly about: all mathematics topics expect this, and you will never be expected to do
anything that cannot be done using the material of this course. Second, we expect you
to be able to demonstrate knowledge and understanding and you might well wonder
how you would demonstrate this in the examination. Well, it is precisely by being able
to grapple successfully with unseen, non-routine, questions that you will indicate that
you have a proper understanding of the topic.

1.1.3

Topics covered

Descriptions of topics to be covered appear in the relevant chapters. However, it is


useful to give a brief overview at this stage.
We start by introducing the fundamental objects of study in linear algebra: these are
matrices and vectors. We examine the geometrical interpretation of vectors and use this
to understand how to find the equations of planes, for instance. Then we show how
certain very important types of equations systems of simultaneous linear equations
can be looked at from the perspective of matrices, and we develop systematic
techniques (using, especially row operations) for solving such systems (or detecting
that no solutions exist). Continuing further our study of matrices, we look at the idea of
the inverse of a matrix and show that inverses can be determined in two distinct ways,
one of which uses row operations and another the important notion of the determinant
of a matrix. We then depart from linear algebra to look at sequences, series and
difference equations. This is an important topic in its own right, but also has many
applications to finance and economics. We will see later how linear algebra techniques
can be used to solve certain problems in difference equations, so it is not unrelated to
linear algebra. Next, we discuss basic vector space concepts such as subspaces, linear
independence, bases and dimension, and linear transformations. Finally, we study the
diagonalisation of matrices and some of its many applications.
Throughout, the emphasis is on the theory as much as on the methods. That is to say,
the aim in this course is not only to provide you with useful techniques and methods of
algebra, but to enable you to understand why these techniques work.

1.2. Reading

1.2 Reading
There are many books that would be useful for this subject. We recommend two in
particular, and another for additional, further reading. We want to emphasise, however,
that there are very many books suitable for this course. Almost any text on linear
algebra will cover the material (with the possible exception of the chapter on sequences,
series and difference equations).

1.2.1

Recommended reading

Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. (Cambridge


University Press, 2012) [ISBN 9780521279482].
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. (Cambridge: Cambridge University Press, 1996) [ISBN
9780521551137 (hardback); 9780521559138 (paperback)].

1.2.2

Further reading

Please note that as long as you read the Recommended reading you are then free to
read around the subject area in any text, paper or online resource. You will need to
support your learning by reading as widely as possible. To help you read extensively,
you have free access to the virtual learning environment (VLE) and University of
London Online Library (see below).
Another useful text for this course is the following (though, as mentioned above, there
are many suitable texts):

Anton, H. and C. Rorres. Elementary Linear Algebra (International Student


Version). (John Wiley & Sons (Asia) Plc Ltd, 2010) tenth edition. [ISBN
9780470561577 ].1

Textbooks will provide more in-depth explanations than you will find in this guide, and
they will also provide many more examples to study and exercises to work through. The
books listed are the ones we have referred to in this guide.

1.3 Online study resources


In addition to the subject guide and the Essential reading, it is crucial that you take
advantage of the study resources that are available online for this course, including the
VLE and the Online Library.
You can access the VLE, the Online Library and your University of London email
account via the Student Portal at:
http://my.londoninternational.ac.uk
1

There are many editions and variants of this book, such as the Applications version. Any one is
equally useful and you will not need more than one of them. You can find the relevant sections cited in
this guide in any edition by using the index.

1. Introduction

You should receive your login details in your study pack. If you have forgotten these
login details, please click on the Forgotten your password link on the login page.

1.3.1

The VLE

The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:
Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
Past examination papers and Examiners commentaries: These provide advice on
how each examination question might best be answered.
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years Study Weekends have been recorded and made available.
Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.
Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.

1.3.2

Making use of the Online Library

The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library you will either need to use
your University of London Student Portal login details, or you will be required to
register and use an Athens login:
http://tinyurl.com/ollathens
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine.

1.4. Using the guide

If you are having trouble finding an article listed in a reading list, try removing any
punctuation from the title, such as single quotation marks, question marks and colons.
For further advice, please see the online help pages:
www.external.shl.lon.ac.uk/summon/about.php

1.4 Using the guide


It is highly recommended that you read textbooks in conjunction with the guide and
that you try problems from the textbooks. The exercises at the end of the chapters of
this guide are a very useful resource. You should try them once you think you have
mastered a particular chapter. Really try them: dont just simply read the solutions
provided. Make a serious attempt before consulting the solutions. Note that the
solutions are often just sketch solutions, to indicate to you how to answer the exercises,
but in the examination, you must show all your calculations. It is vital that you
develop and enhance your problem-solving skills and the only way to do this is to try
lots of examples.
The exercises at the end of the chapters are part of the learning process. As such, they
are not all supposed to be sample examination questions. However, a Sample
examination paper is given as the final chapter to this guide. Some chapter exercises
will be easy or routine. Others will be more challenging, and others more challenging
still. These are there to help you learn and understand.

1.5 Examination
Important: the information and advice given here are based on the examination
structure used at the time this guide was written. Please note that subject guides may
be used for several years. Because of this we strongly advise you to always check both
the current Regulations for relevant information about the examination, and the virtual
learning environment (VLE) where you should be advised of any forthcoming changes.
You should also carefully check the rubric/instructions on the paper you actually sit
and follow those instructions. Remember, it is important to check the VLE for:
up-to-date information on examination and assessment arrangements for this course
where available, past examination papers and Examiners commentaries for the
course which give advice on how each question might best be answered.
A Sample examination paper is given as the final chapter to this guide. There are no
optional topics in this subject: you should do them all. This is reflected in the structure
of the examination paper. There are five questions (each worth 20 marks) and all
questions are compulsory.
Please do not think that the questions in a real examination will necessarily be very
similar to those in the Sample examination paper. An examination is designed (by
definition) to test you. You will get examination questions unlike questions in this
guide. The whole point of examining is to see whether you can apply knowledge in
familiar and unfamiliar settings. The Examiners (nice people though they are) have an

1. Introduction

obligation to surprise you! For this reason, it is important that you try as many
examples as possible, from the guide and from the textbooks. This is not so that you
can cover any possible type of question the Examiners can think of! Its so that you get
used to confronting unfamiliar questions, grappling with them, and finally coming up
with the solution.
Do not panic if you cannot completely solve an examination question. There are many
marks to be awarded for using the correct approach or method.

1.6 The use of calculators


You will not be permitted to use calculators of any type in the examination. This is not
something that you should panic about: the examiners are interested in assessing that
you understand the key concepts, ideas, methods and techniques, and will set questions
which do not require the use of a calculator.

1.7 Background
This section discusses some very basic aspects of mathematics on which the rest of the
subject builds. It is essential to have a firm knowledge of these topics before the more
advanced topics can be understood.
It is assumed that you have studied most (if not all) of these topics in previous
mathematics courses and that nearly all of the material is revision, but dont worry if a
topic is new to you. We will mention the main results which you will need to know. If
you are unfamiliar with a topic, or if you find any of the topics difficult, then you should
look up that topic in any basic mathematics text. It is essential for you to consolidate
your background in basic algebra before attempting this course.

1.7.1

Sets

A set may be thought of as a collection of objects. 2 A set is usually described by listing


or describing its members inside curly brackets. For example, when we write
A = {1, 2, 3}, we mean that the objects belonging to the set A are the numbers 1, 2, 3
(or, equivalently, the set A consists of the numbers 1, 2 and 3). Equally (and this is
what we mean by describing its members), this set could have been written as
A = {n | n is a whole number and 1 n 3}.
Here, the symbol | stands for such that. Often, the symbol : is used instead, so that
we might write
A = {n : n is a whole number and 1 n 3}.
As another example, the set
B = {x | x is a reader of this guide}
2

See Anthony and Biggs, Section 2.1.

1.7. Background

has as its members all of you (and nothing else). When x is an object in a set A, we
write x A and say x belongs to A or x is a member of A.

The set which has no members is called the empty set and is denoted by . The empty
set may seem like a strange concept, but it has its uses.
We say that the set S is a subset of the set T , and we write S T , or S T , if every
member of S is a member of T . For example, {1, 2, 5} {1, 2, 4, 5, 6, 40}. The difference
between the two symbols is that S T literally means that S is a proper subset of T ,
meaning not all of T , and S T means that S is a subset of T and possibly (but not
necessarily) all of T . So in the example just given we could have also written,
{1, 2, 5} {1, 2, 4, 5, 6, 40}.

Given two sets A and B, the union A B is the set whose members belong to A or B
(or both A and B): that is,
A B = {x | x A or x B}.

Example 1.1 If A = {1, 2, 3, 5} and B = {2, 4, 5, 7}, then A B = {1, 2, 3, 4, 5, 7}.


Similarly, we define the intersection: A B to be the set whose members belong to both
A and B: 3
A B = {x | x A and x B}.
Activity 1.1 Suppose A = {1, 2, 3, 5} and B = {2, 4, 5, 7}. Find A B.

1.7.2

Numbers

There are some standard notations for important sets of numbers. 4 The set R of real
numbers, may be thought of as the points on a line. Each such number can be described
by a decimal representation.
The set of real numbers R includes the following subsets.
N, the set of natural numbers: N = {1, 2, 3, . . . }, also referred to as the positive
integers.
Z, the set of integers: {. . . , 3, 2, 1, 0, 1, 2, 3, . . .}.
p
2
9 4
with p, q Z, q = 0; for example,
, ,
= 4.
Q, the set of rational numbers:
q
5
2 1
The
set of irrational numbers, that is, real numbers which are not rational; for example,
2, .
These sets are related by: N Z Q R.
Given two real numbers a and b, we define intervals such as,
(a, b) = {x| a < x < b}
3
4

[a, b] = {x| a x b}

See Anthony and Biggs for examples of union and intersection.


See Anthony and Biggs, Section 2.1.

1. Introduction

and combinations of these. For example, [a, b) = {x| a x < b}. The numbers a and b
are called the endpoints of the interval. You should notice that when a square bracket,
[ or ], is used to denote an interval, the number beside the bracket is included in the
interval, whereas if a round bracket, ( or ), is used, the adjacent number is not in the
interval. For example, [2, 3] contains the number 2, but (2, 3] does not. We can also
indicate unbounded intervals, such as
(, b) = {x| x < b}

[a, ) = {x| a x}.

The symbol means infinity, but it is not a real number, merely a notational
convenience.
The absolute value of a real number a is defined by:
|a| =

a
a

if a 0
.
if a 0

So the absolute value of a equals a if a is non-negative (that is, if a 0), and equals a
otherwise. For instance, |6| = 6 and | 2.5| = 2.5. (This is sometimes called the modulus
of a). Roughly speaking, the absolute value of a number is obtained just by ignoring
any minus sign the number has. Note that

a2 = |a|,

since by x we always mean the positive square root to avoid ambiguity. So the
two
solutions of the equation x2 = 4, are x = 2 (meaning x = 2 or x = 2), but 4 = 2.
The absolute value of real numbers satisfies the following inequality,
|a + b| |a| + |b|,

a, b R.

This is called the triangle inequality.


Having defined R, we can define the set R2 of ordered pairs (x, y) of real numbers. Thus
R2 is the set usually depicted as the set of points in a plane, x and y being the
coordinates of a point with respect to a pair of axes. For instance, (1, 3/2) is an
element of R2 lying to the left of and above (0, 0), which is known as the origin.

1.7.3

Basic notations

Although there is a high degree of standardisation of notation within mathematical


texts, some differences do occur. The notation given here is indicative of what is used in
the rest of this guide and in most of the texts. 5 You should endeavour to familiarise
yourself with as many of the common notations as possible. As an example,
multiplication is sometimes denoted by a dot, as in a b rather than a b. Beware of
confusing multiplication and the use of a dot to indicate a decimal point. Even more
commonly, one simply uses ab to denote the multiplication of a and b. Also, you should
be aware of implied multiplications, as in 2(3) = 6.
Some other useful notations are those for sums and factorials. We denote the sum
x1 + x2 + + xn
5

You may consult any of a large number of basic maths texts for further information on basic notations.

10

1.7. Background

of the numbers x1 , x2 , . . . , xn by
n

xi .
i=1

The indicates that numbers are being summed, and the i = 1 and n below and
above the show that it is the numbers xi , as i runs from 1 to n, that are being
summed together. Sometimes we will be interested in adding up only some of the
numbers. For example,
n1

xi
i=2

would denote the sum x2 + x3 + + xn1 , which is the sum of all the numbers except
the first and last.
Example 1.2 Suppose that x1 = 1, x2 = 3, x3 = 1, x4 = 5. Then
4

xi = 1 + 3 + (1) + 5 = 8,
i=1

xi = 3 + (1) + 5 = 7.
i=2

For a positive whole number n, n! (read as n factorial) is the product of all the integers
from 1 up to n. For example, 4! = 1 2 3 4 = 24. By convention 0! is taken to be 1.

Finally, we often use the symbol to denote the end of a proof, where we have finished
explaining why a particular result is true. This is just to make it clear where the proof
ends and the following text begins.

1.7.4

Simple algebra

You should try to become confident and capable in handling simple algebraic
expressions and equations.
You should be proficient in:
collecting up terms: e.g.
2a + 3b a + 5b = a + 8b.
multiplication of variables: e.g.
(a)(b) + (a)(b) 3(a)(b) + (2a)(4b) = ab ab 3ab + 8ab = 3ab.
expansion of bracketed terms: e.g.
(2x 3y)(x + 4y) = 2x2 3xy + 8xy 12y 2 = 2x2 + 5xy 12y 2.
Activity 1.2 Expand (x 1)(x + 1). Then use this to expand (x 1)(x + 1)(x + 2).

11

1. Introduction

1.7.5

Powers

When n is a positive integer, the nth power6 of the number a, denoted an , is simply the
product of n copies of a, that is,
an = a a a a .
n times

The number n is called the power, exponent, or index. We have the power rules (or rules
of exponents):
ar as = ar+s , (ar )s = ars ,
whenever r and s are positive integers.
Activity 1.3
Prove the power rules above using the definition of an for n N.
The power a0 is defined to be 1.
The definition is extended to negative integers as follows. When n is a positive integer,
an means 1/an . For example, 32 is 1/32 = 1/9. The power rules hold when r and s
are any integers, positive, negative or zero.
When n is a positive integer, a1/n is the positive nth root of a; this isthe positive
number x such that xn = a. For example, a1/2 is usually denoted by a, and is the
positive square root of a, so that 41/2 = 2.
m

When m and n are integers and n is positive, am/n is a1/n . This extends the
definition of powers to the rational numbers. The definition is extended to real numbers
by filling in the gaps between the rational numbers, and it can be shown that the rules
of exponents still apply.
Activity 1.4 Simplify the expression:
49x2
4xy 2

.
35y
(2xy)3
Simplify, then solve for a:

1.7.6

a
6ab (b2 4bc) = 1.
b

Quadratic equations

It is straightforward to find the solution of a linear equation, one of the form ax + b = 0


where a, b R. By a solution, we mean a real number x for which the equation is true.
A common problem is to find the set of solutions of a quadratic equation7
ax2 + bx + c = 0,
6
7

See Anthony and Biggs, Section 7.1.


See Anthony and Biggs, Section 2.4.

12

1.7. Background

where we may as well assume that a = 0, because if a = 0 the equation reduces to a


linear one. In some cases the quadratic expression can be factorised, which means that
it can be written as the product of two linear terms. For example
x2 6x + 5 = (x 1)(x 5),
so the equation x2 6x + 5 = 0 becomes (x 1)(x 5) = 0. Now the only way that two
numbers can multiply to give 0 is if at least one of the numbers is 0, so we can conclude
that x 1 = 0 or x 5 = 0; that is, the equation has two solutions, 1 and 5.
Activity 1.5
Use factorisation to find the solutions of each of these equations:
(a) x2 4 = 0,

(b) x2 + 2x 8 = 0,

(c) 2x2 7x + 3 = 0.

Although factorisation may be difficult, there is a general method for determining the
solutions to a quadratic equation using the quadratic formula,8 as follows. Suppose we
have the quadratic equation ax2 + bx + c = 0, where a = 0. Then the solutions of this
equation are:

b b2 4ac
b + b2 4ac
x1 =
x2 =
.
2a
2a
The term b2 4ac is called the discriminant.
If b2 4ac > 0, the equation has two real solutions as given above.
If b2 4ac = 0, the equation has exactly one solution, x = b/(2a). (In this case we
say that this is a solution of multiplicity two.)
If b2 4ac < 0, the equation has no real solutions.

For example, consider the equation 2x2 7x + 3 = 0. Using the quadratic formula, we
have

7 49 4(2)(3)
b b2 4ac
75
x=
=
=
2a
2(2)
4
So the solutions are x = 3 and x = 12 .
The equation x2 + 6x + 9 = 0 has one solution of multiplicity two; its discriminant is
b2 4ac = 36 9(4) = 0. This equation is most easily solved by recognising that
x2 + 6x + 9 = (x + 3)2 , so the solution is x = 3.

On the other hand, consider the quadratic equation x2 2x + 3 = 0; here we have


a = 1, b = 2, c = 3. The quantity b2 4ac < 0, so this equation has no real solution.
(It does have solutions in complex numbers, but this is outside the scope of this
subject.) This is less mysterious than it may seem. We can write the equation as
(x 1)2 + 2 = 0. Rewriting the left-hand side of the equation in this form is known as
completing the square. Now, the square of a number is always greater than or equal to 0,
so the quantity on the left of this equation is always at least 2 and is therefore never
8

See Anthony and Biggs, Section 2.4.

13

1. Introduction

equal to 0. The quadratic formula for the solutions to a quadratic equation is obtained
using the technique of completing the square.9 Quadratic polynomials which cannot be
written as a product of linear terms (so ones for which the discriminant is negative) are
said to be irreducible.
Activity 1.6
Use the method of completing the square on the equation ax2 + bx + c = 0 to obtain
the quadratic formula.

1.7.7

Polynomial equations

A polynomial of degree n in x is an expression of the form,


Pn (x) = a0 + a1 x + a2 x2 + . . . + an xn
where the ai are real constants, an = 0, and x is a real variable. For example, a
quadratic expression such as those discussed above, is a polynomial of degree 2.
In general, a polynomial equation of degree n has at most n solutions. For example,
since
x3 7x + 6 = (x 1)(x 2)(x + 3),
the equation x3 7x + 6 = 0 has three solutions; namely, 1, 2, 3. The solutions of the
equation Pn (x) = 0 are called the roots or zeros of the polynomial. Unfortunately, there
is no general straightforward formula (as there is for quadratics) for the solutions to
Pn (x) = 0 for polynomials Pn of degree larger than 2.
To find the solutions to P (x) = 0 where P is a polynomial of degree n, we use the fact
that if is such that P () = 0, then (x ) must be a factor of P (x). We find such an
a by trial and error and then write P (x) in the form (x )Q(x), where Q(x) is a
polynomial of degree n 1.
As an example, well use this method to factorise the cubic polynomial x3 7x + 6.
Note that if this polynomial can be expressed as a product of linear factors, then it will
be of the form,
x3 7x + 6 = (x r1 )(x r2 )(x r3 )
where its constant term is the product of the roots: 6 = r1 r2 r3 . (To see this, just
substitute x = 0 into both sides of the above equation.) So if there is an integer root, it
will be a factor of 6. We will try x = 1. Substituting this value for x, we do indeed get
1 7 + 6 = 0, so (x 1) is a factor. Then we can deduce that
x3 7x + 6 = (x 1)(x2 + x 6)
for some number , as the coefficient of x2 must be 1 for the product to give x3 , and the
constant term must be 6 so that (1)(6) = 6, the constant term in the cubic. It only
remains to find . This is accomplished by comparing the coefficients of either x2 or x in
the cubic polynomial and the product. The coefficient of x2 in the cubic is 0, and in the
9

See Anthony and Biggs, Section 2.4, if you havent already.

14

1.7. Background

product the coefficient of x2 is obtained from the terms (1)(x2 ) + (x)(x), so that we
must have 1 = 0 or = 1. Then
x3 7x + 6 = (x 1)(x2 + x 6),
and the quadratic term is easily factored into (x 2)(x + 3), that is
x3 7x + 6 = (x 1)(x 2)(x + 3).

Activity 1.7
Given that the polynomial P (x) = x3 + 3x2 + 4x + 4 has an integer root, find it and
hence show that the polynomial can be expressed as a product P (x) = (x r)Q(x)
where Q(x) is an irreducible quadratic polynomial.

1.7.8

Comments on selected activities

Feedback to activity 1.1


A B is the set of objects in both sets, and so A B = {2, 5}.
Feedback to activity 1.2
x2 1; x3 + 2x2 x 2.
Feedback to activity 1.3
We will show the first, and leave the second to you.
ar as = (a a a a) (a a a a) .
r times

s times

Removing the brackets, we have the product of a times itself a total of r + s times; that
is,
ar as = a a a a = ar+s .
r+s times

Feedback to activity 1.4


4xy 2
7
4xy 2
7
1
1
49x2

=
2 = 2
3
2
3
3
2
35y
(2xy)
5x y 8x y
5x y 2x y
xy

7 1

5 2

9
10x2 y

a
(a) 6ab (b2 4bc) = 6ab ab + 4ac = 5ab + 4ac = a(5b + 4c), so the equation
b
becomes a(5b + 4c) = 1, and solving for a:
a=

1
,
5b + 4c

provided 5b + 4c = 0.

Note that it is an important part of the solution to declare that it is only valid if
5b + 4c = 0, otherwise there is no solution (because a 0 = 0 = 1).

15

1. Introduction

Feedback to activity 1.5


(a) x2 4 = (x 2)(x + 2) = 0, with solutions x = 2.
(b) x2 + 2x 8 = (x 2)(x + 4) = 0, so x = 2 or x = 4
(c) 2x2 7x + 3 = (2x 1)(x 3) = 0, so x =

1
2

or x = 3.

Feedback to activity 1.6


We assume that we do have a quadratic equation, so that a = 0. Then dividing by a the
equation becomes
2
b
c
b
c
b2
2
x + x+ = x+
+ 2 = 0.
a
a
2a
a 4a
Then
b
x+
2a

b2
4ac
2.
2
4a
4a

Taking square roots, simplifying and solving for x, you will obtain the quadratic
formula.
Feedback to activity 1.7
Because all the terms are separated by + signs, the integer root must be a negative
number, so try x = 1. Substitution into the polynomial yields, 1 + 3 4 + 4 = 0, so
1 is not a root. Next try x = 2. This time it works, 8 + 3(4) + 4(2) + 4 = 0, so
x3 + 3x2 + 4x + 4 = (x + 2)(x2 + x + 2).
Comparing the coefficients of either x2 or x terms, you should obtain = 1. The
quadratic polynomial x2 + x + 2 cannot be factored over the real numbers, since its
discriminant is negative. Therefore
P (x) = x3 + 3x2 + 4x + 4 = (x + 2)(x2 + x + 2) = (x + 2)Q(x)
where Q(x) is an irreducible quadratic polynomial.

16

Chapter 2
Matrices and vectors

Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 1.
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 14, 15.
Anton, H. and C. Rorres. Elementary Linear Algebra. Sections 1.3, 1.4, Chapter 3.

Aims of the chapter


Matrices will be the main tool in our study of linear algebra, and vectors will be the
main building blocks, so we begin by learning what these are and how to use them. We
first investigate the definitions and properties associated with matrices and learn how to
manipulate them. This will lead us to a study of vectors, which can be viewed as a
certain type of matrix, but which have far reaching applications viewed as elements of a
Euclidean space, Rn . To understand this, we develop our geometric intuition by looking
at vectors in R2 and R3 and use them to obtain equations of familiar geometric objects:
lines and planes.

2.1 What is a matrix?


Definition 2.1 (Matrix) A matrix is a rectangular array of numbers or symbols. It can
be written as

a11 a12 a1n


a21 a22 a2n
A=
.
..
..
..
...
.
.
.
am1 am2 amn
We denote this array by the single letter A or by (aij ), and we say that A has m rows
and n columns, or that it is an m n matrix. We also say that A is a matrix of size
m n.

The number aij in the ith row and jth column is called the (i, j)-entry. Note that the
first subscript on aij always refers to the row and the second subscript to the column.

17

2. Matrices and vectors

Example 2.1 The matrix

2 1 7 8
A = 0 2 5 1
4 9 3 0

is a 3 4 matrix whose entries are integers. For this matrix, a23 = 5, since this is the
entry in the second row and third column.
Activity 2.1 In the example above, what is a32 ?
A square matrix is an n n matrix; that is, a matrix with the same number of rows as
columns. The diagonal of a square matrix is the list of entries a11 , a22 , . . . , ann .
A diagonal matrix is a square matrix with all the entries which are not on the diagonal
equal to zero. So A is diagonal if it is n n and aij = 0 if i = j.

a11
0
.
..
0
Activity 2.2 Which

3
0
0

0
a22
..
.
0

0
0
.
..
..
.
.
ann

of these matrices are diagonal?

0 0
0 0 0
0 1 0 ,
2 1 ,
0 1
0 0 2

2 0 0
0 1 0

Two matrices are equal if they are the same size and if corresponding entries are equal.
That is, if A = (aij ) and B = (bij ) are both m n matrices, then
A = B aij = bij

1 i m, 1 j n

(where we use the symbol to mean if and only if).

2.2 Matrix addition and scalar multiplication


If A and B are two matrices, then provided they are the same size we can add them
together to form a new matrix A + B. We define A + B to be the matrix whose entries
are the sums of the corresponding entries in A and B.
Definition 2.2 (Addition) If A = (aij ) and B = (bij ) are both m n matrices, then
A + B = (aij + bij )

18

1 i m, 1 j n

2.3. Matrix multiplication

We can also multiply any matrix by a real number, called a scalar in this context. If
is a scalar and A is a matrix, then A is the matrix whose entries are times each of
the entries of A.
Definition 2.3 (Scalar multiplication) If A = (aij ) is an m n matrix and R,
then
A = (aij )
1 i m, 1 j n
Example 2.2
A+B =

3 1 2
0 5 2

2A = 2

1 1 4
2 3 1

3 1 2
0 5 2

2 2 6
2 2 1

6 2 4
0 10 4

2.3 Matrix multiplication


Is there a way to multiply two matrices together? The answer is sometimes, depending
on the sizes of the matrices. If A and B are matrices such that the number of columns
of A is equal to the number of rows of B, then we can define a matrix C which is the
product of A and B. We do this by saying what the entry cij of the product matrix AB
should be.
Definition 2.4 (Matrix multiplication) If A is an m n matrix and B is an n p
matrix then the product is the matrix AB = C = (cij ) with
cij = ai1 b1j + ai2 b2j + + ain bnj .
Although this formula looks daunting, it is quite easy to use in practice. What it says is
that the element in row i and column j of the product is obtained by taking each entry
of row i of A and multiplying it by the corresponding entry of column j of B, then
adding these n products together.

b1j

b2j

row i of A
..
ai1 ai2 ain

bnj

column j of B

What size is C = AB? The matrix C must be m p since it will have one entry for
each of the m rows of A and each of the p columns of B.
Example 2.3 In the following product the element in row 2 and column 1 of the
product matrix (indicated in bold type) is found, as described above, by using the

19

2. Matrices and vectors

row and column printed in bold type.

1 1 1
3
4
3 0
2 0 1

= 5
AB =
1
1
1 14
1 2 4
1 3
2 2 1
9 1
This entry is 5 because

(2)(3) + (0)(1) + (1)(1) = 5.


Notice the sizes of the three matrices. A is 4 3, B is 3 2, and the product AB is
4 2.
We shall see in later chapters that this definition of matrix multiplication is exactly
what is needed for applying matrices in our study of linear algebra.
It is an important consequence of this definition that:
AB = BA in general. That is, matrix multiplication is not commutative.
To see just how non-commutative matrix multiplication is, lets look at some examples,
starting with the two matrices A and B in the example above. The product AB is
defined, but the product BA is not even defined. Since A is 4 3 and B is 3 2 it is
not possible to multiply the matrices in the order BA.
Now consider the matrices
A=

2 1 3
1 2 1

and

3 1
B = 1 0.
1 1

Both products AB and BA are defined, but they are different sizes, so they cannot be
equal. What sizes are they?
Activity 2.3 Answer the question just posed concerning the sizes of AB and BA.
Multiply the matrices to find the two product matrices, AB and BA.
Even if both products are defined and the same size, it is still generally true that
AB = BA.
Activity 2.4 Try this for any two 2 2 matrices. Write down two different matrices
A and B and find the products AB and BA. For example, you could use
A=

20

1 2
3 4

B=

1 1
0 1

2.4. Matrix algebra

2.4 Matrix algebra

Matrices are useful because they provide a compact notation and we can do algebra
with them.
For example, given a matrix equation such as
3A + 2B = 2(B A + C),
we can solve this for the matrix C using the rules of algebra. You must always bear in
mind that to perform the operations, they must be defined. In this equation it is
understood that all the matrices A, B and C are the same size, say m n.

We list the rules of algebra satisfied by the operations of addition, scalar multiplication
and matrix multiplication. The sizes of the matrices are dictated by the operations
being defined.
A + B = B + A. Matrix addition is commutative.
This is easily shown to be true. We will carry out the proof as an example. The matrices
A and B must be of the same size, say m n for the operation to be defined, so both
A + B and B + A are also m n matrices. They also have the same entries. The (i, j)
entry of A + B is aij + bij and the (i, j) entry of B + A is bij + aij , but aij + bij = bij + aij
by the properties of real numbers. So the matrices A + B and B + A are equal.
On the other hand, as we have seen, matrix multiplication is not commutative:
AB = BA in general.
We have the following associative laws:
(A + B) + C = A + (B + C)
(AB) = (A)B = A(B)
(AB)C = A(BC)
These rules allow us to remove brackets. For example the last rule says that we will get
the same result if we first multiply AB and then multiply by C on the right, as we will
if we first multiply BC and then multiply by A on the left, so the choice is ours.
All these rules follow from the definitions of the operations in the same way as we
showed the commutativity of addition. We need to know that the matrices on the left
and on the right of the equals sign have the same size and that corresponding entries
are equal. Only the associativity of multiplication presents any complications; it is
tedious, but it can be done.
Activity 2.5 Think about these rules. What sizes are each of the matrices?
Write down the (i, j) entry for each of the matrices (AB) and (A)(B) and prove
that the matrices are equal.

21

2. Matrices and vectors

Similarly, we have three distributive laws:


A(B + C) = AB + AC

(B + C)A = BA + CA
(A + B) = A + B.
Why do we need both of the first two rules (which state that matrix multiplication
distributes through addition)? Since matrix multiplication is not commutative, we
cannot conclude the second distributive rule from the first; we have to prove it is true
separately. All these statements can be proved from the definitions using the same
technique as used earlier, but we will not take the time to do this here.
A A? We obtain an m n matrix all of whose entries are 0. This is an additive
identity: that is, it plays the same role for matrices as the number 0 does for numbers,
in the sense that A + 0 = 0 + A = A. There is a zero matrix of any size m n.

Definition 2.5 (Zero matrix) A zero matrix, denoted 0, is an m n matrix with all
entries zero,

0 0 0 0
0 0 0 0
. . .

. . ... ... .
.. ..
0 0 0 0
Then
A+0=A
AA=0
0A = 0 , A0 = 0,
where the sizes of the zero matrices above must be compatible with the size of the
matrix A.
We also have a multiplicative identity, which acts like the number 1 does for
multiplication of numbers.
Definition 2.6 (Identity matrix) The n n
the diagonal matrix with aii = 1,

1 0
0 1
I=
... ...
0 0

identity matrix, denoted In or simply I, is

0
0
. .
..
. ..
1

If A is any m n matrix, then,


AI = A and IA = A,
where it is understood that the identity matrix is the appropriate size for the product
to be defined.

22

2.5. Matrix inverses

Activity 2.6 What size is the identity matrix if A is m n and IA = A?

Example 2.4 We can apply these rules to solve for C the equation,
3A + 2B = 2(B A + C).
We will pedantically apply each rule so that you can see how it is being used. In
practice, you dont need to put in all these steps, just implicitly use the rules of
algebra. We begin by removing the brackets using the distributive rule.
3A + 2B = 2B 2A + 2C

(distributive rule)

3A + 2B 2B = 2B 2A + 2C 2B

(commutativity and associativity of addition,


additive identity)

3A + 0 = 2A + 2C + 0
3A = 2A + 2C

(additive identity)

3A + 2A = 2A + 2C + 2A
5A = 2C
C =

5
A
2

(add 2B to both sides)

(add 2A to both sides)

(commutativity and associativity of addition,


additive identity)
(scalar multiplication)

2.5 Matrix inverses


If AB = AC, can we conclude that B = C? The answer is no, as the following example
shows.
Example 2.5 If
A=

0 0
1 1

B=

1 1
3 5

C=

8 0
4 4

then the matrices B and C are not equal, but


AB = AC =

0 0
4 4

Activity 2.7 Check this by multiplying out the matrices.


On the other hand, If A + 5B = A + 5C, then we can conclude that B = C because the
operations of addition and scalar multiplication have inverses. If we have a matrix A,

23

2. Matrices and vectors

then the matrix A = (1)A is an additive inverse because it satisfies A + (A) = 0. If


we multiply a matrix A by a non-zero scalar c we can undo this by multiplying cA by
1/c.
What about matrix multiplication, is there a multiplicative inverse? The answer is
sometimes.
Definition 2.7 (Inverse matrix) The n n matrix A is invertible if there is a matrix
B such that
AB = BA = I
where I is the n n identity matrix. The matrix B is called the inverse of A and is
denoted by A1 .
Notice that the matrix A must be square, and that both I and B = A1 must also be
square n n matrices for the products to be defined.
Example 2.6 Let
A=
Then with
B=

1 2
3 4
2
3
2

1
12

we have AB = BA = I, so B = A1 .
Activity 2.8 Check this. Multiply the matrices to show that AB = I and BA = I,
where I is the 2 2 identity matrix.
You might have noticed that we have said that B is the inverse of A. This is because an
invertible matrix has only one inverse. We will prove this.
Theorem 2.1 If A is an n n invertible matrix, then the matrix A1 is unique.
Proof
Assume the matrix A has two inverses, B and C, so that AB = BA = I and
AC = CA = I. We will show that B and C must actually be the same matrix, that is,
they are equal. Consider CAB. Since matrix multiplication is associative and AB = I,
we have
CAB = C(AB) = CI = C.
On the other hand, again by associativity,
CAB = (CA)B = IB = B,
since CA = I. We conclude that C = B, so there is only one inverse matrix of A.
Not all square matrices will have an inverse. We say that A is invertible or non-singular
if it has an inverse. We say that A is non-invertible or singular if it has no inverse.

24

2.5. Matrix inverses

0 0
1 1
possible for a matrix to satisfy
For example, the matrix

(used in Example 2.5) is not invertible. It is not

0 0
1 1

a b
c d

1 0
0 1

since the (1,1)-entry of the product is 0 and 0 = 1.


On the other hand, if
A=

a b
c d

ad bc = 0,

where

then A has the inverse


A1 =

1
ad bc

d b
c a

Activity 2.9 Check that this is indeed the inverse of A, by showing that if you
multiply A on the left or on the right by this matrix, then you obtain the identity
matrix I.
This tells us how to find the inverse of any 2 2 invertible matrix. If
A=

a b
c d

the scalar ad bc is called the determinant of the matrix A, denoted |A|. We shall see
more about the determinant in Chapter 4. So if |A| = ad bc = 0, then to construct
A1 we take the matrix A, switch the main diagonal entries and put minus signs in
front of the other two entries, then multiply by the scalar 1/|A|.
Activity 2.10

Use this to find the inverse of the matrix


B=

1 2
3 4

and check your answer by looking at Example 2.6 on page 24.


If AB = AC, and A is invertible, can we conclude that B = C? This time the answer is
yes, because we can multiply each side of the equation on the left by A1 :
A1 AB = A1 AC

IB = IC

B = C.

But be careful, if AB = CA then we cannot conclude that B = C, only that


B = A1 CA.
It is not possible to divide by a matrix. We can only multiply on the right or left by
the inverse matrix.

25

2. Matrices and vectors

2.5.1

Properties of the inverse

If A is an invertible matrix, then by definition A1 exists and AA1 = A1 A = I. This


statement also says that the matrix A is the inverse of A1 , that is,
(A1 )1 = A.
It is important to understand the definition of an inverse matrix and be able to use it.
Basically, if we can find a matrix that works in the definition, then that matrix is the
inverse, and the matrices are invertible. For example, if A is an invertible n n matrix,
then
(A)1 =

1 1
A .

This statement says that the matrix A is invertible, and its inverse is given by the
matrix C = (1/)A1 . To prove this is true, we just need to show that the matrix C
satisfies (A)C = C(A) = I. This is straightforward using matrix algebra:
(A)

1 1
A

1
= AA1 = I

and

1 1
1
A
(A) = A1 A = I.

If A and B are invertible n n matrices, then using the definition of the inverse, you
can show that
(AB)1 = B 1 A1 .
This last statement says that if A and B are invertible matrices of the same size, then
the product AB is invertible and its inverse is the product of the inverses in the reverse
order. The proof of this statement is left as an exercise at the end of this chapter.

2.6 Powers of a matrix


If A is a square matrix, what do we mean by A2 ? We naturally mean the product of A
with itself, A2 = AA. In the same way, if A is an n n matrix and r N, then
Ar = A A . . . A .
r times

2.6.1

Properties of powers

If A is an n n matrix and r N, then


(Ar )1 = (A1 )r .
This follows immediately from the definition of inverse matrix and the associativity of
matrix multiplication. Think about what it says; that the inverse of the product of A
times itself r times, is the product of A1 times itself r times.

26

2.7. Transpose

The usual rules of exponents hold: for integers r, s:


Ar As = Ar+s

(Ar )s = Ars .
As r and s are positive integers and matrix multiplication is associative, these
properties are easily verified in the same way as they are with real numbers.
Activity 2.11

Verify the above three properties.

2.7 Transpose
Definition 2.8 (Transpose) The transpose

a11
a21
A = (aij ) =
...
am1
is the n m matrix

of an m n matrix

a12 . . . a1n
a22 . . . a2n
..
..
..
.
.
.
am2 . . . amn

a11 a21 . . . am1


a12 a22 . . . am2
AT = (aji ) =
..
..
..
...
.
.
.
a1n a2n . . . amn
which is obtained from A by interchanging rows and columns.

That is, row i of A becomes column i of AT .


Example 2.7 If
A=

1 2
3 4

and B = ( 1 5 3 ) ,

then
AT =

1 3
2 4


1
BT = 5 .
3

Notice that the diagonal entries of a square matrix do not move under the operation of
taking the transpose, as aii remains aii . So if D is a diagonal matrix, then D T = D.
Properties of transpose
If we take the transpose of a matrix A by switching the rows and columns, and then do
it again, we get back to the original matrix A. This is summarised in the following
equation:
(AT )T = A.

27

2. Matrices and vectors

Two further properties relate to scalar multiplication and addition:


(A)T = AT and

(A + B)T = AT + B T .
These follow immediately from the definition. In particular, the (i, j) entry of (A)T is
aji which is also the (i, j) entry of AT .
The next property tells you what happens when you take the transpose of a product of
matrices:
(AB)T = B T AT
This can be stated as: The transpose of the product of two matrices is the product of the
transposes in the reverse order.
Showing that this is true is slightly more complicated since it involves matrix
multiplication. It is more important to understand why the product of the transposes
must be in the reverse order.
Activity 2.12 If A is an m n matrix and B is n p, look at the sizes of the
matrices AT , B T , (AB)T and show that only the product B T AT is always defined.
Show also that its size is equal to the size of (AB)T .
If A is an m n matrix and B is n p, from the above activity you know that (AB)T
and B T AT are the same size. To prove that (AB)T = B T AT you need to show that the
(i, j)-entries are equal. You can try this as follows.
Activity 2.13 The (i, j) entry of (AB)T is the (j, i) entry of AB, which is obtained
by taking row j of A and multiplying each term by the corresponding entry of
column i of B. We can write this as
(AB)T

ij

= aj1 b1i + aj2 b2i + . . . + ajn b1n .

Do the same for the (i, j) entry of B T AT and show that you obtain the same
number.
The final property in this section states that the inverse of the transpose of an invertible
matrix is the transpose of the inverse; that is, if A is invertible, then
(AT )1 = (A1 )T .
This follows from the previous property and the definition of inverse. We have
AT (A1 )T = (A1 A)T = I T = I, and in the same way (A1 )T AT = I. Therefore, by
the definition of the inverse of a matrix, (A1 )T must be the inverse of AT .

2.7.1

Symmetric matrices

Definition 2.9 (Symmetric matrix) A matrix A is symmetric if it is equal to its


transpose, A = AT .

28

2.8. Vectors in Rn

Only square matrices can be symmetric. If A is symmetric, then aij = aji . That is,
entries diagonally opposite to each other must be equal: the matrix is symmetric about
its diagonal.
Activity 2.14

Fill in the missing numbers if the matrix A is symmetric:

1 4
1
=
A=
2
7 = AT
5
3

If D is a diagonal matrix, then dij = 0 = dji for all i = j. So as we saw above, D T = D;


that is, all diagonal matrices are symmetric.

2.8 Vectors in Rn
An n 1 matrix is a column vector, or simply a vector

v1
v2

v=
vi R
...
vn

We can also define a row vector to be a 1 n matrix. However, in this text, by the term
vector we shall always mean a column vector.
The numbers v1 , v2 , . . . , vn , are known as the components (or entries) of the vector, v.

In order to distinguish vectors from scalars, and to emphasise that they are vectors and
not general matrices, in this text vectors are written in lowercase boldface type. (When
writing by hand, vectors should be underlined to avoid confusion with scalars.)
Addition and scalar multiplication are defined for vectors as for n 1 matrices:

v1 + w1
v1
v2 + w2
v2

v+w=
v =
..

...
.
vn + wn
vn

For a fixed positive integer n, the set of vectors together with the operations of addition
and scalar multiplication form Rn , usually called Euclidean n-space.
We will often write a column vector in the text as the transpose of a row vector.
Although

x1
x2
T

x=
... = ( x1 x2 xn ) ,
xn

we will usually write x = (x1 , x2 , , xn )T , with commas separating the entries. A


matrix does not have commas, however we will use the commas in order to clearly
distinguish the separate components of the vector.

29

2. Matrices and vectors

For vectors v1 , v2 , . . . , vk in Rn and scalars 1 , 2 , . . . , k in R, the vector

v = 1 v1 + + k vk

Rn

is known as a linear combination of the vectors v1 , . . . , vk .


A zero vector, denoted 0, is a vector with all of its entries equal to 0. There is one zero
vector in each space Rn . As with matrices. this vector is an additive identity. For any
vector v Rn , 0 + v = v + 0 = v and multiplying v by the scalar zero results in the
zero vector, 0v = 0.
Although the matrix product of two vectors v and w in Rn cannot be calculated, it is
possible to form the matrix products vT w and vwT . The first is a 1 1 matrix, and the
latter is an n n matrix.
Activity 2.15

Calculate aT b and abT for

1
4
a = 2 , and b = 2 .
3
1

The 1 1 matrix vT w can be identified with the real number, or scalar, which is its
unique entry. This turns out to be particularly useful, and is known as the inner product
or scalar product or dot product of v and w.

2.8.1

The inner product of two vectors

Definition 2.10 (Inner product) Given two vectors:


v1
w1
v2

, w = w.2 ,
v=
.
..
..
vn
wn

the inner product denoted v, w , is the real number given by


v1
w1
v2 w2

v, w =
... , ... = v1 w1 + v2 w2 + + vn wn
vn

wn

The inner product, v, w is also known as the scalar product of v and w, or as the dot
product. In the latter case it is denoted by v w.
The inner product of v and w is precisely the scalar quantity given by

w1
w2

vT w = ( v1 v2 vn )
... = v1 w1 + v2 w2 + + vn wn ,
wn

so that we can write

v, w

30

= vT w.

2.8. Vectors in Rn

Example 2.8 If x = (1, 2, 3)T and y = (2, 1, 1)T then

x, y = 1(2) + 2(1) + 3(1) = 3.

It is important to realise that the inner product is just a number, a scalar, not another
vector or a matrix.
The inner product on Rn satisfies certain basic properties as shown in the next theorem.
Theorem 2.2 The inner product
x, y = x1 y1 + x2 y2 + + xn yn ,

x, y Rn

satisfies the following properties for all x, y, z Rn and for all R:


(i)

x, y = y, x

(ii) x, y = x, y = x, y
(iii)

x + y, z = x, z + y, z

(iv)

x, x 0 and x, x = 0 if and only if x = 0.

Proof
We have
x, y = x1 y1 + x2 y2 + + xn yn = y1 x1 + y2 x2 + + yn xn = y, x
which proves (i). We leave the proofs of (ii) and (iii) as an exercise. For (iv), note that
x, x = x21 + x22 + + x2n

is a sum of squares, so x, x 0, and x, x = 0 if and only if each term x2i is equal to


zero, that is, if and only if each xi = 0, so x is the zero vector, x = 0.
Activity 2.16 Prove properties (ii) and (iii). Show, also, that these two properties
are equivalent to the single property
x + y, z = x, z + y, z .

From the definitions, it is clear that it is not possible to combine vectors in different
Euclidean spaces, either by addition or by taking the inner product. If v Rn and
w Rm , with m = n, then these vectors live in different worlds, or more precisely, in
different vector spaces.
Vectors have a broader use beyond that of being special types of matrices. It is likely
that you have some previous knowledge of vectors; for example, in describing the
displacement of an object from one point to another in R2 or in R3 . Before we continue
our study of linear algebra it is important to consolidate this background, for it
provides valuable geometric insight into the definitions and uses of vectors in higher
dimensions. Parts of the next section may be review for many of you.

31

2. Matrices and vectors

2.9 Developing geometric insight vectors in R2 and


R3
2.9.1

Vectors in R2

The set R can be represented as points along a horizontal line, called a real-number line.
In order to represent pairs of real numbers, (a1 , a2 ), we use a Cartesian plane, a plane
with both a horizontal axis and a vertical axis, each axis being a copy of the
real-number line, and we mark A = (a1 , a2 ) as a point in this plane. We associate this
point with the vector a = (a1 , a2 )T , as representing a displacement from the origin (the
point (0, 0)) to the point A. In this context, a is the position vector of the point A. This
displacement is illustrated by an arrow, or directed line segment, with initial point at
the origin and terminal point at A.

y
a2

r
(a1 , a2 )

(0, 0)

a1

position vector, a
Even if a displacement does not begin at the origin, two displacements of the same
length and the same direction are considered to be equal. So, for example, the two
arrows below represent the same vector, v = (1, 2)T .

(0, 0)

displacement vectors, v
If an object is displaced from a point, say O, the origin, to a point P by the
displacement p, and then displaced from P to Q, by the displacement v, then the total
displacement is given by the vector from O to Q, which is the position vector q. So we
would expect vectors to satisfy q = p + v, both geometrically (in the sense of a
displacement) and algebraically (by the definition of vector addition). This is certainly
true in general, as illustrated below.

32

2.9. Developing geometric insight vectors in R2 and R3

2
q2
p2

p1

(0, 0)

q1

If v = (v1 , v2 )T , then q1 = p1 + v1 and q2 = v2 + p2 .


The order of displacements does not matter, (nor does the order of vector addition) so
also q = v + p. For this reason the addition of vectors is said to follow the parallelogram
law.

p
p

(0, 0)

p + v = v + p.
From q = p + v, we have v = q p. This is the displacement from P to Q. To help you
determine in which direction the vector v points, think of v = q p as the vector which
is added to the vector p in order to obtain the vector q.
If v represents a displacement, then 2v must represent a displacement in the same
direction, but twice as far, and v represents an equal displacement in the opposite
direction. This interpretation is compatible with the definition of scalar multiplication.
Activity 2.17 Sketch the vector v = (1, 2)T in a coordinate system. Then sketch 2v
and v. Looking at the coordinates on your sketch, what are the components of 2v
and v?
We have stated that a vector has both a length and a direction. Given a vector
a = (a1 , a2 )T , its length, denoted by a , can be calculated using Pythagoras theorem
applied to the right triangle shown below:

33

2. Matrices and vectors

(a1 , a2 )

a2

a1

(0, 0)

So the length of a is the scalar quantity


a21 + a22 .

a =

The length of a vector can be expressed in terms of the inner product,


a, a ,

a =

simply because a, a = a21 + a22 . A unit vector is a vector of length 1.


Example 2.9 If v = (1, 2)T , then v =
The vector u =
Activity 2.18

1 , 2
5
5

12 + 22 =

5.

is a unit vector in the same direction as v.

Check this. Calculate the length of u.

The direction of a vector is essentially given by the components of the vector. If we have
two vectors a and b which are (non-zero) scalar multiples, say
a = b,

R,

( = 0),

then a and b are parallel. If > 0 then a and b have the same direction. If < 0 then
we say that a and b have opposite directions.
The zero vector, 0, has length 0 and has no direction. For any other vector, v = 0, there
is one unit vector in the same direction as v, namely
u=

1
v.
v

Activity 2.19 Write down a unit vector, u, which is parallel to the vector
4
a=
. Then write down a vector, w, of length 2 which is in the opposite
3
direction to a.

2.9.2

Inner product

The inner product in R2 is closely linked with the geometric concepts of length and
angle. If a = (a1 , a2 )T , we have already seen that
a

34

= a, a = a21 + a22 .

2.9. Developing geometric insight vectors in R2 and R3

Let a, b be two vectors in R2 , and let denote the angle between them.1 By this we
shall always mean the angle, , such that 0 . If < , the vectors a, b, and
c = b a form a triangle, where c is the side opposite the angle , as, for example, in
the figure below.

c = b a
a

The law of cosines (which you may or may not know dont worry if you dont)
applied to this triangle gives us the important relationship stated in the following
theorem.
Theorem 2.3 Let a, b R2 and let denote the angle between them. Then
a, b = a

b cos .

Proof
The law of cosines states that c2 = a2 + b2 2ab cos where c = b a , a = a ,
b = b . That is,
b a 2 = a 2 + b 2 2 a b cos
(1)
Expanding the inner product and using its properties, we have
ba

= b a, b a = b, b + a, a 2 a, b

That is,
ba

= a

+ b

2 a, b

(2)

Comparing equations (1) and (2) above, we conclude that


a, b = a

b cos .

This theorem has many geometrical consequences.


For example, we can use it to find the angle between two vectors by using
cos =

a, b
.
a b

Angles are always measured in radians, not degrees, here. So, for example 45 degrees is /4 radians.
See the texts, or the subject guide for MT1174 Calculus

35

2. Matrices and vectors

Example 2.10

Let
v=

1
2

and w =

3
1

and let be the angle between them. Then


5
1
cos = = ,
5 10
2

so that =

.
4

Since
a, b = a

b cos ,

and 1 cos 1 for any real number , the maximum value of the inner product is
a, b = a b . This occurs precisely when cos = 1, that is, when = 0. In this case
the vectors a and b are parallel and in the same direction. If they point in opposite
directions, then = and we have a, b = a b . The inner product will be
positive if and only if the angle between the vectors is acute, meaning that 0 < 2 . It
will be negative if the angle is obtuse, meaning that 2 < .
The non-zero vectors a and b are orthogonal (or perpendicular or, sometimes, normal)
when the angle between them is = 2 . Since cos( 2 ) = 0, this is precisely when their
inner product is zero. We restate this important fact:
The vectors a and b are orthogonal if and only if a, b = 0.

2.9.3

Vectors in R3

Everything we have said so far about the inner product and its geometric interpretation
in R2 extends to R3 .

a1
If
a = a2 ,
then
a = a21 + a22 + a23 .
a3
Activity 2.20 Show this. Sketch a position vector a = (a1 , a2 , a3 )T in R3 . Drop a
perpendicular to the xy-plane as in the figure below, and apply Pythagoras theorem
twice to obtain the result.
z
(a1 , a2 , a3 )

PP

PP
PP

(a1 , a2 , 0)

36

2.10. Lines

The vectors a, b and c = b a in R3 lie in a plane and the law of cosines can still be
applied to establish the result that
a, b = a

b cos .

Activity 2.21 Calculate the angles of the triangle with sides a, b, c and show it is
an isosceles right triangle, where

1
1
a = 2
b= 1
c =ba
2
4

2.10 Lines
2.10.1

Lines in R2

In R2 , a line is given by a single Cartesian equation, such as y = ax + b, and as such, we


can draw a graph of the line in the xy-plane. This line can also be expressed as a single
vector equation with one parameter. To see this, look at the following examples.
Example 2.11 Consider the line y = 2x. Any point (x, y) on this line must satisfy
this equation, and all points that satisfy the equation are on this line.
y

(0, 0)

The line y = 2x. The vector shown is v = (1, 2)T .


Another way to describe the points on the line is by giving their position vectors. We
can let x = t where t is any real number. Then y is determined by y = 2x = 2t. So if
x = (x, y)T is the position vector of a point on the line, then
x=

t
2t

=t

1
2

= tv ,

t R.

For example, if t = 2, we get the position vector of the point (2, 4) on the line, and if
t = 1 we obtain the point (1, 2). As the parameter t runs through all real
numbers, this vector equation gives the position vectors of all the points on the line.
Starting with the vector equation,
x=

x
y

= tv = t

1
2

tR

37

2. Matrices and vectors

we can retrieve the Cartesian equation using the fact that the two vectors are equal
if and only if their components are equal. This gives us the two equations x = t and
y = 2t. Eliminating the parameter t between these two equations yields y = 2x.
The line in the above example is a line through the origin. What about a line which
does not contain (0, 0)?
Example 2.12 Consider the line y = 2x + 1. Proceeding as above, we set x = t,
t R. Then y = 2x + 1 = 2t + 1, so the position vector of a point on this line is
given by
x=

t
2t + 1

0
1

t
2t

0
1

1
2

+t

t R.

(0, 0)

The line y = 2x + 1. The vector shown is v = (1, 2)T .


We can interpret this as follows. To locate any point on the line, first locate one
particular point which is on the line, for example the y intercept, (0, 1). Then the
position vector of any point on the line is a sum of two displacements, first going to
the point (0, 1) and then going along the line, in a direction parallel to the vector
v = (1, 2)T . It is important to notice that in this case the actual position vector of a
point on the line does not lie along the line. Only if the line goes through the origin
will that happen.
Activity 2.22 Sketch the line y = 2x + 1 and the position vector q of the point
(3, 7) which is on this line. Then express q as the sum of two vectors, q = p + tv
where p = (0, 1)T and v = (1, 2)T for some t R and add these vectors to your
sketch.
In the vector equation, any point on the line can be used to locate the line, and any
vector parallel to the direction vector, v, can be used to give the direction. So, for
example,
x
1
2
=
+s
, sR
y
3
4
is also a vector equation of this line.

38

2.10. Lines

If q = (3, 7)T , what is s in this expression of the line?

Activity 2.23

As before, we can retrieve the Cartesian equation of the line by equating components of
the vector and eliminating the parameter.
Do this for each of the vector equations given above for the line

Activity 2.24
y = 2x + 1.

In general, any line in R2 is given by a vector equation with one parameter of the form
x = p + tv
where x is the position vector of a point on the line, p is any particular point on the
line and v is the direction of the line.
Activity 2.25 Write down a vector equation of the line through the points
P = (1, 1) and Q = (3, 2). What is the direction of this line? Find a value for c
such that the point (7, c) is on the line.
In R2 , two lines are either parallel or intersect in a unique point.
Example 2.13
1 :

x
y

The lines 1 and 2 , given by


=

1
3

1
2

+t

x
y

2 :

5
6

+t

2
1

, tR

are not parallel, since their direction vectors are not scalar multiples of one another.
Therefore they intersect in a unique point. We can find this point either by finding
the Cartesian equation of each line and solving the equations simultaneously, or
using the vector equations. We will do the latter. We are looking for a point (x, y) on
both lines, so its position vector will satisfy
x
y

1
3

1
2

+t

5
6

+s

2
1

for some t R and for some s R. We need to use different symbols (s and t) in the
equations because they are unlikely to be the same number for each line. We are
looking for values of s and t which will give us the same point. Equating components
of the position vectors of points on the lines, we have
1 + t = 5 2s
3 + 2t = 6 + s

2s + t = 4
2s + t = 4

.
s + 2t = 3
2s + 4t = 6

Adding these last two equations, we obtain t = 2, and therefore s = 1. Therefore the
point of intersection is (3, 7):
1
3

+2

1
2

3
7

5
6

+1

2
1

What is the angle of intersection of these two lines? Since


1
2

2
1

=0

the lines are perpendicular.

39

2. Matrices and vectors

2.10.2

Lines in R3

How can you describe a line in R3 ? Think about this. How do you describe the set of
points (x, y, z) which are on a given line?
Because there are three variables involved, the natural way is to use a vector equation.
To describe a line you locate one point on the line by its position vector, and then
travel along from that point in a given direction, or in the opposite direction.

(0, 0, 0)

A line in R3
Therefore, a line in R3 is given by a vector equation with one parameter,
x = p + tv,
where x is the position vector of any point on the line, p is the position vector of one
particular point on the line and v is the direction of the line,


x
p1
v1

x = y = p2 + t v2 , t R,
(2.10.2)
z
p3
v3
The equation, x = tv represents a parallel line through the origin.
Example 2.14 The equations

1
1
3
3
x = 3 + t 2 and x = 7 + s 6 ,
0
1
2
3

s, t R

describe the same line. This is not obvious, so how do we show it?
The lines represented by these equations are parallel since their direction vectors are
parallel

3
1
6 = 3 2 ,
3
1
so they either have no points in common and are parallel, or they have all points in
common, and are really the same line. Since

3
1
1
7 = 3 + 2 2 ,
2
0
1

40

2.10. Lines

the point (3, 7, 2) is on both lines, so they must have all points in common. We say
that the lines are collinear.
On the other hand, the lines represented by the equations

1
1
3
3
x = 3 + t 2 and x = 7 + t 6 ,
0
1
1
3

2
tR

are parallel, with no points in common, since there is no value of t for which

3
1
1
7 = 3 + t 2 .
1
0
1
Activity 2.26

Verify this last statement.

Now try the following.


Activity 2.27 Write down a vector equation of the line through the points
P = (1, 1, 2) and Q = (3, 2, 1). What is the direction of this line?
Is the point (7, 1, 3) on this line? Suppose you want a point on this line of the form
(c, d, 3). Find one such point. How many choices do you actually have for the values
of c and d?
We can also describe a line in R3 by Cartesian equations, but this time we need two
such equations because there are three variables. Equating components in the vector
equation 2.10.2 above, we have
x = p1 + tv1 ,

y = p2 + tv2 ,

z = p3 + tv3 .

Solving each of these equations for the parameter t and equating the results, we have
the two equations
x p1
y p2
z p3
=
=
, provided vi = 0, i = 1, 2, 3.
v1
v2
v3
Example 2.15

To find Cartesian equations of the line


1
1
x = 2 + t 0 , t R,
3
5

we equate components,

x = 1 t,

y = 2,

z = 3 + 5t,

and then solve for t in the first and third equation. The Cartesian equations are
1x=

z3
5

and

y = 2.

41

2. Matrices and vectors

This is a line parallel to the xz-plane in R3 . The direction vector has a 0 in the
second component, so there is no change in the y direction, the y coordinate has the
constant value y = 2.
In R2 , two lines are either parallel or intersect in a unique point. In R3 more can
happen. Two lines in R3 either intersect in a unique point, are parallel, or are skew,
which means that they lie in parallel planes and are not parallel.
Try to imagine what skew lines look like. If you are in a room with a ceiling parallel to
the floor, imagine a line drawn in the ceiling. It is possible for you to draw a parallel
line in the floor, but instead it is easier to draw a line in the floor which is not parallel
to the one in the ceiling. These lines will be skew. They lie in parallel planes (the ceiling
and the floor). If you could move the skew line in the floor onto the ceiling, then the
lines would intersect in a unique point.
Two lines are said to be coplanar if they lie in the same plane, in which case they are
either parallel or intersecting.
Example 2.16 Are the lines L1 and L2 intersecting, parallel or skew?

x
1
1
x
5
2
L1 : y = 3 + t 2 , L2 : y = 6 + t 1 ,
z
4
1
z
1
7
Activity 2.28

tR

Clearly the lines are not parallel. Why?

Example 2.16 (continued)


The lines intersect if there exist values of the parameters, s, t such that


x
1
1
5
2
y = 3 + t 2 = 6 + s 1 .
z
4
1
1
7

Equating components, we need to solve the three simultaneous equations in two


unknowns,

1 + t = 5 2s
2s + t = 4
3 + 2t = 6 + s
s + 2t = 3 .

4 t = 1 + 7s
7s + t = 3

We have already seen in Example 2.13 on page 39, that the first two equations have
the unique solution, s = 1, t = 2.Substituting these values into the third equation,
7s + t = 7(1) + 2 = 3,
we see that the system has no solution. Therefore the lines do not intersect and must
be skew.

42

2.11. Planes in R3

Example 2.17 On the other hand, if we take a new line L3 , which is parallel to L2
but which passes through the point (5, 6, 5), then the lines

x
1
1
x
5
2
L1 : y = 3 + t 2 , L3 : y = 6 + t 1 , t R
z
4
1
z
5
7
do intersect in the unique point (3, 7, 2).

Activity 2.29 Check this. Find the point of intersection of the two lines L1 and L3 .

2.11 Planes in R3
It is important to look at some sketches of lines and planes in R3 in the texts as you read
this and the previous section. If these are not handy, think about the floor in your room
(which is a plane in 3-space). In how many directions can you move along the floor?
On a line, there is essentially one direction in which a point can move given as all
possible scalar multiples of a given direction, but on a plane there are more possibilities.
A point can move in two different directions, and in any linear combination of these two
directions. So how do we describe a plane in R3 ?
The vector parametric equation
x = p + sv + tw,

s, t, R

describes the position vectors of points on a plane in R3 provided that the vectors v and
w are non-zero and are not parallel. The vector p is the position vector of any
particular point on the plane and the vectors v and w are displacement vectors which
lie in the plane. By taking all possible linear combinations x = p + sv + tw, for s, t R,
we obtain all the points on the plane.
The equation
x = sv + tw,

s, t, R

describes a plane through the origin. In this case the position vector, x, of any point on
the plane lies in the plane.
Activity 2.30 If v and w are parallel, what does the equation
x = p + sv + tw, s, t R, actually represent?
Example 2.18 You have shown that the lines L1 and L3 given in Example 2.17
intersect in the point (3, 7, 2). (See Activity 2.29 on page 43.) Two intersecting lines
determine a plane. A vector equation of the plane containing the two lines is given by

2
x
3
1
y = 7 + s 2 + t 1 , s, t R.
z
2
1
7

43

2. Matrices and vectors

Why? We know that (3, 7, 2) is a point on the plane, and the directions of each of
the lines must lie in the plane. As s and t run through all real numbers, this
equation gives the position vector of all points on the plane. Since the point (3, 7, 2)
is on both lines, if t = 0 we have the equation of L1 , and if s = 0 we get L3 .
Any point which is on the plane can take the place of the vector (3, 7, 2)T , and any
non-parallel vectors which are linear combinations of v and w can replace these in
the equation. So, for example

x
1
1
3
y = 3 + t 2 + s 1 , s, t R
z
4
1
8
is also an equation of this plane.

Activity 2.31 Verify this. Show that (1, 3, 4) is a point on the plane given by each
equation, and show that (3, 1, 8)T is a linear combination of (1, 2, 1)T and
(2, 1, 7)T .
There is another way to describe a plane in R3 geometrically which is often easier to
use. We begin with planes through the origin. Let n be a given vector in R3 and
consider all position vectors x which are orthogonal to n. Geometrically, the set of all
such vectors describes a plane through the origin in R3 .
Again you should look at sketches in the texts, or try to imagine this by placing a
pencil perpendicular to a table top. The pencil represents a normal vector, the table top
a plane, and the point where the pencil is touching the table is the origin of your
coordinate system. Then any vector which you can draw on the table top is orthogonal
to the pencil, and conversely any point on the table top can be reached by a directed
line segment (from the point of the pencil) which is orthogonal to the pencil.
A vector, x, is orthogonal to n if and only if
n, x = 0,
so this equation gives the position vectors, x, of points on the plane. If n = (a, b, c)T
and x = (x, y, z)T , then this equation can be written as

a
x

n, x =
b , y =0
c
z
or
ax + by + cz = 0.
This is a Cartesian equation of a plane through the origin in R3 . The vector n is called
a normal vector to the plane. Any vector which is parallel to n will also be a normal
vector and will lead to the same Cartesian equation.
On the other hand, given a Cartesian equation,
ax + by + cz = 0
then this equation represents a plane through the origin in R3 with normal vector
n = (a, b, c)T .

44

2.11. Planes in R3

To describe a plane which does not go through the origin, we choose a normal vector n
and one point P on the plane with position vector p. We then consider all displacement
vectors which lie in the plane with initial point at P . If x is the position vector of any
point on the plane, then the displacement vector x p lies in the plane, and x p is
orthogonal to n. Conversely, if the position vector x of a point satisfies x p, n = 0,
then the vector x p lies in the plane, so the point (with position vector x) is on the
plane.
(Again, think about the pencil perpendicular to the table top, only this time the point
where the pencil is touching the table is a point, P , on the plane, and the origin of your
coordinate system is somewhere else, say, in the corner on the floor.)
The orthogonality condition means that the position vector of any point on the plane is
given by the equation
n, x p = 0.
Using properties of the inner product, we can rewrite this as
n, x = n, p
where n, p = d is a constant.
If n = (a, b, c)T and x = (x, y, z)T , then
ax + by + cz = d
is a Cartesian equation of a plane in R3 . The plane goes through the origin if and only if
d = 0.
Example 2.19

The equation
2x 3y 5z = 2

represents a plane which does not go through the origin, since (x, y, z) = (0, 0, 0)
does not satisfy the equation. To find a point on the plane we can choose any two of
the coordinates, say y = 0 and z = 0, and then the equation tells us that x = 1. So
the point (1, 0, 0) is on this plane. The components of a normal to the plane can be
read from this equation as the coefficients of x, y, z: n = (2, 3, 5)T .
How does the Cartesian equation of a plane relate to the vector parametric equation of
a plane? A Cartesian equation can be obtained from the vector equation algebraically,
by eliminating the parameters in the vector equation, and vice versa, as the following
example shows.
Example 2.20

Consider the plane


2
x
1
y = s 2 + t 1 = sv + tw,
z
1
7

s, t R,

which is a plane through the origin parallel to the plane in Example 2.18 on page 43.
The direction vectors v = (1, 2, 1)T and w = (2, 1, 7) lie in the plane.

45

2. Matrices and vectors

To obtain a Cartesian equation in x,


vector equation.
x
y
z

y and z, we equate the components in this


= s 2t
= 2s + t .
= s + 7t

and eliminate the parameters s and t. We begin by solving the first equation for s,
and then substitute this into the second equation to solve for t in terms of x and y,
s = x + 2t y = 2(x + 2t) + t = 2x + 5t 5t = y 2x t =

y 2x
.
5

Then substitute back into the first equation to obtain s in terms of x and y,
s=x+2

y 2x
5

5s = 5x + 2y 4x s =

x + 2y
.
5

Finally, we substitute for s and t in the third equation, z = s + 7t, and simplify to
obtain a Cartesian equation of the plane,
3x y + z = 0.

Activity 2.32
plane.

Carry out this last step to obtain the Cartesian equation of the

This Cartesian equation can be expressed as


n, x = 0

where

3
n = 1 ,
1

x = y .
z

The vector n is a normal vector to the plane. We can check that n is, indeed,
orthogonal to the plane by taking the inner product with the vectors v and w, which lie
in the plane.
Activity 2.33 Do this. Calculate n, v and n, w , and verify that both inner
products are equal to zero.
Since n is orthogonal to both v and w, it is orthogonal to all linear combinations of
these vectors, and hence to any vector in the plane. So this plane can equally be
described as the set of all position vectors which are orthogonal to n.
Activity 2.34 Using the properties of inner product, show that this last statement
is true. That is, if n, v = 0 and n, w = 0, then n, sv + tw = 0, for any
s, t R.
Can we do the same for a plane which does not pass through the origin? Consider the
following example.

46

2.11. Planes in R3

Example 2.21 The plane we just considered in Example 2.20 is parallel to the
plane with vector equation

x
3
1
2
y = 7 + s 2 + t 1 = p + sv + tw, s, t R,
z
2
1
7

which passes through the point (3, 7, 2). Since the planes are parallel, they will have
the same normal vectors. So the Cartesian equation of this plane is of the form
3x y + z = d.
Since (3, 7, 2) is a point on the plane, it must satisfy the equation for the plane.
Substituting into the equation we find d = 3(3) (7) + (2) = 4 (which is equivalent
to finding d by using d = n, p ). So the Cartesian equation we obtain is
3x y + z = 4.
Conversely, starting with a Cartesian equation of a plane, we can obtain a vector
equation. We are looking for the position vector of a point on the plane whose
components satisfy 3x y + z = 4, or equivalently, z = 4 3x + y. (We can solve for
any one of the variables x, y or z, but we chose z for simplicity.) So we are looking
for all vectors x such that


x
x
0
1
0
y =

y
= 0 +x
0
+ y 1
z
4 3x + y
4
3
1
for any x, y R. Therefore


x
0
1
0
y = 0 + s 0 + t1,
z
4
3
1

s, t R

is a vector equation of the same plane as that given by the original vector equation,

x
3
1
2
y = 7 + s 2 + t 1 , s, t R,
z
2
1
7
although it is difficult to spot this at a glance.

There are many ways to show that these two vector equations do represent the same
plane, but we can use what we know about planes to find the easiest. The planes
represented by the two vector equations have the same normal vector n, since the
vectors (1, 0, 3)T and (0, 1, 1)T are also orthogonal to n. So we know that the two
vector equations represent parallel planes. They are the same plane if they have a
point in common. It is far easier to find values of s and t for which p = (3, 7, 2)T
satisfies the new vector equation,


3
0
1
0
7 = 0 + s 0 + t 1 , s, t R
2
4
3
1
than the other way around (which is by showing that (0, 0, 4) satisfies the original
equation) because of the positions of the 0s and 1s in the direction vectors.

47

2. Matrices and vectors

Activity 2.35
t which work.

Do this. You should be able to immediately spot the values of s and

Using the examples we have just done, you should now be able to tackle the following
question.
Activity 2.36 The two lines, L1 and L2 ,

x
1
1
x
5
2
L1 : y = 3 + t 2 , L2 : y = 6 + t 1 ,
z
4
1
z
1
7

tR

in Example 2.16 on page 42 are skew, and therefore are contained in parallel planes.
Find vector equations and Cartesian equations for these two planes.

Two planes in R3 are either parallel or intersect in a line. Considering such questions, it
is usually easier to use the Cartesian equations of the planes. If the planes are parallel,
then this will be obvious from looking at their normal vectors. If they are not parallel,
then the line of intersection can be found by solving the two Cartesian equations
simultaneously.
Example 2.22

The planes
x + 2y 3z = 0 and

2x 4y + 6z = 4

are parallel, since their normal vectors are related by (2, 4, 6)T = 2(1, 2, 3)T .
The equations do not represent the same plane, since they have no points in
common; that is, there are no values of x, y, z which can satisfy both equations. The
first plane goes through the origin and the second plane does not.
On the other hand, the planes
x + 2y 3z = 0 and x 2y + 5z = 4
intersect in a line. The points of intersection are the points (x, y, z) which satisfy
both equations, so we solve the equations simultaneously. We begin by eliminating
the variable x from the second equation, by subtracting the first equation from the
second. This will naturally lead us to a vector equation of the line of intersection;
x + 2y 3z = 0
x 2y + 5z = 4

x + 2y 3z = 0

4y + 8z = 4 .

This last equations tells us that if z = t is any real number, then y = 1 + 2t.
Substituting these expressions into the first equation, we find x = 2 t. Then a
vector equation of the line of intersection is

x
2t
2
1
y = 1 + 2t = 1 + t 2 .
z
t
0
1

This can be verified by showing that the point (2, 1, 0) satisfies both Cartesian
equations and that the vector v = (1, 2, 1)T is orthogonal to the normal vectors of
each of the planes (and therefore lies in both planes).

48

2.12. Lines and hyperplanes in Rn

Activity 2.37 Carry out the calculations in the above example and verify that the
line is in both planes.

2.12 Lines and hyperplanes in Rn


We can apply similar geometric language to vectors in Rn . Using the inner product in
Rn (defined in section 2.8.1), we define the length of a vector x = (x1 , x2 , . . . , xn )T by
x =

x21 + x22 + + x2n

or

= x, x .

We say that two vectors, v, w Rn are orthogonal if and only if


v, w = 0.
A line in Rn is the set of all points (x1 , x2 , . . . , xn ) whose position vectors x satisfy a
vector equation of the form
x = p + tv, t R,
where p is the position vector of one particular point on the line and v is the direction
of the line. If we can write x = tv, t R, then the line goes through the origin.

2.12.1

Hyperplanes

The set of all points (x1 , x2 , . . . , xn ) which satisfy one Cartesian equation,
a1 x1 + a2 x2 + + an xn = d,
is called a hyperplane in Rn .
In R2 , a hyperplane is a line; in R3 it is a plane. For n > 3, we use the term hyperplane.
The vector

a1
a2

a=
...
an

is a normal vector to the hyperplane. Writing the Cartesian equation in vector form, a
hyperplane is the set of all vectors, x Rn such that
n, x p = 0,
where the normal vector n and the position vector p of a point on the hyperplane are
given.
Activity 2.38
Rn ?

How many Cartesian equations would you need to describe a line in

How many parameters would there be in a vector equation of a hyperplane?

49

2. Matrices and vectors

Learning outcomes

At the end of this chapter and the relevant reading you should be able to:
explain what is meant by a matrix
use matrix addition, scalar multiplication and matrix multiplication appropriately
(know when and how these operations are defined)
manipulate matrices algebraically
state what is meant by the inverse of a square matrix, a power of a square matrix,
and the transpose of a matrix, and know the properties of these in order to
manipulate them
explain what is meant by a vector and by Euclidean n-space.
state what is meant by the inner product of two vectors and what properties it
satisfies
state what is meant by the length and direction of a vector, what is meant by a
unit vector
state the relationship between the inner product and the length and angle between
two vectors
explain what is meant by two vectors being orthogonal and how to determine this
find the equations, vector and Cartesian, of lines in R2 , lines and planes in R3 , and
work problems involving lines and planes
state what is meant by a line and by a hyperplane in Rn .

Exercises
Exercise 2.1
Given the matrices:

2 1
A = 1 1,
0 3

1
b = 1 ,
1

1 2 1
C = 3 0 1 ,
4 1 1

0 1
D = 2 5,
6 3

which of the following matrix expressions are defined? Compute those which are defined.
(a) Ab

(b) CA

(c) A + Cb

(d) A + D

(f ) DAT + C

(g) bT b

(h) bbT

(i) Cb

(e) bT D

Exercise 2.2
If a, b Rn , show that aT b = bT a.
Exercise 2.3
If A and B are invertible n n matrices, then using the definition of the inverse, prove
that
(AB)1 = B 1 A1 .

50

2.12. Exercises

Exercise 2.4
Solve for the matrix A:
T

3A +

1 0
2 5

= 4A.

Exercise 2.5
If A and AB are invertible matrices and B is a matrix, solve the equation
(AB)1 = 2A1

for B

and deduce that B is also invertible.


Exercise 2.6
If B is an m k matrix, show that the matrix B T B is a k k symmetric matrix.
Exercise 2.7
Are the lines with equations


x
1
3
y = 2 + s5
z
2
7

and

parallel or coincident (that is, the same line)?



x
7
6
y = 12 + t 10
z
16
14

Exercise 2.8
Which of the following sets of points in R3 are collinear that is, lie on a line?
(a) (2, 1, 4), (4, 4, 1), (6, 7, 6)
(b) (1, 2, 3), (4, 2, 1), (1, 1, 2).
If the points are not collinear, find a vector equation for the plane determined by the
three points.
Exercise 2.9
Let


x
1
1

1 be the line with equation


y = 1 + t 2 ,
z
2
3
2 the line through (5, 7, 4) and (8, 13, 3),

7
3 the line through (1, 17, 6) parallel to the vector 4 .
3

Show that:

(a) 1 and 3 are skew,


(b) 1 and 2 intersect,

51

2. Matrices and vectors

(c) 2 and 3 intersect.

Find the points of intersection.


Determine whether each pair of intersecting lines is orthogonal. If not, find the angle
between them.
(d) Find the Cartesian equation of the plane containing 1 and 2 .
(e) Find the Cartesian equations of the parallel planes containing 1 and 3 .
Exercise 2.10
Find any points of intersection of the plane

x
1
y , 1
z
1

=5

and each of the lines






x
1
1
x
1
1
y = 2 + t 0 and y = 2 + s 2 ,
z
3
5
z
3
1

s, t R.

Exercise 2.11
Write down the general Cartesian equation of a vertical plane (parallel to the z-axis), a
non-vertical plane and a horizontal plane (parallel to the x, y-plane) together with their
normal vectors.
Find the Cartesian equations of the line of intersection of the plane
3x y + 2z = 4
and a general horizontal plane.
Exercise 2.12

1
1
Show that the line x = 3 + t 4 does not intersect the plane 2x + z = 8.
1
2

Find the equation of the line through the point (1, 3, 1) which is parallel to the normal
vector of the plane, and determine at what point it intersects the plane. Hence, or
otherwise, find the distance of the line to the plane.

Comments on selected activities


Feedback to activity 2.1
For this matrix, a32 = 9.
Feedback to activity 2.2
Only the second matrix is diagonal.

52

2.12. Comments on selected activities

Feedback to activity 2.3


AB is 2 2 and BA is 3 3,

7 5 10
BA = 2 1 3
3 3 4

10 5
6 2

AB =
Feedback to activity 2.4

1 3
3 7

AB =

4 6
3 4

BA =

Feedback to activity 2.5


If A is m n and B is n p, then AB is an m p matrix. The size of a matrix is not
changed by scalar multiplication, so both (AB) and (A)B are m p. Looking at the
(i, j) entries of each,
((AB))ij = (ai1 b1j + ai2 b2j + . . . + ain bnj )
= ai1 b1j + ai2 b2j + . . . + ain bnj
= ((A)B)ij ,
so these two matrices are equal.
Feedback to activity 2.6
In this case I is m m.
Feedback to activity 2.8

AB =

1 2
3 4

BA =

and

Therefore A1 =

2
3
2

1
12

3
2

2
3
2

1
12

1
12

1 0
0 1

1 2
3 4

1 0
0 1

Feedback to activity 2.9


We will show one way, you should show that A1 A = I.
AA1 =

a b
c d

1
ad bc

1 0
0 1

1
ad bc

d b
c a

ad bc ab + ba
cd dc bc + ad
.

53

2. Matrices and vectors

Feedback to activity 2.11


We will do the first, and leave the others to you. The inverse of Ar is a matrix B such
that Ar B = BAr = I. So show that the matrix B = (A1 )r works:
Ar (A1 )r = (A A . . . A)(A1 A1 . . . A1 ).
r times

r times

Removing the brackets (matrix multiplication is associative) and replacing each central
AA1 = I, the resultant will eventually be AIA1 = AA1 = I. To complete the proof,
show also that (A1 )r Ar = I. Therefore (Ar )1 = (A1 )r .
Feedback to activity 2.12
Given the sizes of A and B, the matrix AB is m p, so (AB)T is p m. Also, AT is
n m and B T is p n, so the only way these matrices can be multiplied is as B T AT
(unless m = p).
Feedback to activity 2.13
The (i, j) entry of B T AT is obtained by taking row i of B T , which is column i of B and
multiplying each term by the corresponding entry of column j of AT , which is row j of
A, and then summing the products:
B T AT

ij

= b1i aj1 + b2i aj2 + . . . + b1n ajn .

This produces the same scalar as the (i, j) entry of (AB)T .


Feedback to activity 2.14
The matrix is

Feedback to activity 2.15

1 4
5
A = 4 2 7 = AT
5 7 3

4
aT b = ( 1 2 3 ) 2 = (3)
1

1
4 2 1
abT = 2 ( 4 2 1 ) = 8 4 2 .
3
12 6 3

Feedback to activity 2.16


To prove properties (ii) and (iii), apply the definition to the LHS (left-hand side) of the
equation and rearrange the terms to obtain the RHS. For example, for x, y Rn , using
the properties of real numbers,
x, y

54

= (x1 y1 + x2 y2 + + xn yn )
= x1 y1 + x2 y2 + + xn yn
= (x1 )y1 + (x2 )y2 + + (xn )yn = x, y .

2.12. Comments on selected activities

Do the same for property (iii).


The single property x + y, z = x, z + y, z implies property (ii) by letting
= 0 for the first equality and then letting = 0 for the second, and property (iii) by
letting = = 1. On the other hand, if properties (ii) and (iii) hold, then
x + y, z

= x, z + y, z
= x, z + y, z

by property (iii)
by property (ii) .

Feedback to activity 2.19


a = 5, so
u=

1
5

4
3

and w =

2
5

4
3

Feedback to activity 2.20


In the figure below,

z
(a1 , a2 , a3 )

P
y
PPP
PP

(a1 , a2 , 0)

the line from the origin to the point (a1 , a2 , 0) lies in the xy-plane, and by Pythagoras
theorem, it has length a21 + a22 . Applying Pythagoras theorem again to the right
triangle shown, we have
2

a =
Feedback to activity 2.21
We have

1

a = 2
2

a21 + a22

+ (a3 )2 =

1
b= 1
4

The cosines of the three angles are given by

a12 + a22 + a32

2
c = b a = 1
2

a, b
1 + 2 + 8
1
=
=
a b
9 18
2
a, c
2+24
= = 0;
a c
9 9
b, c
21+8
1
= =
b c
18 9
2

55

2. Matrices and vectors

Thus the triangle has a right-angle, and two angles of /4.

Alternatively, as the vectors a and c are orthogonal, and have the same length, it
follows immediately that the triangle is right-angled and isosceles.
Feedback to activity 2.22
If t = 3, then q = (3, 7)T . You are asked to sketch the position vector q as this sum to
illustrate that the vector q does locate a point on the line, but the vector q does not lie
on the line.
Feedback to activity 2.23
Here s = 1.
Feedback to activity 2.24
We will work through this for the second equation and leave the first for you. We have,
for s R,
x
y

1
3

+s

2
4

x = 1 2s

y = 3 4s

1x
3y
=s=
,
2
4

which yields 2(1 x) = 3 y or y = 2x + 1.


Feedback to activity 2.25
A vector equation of the line is
x=

1
1

+t

4
1

= p + tv,

t R,

where we have used p to locate a point on the line, and the direction vector, v = q p.
The point (7, 3) is on the line (t = 2), and this is the only point of this form on the line,
since once 7 is chosen for the x coordinate, the y coordinate is determined.
Feedback to activity 2.26
Once given, for example, that the x coordinate is x = 3, the parameter t of the vector
equation is determined, therefore, so are the other two coordinates. We saw in Example
2.14 that t = 2 satisfies the first two equations and it certainly does not satisfy the third
equation, 1 = 0 t.
Feedback to activity 2.27
This is similar to the earlier activity in R2 . A vector equation of the line is

1
4
x = 1 + t 1 = p + tv, t R.
2
1

The point (7, 1, 3) is not on this line, but the point (5, 0, 3) is on the line. The value
t = 1 will then satisfy all three component equations. There is, of course, only one
possible choice for the values of c and d.
Feedback to activity 2.28
The lines are not parallel because their direction vectors are not parallel.

56

2.12. Comments on selected activities

Feedback to activity 2.30


If v and w are parallel, then this equation represents a line in the direction v. If
w = v, then this line can be written as

x = p + (s + t)v, where r = s + t R.
Feedback to activity 2.34
Using the properties of the inner product, we have for any s, t R,
n, sv + tw = s n, v + t n, w = s 0 + t 0 = 0.
Feedback to activity 2.35
Equating components in the vector equation, we have 3 = s and 7 = t from the first two
equations, and these values do satisfy the third equation, 2 = 4 3s + t.
Feedback to activity 2.36
The parallel planes must each contain the direction vectors of each of the lines as
displacement vectors, so the vector equations of the planes are, respectively

x
1
1
2
y = 3 + s 2 + t 1
z
4
1
7
and

where s, t R.

x
5
1
2
y = 6 + s 2 + t 1 ,
z
1
1
7

The parallel planes have the same normal vector, which we need for the Cartesian
equations. Recall that in Example 2.21 on page 47 we found a Cartesian equation and a
normal vector to the first plane, the plane which contains L1 ,

3
3x y + z = 4 with n = 1 .
1
Note that the point (1, 3, 4) is on this plane because it satisfies the equation, but the
point (5, 6, 1) does not. Substituting (5, 6, 1) into the equation 3x y + z = d, we find
the Cartesian equation of the parallel plane which contains L2 is
3x y + z = 10.
Feedback to activity 2.37
As stated, to verify that the line is in both planes, show that its direction vector is
perpendicular to the normal vector of each plane, and that the point (2, 1, 0) satisfies
both equations.
Feedback to activity 2.38
To describe a line in Rn you need n 1 Cartesian equations. A vector parametric
equation of a hyperplane in Rn would require n 1 parameters.

57

2. Matrices and vectors

Comments on exercises

Solution to exercise 2.1


(a) Ab is not defined.
(b) CA is (3 3)(3 2) giving a

1
CA = 3
4

3 2 matrix,

2 1
2 1
4 6
0 1 1 1 = 6 0
1 1
0 3
9 8

(c) A + Cb is not defined since Cb is 3 1 and A is 3 2.

2 2
(d) A + D = 3 6
6 6

0 1
(e) bT D = ( 1 1 1 ) 2 5 = ( 4 3 )
6 3

(f) The product DAT is a 3 3 matrix, as is C, so this expression is defined:

0 1
1 1 3
2 1 0
DAT = 2 5
= 9 7 15
1 1 3
6 3
15 9 9

1 1 3
1 2 1
2 3 4
DAT + C = 9 7 15 + 3 0 1 = 12 7 14
15 9 9
4 1 1
19 10 10

1
(g) bT b = ( 1 1 1 ) 1 = (3)
1

1
1
1 1
(h) bbT = 1 ( 1 1 1 ) = 1
1 1
1
1 1 1


1 2 1
1
2

(i) Cb = 3 0 1
1
= 4
4 1 1
1
4
Solution to exercise 2.2

aT b is the product of a 1 n matrix and an n 1 matrix, so it is a 1 1 matrix, which


is identified with the scalar it represents (that is, the scalar given by a, b ). Since the
product is a 1 1 matrix, it is symmetric, so that
aT b = (aT b)T = bT a

58

2.12. Comments on exercises

Solution to exercise 2.3


By definition, to prove that the matrix AB is invertible you have to show that there
exists a matrix, C, such that
(AB)C = C(AB) = I.
You are given that C = B 1 A1 . Since both A and B are invertible matrices, you
know that both A1 and B 1 exist and both are n n, so the matrix product B 1 A1
is defined. So all you need to do is to show that if you multiply AB on the left or on the
right by the matrix B 1 A1 then you will obtain the identity matrix, I.
(AB)(B 1 A1 )= A(BB 1 )A1
= AIA1
= AA1
=I

matrix multiplication is associative


by the definition of B 1
since AI = A for any matrix A
by the definition of A1

In the same way,


(B 1 A1 )(AB) = (B 1 )(A1 A)(B) = B 1 IB = B 1 B = I
Hence B 1 A1 is the inverse of the matrix AB.
Solution to exercise 2.4
A=

1 2
0 5

Solution to exercise 2.5


Begin by taking the inverse of both sides of this equation and then use A1 . A is square,
since it is invertible. If A is n n, then B must also be n n for AB to be defined and
invertible. Simplifying the equation, you can deduce that B = 12 I, where I is the n n
identity matrix, so that B is invertible.
Solution to exercise 2.6
Since B T is a k m matrix, B T B is k k. (B T B)T = B T (B T )T = B T B which shows
that it is symmetric.
Solution to exercise 2.7
The direction of the second line is the same as the first line, since


6
3
10 = 2 5
14
7

So if the two lines have at least one point in common, they must be the same line. To
show the two lines are coincident you only need to show that one point on the second
line is also on the first line. For example, the two lines are coincident if and only if
(7, 12, 16) lies on the first line and this is true if and only if there is a t R such that

3
7
1
7 = 1 + 3t
12 = 2 + t 5
12 = 2 + 5t

2
7
16
16 = 2 + 7t

59

2. Matrices and vectors

The three component equations have the consistent solution t = 2. Hence the lines are
the same (coincident).

Solution to exercise 2.8


Three points are collinear if the third point lies on the line determined by the first two.
(a) An equation for the line which goes through the two points (2, 1, 4) and (4, 4, 1)
is



x
2
2
2
4
2
y = 1 + t 3 , where v = 3 = 4 1
z
4
5
5
1
4
is the direction of the line.

The point (6, 7, 6) is on this line there is a t R such that

6
2
2
7 = 1 + t 3
6
4
5

6 = 2 + 2t
7 = 1 + 3t

6 = 4 5t

This system of equations has a solution, namely t = 2, so the points are collinear. (b)
In the same way, the three given points lie on the same line if and only if there is a
t R such that

1
1
5
1 = 1 + 5t
1 = 2 + t0
1=2

2
3
2
2 = 3 + 2t
This system of equations has no solution, so the points are not collinear.

In this case the three points determine a plane in R3 . A vector equation is given by



x
1
5
0
y = 2 + t 0 + s 1 = p + tv + sw,
z
3
2
1

s, t R

where w = (0, 1, 1)T is the vector from the point (1, 1, 2) to the point (1, 2, 3).
Remember that any point on the plane can be used to locate it, and any non-parallel
vectors in the plane can be used for the directions.
Solution to exercise 2.9
The equation of line 1 is given. The equations of the other two lines are:

5
3
2 : x = 7 + s 6 , s R
4
1

60

1
7
3 : x = 17 + q 4 , q R.
6
3

2.12. Comments on exercises

(a) The lines 1 and 3 are not parallel since their direction vectors are not parallel (not
multiples of one another). The two lines intersect there is a point (x, y, z) which
lies on both lines there are scalars t, q R for which

1
1
1
7
x = 1 + t 2 = 17 + q 4 .
2
3
6
3

The component equations are a system of three equations in two unknowns:

1 + t = 1 + 7q

t 7q = 0
1 + 2t = 17 4q
2t + 4q = 18

2 + 3t = 6 3q
3t + 3q = 4

The top two equations can be solved simultaneously by substituting t = 7q (from the
first equation) into the second equation
2t + 4q = 18 2(7q) + 4q = 18 18q = 18 q = 1.
Then the first equation yields t = 7q = 7(1) = 7. But although this solution satisfies the
top two equations, it does not satisfy the third equation,
3t + 3q = 4

3(7) + 3(1) = 4.

Hence these equations have no solution, so no such scalars exist. Since the lines are not
parallel and do not intersect, they are skew.
(b) In the same way, the lines 1 and 2 intersect if there are scalars t, s R such that

1
1
5
3
1 + t = 5 + 3s
1 + t 2 = 7 + s 6
1 + 2t = 7 + 6s

2
3
4
1
2 + 3t = 4 s

These equations have the unique solution t = 1, s = 1. (You should check that these
satisfy all three equations.) The point of intersection can be found from either line.
Using 1 and t = 1,


1
1
2

x = 1 + 1 2 = 1 .
2
3
5
Check using 2 and s = 1. The angle of intersection of these lines is obtained from the
scalar product of their direction vectors.

1
3
2 , 6 = 3 + 12 3 = 12.
3
1

Since this is not zero, the lines are not orthogonal. The angle between them is given by
cos =

12
12

22

32

32

62

(1)2

12
6
= = .
14 46
7 23

Leave the answer in this form as you are not expected to use a calculator.

61

2. Matrices and vectors

(c) The lines 2 and 3 intersect there are scalars s, q R such that

5
3
1
7
5 + 3s = 1 + 7q
7 + s 6 = 17 + q 4
7 + 6s = 17 4q

4
1
6
3
4 s = 6 3q

These equations have the solution s = 1, q = 1. So the lines do intersect. The point of
intersection is (8, 13, 3). The scalar product of the direction vectors is

3
7
6 , 4 = 21 24 + 3 = 0.
1
3
Since this is zero, the lines are orthogonal.

Now for the Cartesian equations of the planes. We will illustrate a slightly different
method than the one already shown in this chapter.
(d) We need a normal vector to the plane; that is, a vector, n which is orthogonal to the
direction vectors of both lines. Let n = (a, b, c)T , so that

a
1
a
3
b , 2 = 0 and
b , 6 = 0.
c
3
c
1
This leads to the simultaneous equations

a + 2b + 3c = 0
3a + 6b c = 0

Multiplying the first equation by 3 and then subtracting the two equations, we find the
solution c = 0 and a = 2b, where b is any real number. Taking b = 1, a normal vector
is n = (2, 1, 0)T , and the Cartesian equation of the plane is of the form 2x + y = d.
Substituting in any point on the plane, such as (1, 1, 2), we find d = 3. So the
Cartesian equation of the plane containing 1 and 2 is
2x y = 3.
This is a plane which is parallel to the z-axis. You should check that the point (5, 7, 4)
(on the line 2 ) and the point of intersection, (2, 1, 5), also satisfy this equation.
(e) Using the same technique, a normal vector to both planes is n = (1, 4, 3)T . The
Cartesian equation of the plane containing 1 is x + 4y 3z = 9, and the parallel
plane containing 3 is x + 4y 3z = 51.
Solution to exercise 2.10
Solve the equations simultaneously to find any points of intersection. The value of t for
the point of intersection of the plane and the first line is given by

1
1
1
2 + t 0 , 1 = 5
3
5
1

62

2.12. Comments on exercises

Multiply out the scalar product and solve for t to obtain t = 12 . Substituting this back
).
into the equation of the line, the point is ( 23 , 2, 11
2
The value of t for the point of intersection of the plane and the second line is given by

1
1
1
2 + t 2 , 1 = 5
3
1
1

Multiplying out the scalar product and simplifying yields the false statement that 2 = 5,
which is impossible, hence the line does not intersect the plane. In fact, it is parallel to
the plane. The direction vector of the line is orthogonal to the normal to the plane:

1
1
2 , 1 = 0.
1
1
What would have happened if the line had been in the plane? Try this. Write down a
line in the plane parallel to the one given and see what happens when you try to solve
for t.
Solution to exercise 2.11
This time notice that the normal to the plane is orthogonal to the direction of the line,
as

2
1
0 , 4 = 0.
1
2

Since the point (1, 3, 1) on the line does not satisfy the equation of the plane, the line is
parallel to the plane. Therefore it makes sense to ask for the distance of the line from
the plane. This can be found by dropping a perpendicular from the line to the plane
and measuring its length. A method for doing this is given in the question.
The line through (1, 3, 1) and parallel to the normal to the plane is perpendicular to the
plane. A vector equation of the line is


x
1
2
y = 3 + t 0 , t R.
z
1
1
Equating components, we have x = 1 + 2t, y = 3, and z = 1 + t. At the point of
intersection of the line with the plane, these components will satisfy the equation of the
plane, so that
2x + z = 8 2(1 + 2t) + (1 + t) = 8 3 + 5t = 8,
or t = 1. Then putting this value for t in the line, we find the point of intersection is
(3, 3, 2).
The distance between the line and the plane is the distance between this point and the
point (1, 3, 1), which is given by the length of the vector

3
1
2

v = 3 3 = 0 , so the distance is 5.
2
1
1

63

2. Matrices and vectors

64

Chapter 3
Systems of linear equations
3
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 2.
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 16, 17.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapter 1.

Aims of the chapter


Being able to solve systems of many linear equations in many unknowns is a vital part
of linear algebra. This is where we begin to use matrices and vectors as essential
elements of obtaining and expressing the solutions.
We begin by expressing a system in matrix form and defining elementary row
operations on an augmented matrix. These operations mimic standard operations on
systems of equations. We then learn a precise algorithm to apply these operations in
order to put the matrix in a form called reduced echelon form, from which the general
solution to the system is readily obtained. The method of manipulating matrices in this
way to obtain the solution is known as Gaussian elimination.
We then examine the forms of solutions to systems of linear equations and look at their
properties, defining what is meant by a homogeneous system and the null space of a
matrix.

3.1 Systems of linear equations


A system of m linear equations in n unknowns x1 , x2 , . . . , xn is a set of m equations of
the form
a11 x1 + a12 x2 + + a1n xn = b1
a21 x1 + a22 x2 + + a2n xn = b2
..
.

..
.

am1 x1 + am2 x2 + + amn xn = bm .

65

3. Systems of linear equations

The numbers aij are known as the coefficients of the system.


We say that s1 , s2 , . . . , sn is a solution of the system if all m equations hold true when
x1 = s1 , x2 = s2 , . . . , xn = sn .

Sometimes a system of linear equations is known as a set of simultaneous equations;


such terminology emphasises that a solution is an assignment of values to each of the n
unknowns such that each and every equation holds with this assignment. It is also
referred to simply as a linear system.
In order to deal with large systems of linear equations we write them in matrix form.
Definition 3.1 (Coefficient matrix) The matrix A = (aij ), whose (i, j)-entry is the
coefficient aij of the system of linear equations is called the coefficient matrix.

a11 a12 ... a1n


a21 a22 ... a2n
A=
..
..
..
...
.
.
.
am1 am2 ... amn

Let x = (x1 , x2 , . . . , xn )T be the vector of unknowns. Then the product Ax of the m n


coefficient matrix A and the n 1 column vector x is an m 1 matrix,

a11 a12 ... a1n


x1
a11 x1 + a12 x2 + + a1n xn
a21 a22 ... a2n x2 a21 x1 + a22 x2 + + a2n xn
.
. =
,
..
..
..
..
..
..

.
.
. ..
.
.
am1 am2 ... amn
xn
am1 x1 + an2 x2 + + amn xn
whose entries are the left-hand sides of our system of linear equations.

If we define another column vector b, whose m components are the right-hand sides bi ,
the system is equivalent to the matrix equation
Ax = b.
Example 3.1 Consider the following system of three linear equations in the three
unknowns, x1 , x2 , x3 :
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 x2 + 2x3 = 5
This system can be written in

1 1
A = 2 1
1 1
The entries of the matrix
multiplication of Ax,

1
2
1

66

matrix notation as Ax = b with



1
x1
3
1,
x = x2 ,
b = 4.
2
x3
5

A are the coefficients of the xi . If we perform the matrix


1 1
x1
x1 + x2 + x3
1 1 x2 = 2x1 + x2 + x3
1 2
x3
x1 x2 + 2x3

3.2. Row operations

the matrix product is a 3 1 matrix, a column vector. If Ax = b, then


x1 + x2 + x3
3
2x1 + x2 + x3 = 4
x1 x2 + 2x3
5

and these two 3 1 matrices are equal if and only if their components are equal.
This gives precisely the three linear equations.

3.2 Row operations


Our purpose is to find an efficient means of finding the solutions of systems of linear
equations. To do this, we begin by looking at a simple example.
An elementary way of solving a system of linear equations such as
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 x2 + 2x3 = 5
is to begin by eliminating one of the variables from two of the equations. For example,
we can eliminate x1 from the second equation by multiplying the first equation by 2 and
then subtracting it from the second equation. Lets do this. Twice the first equation
gives the equation 2x1 + 2x2 + 2x3 = 6. Subtracting this from the second equation,
2x1 + x2 + x3 = 4 yields the equation x2 x3 = 2. We can now replace the second
equation in the original system by this new equation,
x1 + x2 + x3 = 3
x2 x3 = 2
x1 x2 + 2x3 = 5
and the new system will have the same set of solutions as the original system.
We can continue in this manner to obtain a simpler set of equations with the same
solution set as the original system. So what operations can we perform on the equations
of a linear system without altering the set of solutions? We can:
O1 multiply both sides of an equation by a non-zero constant
O2 interchange two equations.
O3 add a multiple of one equation to another.
These operations do not alter the set of solutions since the restrictions on the variables
x1 , x2 , . . . , xn given by the new equations imply the restrictions given by the old ones
(that is, we can undo the manipulations made on the old system).
At the same time, we observe that these operations really only involve the coefficients of
the variables and the right sides of the equations.

67

3. Systems of linear equations

For example, using the same system as above expressed in matrix form, Ax = b, then
the matrix

1 1 1 3
(A|b) = 2 1 1 4
1 1 2 5

which is the coefficient matrix A together with the constants b as the last column,
contains all the information we need to use, and instead of manipulating the equations,
we can instead manipulate the rows of this matrix. For example, subtracting twice
equation 1 from equation 2 is executed by taking twice row 1 from row 2.
These observations form the motivation behind a method to solve systems of linear
equations, known as Gaussian elimination. To solve a linear system Ax = b we first
form the augmented matrix, denoted (A|b) which is A with column b tagged on.

Definition 3.2 (Augmented matrix)

a11 a12
a21 a22
A=
..
..
...
.
.
am1 am2
Then the matrix

If Ax = b is a system of linear equations,



a1n
x1
b1

a2n
x
b
.2 b = .2
x
=
..
..
..
.
amn
xn
bm

a11
a21
(A|b) =
...
am1

a12
a22
..
.
am2

a1n
a2n
..
..
.
.
amn

is called the augmented matrix of the linear system.

b1
b2
..
.
bm

From the operations listed above for manipulating the equations of the linear system,
we define corresponding operations on the rows of the augmented matrix.
Definition 3.3 (Elementary row operations) These are:
RO1 multiply a row by a non-zero constant
RO2 interchange two rows
RO3 add a multiple of one row to another.

3.3 Gaussian elimination


We will describe a systematic method for solving systems of linear equations by an
algorithm which uses row operations to put the augmented matrix into a form from
which the solution of the linear system can be easily read. To illustrate the algorithm,
we will use two examples: the augmented matrix (A|b) of the example in the previous
section and the augmented matrix (B|b) of a second system of linear equations.

1 1 1 3
0 0 2 3
( A|b ) = 2 1 1 4 ,
( B|b ) = 0 2 3 4 .
1 1 2 5
0 0 1 5

68

3.3. Gaussian elimination

3.3.1

The algorithm reduced row echelon form

Using the above two examples, we will carry out the algorithm in detail.
(1) Find the leftmost column that

1 1 1
2 1 1
1 1 2

is not all zeros.

3
0 0 2 3
0 2 3 4
4
5
0 0 1 5

(This is column 1 of (A|b) and column 2 of (B|b).)

(2) Get a non-zero entry at the top of this column.


The matrix on the left already has a non-zero entry at the top.
right, we interchange row 1 and row 2.

1 1 1 3
0 2 3
2 1 1 4
0 0 2
1 1 2 5
0 0 1

For the matrix on the

4
3
5

(3) Make this entry 1; multiply the first row by a suitable number or interchange two
rows. This is called a leading one.
The left-hand matrix already
multiply row 1 by 12 .

1 1
2 1
1 1

had a 1 in this position. For the second matrix, we

1 3
1 4
2 5

0 1
0 0
0 0

3
2

2
1

2
3
5

(4) Add suitable multiples of the top row to rows below to make all entries below the
leading one become zero.
For the matrix on the left, we add 2 times row 1 to row 2, then we add 1 times row 1
to row 3. The first operation is the same as the one we performed earlier on the example
using the equations. The matrix on the right already has zeros under the leading one.

1 1
1
3
0 1 32 2
0 1 1 2
0 0 2 3
0 2 1
2
0 0 1 5

At any stage we can read the modified system of equations from the new augmented
matrix, remembering that column 1 gives the coefficients of x1 , column 2 the coefficients
of x2 and so on, and that the last column represents the right-hand side of the equations.
For example the matrix on the left is now the augmented matrix of the system
x1 + x2 + x3 = 3
x2 x3 = 2
2x2 + x3 = 2
The next step in the algorithm is
(5) Cover up the top row and apply steps 1 to 4 again.

69

3. Systems of linear equations

This time we will work on one matrix at a time. After the first four steps, we have
altered the augmented matrix (A|b) to:

1 1
1
3
(A|b) 0 1 1 2
0 2 1
2

We now ignore the top row. Then the leftmost column which is not all zeros is column
2. This column already has a non-zero entry at the top. We make it into a leading one
by multiplying row 2 by 1:

1 1 1 3
0 1 1 2
0 2 1 2
This is now a leading one, and we use it to
row 3:

0
0

obtain zeros below. We add 2 times row 2 to

1 1 3
1 1 2
0 3 6

Now we cover up the top two rows and start again with steps 1 to 4. The leftmost
column which is not all zeros is column 3. We multiply row 3 by 31 to obtain the final
leading one:

1 1 1 3
0 1 1 2 .
0 0 1 2

This last matrix is in row echelon form, or simply, echelon form.

Definition 3.4 (Row echelon form) A matrix is said to be in echelon form (or row
echelon form) if it has the following three properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.
Activity 3.1
Check that the above matrix satisfies these three properties.
The term echelon form takes its name from the form of the equations at this stage.
Reading from the matrix, these equations are
x1 + x2 + x3 = 3
x2 + x3 = 2
x3 = 2
We could now use a method called back substitution to find the solution of the system.
The last equation tells us that x3 = 2. We can then substitute this into the second
equation to obtain x2 , and then use these two values to obtain x1 . This is an acceptable
approach, but we can effectively do the same calculations by continuing with row
operations. So we continue with one final step of our algorithm.

70

3.3. Gaussian elimination

(6) Begin with the last row and add suitable multiples to each row above to get zeros
above the leading ones.
Continuing from the row echelon form and using row 3, we replace row 2 with row
2row 3, and at the same time we replace row 1 with row 1row 3.

1 1 1 3
1 1 0 1
(A|b) 0 1 1 2 0 1 0 0
0 0 1 2
0 0 1 2

We now have zeros above the leading one in column 3. There is only one more step to
do, and that is to get a zero above the leading one in column 2. So the final step is row
1row 2,

1 0 0 1
0 1 0 0 .
0 0 1 2
This final matrix is now in reduced (row) echelon form. It has the additional property
that every column with a leading one has zeros elsewhere.
Definition 3.5 (Reduced row echelon form) A matrix is said to be in reduced
echelon form (or reduced row echelon form) if it has the following four properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.
(4) Every column with a leading one has zeros elsewhere.
If R is the reduced row echelon form of a matrix M, we will sometimes write
R = RREF (M).
The solution can now be read from the matrix. The top row says x1 = 1, the second row
says x2 = 0, and the third row says x3 = 2. The original system has been reduced to the
matrix equation


1 0 0
x1
1
0 1 0 x2 = 0
0 0 1
x3
2
giving the solution


1
x1
x2 = 0 .
2
x3

This system of equations has a unique solution.

We can check that this solution is the correct solution of the original system by
substituting it into the equations, or equivalently, by multiplying out the matrices Ax
to show that Ax = b.
Activity 3.2 Do this: check that


1 1 1
1
3
2 1 10 = 4.
1 1 2
2
5

71

3. Systems of linear equations

We now return to the example (B|b) which we left after the first round of steps 1 to 4,
and apply step 5. We cover up the top row and apply steps 1 to 4 again. We need to
have a leading one in the second row, which we achieve by switching row 2 and row 3:

0 1 32 2
0 1 23 2
(B|b) 0 0 2 3 0 0 1 5
0 0 1 5
0 0 2 3
We obtain a zero under this leading one by replacing row 3 with row 3 + (2) times
row 2,

0 1 32 2
0 0 1 5
0 0 0 7

and then finally multiply row 3 by 17

0 1

0 0
0 0

3
2

1
0

2
5
1

This matrix is now in row echelon form, but we shall see that there is no point in going
on to reduced row echelon form. This last matrix is equivalent to the system

x1
2
0 1 23
0 0 1 x2 = 5
0 0 0
x3
1

What is the bottom equation of this system? Row 3 says 0x1 + 0x2 + 0x3 = 1, that is
0 = 1 which is impossible! This system has no solution.

3.3.2

Consistent and inconsistent systems

Definition 3.6 (Consistent) A system of linear equations is said to be consistent if it


has at least one solution. It is inconsistent if there are no solutions.

If the row echelon form (REF) of the augmented matrix ( A|b ) contains a row
(0 0 0 1) then it is inconsistent.
It is instructive to lookat the original
systems represented by
matrices,
these augmented

1 1 1 3
0 0 2 3
( A|b ) = 2 1 1 4
( B|b ) = 0 2 3 4
1 1 2 5
0 0 1 5

= 3
x1 + x2 + x3 = 3
2x3
2x1 + x2 + x3 = 4
2x2 + 3x3 = 4 .

x1 x2 + 2x3 = 5
x3
= 5
We see immediately that the system Bx = b is inconsistent since it is not possible for
both the top and the bottom equation to hold.

72

3.3. Gaussian elimination

Since these are systems of three equations in three variables, we can interpret these
results geometrically. Each of the equations above represents a plane in R3 . The system
Ax = b represents three planes which intersect in the point (1, 0, 2). This is the only
point which lies on all three planes. The system Bx = b represents three planes, two of
which are parallel (the horizontal planes 2x3 = 3 and x3 = 5), so there is no point which
lies on all three planes.
This method of reducing the augmented matrix to reduced row echelon form is known
as Gaussian elimination or Gauss-Jordan elimination.
We have been very careful in illustrating this method to explain what the row
operations were for each step of the algorithm, but in solving a system with this method
it is not necessary to include all this detail. The aim is to use row operations to put the
augmented matrix into reduced row echelon form, and then read off the solutions from
this form. Where it is useful to indicate the operations, you can do so by writing, for
example, R2 2R1 , where we always write down the row we are replacing first, so that
R2 2R1 indicates replace row 2 (R2 ) with row 2 plus 2 times row 1 (R2 2R1 ).
Otherwise, you can just write down the sequence of matrices linked by arrows. It is
important to realise that once you have performed a row operation on a matrix, the
new matrix obtained is not equal to the previous one, this is why you must use arrows
between the steps and not equal signs.
Example 3.2 We repeat the reduction of (A|b) to illustrate this for the system
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 x2 + 2x3 = 5
Begin by writing down the augmented matrix, then apply the row operations to
carry out the algorithm. Here we will indicate the row operations.

1 1 1 3
(A|b) = 2 1 1 4
1 1 2 5

1 1
1
3
R2 2R1 0 1 1 2
R3 R1
0 2 1
2

1 1 1 3

(1)R2 0 1 1 2
0 2 1 2

1 1 1 3
0 1 1 2
R3 + 2R2 0 0 3 6

1 1 1 3
0 1 1 2.
1
( 3 )R3 0 0 1 2

73

3. Systems of linear equations

The matrix is now in row echelon form, continue to reduced row echelon form,

R1 R3 1 1 0 1
R2 R3 0 1 0 0
0 0 1 2

R1 R2 1 0 0 1
0 1 0 0.
0 0 1 2
The augmented matrix is now in reduced row echelon form.

Activity 3.4 Use Gaussian elimination to solve the following system of equations,
x1 + x2 + x3 = 6
2x1 + 4x2 + x3 = 5
2x1 + 3x2 + x3 = 6.
Be sure to follow the algorithm to put the augmented matrix into reduced row
echelon form using row operations.

3.3.3

Linear systems with free variables

Gaussian elimination can be used to solve systems of linear equations with any number
of equations and unknowns. We will now look at an example of a linear system with
four equations in five unknowns,
x1 + x2 + x3 + x4 + x5
2x1 + x2 + x3 + x4 + 2x5
x1 x2 x3 + x4 + x5
x1 + x4 + x5

=
=
=
=

3
4
5
4.

1
2
1
1

3
4
.
5
4

The augmented matrix is

1 1
1 1
2 1
1 1
(A|b) =
1 1 1 1
1 0
0 1

Check that your augmented matrix is correct before you proceed, or you could be
solving the wrong system! A good method is to first write down the coefficients by rows,
so reading across the equations, and then to check the columns do correspond to the
coefficients of that variable. Now follow the algorithm to put (A|b) into reduced row
echelon form.

1 1
1
1 1 3

R2 2R1
0 1 1 1 0 2

0 2 2 0 0 2
R3 R1
R4 R1
0 1 1 0 0 1

74

3.3. Gaussian elimination

1 1
1

(1)R2 0 1
1

0 2 2
0 1 1

0
R3 + 2R2 0
R4 + R2
0

0
( 12 )R3 0
0

1
1
0
0
1
1
0
0

0
0
R4 R3 0

This matrix is in row echelon form.


with the third row,
R1 R3
R2 R3

1
1
0
0

1
1
0
0
1
1
0
0

1
1
0
0

1
1
2
1
1
1
1
1

1
1
0
0

1
0
0
0

3
2

6
3

3
2

3
3

1
0
0
0
1
1
1
0

3
2

2
1

1
0
0
0

1
0
0
0

3
2

3
0

We continue to reduced row echelon form, starting

1
0

0
0

1
R1 R2
0
0

1
1
0
0

1
1
0
0

0
0
1
0

0
1
0
0

0
1
0
0

0
0
1
0

1 0
0 1

0 3
0 0

1 1
0 1
.
0 3
0 0

There are only three leading ones in the reduced row echelon form of this matrix. These
appear in columns 1, 2 and 4. Since the last row gives no information, but merely states
that 0 = 0, the matrix is equivalent to the system of equations
x1 + 0 + 0 + 0 + x5 = 1
x2 + x3 + 0 + 0 = 1
x4 + 0 = 3.
The form of these equations tells us that we can assign any values to x3 and x5 , and
then the values of x1 , x2 and x4 will be determined.
Definition 3.7 (Leading variables) The variables corresponding to the columns with
leading ones in the reduced row echelon form of an augmented matrix are called leading
variables. The other variables are called non-leading variables.
In this example the variables x1 , x2 and x4 are leading variables, x3 and x5 are
non-leading variables. We assign x3 , x5 the arbitrary values s, t, where s, t represent any

75

3. Systems of linear equations

real numbers, and then solve for the leading variables in terms of these. We get
x4 = 3

x2 = 1 s

x1 = 1 t.

Then we express this solution in vector form:



x1
1t
1
0
1
x2 1 s 1
1
0

x=
x3 = s = 0 + s 1 + t 0 .
x4 3 3
0
0
x5
t
0
0
1

Observe that there are infinitely many solutions, because any values of s R and t R
will give a solution.
The solution given above is called a general solution of the system, because it gives a
solution for any values of s and t. For any particular assignment of values to s and t,
such as s = 0, t = 1, we obtain a particular solution of the system.
Activity 3.5 Let s = 0 and t = 0 and show (by substituting it into the equation)
that x0 = (1, 1, 0, 3, 0)T is a solution of Ax = b. Then let s = 1 and t = 2 and show
that the new vector x1 you obtain is also a solution.
With practice, you will be able to read the general solution directly from the reduced
row echelon form of the augmented matrix. We have

1 0 0 0 1 1
0 1 1 0 0 1

(A|b)
0 0 0 1 0 3 .
0 0 0 0 0 0

Locate the leading ones, and note which are the leading variables. Then locate the
non-leading variables and assign each an arbitrary parameter. So, as above, we note
that the leading ones correspond to x1 , x2 and x4 and we assign arbitrary parameters to
the non-leading variables; that is, values such as x3 = s and x5 = t where s and t
represent any real number. Then write down the vector x = (x1 , x2 , x3 , x4 , x5 )T (as a
column) and fill in the values starting with x5 and working up. We have x5 = t. Then
the third row tells us that x4 = 3. We have x3 = s. Now look at the second row, which
says x2 + x3 = 1, or x2 = 1 s. Then the top row tells us that x1 = 1 t. In this
way we obtain the solution in vector form.
Activity 3.6 Write down the system of three linear equations in three unknowns
represented by the matrix equation Ax = b, where



1 2 1
x
3

A= 2 2 0 ,
x= y ,
b = 2.
3 4 1
z
5

Use Gaussian elimination to solve the system. Express your solution in vector form.
If each equation represents the Cartesian equation of a plane in R3 , describe the
intersection of these three planes.

76

3.4. Homogeneous systems and null space

3.3.4

Solution sets

We have seen systems of linear equations which have a unique solution, no solution and
infinitely many solutions. It turns out that these are the only possibilities.
For suppose we have a linear system Ax = b which has two distinct solutions, p and q.
Thinking of these vector solutions as determining points in Rn , then we will show that
every point on the line through p and q is also a solution. Therefore, as soon as there is
more than one solution, there must be infinitely many.
To prove this claim, let p and q be vectors such that Ap = b and Aq = b, p = q. The
equation of the line is
v = p + t(q p)
t R.
Then for any vector v on the line we have Av = A(p + t(q p)). Using the
distributive laws,
Av = Ap + tA(q p) = Ap + t(Aq Ap) = b + t(b b) = b
Therefore v is also a solution for any t R, so there are infinitely many of them.

Notice that in this proof, the vector w = q p satisfies the equation Ax = 0. This
leads us to our next topic.

3.4 Homogeneous systems and null space


3.4.1

Homogeneous systems

Definition 3.8 A homogeneous system of linear equations is a linear system of the


form Ax = 0.
A homogeneous system Ax = 0 is always consistent.
Why? Because A0 = 0, so it always has the solution x = 0. For this reason, x = 0 is
called the trivial solution.
Note that if Ax = 0 has a unique solution, then it must be the trivial solution,
x = 0.
If we form the augmented matrix, (A | 0), of a homogeneous system, then the last
column will consist entirely of zeros. This column will remain a column of zeros
throughout the entire row reduction, so there is no point in writing it. Instead, we use
Gaussian elimination on the coefficient matrix A, remembering that we are solving
Ax = 0.
Example 3.3 Find the solution of the homogeneous linear system,
x + y + 3z + w = 0
xy+z+w = 0
y + 2z + 2w = 0

77

3. Systems of linear equations

We reduce the coefficient matrix A to reduced

1 1
1 1 3 1

A =
1 1 1 1

0 2
0 1 2 2
0 1

1 1 3 1
1 1 0

0 1 1 0

0 1 0
0 0 1 2
0 0 1

row echelon form,

3 1
1

2 0

0
2 2
0

5
1 0

0 1
2
0 0

1 3 1
1 1 0
1 2 2

0 3
0 2
1 2

Activity 3.7 Work through the above calculation and state what row operation is
being done at each stage. For example, the first operation is R2 R1 .
Then write down the solution from the reduced row echelon form of the matrix.
The solution is

x
3
y
2

x=
z = t 2 ,
w
1

tR

which is a line through the origin, x = tv, with v = (3, 2, 2, 1)T . There are infinitely
many solutions, one for every t R.

This example illustrates the following fact.

Theorem 3.1 If A is an m n matrix with m < n then Ax = 0 has infinitely many


solutions.
Proof
The system is always consistent and the solutions are found by reducing the coefficient
matrix A. If A is m n, then the reduced row echelon form of A contains at most m
leading ones, so there are at most m leading variables. Therefore, there must be n m
non-leading variables. Since m < n, n m > 0, which means n m 1. This says that
there is at least one non-leading variable, so at least one arbitrary parameter in the
solution which can take on any real value. Hence there are infinitely many solutions.
What about a linear system Ax = b? If A is m n with m < n, does Ax = b have
infinitely many solutions? The answer is, that provided the system is consistent, then
there are infinitely many solutions, as the following examples show.
Example 3.4 The linear system
x+y+z = 6
x+y+z = 1
is inconsistent, since there are no values of x, y, z which can satisfy both equations.
These equations represent parallel planes in R3 .

78

3.4. Homogeneous systems and null space

Example 3.5 On the other hand,


x + y + 3z + w = 2
xy+z+w = 4
y + 2z + 2w = 0

is consistent and will have infinitely many solutions. Notice that the coefficient
matrix of this linear system is the same matrix A as that used in the previous
example of a homogeneous system.
The augmented matrix is

1 1 3 1 2
(A|b) = 1 1 1 1 4 .
0 1 2 2 0
Activity 3.8 Show that the reduced

1 0
0 1
0 0
Then write down the solution.

row echelon form of the augmented matrix is

0 3 1
0 2 2 .
1 2
1

Example 3.5 (continued)


The general solution of this system,

x
1
3
y 2
2

x=
z = 1 + t 2 = p + tv
w
0
1

tR

is a line which does not go through the origin. It is parallel to the line of solutions of
the homogeneous system, Ax = 0, and goes through the point determined by p.
This should come as no surprise, since the coefficient matrix forms the first four
columns of the augmented matrix. Compare the solution sets:
Ax = 0 :

RREF (A)

Ax = b :

1 0 0 3
0 1 0 2
0 0 1 2

3
2

x = t
2
1

RREF (A|b)

1 0 0 3 1
0 1 0 2 2
0 0 1 2
1

3
1

+ t 2
x=
2
1
0
1

79

3. Systems of linear equations

The reduced row echelon form of the augmented matrix of a system Ax = b will always
contain the information for the solution of Ax = 0, since the matrix A is the first part
of (A|b). We therefore have the following definition.

Definition 3.9 (Associated homogeneous system) Given a system of linear


equations, Ax = b, the linear system Ax = 0 is called the associated homogeneous
system.
The solutions of the associated homogeneous system form an important part of the
solution of the system Ax = b, as we shall see in the next section.
Activity 3.9 Look at the reduced row echelon form of A in Example 3.5,

1 0 0 3
0 1 0 2 .
0 0 1 2

Explain why you can tell from this matrix that for all b R3 , the linear system
Ax = b is consistent with infinitely many solutions.

Activity 3.10

Find the solution of the system of equations Ax = b given by


x1 + 2x2 + x3 = 1
2x1 + 2x2 = 2
3x1 + 4x2 + x3 = 2.

Find also the general solution of the associated homogeneous system, Ax = 0.


Describe the configuration of intersecting planes for each system of equations
(Ax = b and Ax = 0).

3.4.2

Null space

It is clear from what we have just seen that the general solution to a consistent linear
system Ax = b involves solutions to the system Ax = 0. This set of solutions is given a
special name: the null space or kernel of a matrix A. This null space, denoted N(A), is
the set of all solutions x to Ax = 0, where 0 is the zero vector. That is,
Definition 3.10 (Null space) For an m n matrix A, the null space of A is the
subset of Rn given by,
N(A) = {x Rn | Ax = 0},
where 0 = (0, 0, . . . , 0)T is the zero vector of Rm .
We now formalise the connection between the solution set of a consistent linear system,
and the null space of the coefficient matrix of the system.
Theorem 3.2 Suppose that A is an m n matrix, that b Rm , and that the system
Ax = b is consistent. Suppose that p is any solution of Ax = b. Then the set of all

80

3.4. Homogeneous systems and null space

solutions of Ax = b consists precisely of the vectors p + z for z N(A); that is,


{x | Ax = b} = {p + z | z N(A)}.
Proof
To show the two sets are equal, we show that each is a subset of the other. This means
showing that p + z is a solution for any z in the null space of A, and that all solutions x
of Ax = b are of the form p + z for some z N(A).
If z N(A) then

A(p + z) = Ap + Az = b + 0 = b,

so p + z is a solution of Ax = b, that is, p + z {x | Ax = b}. This shows that


{p + z | z N(A)} {x | Ax = b}.
Conversely, suppose that x is any solution of Ax = b. Because p is also a solution, we
have
A(x p) = Ax Ap = b b = 0,
so the vector z = x p is a solution of the system Az = 0; in other words, z N(A).
But then x = p + (x p) = p + z, where z N(A). This shows that all solutions are of
the form, p + z for some z N(A) that is
{x | Ax = b} {p + z | z N(A)}.
So the two sets are equal, as required.
The above result is the Principle of Linearity. It says that the general solution of a
consistent linear system Ax = b is equal to any one particular solution p, where
Ap = b, plus the general solution of the associated homogeneous system.
{solutions of Ax = b} = p + {solutions of Ax = 0}.

In light of this result, lets have another look at some of the examples we worked earlier.
In the previous section (page 79) we observed that the solutions of
x + y + 3z + w = 2
xy+z+w = 4
y + 2z + 2w = 0.
are of the form

x
1
3
y 2

=
+ t 2 = p + tv,
x=
z 1
2
w
0
1

t R,

where x = tv is the general solution we had found of the associated homogeneous


system (page 77). It is clear that p is a particular solution of the linear system (take
t = 0), so this solution is of the form described in the theorem.

81

3. Systems of linear equations

Now refer back to the first two examples Ax = b and Bx = b which


section 3.3.1.

=
x+y+z = 3
2z
2x + y + z = 4
2y + 3z =

x y + 2z = 5
z
=

The echelon forms of the augmented matrices we found were

1 0 0 1
0 1

(A|b) 0 1 0 0 ,
(B|b) 0 0
0 0 1 2
0 0

3
2

1
0

we worked in
3
4 .
5

2
5.
1

The first system, Ax = b, has a unique solution, p = (1, 0, 2)T , and the second system,
Bx = b, is inconsistent.
The reduced row echelon form of the matrix A is the identity matrix (these are the first
three columns of the augmented matrix). Therefore the homogeneous system Ax = 0
will only have the trivial solution. The unique solution of Ax = b is of the form
x = p + 0, which conforms with the Principle of Linearity.
This principle does not apply to the inconsistent system Bx = b; however, the
associated homogeneous system is consistent. Notice that the homogeneous system is

= 0
2z
2y + 3z = 0

z
= 0

which represents the intersection of two planes, since the equations 2z = 0 and z = 0
each represent the xy-plane. To find the solution, we continue to reduce the matrix B to
reduced row echelon form.

0 1 0
0 1 23
B 0 0 1 0 0 1 .
0 0 0
0 0 0

The non-leading variable is x, so we set x = t, and the solution is




t
1

x = 0 = t 0, t R
0
0

which is a line through the origin, namely the x-axis. So the plane 2y + 3z = 0
intersects the xy-plane along the x-axis.
We summarise what we have noticed so far:
If Ax = b is consistent, the solutions are of the form x = p + z where p is any one
particular solution and z N(A), the null space of A.
(1) If Ax = b has a unique solution then Ax = 0 has only the trivial solution.
(2) If Ax = b has infinitely many solutions then Ax = 0 has infinitely many
solutions.
Ax = b may be inconsistent, but Ax = 0 is always consistent.

82

3.4. Learning outcomes

Activity 3.11
Look at the example we solved in section 3.3.3 on page 74.
x1 + x2 + x3 + x4 + x5
2x1 + x2 + x3 + x4 + 2x5
x1 x2 x3 + x4 + x5
x1 + x4 + x5

=
=
=
=

3
4
5
4.

Show that the solution we found is of the form x = p + sv + tw, s, t R, where p is


a particular solution of Ax = b and sv + tw is a general solution of the associated
homogeneous system Ax = 0.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
express a system of linear equations in matrix form as Ax = b and know what is
meant by the coefficient matrix and the augmented matrix
put a matrix into reduced row echelon form using row operations and following the
algorithm
recognise consistent and inconsistent systems of equations
solve a system of m linear equations in n unknowns using Gaussian elimination
express the solution in vector form
interpret systems with three unknowns as intersections of planes in R3
say what is meant by a homogeneous system of equations and what is meant by the
associated homogeneous system of any linear system of equations
state and explain the Principle of Linearity
say what is meant by the null space of a matrix.

Exercises
Exercise 3.1
Write down the augmented matrix for each of the following systems of equations, and
use it to solve the system by reducing the augmented matrix to reduced row echelon
form.

x+y+z =2
x + y + 2z = 2
(i)

2y + z = 0

(ii)

x + y z = 4

x + y + 2z = 2
2y + z = 0
(iii)

x + y z = 2

(iv)

2y + z = 0

x + y z = 0

3x y + z = 0

2x + 3y + 2z = 0
x + 2y + 3z = 0

83

3. Systems of linear equations

Interpret the solutions to each of the above systems as intersections of planes,


describing them geometrically.
Exercise 3.2
Find the vector equation of the line of intersection of the planes
3x1 + x2 + x3 = 3

and

x1 x2 x3 = 1.

What is the intersection of these two planes and the plane


x1 + 2x2 + 2x3 = 1 ?
Exercise 3.3
Write down the augmented matrix for the following system of equations, and use it to
solve the system by reducing the augmented matrix to reduced row echelon form.
Express your answer in vector form.
x1 + x2 + x3 + x5 = 1
3x1 + 3x2 + 6x3 + 3x4 + 9x5 = 6
2x1 + 2x2 + 4x3 + x4 + 6x5 = 5.
Exercise 3.4
Find the reduced row echelon form of the

C= 2
6

matrix

2 1 3
1 1 0.
2 4 6

(a) If C is the augmented matrix of a system of equations Ax = b, C = (A|b), what


can you say about the solutions? (What are they? What Euclidean space are they in?)
(b) If C is the coefficient matrix of a homogeneous system of equations, Cx = 0, what
can you say about the solutions? (What are they? What Euclidean space are they in?)

Comments on selected activities


Feedback to activity 3.3
Put the augmented matrix into reduced row echelon form. It should take

1 1 1 6
1 0 0
2 4 1 5 (1) (2) (3) (4) 0 1 0
2 3 1 6
0 0 1

five steps:

2
1 ,
5

from which you can read the solution, x = (2, 1, 5)T . We will state the row operations
at each stage. To obtain (1), do R2 2R1 and R3 2R1 ; for (2) switch R2 and R3 ; For
(3) do R3 2R2 . The augmented matrix is now in row echelon form, so starting with
the bottom row, for (4) do R2 + R3 and R1 R3 . The final operation, R1 R2 will yield
the matrix in reduced row echelon form.

84

3.4. Comments on selected activities

Feedback to activity 3.4


Multiply the matrices below as instructed to obtain b:

1 1
1 1
2 1
1 1
Ax0 =
1 1 1 1
1 0
0 1

1
1

1
2
0

1
3
1
0

1 1
1 1
2 1
1 1
and Ax1 =
1 1 1 1
1 0
0 1

1
1

2
2
1 .

1
3
1
2

Feedback to activity 3.5


The equations are:
x1 + 2x2 + x3 = 3
2x1 + 2x2 = 2
3x1 + 4x2 + x3 = 5.
Put the augmented matrix

1
2
3

into reduced row echelon form

2 1 3
1 2
1
3
2 0 2 0 2 2 4
4 1 5
0 2 2 4

1 2 1 3
1 0 1 1
. . . 0 1 1 2 0 1 1
2
0 0 0 0
0 0 0
0

x
1 + t
1
1
with solution y = 2 t = 2 + t 1 ,
z
t
0
1

t R. This is the equation of a line in R3 . So the three planes intersect in a line.


Feedback to activity 3.8
This is the reduced row echelon form of the coefficient matrix, A. The reduced row
echelon form of any augmented matrix. (A|b), will have the same four columns as the
first four of its five columns. As there is a leading one in every row, it is impossible to
have a row of the form (0 0 . . . 0 1), so the system will be consistent. There will be one
free (non-leading) variable, (fourth column, say x4 = t), so there will be infinitely many
solutions.
Feedback to activity 3.9
Using row operations to reduce the augmented matrix to echelon form, we obtain

1 2 1 1
1 2
1
1
1 2
1
1
1 2 1 1
2 2 0 2 0 2 2 0 0 1
1
0 0 1 1 0 .
3 4 1 2
0 2 2 1
0 2 2 1
0 0 0 1
There is no reason to reduce the matrix further, we conclude that the original system of
equations is inconsistent, there is no solution. For the homogeneous system, Ax = 0, the
row echelon form of A consists of the first three columns of the echelon form of the

85

3. Systems of linear equations

augmented matrix. So starting from these and


we obtain

1 2 1
1

A = 2 2 0 ... 0
3 4 1
0

continuing to reduced row echelon form,

2 1
1 0 1
1 1 0 1 1 .
0 0
0 0 0

Setting the non-leading variable x3 = t, we find that the null space of A consists of all
vectors, x,

1
x = t 1 , t R.
1

The system of equations Ax = 0 has infinitely many solutions.

Geometrically, the associated homogeneous system represents the equations of three


planes, all of which pass through the origin. These planes intersect in a line through the
origin. The equation of this line is given by the solution we found.
The original system represents three planes with no common points of intersection. No
two of the planes in either system are parallel. Why? Look at the normals to the planes,
no two of these are parallel, so no two planes are parallel. These planes intersect to form
a kind of triangular prism; any two planes intersect in a line, and the three lines of
intersection are parallel, but there are no points which lie on all three planes. (If you
have trouble visualising this, take three cards, place one flat on the table, and then get
the other two to balance on top, forming a triangle when viewed from the side.)

Comments on exercises
Solution to exercise 3.1
The first set is presented

1
(i) (A|b) = 0
1

0
0

1
R1 R3
0
0
1
R
2 3

in detail,

1 1
2
1 1 1 2
R3 +R1
0 2 1 0
2 1
0
1 1 4
0 2 0 2

1 1 2
2 1 0
1 0 1

1 0 0
1 0 1
0 1 2

1
R3 R2

0
0

1
R1 R2
0
0

1 1 2
1 0 1
2 1 0

0 0 1
1 0 1
0 1 2

1 1 2
R3 2R2

1 0 1
0 1 2

x
1
= y = 1
z
2
1
0
0

This system has a unique solution. The three planes intersect in one point.

(ii)

86

1 1 2 2
1 1 2 2
1 1 2 2
R3 +R1
3 R2
0 2 1 0 R
0 2 1 0
(A|b) = 0 2 1 0
1 1 1 0
0 2 1 2
0 0 0 2

3.4. Comments on exercises

This system is inconsistent; there is no solution. The equations represent three planes
whose lines of intersection are parallel, so have no points in common, but no two of the
planes are parallel.

1 1 2
2
R3 +R1
(iii) (A|b) = 0 2 1
0
1 1 1 2

1 1 2 2
1 1 2
1
R
R3 R2
2 2

0 2 1 0

0 1 12
0 0 0 0
0 0 0

1
0
0

2
0
0

1 2 2
2 1 0
2 1 0

1 0
R1 R2

0 1
0 0

3
2
1
2

2
0
0

Set z = t and then solve for x and y in terms of t. There are infinitely many solutions.


3
x
2 32 t
2
2
= y = 21 t = 0 + t 21 , t R .
z
t
0
1
The three planes intersect in a line. If you set z = 2s, then this line of solutions can be
written as


x
2
3
y = 0 + s 1 , s R.
z
0
2
(iv) This is a homogeneous system of equations, so you can just reduce the coefficient
matrix A. If you reduce the augmented matrix, you will notice that the last column
remains all zeros throughout the row reduction.

To reduce calculations, the first step in reducing this matrix should be to switch row 1
and row 3, so that you have a leading one without introducing fractions. The reduced
row echelon form is


3 1 1
1 0 0
x
0

A = 2 3 2 . . . 0 1 0

y = 0.
1
2 3
0 0 1
z
0
The unique solution is x = (0, 0, 0)T . These equations represent three planes, all of
which pass through the origin, and have one point of intersection, which is the origin.
Solution to exercise 3.2
Solve the first two equations simultaneously using Gaussian elimination. The general
solution takes the form x = p + sw, s R, where p = (1, 0, 0)T and w = (0, 1, 1)T ,
which is the equation of the line of intersection of the two planes.
The third plane intersects the first two in the same line. You can determine this by
solving the linear system of three equations using Gaussian elimination. Alternatively,
you can notice that the line of intersection of the first two planes is in the third plane
(since its direction is perpendicular to the normal, and the point v satisfies the
Cartesian equation of the plane), so this must be the intersection of all three planes.

87

3. Systems of linear equations

Solution to exercise 3.3

The reduced echelon form of the augmented matrix is

1 1 0 0 1 1
0 0 1 0 2
2 .
0 0 0 1 0 1

x2 and x5 are non-leading variables. Set x2 = r and x5 = s. Using the equations,


solve for the other variables in terms of these.

x1
1 r + s
1
1
1
x2
0
1
0
r

x3 = 2 2s = 2 + r 0 + s 2 ,
r, s R.

x4
1
0
0
1
x5
s
0
0
1
Solution to exercise 3.4
To obtain the

3
2
6

reduced echelon form of the matrix

2 1 3
1 1 0 3
1 1 0
3
R1 R2
2R1
2 1 1 0 R2
0 1 1 6
1 1 0
R3 6R1
2 4 6
6 2 4 6

0 4 4 12

1 1
0
3
R
2 0 1 1
6
0 4 4 12

R3 +4R2

and so on to obtain

1
1
0
3
1 1 0
3
1
R
12 3
0 1 1 6
0 1 1 6
0 0 0 1
0 0 0 12

1 1 0 0
1 0 1 0
0 1 1 0 0 1 1 0 .
0 0 0 1
0 0 0 1
(a) If this is the reduced row echelon form of an augmented matrix, C = (A|b), the
original system of three equations in three unknowns, Ax = b, is inconsistent.
(Solutions would have been in R3 .)
(b) If this is the reduced row echelon form of the coefficient matrix of a homogeneous
system, Cx = 0, with variables x, y, z, w, the solutions are the vectors,

x
t
1
y t

= = t 1 , t R
x=
z t
1
w
0
0
which is a line through the origin in R4 .

88

Chapter 4
Matrix inversion and determinants
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 3.
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 18, 19, 20.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapters 1, 2.

Aims of the chapter


In this chapter all matrices will be square, n n matrices, unless explicitly stated
otherwise. Only a square matrix can have an inverse, and the determinant is only
defined for a square matrix.
We begin by looking more closely at row operations and use them to understand and
answer two questions: (1) When is a matrix A invertible? and (2) How can we find the
inverse matrix?
These same two questions can also be answered using the determinant. You have
already met the determinant of a 2 2 matrix in section 2.5 (page 25); we will define
the determinant of an n n matrix, learn how to evaluate it, and see why it can be
used to determine whether or not a matrix is invertible. Then we will establish another,
often more practical, way to find the inverse of a matrix. An application of this leads to
Cramers rule, a method to solve a system of n equations in n unknowns, provided it
has a unique solution.
Finally we will see a well-known application of linear algebra to economics, a method of
using systems of linear equations to analyse an economy, and for which its proponent
was awarded the Nobel prize in Economics in 1973.

4.1 Matrix inverse using row operations


4.1.1

Elementary matrices

Recall the three elementary row operations:


RO1 multiply a row by a non-zero constant
RO2 interchange two rows
RO3 add a multiple of one row to another.

89

4. Matrix inversion and determinants

These operations change a matrix into a new matrix. We want to examine this process
more closely. Let A be an n n matrix and let Ai denote the ith row of A. Then we can
write A as a column of n rows,

a11 a12 a1n


A1
a21 a22 a2n
A2

=
A=
..
..
..
...

...
.
.
.
An
an1 an2 ann

We use this row notation to indicate


are indicated below?

A1
3A2
.
..
An

row operations. For example, what row operations

A2
A1
.
..
An

A1
A2 + 4A1

..

.
An

The first is multiply row 2 by 3, the second is interchange row 1 and row 2, and the
third is add 4 times row 1 to row 2. Each of these represent new matrices after the row
operation has been executed.
Now look at a product of two n n matrices A and B. The (1, 1) entry in the product
is the inner product of row 1 of A and column 1 of B. The (1, 2) entry is the inner
product of row 1 of A and column 2 of B, and so on. In fact, row 1 of the product
matrix AB is obtained by taking the product of the row matrix A1 with the matrix B,
that is, A1 B. This is true of each row of the product; that is, each row i of the product
AB is obtained by taking Ai B. So we can express the product AB as,


a11 a12 a1n
b11 b12 b1n
A1 B
a21 a22 a2n b21 b22 b2n A2 B
.
.
= .
..
..
..
..
..
..
..
.
.
.
. ..
.
. ..
An B
an1 an2 ann
bn1 bn2 bnn
Now consider the effect of a row operation on a product AB. The first matrix below is
the product AB after the row operation add 4 times row 1 of AB to row 2 of AB.


A1 B
A1 B
A1
A2 B + 4A1 B (A2 + 4A1 )B A2 + 4A1

=
=
B
..
..
..

.
.
.
An B

An B

An

In the second matrix we have used the distributive rule to write


A2 B + 4A1 B = (A2 + 4A1 )B. But comparing this matrix to the row form of a product
of two matrices given above, this matrix is just the product of the matrix obtained from
A after the same row operation, multiplying the matrix B.
We have shown that the matrix obtained by a row operation on the product AB is
equal to the product of the matrix obtained by the row operation on A with the matrix
B; that is,
(matrix obtained by a row operation on AB) =
(matrix obtained by a row operation on A)B

90

4.1. Matrix inverse using row operations

This is true for any n n matrices A and B.

Now take A = I, the identity matrix. Since IB = B, the previous statement now says
that:
The matrix obtained by a row operation on B is equal to the following product:
(the matrix obtained by a row operation on I)B.
Definition 4.1 (Elementary matrix) An elementary matrix, E, is an n n matrix
obtained by doing exactly one row operation on the n n identity matrix, I.
For example,

1 0 0
0 1 0
1 0 0
0 3 0
1 0 0
4 1 0
0 0 1
0 0 1
0 0 1
are elementary matrices. The first has had row 2 multiplied by 3, the second had row 1
and row 2 interchanged, and the last matrix had 4 times row 1 added to row 2.
Activity 4.1 Which

2
0
0

of the matrices below

1 0
0 1

1 0
1 0
0 1
1 0

are elementary matrices?

0
1 0 0
0 1 0
0
1
1 0 1

Write the first matrix as the product of two elementary matrices.

Elementary matrices provide a useful tool to relate a matrix to its reduced row echelon
form. We have shown above that the matrix obtained from a matrix B after performing
one row operation is equal to a product EB, where E is the elementary matrix obtained
from I by that same row operation.
Example 4.1 Suppose we want to put the

1
B= 1
1
into reduced row echelon form.

B=
1
1

matrix

2 4
3 6
0 1

Our first step is

2 4
R2 R1
3 6

0 1

1 2 4
0 1 2
1 0 1

We perform the same operation on the identity matrix


matrix, which we will denote by E1 .

1 0 0
1
R2 R1

I= 0 1 0

1
0 0 1
0

Then the matrix E1 B is

to obtain an elementary

0 0
1 0 = E1
0 1

1 0 0
1 2 4
1 2 4
E1 B = 1 1 0 1 3 6 = 0 1 2 ,
0 0 1
1 0 1
1 0 1

91

4. Matrix inversion and determinants

which is the matrix obtained from B after the row operation.


We now want to look at the invertibility of elementary matrices and row operations.
First, note that any elementary row operation can be undone by an elementary row
operation.
RO1 is multiply a row by a non-zero constant.
To undo RO1, multiply the row by 1/(constant).
RO2 is interchange two rows.

To undo RO2 interchange the rows again.


RO3 is add a multiple of one row to another.
To undo RO3 subtract the multiple of one row from the other.
If we obtain an elementary matrix by performing one row operation on the identity, and
another elementary matrix from the row operation which undoes it, then multiplying
these matrices together will return the identity matrix. That is, they are inverses of one
another. This argument establishes the following theorem.
Theorem 4.1 Any elementary matrix is invertible, and the inverse is also an
elementary matrix.
Activity 4.2 Let

1 0 0
E = 4 1 0 .
0 0 1

Write down E 1 . Then show that EE 1 = I and E 1 E = I.


We saw earlier in Example 4.1

E1 B = 1
0

that multiplying E1 B we obtain

0 0
1 2 4
1 2 4
1 0 1 3 6 = 0 1 2
0 1
1 0 1
1 0 1

We can undo this row operation and return


E11 ,

1 0 0
1 2
1 1 0 0 1
0 0 1
1 0

4.1.2

the matrix B by multiplying on the left be


4
1 2 4
2 = 1 3 6.
1
1 0 1

Row equivalence

Definition 4.2 If A and B are m n matrices, we say that A is row equivalent to B if


and only if there is a sequence of elementary row operations to transform A into B.
This is an example of what is known as an equivalence relation. This means it satisfies
three important conditions: it is

92

4.1. Matrix inverse using row operations

reflexive
symmetric
transitive

AA
AB BA
A B and B C A C

Activity 4.3 Argue why this is true: that is, explain why row equivalence as defined
above satisfies these three conditions?
The algorithm for putting a matrix A into reduced row echelon form by a sequence of
row operations means that every matrix is row equivalent to a matrix in reduced row
echelon form. This fact is stated in the following theorem.
Theorem 4.2 Every matrix is row equivalent to a matrix in reduced row echelon form.

4.1.3

The main theorem

We are now ready to answer the first question, When is a matrix invertible? We collect
our results in the following theorem.
Theorem 4.3 If A is an n n matrix, then the following statements are equivalent
(meaning if any one of these statements is true for A, then all the statements are true).
(1) A1 exists.
(2) Ax = b has a unique solution for any b Rn .
(3) Ax = 0 only has the trivial solution, x = 0.
(4) The reduced row echelon form of A is I.
Proof
If we show that (1) (2) (3) (4) (1), then any one statement will imply all
the others, so the statements are equivalent.
(1) = (2). We assume that A1 exists, and consider the system of linear equations
Ax = b where x is the vector of unknowns and b is any vector in Rn . We use the
matrix A1 to solve for x by multiplying the equation on the left by A1 ,
A1 Ax = A1 b = Ix = A1 b = x = A1 b.
This shows that x = A1 b is a solution, and it is the only possible solution. So Ax = b
has a unique solution for any b Rn .
(2) = (3). If Ax = b has a unique solution for all b Rn , then this is true for b = 0.
The unique solution of Ax = 0 must be the trivial solution, x = 0.
(3) = (4). If the only solution of Ax = 0 is x = 0, then there are no free (non-leading)
variables and the reduced row echelon form of A must have a leading one in every
column. Since the matrix is square and a leading one in a lower row is further to the
right, A must have a leading one in every row. Since every column with a leading one
has zeros elsewhere, this can only be the n n identity matrix.

93

4. Matrix inversion and determinants

(4) = (1). We now make use of elementary matrices. If A is row equivalent to I, then
there is a sequence or row operations which reduce A to I, so there must exist
elementary matrices E1 , . . . , Er such that
Er Er1 E1 A = I.
Each elementary matrix has an inverse. We use these to solve the above equation for A,
1
by first multiplying the equation on the left by Er1 , then by Er1
, and so on, to obtain
1
A = E11 Er1
Er1 I

This says that A is a product of invertible matrices, hence invertible. (Recall from
Chapter 1 that if A and B are invertible matrices of the same size, then the product
AB is invertible and its inverse is the product of the inverses in the reverse order,
(AB)1 = B 1 A1 .)
This proves the theorem.

4.1.4

Using row operations to find the inverse matrix

From the proof of the theorem we have


A = E11 Er1
where the matrices Ei are the elementary matrices corresponding to the row operations
used to reduce A to the identity matrix, I. Then taking the inverse of both sides,
A1 = (E11 Er1 )1 = Er E1 = Er E1 I.
This tells us that if we apply the same row operations to the matrix I that we use to
reduce A to I, then we will obtain the matrix A1 . That is,
Er Er1 E1 A = I ,

A1 = Er E1 .I

This gives us a method to find the inverse of a matrix A. We start with the matrix A
and we form a new, larger matrix by placing the identity matrix to the right of A,
obtaining the matrix denoted (A|I). We then use row operations to reduce this to
(I|B). If this is not possible (which will become apparent) then the matrix is not
invertible. If it can be done, then A is invertible and B = A1 .
Example 4.2 We use this method to find

1
A= 1
1

the inverse of the matrix

2 4
3 6.
0 1

In order to determine if the matrix is invertible


we form the matrix

1 2 4
(A | I) = 1 3 6
1 0 1

94

and, if so, to determine the inverse,

1 0 0
0 1 0.
0 0 1

4.1. Matrix inverse using row operations

(We have separated A from I by a vertical line just to emphasise how this matrix is
formed. It is also helpful in the calculations.) Then we carry out elementary row
operations.

1 2 4
1 0 0
R2 R1
0 1 2 1 1 0
R3 + R1
0 2 5
1 0 1

1 2 4
1
0 0
0 1 2 1 1 0
R3 2R2
0 0 1
3 2 1

1 2 0 11 8 4
R1 4R3
0 1 0 7
5 2
R2 2R3
0 0 1
3
2 1

1 0 0
3 2 0
R1 2R2
0 1 0 7 5 2 .
0 0 1
3 2 1

This is now in the form (I|B) so we deduce that A is invertible and that

3 2 0
A1 = 7 5 2 .
3 2 1

It is very easy to make mistakes when row reducing a matrix, so the next thing you
should do is check that AA1 = I.
Activity 4.4 Do this. Check that when you multiply AA1 , you get the identity
matrix I.
(In order to establish that this is the inverse matrix, you should also show
A1 A = I, but we will forgo that here.)

If the matrix A is not invertible, what will happen? By the theorem, if A is not
invertible, then the reduced row echelon form of A cannot be I, so there will be a row of
zeros in the row echelon form of A.
Activity 4.5 Find the inverse, if it

2 1

A=
0 1
1
2

4.1.5

exists, of each of the following matrices

3
2 1 3
1
B = 0 1 1 .
0
1 2 0

Result on inverses

At this stage, in order to show that a square matrix B is the inverse of the n n matrix
A, it seems we have to show that both statements, AB = I and BA = I are true. After
we have proved the following theorem, which follows from Theorem 4.3, we will be able

95

4. Matrix inversion and determinants

to deduce from the one statement AB = I that A and B must be inverses of one
another.
Theorem 4.4 If A and B are n n matrices and AB = I, then A and B are each
invertible matrices, and A = B 1 and B = A1 .
Proof
If we show that the homogeneous system of equations Bx = 0 has only the trivial
solution, x = 0, then by Theorem 4.3 this will prove that B is invertible. So we consider
the matrix equation Bx = 0 and multiply both sides of this equation on the left by the
matrix A. We have

Bx = 0 A(Bx) = A0 (AB)x = 0.
But we are given that AB = I, so that
(AB)x = 0 Ix = 0 x = 0,
which shows that the only solution of Bx = 0 is the trivial solution. We therefore
conclude that B is invertible, so the matrix B 1 exists.
We now multiply both sides of the equation AB = I on the right by the matrix B 1 .
We have,
AB = I (AB)B 1 = IB 1 A(BB 1 ) = B 1 A = B 1 .
So A is the inverse of B, and therefore A is also an invertible matrix. Then taking
inverses of both sides of the last equation, we conclude that A1 = (B 1 )1 = B.

4.2 Determinants
4.2.1

Determinant using cofactors

The determinant of a square matrix A is a particular number associated with A, written


|A| or det A.
This number will provide a quick way to determine whether or not a matrix A is
invertible. In view of this, suppose A is a 2 2 matrix, and that we wish to determine
A1 using row operations. Then we form the matrix (A | I) and attempt to row reduce
A to I. We assume a = 0, otherwise we would begin by switching rows,
(A | I) =
R2 cR1

a b
c d

1
b/a
0 d cb/a

1 0
0 1
1/a 0
c/a 1

(1/a)R1

aR

1 b/a
c d

1
b/a
0 (ad bc)

which shows that A1 exists if and only if ad bc = 0.

For a 2 2 matrix, the determinant is given by the formula


a b
c d

96

a b
c d

1/a 0
0 1

= ad bc.

1/a 0
c a

4.2. Determinants

For example,
1 2
= (1)(4) (2)(3) = 2.
3 4
To extend this definition to n n matrices, we define the determinant of an n n
matrix recursively, in terms of (n 1) (n 1) determinants. So the determinant of a
3 3 matrix is given in terms of 2 2 matrices, and so on. To do this we will need the
following two definitions.
Definition 4.3 Suppose A is an n n matrix. The (i, j) minor of A, denoted by Mij ,
is the determinant of the (n 1) (n 1) matrix obtained by removing the ith row and
jth column of A.
Definition 4.4 The (i, j) cofactor of a matrix A is
Cij = (1)i+j Mij .
So the cofactor is equal to the minor if i + j is even, and it is equal to 1 times the
minor if i + j is odd.
Example 4.3 Let

1 2 3
A = 4 1 1.
1 3 0

Then the minor M23 and the cofactor C23 are


M23 =

1 2
= 5,
1 3

C23 = (1)(2+3) M23 = 5.

There is a simple way to associate the cofactor Cij with the entry aij of the matrix.
Locate the entry aij and cross out the row and the column containing aij . Then
evaluate the determinant of the (n 1) (n 1) matrix which remains. This is the
minor, Mij . Then give it a + or sign according to the position of aij on the
following pattern:

+ +
+ +

+ + .
.. .. .. .. . .
.
. . . .
Activity 4.6 Write down the cofactor C13 for the matrix A above using this
method.
If A is an n n matrix, the determinant of A, is given by
a11
a21
|A| = ..
.
an1

a12
a22
..
.
an2

. . . a1n
. . . a2n
..
..
.
.
. . . ann

= a11 C11 + a12 C12 + + a1n C1n .

97

4. Matrix inversion and determinants

This is called the cofactor expansion of |A| by row one. It is a recursive definition,
meaning that the determinant of an n n matrix is given in terms of (n 1) (n 1)
determinants.
Example 4.4 We calculate the determinant of the matrix A above:
|A| = 1C11 + 2C12 + 3C13
1 1
4 1
4 1
= 1
2
+3
3 0
1 0
1 3
= 1(3) 2(1) + 3(13) = 34.

Activity 4.7 Calculate the determinant of

M=
0
1

the matrix

2 1
2 3.
1 4

You might ask, Why is the cofactor expansion given by row 1, rather than any other
row? In fact it turns out that using a cofactor expansion by any row or column of A
will give the same number |A|, as the following theorem states.

Theorem 4.5 If A is an n n matrix, then the determinant of A can be computed by


multiplying the entries of any row (or column) by their cofactors and summing the
resulting products:
|A| = ai1 Ci1 + ai2 Ci2 + . . . + ain Cin
(cofactor expansion by row i)
|A| = a1j C1j + a2j C2j + . . . + anj Cnj
(cofactor expansion by column j).
We will look into the proof of this result later, but first note that this allows you to
choose any row or any column of a matrix to find its determinant using a cofactor
expansion. So we should choose a row or column which gives the simplest calculations.
Obtaining the correct value for |A| is important, so it is a good idea to check your result
by calculating the determinant by another row or column.
Example 4.5 In the example we have been using (see page 97), instead of using
the cofactor expansion by row 1 as shown above, we can choose to evaluate the
determinant of the matrix A by row 3 or column 3, which will involve fewer
calculations since a33 = 0. To check the result |A| = 34, we will evaluate the
determinant again using column three. Remember the correct cofactor signs.
|A| =

98

1 2 3
4 1
1 2
4 1 1 =3
1
+ 0 = 3(13) (5) = 34.
1 3
1 3
1 3 0

4.2. Determinants

Activity 4.8 Check your calculation of the

M=
0
1

determinant of the matrix

2 1
2 3.
1 4

in the previous activity by expanding by a different row or column. Choose one with
fewer calculations.

Determinant as a sum of elementary signed products [optional]

We will give an informal proof of Theorem 4.5, because it is useful to understand how
the definition of determinant works. This section can be safely omitted (you can accept
the theorem without proof), but you might find it worth your while to read through it.
For a 2 2 matrix the cofactor expansion by row 1 is equivalent to the definition given
on page 97 as ad bc,
a11 a12
= a11 a22 a12 a21 .
a21 a22
Notice that each term of the sum is a product of entries, one from each row and one
from each column. Indeed a11 is the entry from row 1 and column 1, and a22 is not in
either, it comes from row 2 and column 2. Similarly for the second term, a12 a21 , which is
the only different way of taking one entry from each row and each column of the matrix.
For a 3 3 matrix, the cofactor expansion by row 1 yields,
a11
a21
a31

a12
a22
a32

a13
a
a23 = a11 22
a32
a33

a23
a
a12 21
a33
a31

a23
a
+ a13 21
a33
a31

a22
a32

= a11 (a22 a33a23 a32 )a12 (a21 a33a23 a31 )+a13 (a21 a32a22 a31 ).

Then |A| is the sum of the products:


()

a11 a22 a33


a11 a23 a32

+a12 a23 a31


a12 a21 a33

+a13 a21 a32


a13 a22 a31

The row indices of each product are in ascending order, 123


and the column indices are:
123
231
312
132
213
321
These are the six permutations of the numbers 1,2,3.

Definition 4.5 A permutation of a set of integers {1, 2, 3, . . . , n} is an arrangement of


these integers in some order with no omissions and no repetitions.
To find all permutations of a set of numbers, we can use a permutation tree:

1
2
3
3 choices

/\ /\ /\

23 13 12
2 choices
3 2 1 = 3!

|| || ||

32 31 21
1 choice

99

4. Matrix inversion and determinants

In the above expansion of |A|, because each term has the row indices arranged in
ascending order and the column indices form a different permutation of the numbers
1,2,3, we know that each term of the sum is a different product of entries, one from each
row and one from each column of A, and the set of six products contains all ways in
which this can happen.
But what about the minus signs? An inversion is said to occur in a permutation
whenever a larger integer precedes a smaller one. For example,
1 2 3 no inversions
1 3 2 one inversion

A permutation is said to be even if the total number of inversions is even. It is odd if the
total number of inversions is odd.
To find the total number of inversions of a permutation, start at the left and find the
total number of integers to the right of the first integer which are smaller than the first
integer. Then go to the next integer to the right and do the same. Continue until the
end, and then add up all these numbers.
Example 4.6 Applying this to the permutation 5 2 3 4 1, the total number of
inversions is 4 + 1 + 1 + 1 = 7, so this permutation is odd. The total number of
inversions gives the minimum number of steps that it takes to put these numbers
into ascending order, where in each step you are only allowed to switch the positions
of two adjacent integers. For the permutation 5 2 3 4 1, this can be done in seven
steps by
5 2 3 4 1 2 5 3 4 1 2 3 5 4 1 2 3 4 5 1
2 3 4 1 5 2 3 1 4 5 2 1 3 4 5 1 2 3 4 5.
If we look again at the list of products (), we find that the permutations of the column
indices corresponding to the products with a plus sign are all even, and those
corresponding to the products with a minus sign are all odd.
Definition 4.6 An elementary product from an n n matrix A is a product of n
entries, no two of which come from the same row or column. A signed elementary
product has the row indices arranged in ascending order, multiplied by 1 if the column
indices are an odd permutation of the numbers 1 to n.
We are now ready to give an intrinsic (but completely impractical) definition of
determinant.
Definition 4.7 (Determinant) The determinant of an n n matrix A is the sum of all
signed elementary products of A.
A cofactor expansion is a clever way to obtain this sum of signed elementary products.
You choose the entries from one row, say, and then cross out that row and the column
containing the entry to obtain the cofactor, and each stage of calculating the cofactor
repeats the process. The proof of Theorem 4.5 is that each cofactor expansion is the
sum of all signed elementary products.
Activity 4.9 Expand the determinant
a11
|A| = a21
a31

100

a12
a22
a32

a13
a23
a33

4.2. Determinants

using the cofactor expansion by column 2, and show that you get the same list of
signed elementary products as we obtained in ().
For very large matrices, using a cofactor expansion is impractical. For example,
1 4 3
2 7 5
|A| =
1
2
6
2 10 14

2
1
= 1C11 + (4)C12 + 3C13 + 2C14
0
4

would require calculating four 3 3 determinants. Fortunately, there is a better way. To


simplify the calculations we will turn once again to row operations. But first we need to
establish some useful results on determinants which follow directly from Theorem 4.5.
Results on determinants
Theorem 4.6 If A is an n n matrix then
|AT | = |A|.
Proof
This theorem follows immediately from Theorem 4.5. The cofactor expansion by row i
of |AT | is precisely the same, number for number, as the cofactor expansion by column i
of |A|.
Each of the following three statements follows from Theorem 4.5. As a result of Theorem
4.6, it follows that each is true if the word row is replaced by column. We will need
these results in the next section. In all of them we assume that A is an n n matrix.
Corollary 1 If a row of A consists entirely of zeros, then |A| = 0.

Proof
If we evaluate the determinant by the cofactor expansion using the row of zeros, then
each cofactor is multiplied by 0 and the sum will be zero. To visualise this, expand the
determinant below using row 1,
0
a21
..
.
an1

0
a22
..
.
an2

... 0
. . . a2n
.. = 0C11 + 0C12 + + 0C1n = 0.
..
.
.
. . . ann

Corollary 2 If A contains two rows which are equal, then |A| = 0.

Proof
To prove this we will use a recursive (or inductive) argument. If A is a 2 2 matrix
with two equal rows, then
|A| =

a b
= ab ab = 0
a b

101

4. Matrix inversion and determinants

Now consider a 3 3 matrix with two equal rows. If we expand the determinant by the
other row, then each cofactor is a 2 2 determinant with two equal rows, therefore each
is zero and so is their sum. For example,
a b c
b c
a c
a b
|A| = d e f = d
+e
f
=0+0+0=0
b c
a c
a b
a b c
In this way, the result for (n 1) (n 1) matrices implies the result for n n
matrices, so the statement is true.

Corollary 3 If the cofactors of one row are multiplied by the entries of a different row,
then the result is 0.
Proof
Let

a11 a12 . . . a1n


a21 a22 . . . a2n
A=
..
..
..
...
.
.
.
an1 an2 . . . ann

The cofactor expansion of |A| by row i is

|A| = ai1 Ci1 + ai2 Ci2 + + ain Cin


Look at the expression
aj1 Ci1 + aj2 Ci2 + + ajn Cin

for i = j

This expression is not equal to |A|, so what is it? It is equal to |B| for some matrix B,
but what does the matrix B look like?
In the expression |B| = aj1 Ci1 + aj2 Ci2 + + ajn Cin , each cofactor Cik , k = 1, . . . , n is
made up of entries of the matrix A, omitting the entries from row i. For example, if
i = 1, then Ci1 is obtained from the matrix resulting in removing row i and column 1
from A. So the matrix B will have the same entries as the matrix A except in row i. In
the cofactor expansion of a determinant by row i, the entries of row i are the numbers
multiplying the cofactors. Therefore the entries of row i of the matrix B must be
aj1 , . . . , ajn . Then B has two equal rows, since row i has the same entries as row j, and
|B| = 0.

4.2.2

Determinant using row operations

Definition 4.8 An n n matrix A is upper triangular if all entries below the main
diagonal are zero. It is lower triangular if all entries above the main diagonal are zero.

upper
triangular
matrix

102

a11
0
.
..
0

a12
a22
..
.
0

. . . a1n
. . . a2n
..
..
.
.
. . . ann

4.2. Determinants

a11 0
a21 a22
.
..
..
.
an1 an2

a11 0
0 a22
.
..
..
.
0
0

lower
triangular
matrix

diagonal
matrix

... 0
... 0
..
..
.
.
. . . ann

... 0
... 0
..
..
.
.

. . . ann

Suppose we wish to evaluate the determinant of an upper triangular matrix, such as


a11
0
..
.

a12
a22
..
.

. . . a1n
. . . a2n
..
..
.
.
. . . ann

Which row or column should we use for the cofactor expansion? Clearly the calculations
are simplest if we expand by column 1 or row n. Expansion by column 1 gives us
|A| = a11

a22
..
.
0

. . . a2n
..
..
.
. .
. . . ann

where the (n 1) (n 1) matrix on the right is again upper triangular. Continuing in


this way we see that |A| is just the product of the diagonal entries. The same argument
holds true for a matrix which is diagonal or lower triangular, so we have established one
more corollary of Theorem 4.5:
Corollary 4 If A is upper triangular, lower triangular, or diagonal then
|A| = a11 a22 ann .
A square matrix in row echelon form is upper triangular. If we know how a determinant
is affected by a row operation, then this observation will give us an easier way to
calculate large determinants. We can use row operations to put the matrix into row
echelon form, keep track of any changes, and then easily calculate the determinant of
the reduced matrix. So how does each row operation affect the value of the determinant?
RO1 multiply a row by a non-zero constant
Suppose the matrix B is obtained from a matrix A by multiplying row i by a non-zero
constant . For example,
a11
a21
|A| = ..
.

a12
a22
..
.

an1

an2

. . . a1n
. . . a2n
..
..
.
.
. . . ann

a11
a21
|B| =
..
.

a12
a22
..
.

an1

an2

. . . a1n
. . . a2n
..
..
.
.
. . . ann

If we evaluate |B| by the cofactor expansion by row i, we obtain


|B| = ai1 Ci1 + ai2 Ci2 + + ain Cin = (ai1 Ci1 + ai2 Ci2 + + ain Cin ) = |A|

103

4. Matrix inversion and determinants

The effect of multiplying a row of A by is to multiply |A| by , |B| = |A|.


When we actually need this, we will use it to factor out a constant from the
determinant, as
a11
a21
..
.
an1

a12
a22
..
.
an2

. . . a1n
a11
. . . a2n
a21
= ..
..
..
.
.
.
. . . ann
an1

a12
a22
..
.
an2

. . . a1n
. . . a2n
.. .
..
.
.
. . . ann

RO2 interchange two rows


This time we will use a recursive proof on the cofactor expansion. If A is a 2 2 matrix
and B is the matrix obtained from A by interchanging the two rows, then
|A| =

a b
= ad bc
c d

|B| =

c d
= bc ad
a b

so |B| = |A|.
Now let A be a 3 3 matrix and let B be a matrix obtained from A by interchanging
two rows. Then if we expand |B| using a different row, each cofactor contains the
determinant of a 2 2 matrix which is a cofactor of A with two rows interchanged, so
each will be multiplied by 1, and |B| = |A|. To visualise this, consider for example
g h i
|B| = d e f
a b c

a b c
|A| = d e f ,
g h i
Expanding |A| and |B| by row 2, we have
|A| = d
|B| = d

a b
a c
b c
f
+e
g h
g i
h i

g h
g i
h i
+e
f
= |A|
a c
a b
b c

since all the 2 2 determinants change sign. In the same way, if this holds for
(n 1) (n 1) matrices, then it hold for n n matrices.
The effect of interchanging two rows of a matrix is to multiply the determinant by
1, |B| = |A|.
RO3 add a multiple of one row to another.
Suppose the matrix B is obtained from the matrix A by replacing row j of A by row j
plus k times row i of A, j = i. For example, consider the case in which B is obtained
from A by adding 4 times row 1 of A to row 2. Then
a11
a21
|A| = ..
.
an1

104

a12
a22
..
.
an2

. . . a1n
. . . a2n
.. ,
..
.
.
. . . ann

4.2. Determinants

a11
a12
a21 + 4a11 a22 + 4a12
|B| =
..
..
.
.
an1
an2
In general, in a situation like this, we can expand

...
a1n
. . . a2n + 4a1n
.
..
..
.
.
...
ann
|B| by row j:

|B| = (aj1 + kai1 )Cj1 + (aj2 + kai2 )Cj2 + + (ajn + kain )Cjn
= aj1 Cj1 + aj2 Cj2 + + ajn Cjn + k(ai1 Cj1 + ai2 Cj2 + + ain Cjn )
= |A| + 0
The last expression in brackets is 0 because it consists of the cofactors of one row
multiplied by the entries of another row. So this row operation does not change the
value of |A|.
There is no change in the value of the determinant if a multiple of one row is added
to another.
We collect these results in the following theorem.
Theorem 4.7 (Effect of a row (column) operation on |A|) All statements are true
if row is replaced by column.
(RO1) If a row is multiplied by a constant
|A| changes to |A|.
(RO2) If two rows are interchanged
|A| changes to

|A|.

(RO3) If a multiple of one row is added to another


NO change in |A|.

Example 4.7 We can now use row operations to evaluate


1
1
|A| =
2
1

2 1 4
3 0 2
1 1 2
4 1 3

by reducing A to an upper triangular matrix. First we obtain zeros below the


leading one by adding multiples of row 1 to the rows below. The new matrix will
have the same determinant as A,
1 2 1 4
0 5 1 6
= 3
|A| =
0 3 3 6
0 2
2 1

1
0
0
0

2 1 4
5 1 6
1 1 2
2 2 1

In the second step we factored 3 from the third row. We would need to multiply
the resulting determinant on the right by 3 in order to put the 3 back into the
third row, and get back a matrix with the same determinant as A. Next we switch
row 2 and row 3, with the effect of changing the sign of the determinant.
1
0
|A| = 3
0
0

2 1 4
1 1 2
=3
5 1 6
2 2 1

1
0
0
0

2 1 4
1 1 2
0 4 4
0 4 5

105

4. Matrix inversion and determinants

The final steps all use RO3, so there is no change in the value of the determinant.
Finally we evaluate the determinant of the upper triangular matrix
1
0
=3
0
0

2 1 4
1 1 2
= 12.
0 4 4
0 0 1

A word of caution with row operations! What is the change in the value of |A|
(1) if R2 is replaced by R2 3R3 or

(2) if R2 is replaced by 3R1 R2 ?

For (1) there is no change, but for (2) the determinant will change sign. Why? 3R1 R2
is actually two elementary row operations, first multiply row 2 by 1 and then add
three times row 1 to it. When performing row operation RO3, you should always add a
multiple of another row to the row you are replacing.
Activity 4.10 You can shorten the writing in the above example by expanding the
4 4 determinant using the first column as soon as you have obtained the
determinant with zeros under the leading one. You will then be left with a 3 3
determinant to evaluate. Do this. Without looking at the example above, work
through the calculations in this way to evaluate
1
1
|A| =
2
1

2 1 4
3 0 2
1 1 2
4 1 3

4.2.3

The determinant of a product

One very important result concerning determinants can be stated as: The determinant
of the product of two square matrices is the product of their determinants. This is the
content of the following theorem.
Theorem 4.8 If A and B are n n matrices then
|AB| = |A||B|.
Proof (optional)
We will outline the proof of this theorem without filling in all the details, but you can
safely omit it and carry on to the next section. We first prove the theorem in the case
when the matrix A is an elementary matrix. We use again the fact established in section
4.1.1 (page 91) that the matrix obtained by a row operation on the matrix B is equal to
the product of the elementary matrix of that row operation times the matrix B.
Let E1 be an elementary matrix that multiplies a row by a non-zero constant k. Then
E1 B is the matrix B obtained by performing that row operation on B, and by Theorem

106

4.3. Matrix inverse using cofactors

4.7, |E1 B| = k |B|. For the same reason, |E1 | = |E1 I| = k|I| = k. Therefore
|E1 B| = |E1 | |B|.
The argument for the other two types of elementary matrices is the same. So we assume
that the theorem holds when A is any elementary matrix.
Now recall that every matrix is row equivalent to a matrix in reduced row echelon form,
so if R denotes the reduced row echelon form of the matrix A, then we can write
A = Er Er1 . . . E1 R.
Since A is a square matrix, R is either the identity matrix or a matrix with a row of
zeros.
Applying the result for an elementary matrix repeatedly,
|A| = |Er Er1 . . . E1 R| = |Er ||Er1 | . . . |E1 ||R|.
where |R| is either 1 or 0. Note that |R| = 0 if and only if |A| = 0.

If R = I, then by repeated application of the result for elementary matrices, this time
with the matrix B,
|AB| = |Er Er1 . . . E1 B| = |Er ||Er1| . . . |E1 ||B| = |Er Er1 . . . E1 ||B| = |A| |B|.
If R = I, then
|AB| = |Er Er1 . . . E1 R B| = |Er ||Er1 | . . . |E1 ||RB|.
Since the product matrix RB must also have a row of zeros |RB| = 0. Therefore
|AB| = 0 = 0|B| and the theorem is proved.

4.3 Matrix inverse using cofactors


Theorem 4.9 If A is an n n matrix, then A is invertible if and only if |A| = 0.
We will give two proofs of this theorem, the first which follows easily from Theorem 4.3,
and the second because it is gives us another method to calculate the inverse of a
matrix.
Proof 1: We have already established this theorem indirectly by our arguments in the
previous section; we will repeat and collect them here.
By Theorem 4.3 on page 93, A is invertible if and only if the reduced row echelon form
of A is the identity matrix. Let R be the reduced row echelon form of A. Since A is a
square matrix, R is either the identity matrix, or a matrix with a row of zeros. (Indeed,
if R has a leading one in every row, then it must also have a leading one in every
column, and since it is n n it must be the identity matrix. Otherwise, there is a row of
R without a leading one, and this must, therefore, be a row of zeros.)
So either R = I, which is the case if and only if A is invertible, with |R| = 1 = 0; or
|R| = 0 because it has a row of zeros, which is the case if and only if A is not invertible.

107

4. Matrix inversion and determinants

As we have seen, row operations cannot alter the fact that a determinant is zero or
non-zero. By performing a row operation we might be multiplying the determinant by a
non-zero constant, or by 1, or not changing the determinant at all. Therefore we can
conclude that |A| = 0 if and only if the determinant of its reduced row echelon form,
|R| = 0 which is if and only if A is not invertible. Or, put the other way, |A| = 0 if and
only if |R| = 1, if and only if the matrix A is invertible.
Proof 2: We will now prove this theorem directly. Since it is an if and only if
statement, we must prove both implications.

First we show that if A is invertible, then |A| = 0. We assume A1 exists, so that


AA1 = I. Then taking the determinant of both sides of this equation,
|AA1 | = |I| = 1. Applying Theorem 4.8 to the product,
|AA1 | = |A| |A1 | = 1.
If the product of two real numbers is non-zero, then neither number can be zero, which
proves that |A| = 0.
As a consequence of this argument we have the bonus result that
|A1 | =

1
.
|A|

We now show the other implication, that if |A| = 0 then A is invertible. To do this we
will construct A1 , and to do this we need some definitions.
Definition 4.9 If A is an n n matrix, the matrix of cofactors of A is the matrix
whose (i, j) entry is Cij , the (i, j) cofactor of A. The adjoint (adjugate in some
textbooks) of the matrix A is the transpose of the matrix of cofactors. That is, the
adjoint of A, adj(A), is the matrix

C11 C21 . . . Cn1


C12 C22 . . . Cn2
adj(A) =
.
..
..
..
...
.
.
.
C1n C2n . . . Cnn
Notice that column 1 of this matrix consists of the cofactors of row 1 of A (and row 1
consists of the cofactors of column 1 of A), and similarly for each column and row.
We now multiply the matrix A

a11
a21
A adj(A) =
...
an1

with its adjoint matrix,

a12 . . . a1n
C11
a22 . . . a2n C12
.
..
..
..
.
.
. ..
an2 . . . ann
C1n

Look carefully at what each entry of the product will be.

C21
C22
..
.
C2n

. . . Cn1
. . . Cn2
..
..
.
.
. . . Cnn

The (1, 1) entry is: a11 C11 + a12 C12 + + a1n C1n . This is the cofactor expansion of |A|
by row 1.
The (1,2) entry is: a11 C21 + a12 C22 + + a1n C2n . This consists of the cofactors of
row 2 of A multiplied by the entries of row 1, so this is equal to 0 by Corollary 3 in
section 4.2.1.

108

4.3. Matrix inverse using cofactors

Continuing in this way, we see that the entries on the main diagonal of the product are
all equal to |A|, and all entries off the main diagonal are equal to 0. That is,

|A| 0 0
0 |A| 0
A adj(A) =
= |A| I,
..
..
..
...
.
.
.
0

|A|

since |A| is just a real number, a scalar.

We know |A| = 0, so we can divide both sides of the equation by |A| to obtain,
A
This implies that

1
adj(A)
|A|

A1 =

=I

1
adj(A)
|A|

This gives us a method to calculate the inverse of a matrix using cofactors.


Example 4.8 Find A1 for the matrix

1 2 3
A = 1 2 1 .
4 1 1

First calculate |A| to see if A is invertible. Using the cofactor expansion by row 1,
|A| = 1(2 1) 2(1 4) + 3(1 8) = 16 = 0.
We then calculate the minors, for example
M11 =
and fill in the chart below
M11 = 1
M21 = 1
M31 = 4

2 1
= 1,
1 1

M12 = 5
M22 = 11
M32 = 4

M13 = 9
M23 = 7
M33 = 4

Change the minors into cofactors, by multiplying by 1 those minors with i + j equal
to an odd number. Finally transpose the result to form the adjoint matrix, so that

1
1
4
1
1
A1 =
adj(A) = 5 11 4 .
|A|
16
9
7
4

As with all calculations, it is easy to make a mistake. Therefore, having found A1 ,


the next thing you should do is check your result by showing that AA1 = I,

1 2 3
1
1
4
16
0
0
1
1
1 2 1 5 11 4 = 0
16
0 = I.
16
16
4 1 1
9
7
4
0
0
16

109

4. Matrix inversion and determinants

Activity 4.11

Use this method to find the inverse of the matrix

1 2 3
A = 0 4 0
5 6 7

Check your result.

Remember: the adjoint matrix only contains the cofactors of A; the (i, j) entry is the
cofactor Cji of A. The entries only multiply the cofactors when calculating the
determinant of A, |A|.

4.3.1

Cramers rule

If A is a square matrix with |A| = 0, then Cramers rule gives us an alternative method
of solving a system of linear equations Ax = b.
Theorem 4.10 (Cramers rule) If A is n n, |A| = 0, and b Rn , then the solution
x = (x1 , x2 , . . . , xn )T of the linear system Ax = b is given by
xi =

|Ai |
|A|

where Ai is the matrix obtained from A by replacing the ith column with the vector b
Before we prove this theorem, lets see how it works.
Example 4.9 Use Cramers rule to find the solution of the linear system
x + 2y + 3z = 7
x + 2y + z = 3
4x + y + z = 5
In matrix form Ax = b this system

1 2
1 2
4 1

is,

3
x
7
1 y = 3
1
z
5

We first check that |A| = 0. This is the same matrix A we used Example 4.8 to find
the inverse of a matrix on page 109; |A| = 16. Then applying Cramers rule, we
find x by evaluating the determinant of the matrix obtained from A by replacing
column 1 with b,
7 2 3
3 2 1
5 1 1
16
x=
=
=1
|A|
16
and in the same way we obtain y and z.
1
7 3
1 3 1
4
5 1
48
y=
=
= 3
|A|
16

110

1 2 7
1 2 3
4 1 5
64
z=
=
=4
|A|
16

4.4. Application: Leontief input-output analysis

which can be easily checked by substitution into the original equations (or
multiplying Ax).
We now prove the theorem.
Proof
Since |A| = 0, A1 exists, and we can solve for x by multiplying Ax = b on the left by
A1 . Then
x = A1 b


x1
C11 C21 . . . Cn1
b1
x2
C
C22 . . . Cn2 b2
1

12
.
x=
..
..
..
... = |A| ...
.
.
. ..
xn
C1n C2n . . . Cnn
bn
The entry xi of the solution is equal to the ith row of this product.
xi =

1
(b1 C1i + b2 C2i + + bn Cni )
|A|

Stare at this expression a moment. The cofactors all come from row i of the adjoint
matrix, and they are the cofactors of column i of A, so this looks like a cofactor
expansion by column i of a matrix which is identical to A except in column i, where the
entries are the components of the vector b. That is, the term in brackets is the cofactor
expansion by column i of the matrix A with column i replaced by the vector b; this is
|Ai |.
Summary of Cramers rule. To find xi ,
(1) replace column i of A by b,
(2) evaluate the determinant of the resulting matrix,
(3) divide by |A|.
Activity 4.12 Can you think of another method to obtain the solution to Example
4.9? One way is to use the inverse matrix. Do this. We found A1 in Example 4.8 on
page 109. Now use it to find the solution x of

1 2 3
x
7
1 2 1 y = 3
4 1 1
z
5
by calculating x = A1 b.

4.4 Application: Leontief input-output analysis


In 1973 Wassily Leontief was awarded the Nobel prize in Economics for work he did
analysing an economy with many interdependent industries using a system of linear
equations. We present a brief outline of his method here.

111

4. Matrix inversion and determinants

Suppose an economy has n interdependent production processes; the outputs of the n


industries are used to run the industries and to satisfy an outside demand. We will
assume that prices are fixed so that they can be used to measure the output. The
problem we wish to solve is to determine the level of output of each industry which will
satisfy all demands exactly; that is, both the demands of the other industries and the
outside demand. The problem can be described as a system of linear equations, as we
shall see by considering the following simple example.
Example 4.10

Suppose there are two industries: water and electricity. Let


x1 = total output of water

($ value)

x2 = total output of electricity

($ value)

We can express this as a vector,


x=

x1
x2

called a production vector.

Suppose we know that


$0.01 water
$0.15 electricity
$0.21 water
electric uses
$0.05 electricity

water uses

to produce $1.00 water output


to produce $1.00 electricity.

What is the total water used by the industries? Water is using $0.01 for each unit
output, so a total of 0.01x1 , and electricity is using $0.21 water for each unit of its
output, so a total of 0.21x2 . The total amount of water used by the industries is
therefore 0.01x1 + 0.21x2 . In the same way, the total amount of electricity used by
the industries is 0.15x1 + 0.05x2 . The totals can be expressed as
water
electricity

0.01 0.21
0.15 0.05

x1
x2

= Cx.

The matrix C is known as a consumption matrix or a technology matrix.


After the industries have used water and electricity to produce their outputs, how
much water and electricity are left to satisfy the outside demand?
Activity 4.13 Think about this before continuing. Write down an expression for
the total amount of water which is left after the industries have each used what they
need to produce their output. Do the same for electricity.
Example 4.10 (continued)
Let d1 denote the outside demand for water, and d2 the demand for electricity. Then
in order for the output of these industries to supply the industries and satisfy the
outside demand exactly, the following equations must be satisfied:
=

112

x1 0.01x1 0.21x2 = d1

x2 0.15x1 0.05x2 = d2

(water)
(electricity)

4.4. Application: Leontief input-output analysis

In matrix notation,
x1
x2

0.01 0.21
0.15 0.05

x1
x2

d1
d2

or, x Cx = d, where
d=

d1
d2

is the outside demand vector.

If we use the fact that Ix = x, where I is the 2 2 identity matrix, then we can
rewrite this system in matrix form as
Ix Cx = d

(I C)x = d.

or

This is now in the usual matrix form for a system of linear equations. A solution, x,
to this system of equations will determine the output levels of each industry required
to satisfy all demands exactly.
Now lets look at the general case. Suppose we have an economy with n interdependent
industries. If cij denotes the amount of industry i used by industry j to produce $1.00
of industry j, then the consumption or technology matrix is C = (cij ):

c11 c12 c1n


c21 c22 c2n
,
C=
..
..
..
...
.
.
.
cn1

cn2

cnn

where

Row i lists the amounts of industry i used by each industry.


Column j lists the amounts of each industry used by industry j.
If, as before, we denote by d the n 1 outside demand vector, then in matrix form the
problem we wish to solve is to find the production vector x such that
(I C)x = d,
a system of n linear equations in n unknowns.
Activity 4.14 Return to Example 4.10 and assume that the public demand for
water is $627 and for electricity is $4,955. Find the levels of output which satisfy all
demands exactly. (You should find that x1 = 1, 800 and x2 = 5, 500.)

113

4. Matrix inversion and determinants

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

say what is meant by an elementary matrix, and understand how they are used for
row operations
find the inverse of a matrix using row operations
find the determinant of a square matrix and use it to determine if a matrix is
invertible
find the inverse of a matrix using cofactors
solve a system of linear equations using Cramers rule
say what is meant by the Leontief input-output model.
In addition you should know that:
There are three methods to solve Ax = b if A is n n and |A| = 0:
(1) Gaussian elimination.

(2) Find A1 , then x = A1 b.


(3) Cramers rule.
There is one method to solve Ax = b if A is m n and m = n, or if |A| = 0:
(1) Gaussian elimination.

There are two methods to find A1 :


(1) Using cofactors for the adjoint matrix.
(2) By row reduction of (A | I) to (I | A1 ).
If A is an n n matrix, then the following statements are equivalent (Theorems 4.3
and 4.9):
(1) A is invertible.
(2) Ax = b has a unique solution for any b Rn .

(3) Ax = 0 has only the trivial solution, x = 0.


(4) The reduced row echelon form of A is I.
(5) |A| = 0.

Exercises
Exercise 4.1
Use elementary row operations to find any inverses of the following matrices.

1 0 4 0
1 2 3
1 2 3
0 1 0 0

C=
A= 2 3 0
B = 2 3 0
0 0 0 1
0 1 6
0 1 2
0 0 1 0

114

4.4. Exercises

Is C an elementary matrix? If the answer is yes, what operation does it perform? If


the answer is no, write it as the product of elementary matrices.
Exercise 4.2
Given a system of equations Ax = b for several different values of b, it is often more
practical to find A1 , if it exists, and to find the solutions using x = A1 b.
Use this method to solve Ax = br for the matrix A given in the previous question, and
for the following vectors br , r = 1, 2, 3:



1
1
0

(a) b1 = 0
(b) b2 = 1
(c) b3 = 1 .
3
1
0
Be certain your solution for A1 is correct (check that AA1 = I first).
Exercise 4.3
Evaluate the following determinants using the cofactor expansion along an appropriate
row or column.
7 5
2 0
(b)
11 2
23 57

2 5 1
(a) 1 0 2
7 1 1

0
1
(d)
0
0

1
0
0
0

0
0
1
0

2 3
0 0
0 0
1 1

0
0
0
(e)
0
0
1

0
0
0
1

0
0
0
0
6
3

1
3
0
0

(c)

0
0
0
1
9
4

0
0
2
0
8
2

0
3
9
7
7
9

2
2
1
1

1
1
6
1

0
0
5
1

1
2
3
4
5
6

Exercise 4.4
Let

3 t 2
B = 1 5 3 .
2 1 1

For what values of t is the determinant of B equal to zero?


Exercise 4.5
Evaluate the following determinants (use row operations to simplify the calculation).
1 4 3
2 7 5
(a)
1
2
6
2 10 14

2
1
0
4

1
1
(b) 2
2
1

4 1 3 0
7 4 3 8
8 2 6 0
0 5 5 7
9 0 9 2

(c)

3 3a 3a2
2 2b 2b2
1 c
c2

115

4. Matrix inversion and determinants

Exercise 4.6
For which values of is the matrix
A=

2
3
2
1

not invertible?
Exercise 4.7

Suppose A is a 3 3 matrix with |A| = 5. Find |3A|, |A2 |, |2A1|, |(2A)1 |.


Exercise 4.8
Use the method of the adjoint matrix to find
matrices, if it exists.

2 0 3
1
B = 2
A= 0 3 1
1 4 2
0

the inverse of each of the following

0 2
1 3
1 1

1 2 0
C = 0 1 1
2 1 1

Exercise 4.9
Use Cramers rule to find the value of x, y, z for system (a) and to find the value of z
for system (b) where a, b are constants, a = b, a = 2b.

x
+
y
+
z
=
8

ax ay + bz = a + b
2x + y z = 3
bx by + az = 0
(a)
(b)

x + 2y + z = 3
ax + 2by + 3z = a b
Exercise 4.10
Consider an economy with three industries,
i1 : water

i2 : electricity

i3 : gas

interlinked so that the corresponding consumption matrix is

0.2 0.3 0.2


C = 0.4 0.1 0.2
0
0 0.1
Each week the external demands for water, electricity and gas are respectively,
d1 = $40, 000

d2 = $100, 000

d3 = $72, 000

(a) How much water, electricity and gas is needed to produce $1 worth of electricity?
(b) What should be the weekly production of each industry in order to satisfy all
demands exactly?

116

4.4. Comments on selected activities

Comments on selected activities


Feedback to activity 4.1
Only the last matrix is an elementary matrix, representing the operation R3 R1 on I.
The others each represent two row operations. For example,

2 1 0
1 1 0
2 0 0
0 1 0 = 0 1 0 0 1 0 = E2 E1 ,
0 0 1
0 0 1
0 0 1

where E1 represents 2R1 and E2 represents R1 + R2 . You should multiply the matrices in
the opposite order, E1 E2 , and notice the effect, thinking about the row operations on I.
Feedback to activity 4.2
The matrix E is the identity matrix after the
matrix is the identity matrix after R2 + 4R1 ,

1
1

E = 4
0

row operation R2 4R1 , so the inverse

0 0
1 0.
0 1

Multiply out EE 1 and E 1 E as instructed.


Feedback to activity 4.5
For the matrix A,

2 1 3

(A|I) =
0 1 1
1
2 0

So

1
R3 +2R1
0
0

1
R3 5R2

0
0

1
R2 +R3
0
0

2
1
0

1 0 0
1
2 0
R1 R3

0 1 0

0 1 1
0 0 1
2 1 3

0 0 1
1
(1)R2
0 1 0 0
1 0 2
0

0
0 0 1 1R
1
8 3

1 0 1 0 0
0
8
1 5 2

0 0 0 1
1
R 2R
1
3
1
0 8 8 4 1 2 0
5
1
1 18
0
8
4

2 0
1 1
5 3

2 0
1 1
5 3

2
1
0

2 0
1 1
0 1

A1
Now check that AA1 = I.

0 0
1 0
0 1

2 6 4
1
= 1 3 2
8
1
5 2

0 0 1
0 1 0
1 0 0

0 0 1
0 1 0
1 0 2

0
0
1
8

28
1
8
1
8

0
1

1
0

6
8

1
2
1
4
1
4

5
8

38
5
8

1
4

When you carry out the row reduction, it is not necessary to always indicate the
separation of the two matrices by a line as we have done so far. You just need to keep
track of what you are doing.

117

4. Matrix inversion and determinants

In the calculation for the inverse of B, we have omitted the line but added a bit of
space to make it easier for you to read.

2 1 3
1 0 0
1 2 0
0 0 1
R1 R3
0 1 1
(B|I) = 0 1 1
0 1 0
0 1 0
1 2 0
0 0 1
2 1 3
1 0 0

0 0 1
1 2
0
(1)R2
0 1 0 0 1 1
1 0 2
0 3 3

1 2 0
0 0
1
R3 +3R2
0 1 1
0 1 0
0 0 0
1 3 2

1 2 0
R3 2R1
0 1 1
0 3 3

0 0
1
0 1 0
1 0 2

which indicates that the matrix B is not invertible; it is not row equivalent to the
identity matrix.
Feedback to activity 4.6
C13 = 13.
Feedback to activity 4.7
|M| = 1(8 3) 2(0 3) + 1(0 2) = 1
Feedback to activity 4.8
You should either expand by column 1 or row 2. For example, using column 1:
|M| = 1(8 3) + 1(6 2) = 1.
Feedback to activity 4.10
1 2 1 4
5 1 6
0 5 1 6
= 3 3 6
|A| =
0 3 3 6
2
2 1
0 2
2 1

At this stage you can expand the 3 3 matrix using a cofactor expansion, or continue a
bit more with row operations:
1 1 2
1 1 2
4 4
|A| = 3 5 1 6 = 3 0 4 4 = 3
= 3(4) = 12
4 5
2 2 1
0 4 5
Feedback to activity 4.11

A1

|A| = 32 = 0

28
4 12
7 1 3
1
1
1
adj(A) = 0
8
0 = 0
2
0 .
=
|A|
32
8
20 4
4
5 1 1

Feedback to activity 4.12


Carry out the matrix multiplication as instructed.

118

4.4. Comments on exercises

Feedback to activity 4.13


The total water output remaining is x1 0.01x1 0.21x2 , and the total electricity
output left is x2 0.15x1 0.05x2 .
Feedback to activity 4.14
Solve (I C)x = d by Gaussian elimination, where
C=

0.01 0.21
0.15 0.05

x=

x1
x2

d=

627
4955

Reducing the augmented matrix,


((I C)|d) =

0.99 0.21 627


0.15 0.95 4955

1 7/33 1900/3
3
19
99100
1 7/33 1900/3
0
1
5500

33 7 20900
3 19 99100

1 7/33 1900/3
0 202/11 101000

1 0 1800
0 1 5500

x=

1800
5500

Comments on exercises
Solution to exercise 4.1
For the matrix A:

1 2 3
1 0 0
(A|I) = 2 3 0
0 1 0
0 1 2
0 0 1

1 2
3
1 0 0
R2 2R1 0 1 6
2 1 0
0 1
2
0 0 1

1 2 3
1 0 0
1R2 0 1 6
2 1 0
0 1 2
0 0 1

1 2 3
1
0 0
R3 R2 0 1 6
2 1 0
0 0 4
2 1 1

1 2 3
1 0
0
14 R3 0 1 6
2 1 0
1
0 0 1
14 41
2

3
3
1 2 0
12
4
4
R2 6R3
1
3
0 1 0
1
2
2
R1 3R3
1
1
1
0 0 1

2
4
4

3
1
9
1 0 0
4 4
2
3
R1 2R2 0 1 0
1 21
2
1
1
1
0 0 1

2
4
4

119

4. Matrix inversion and determinants

=
You should check that AA1 = I.
For the matrix B:

3
2

41

94

1
2

41

14

A1 = 1

1
(B|I) = 2
0

R2 2R1
0
0

1
2

2 3
3 0
1 6
2
3
1 6
1
6

3
2

1 0 0
0 1 0
0 0 1

1 0 0
2 1 0
0 0 1

The next step will yield a row of zeros, therefore the matrix B is not invertible.
For C

1
0
(C|I) =
0
0

R3 R4

R1 4R3

1
0

0
0

1
0

0
0

You should check that CC 1 = I.

0
1
0
0

4
0
0
1
0
1
0
0

0
1
0
0

C 1

0
0
1
0
4
0
1
0

0
0
1
0

1
0
0
0
0
0
0
1

1
0
0
0

0
0
0
1

1
0
=
0
0

0
1
0
0

1
0
0
0
0
1
0
0

0
0
1
0
0
1
0
0

0
1
0
0

0
0
0
1
0
0
0
1

0
0

0
1

0
0

1
0

4
0

1
0

0 4
0 0
.
0 1
1 0

C is not an elementary matrix. Since two row operations were required to reduce it to
the identity matrix, C is the product of two elementary matrices which correspond to
(the inverses of) those two operations:

1 0 4 0
1 0 0 0
0 1 0 0

, E2 = 0 1 0 0 C = E2 E1
E1 =
0 0 1 0
0 0 0 1
0 0 0 1
0 0 1 0
Note that the order is important. C = E1 E2

120

4.4. Comments on exercises

Solution to exercise 4.2


Solving Ax = b using x = A1 b:
(a) for b1 the solution is,
3
2

14

1
2

14

x1 = 1

1
2

(b) for b2 and b3 the solutions are

1
x2 = A1 b2 = 1
0

21
1
4
3
7
0
=
2
2
14
3
14
94

and

x3 = A1 b3 =

You can check these answers using the matrix A.

14
1
2

14

Solution to exercise 4.3


(a) Cofactor expansion by row 2:
2 5 1
5 1
2 5
1 0 2 = 1
2
= 1(4) 2(33) = 62.
1 1
7 1
7 1 1
(b) Cofactor expansion by row 2:
7 5
2 0
|A| =
11 2
23 57

2 3
5 2 3
2 3
0 0
= 20.
= 2 2 0 0 = 2(2)
1 1
0 0
57 1 1
1 1

(c) Cofactor expansion by column 1:

(iii)

1
3
0
0

2
2
1
1

1
1
6
1

0
2 1 0
2 1 0
2 1 0
0
= 1 1 6 5 3 1 6 5 = 2 1 6 5
5
1 1 1
1 1 1
1 1 1
1
= 2 2

6 5
1 5
1
1 1
1 1

= 2(2(1) 1(4)) = 12

(d) You can either use a cofactor expansion, say by row 2,


0
1
0
0

1
0
0
0

0
0
1
0

0
1 0 0
0
= 1 0 1 0 = 1,
0
0 0 1
1

or recognise that this is an elementary matrix, which is the identity matrix with row 1
and row 2 switched, so the determinant is equal to 1.

121

4. Matrix inversion and determinants

(e) Expand each determinant by row 1:


0
0
0
0
0
1

0
0
0
0
6
3

0
0
0
1
9
4

0
0
2
0
8
2

0
3
9
7
7
9

1
2
3
= 1
4
5
6

0
0
0
0
1

0
0
0
6
3

0
0
1
9
4

0
2
0
8
2

3
9
7 = (1)(3)
7
9

0
0
0
1

0
0
6
3

0
1
9
4

2
0
8
2

0 0 1
0 6
= (1)(3)(2) 0 6 9 = (1)(3)(2)(1)
= 36
1 3
1 3 4

Solution to exercise 4.4


You will make fewer errors when there is an unknown constant present if you expand by
a row or column containing the unknown. So, for example, expanding by row 1:
3 t 2
|B| = 1 5 3 = 3(5 3) t(1 6) 2(1 10) = 6 + 7t + 22 = 7t + 28.
2 1 1
So |B| = 0 t = 4.
Solution to exercise 4.5
1 4 3
2
1 4 3 2
1 1 3
1 1 3
0 1 1 3
2 7 5 1
16
3 2 = 0 9
= 6
=
(a)
0 6
3 2
1
2
6 0
0 6 6
2 8
0
0 2 8
0
2 10 14 4
=

9 16
9 16
= 6(25) = 150.
=6
1 1
6 6

4 1 3 0
7 4 3 8
8 2 6 0 = 0 since R3 = 2R1 .
0 5 5 7
9 0 9 2

(b)

1
1
2
2
1

(c)

3 3a 3a2
1 a a2
1
a
a2
b a b2 a2
2 2b 2b2 = 6 1 b b2 = 6 0 b a b2 a2 = 6
c a c2 a2
1 c
c2
1 c c2
0 c a c2 a2
= 6(b a)(c a)

1 b+a
= 6(b a)(c a)(c b).
1 c+a

Solution to exercise 4.6


To answer this question, you need to find the determinant of A,
|A| =

122

2
3
= (2 )(1 ) 6 = 2 3 4 = ( 4)( + 1).
2
1

4.4. Comments on exercises

Therefore, A1 does not exist |A| = 0 = 4 or = 1.


Solution to exercise 4.7
Given that A is 3 3 and |A| = 5,
|3A| = 33 |A| = 135 ,
|2A1 | = 23 |A1 | =

|A2 | = |A| |A| = 25,

23
8
= ,
|A|
5

|(2A)1 | =

1
1
1
= 3
= .
|2A|
2 |A|
40

Solution to exercise 4.8


The first thing
invertible.
2 0
|A| = 0 3
1 4

you should do for each matrix is evaluate its determinant to see if it is


3
1 = 2(6 4) 3(3) = 4 9 = 5, so A1 exists.
2

Next find the cofactors,


C11 =

3 1
=2
4 2

C21 =
C31 =

C12 =

0 3
= (12)
4 2

C22 =

0 3
=9
3 1

Then

A1

0 1
= (1)
1 2

2 3
=1
1 2

C32 =

2 3
= (2)
0 1

2 12 9
1
= 1
1
2 .
5
3
8
6

C13 =

0 3
=3
1 4

C23 =
C33 =

2 0
= (8)
1 4

2 0
=6
0 3

Check that AA1 = I.

1 0 2
|B| = 2 1 3 = 1(1 + 3) + 2(2) = 0, so B is not invertible.
0 1 1
1 2 0
|C| = 0 1 1 = 1(1 1) 2(0 2) = 2, so C is invertible.
2 1 1
Find the cofactors,

C11 = 2

C12 = (1)(2)

C31 = 2

C32 = (1)(1)

C22 = 1

C21 = (1)(2)

Then

C 1

2 2
2
1
= 2 1 1 .
2
2 3
1

C13 = 2

C23 = (1)(3)
C33 = 1

Check your answer.

123

4. Matrix inversion and determinants

Solution to exercise 4.9


(a) Using Cramers

1 1
A= 2 1
1 2

rule for this system, we have

1
1
|A| = 1(3) 1(1) + 1(5) = 7 = 0
1

8 1 1
1
8(3) 1(6) + 1(3)
21
x=
3 1 1 =
=
= 3,
|A|
|A|
7
3 2 1

1
y=
|A|

1 8 1
1(6) 8(1) + 1(9)
7
2 3 1 =
= = 1,
|A|
7
1 3 1

1
z=
|A|

1 1 8
1(3) 1(9) + 8(5)
28
2 1 3 =
=
= 4.
|A|
7
1 2 3


8
b = 3,
3

(b) For this part, it is very useful to use the fact that Theorem 4.7 concerning the
effects of row operations on a determinant is also true for the effects of the same
operations on the columns of a matrix. Indeed, if you think about the fact that
|A| = |AT |, this has to be so. We want z in the solution of Ax = b, where

a a b
x
a+b
A = b b a , x = y , b = 0 .
a 2b 3
z
ab

To evaluate |A|, begin by adding column 1 to column 2, replacing column 2, exactly as


you would do for a row operation. This does not change the value of the determinant,
a
0
b
a b
|A| = b
0
a = (a + 2b)
= (a 2b)(a2 b2 ).
b a
a a + 2b 3
Since a = 2b and a = b, you know that |A| = 0, so Cramers rule can be used.

To evaluate z, you need to replace the last column of A with the vector b and then
evaluate the new determinant. So you can still use the same column operation,
(replacing C2 with C2 + C1 ) to simplify the determinant,
z=

1
|A|

1
|A|

a a a + b
b b
0
a 2b a b

a
0
a+b
b
0
0
a a + 2b a b

(a + 2b) a a + b
b
0
|A|
(a 2b)b(a + b)
=
,
(a 2b)(a2 b2 )

124

4.4. Comments on exercises

so that
z=

b
.
ab

Solution to exercise 4.10


(a) Electricity is industry i2 , so column 2 gives the amounts of each industry needed
to produce $1 of electicity: c12 $0.30 water, c22 $0.10 electricity, c32 0 gas.
(b) To solve (I C)x = d, reduce the augmented matrix to reduced row echelon form:

4
3
10
15
40, 000
5
9
15
100, 000
( I C|d ) = 25
10
9
0
0
72, 000
10

5
R
50, 000
1 38 41
4 1
9
2

51
100, 000
5
10
10
0
0
1
80, 000
R3
9

1 38 14
50, 000
3
R2 + 25 R1 0 34 10
120, 000

0 0
1
80, 000

1 38 14
50, 000
4
0 1 2
R
160, 000
3 2
5

0 0
1
80, 000

R1 + 41 R3
70, 000
1 38 0
0 1 0

192, 000
2
R2 + 5 R3
0 0 1
80, 000

R1 + 38 R2
1 0 0
142, 000
0 1 0

192, 000
0 0 1
80, 000

x1
142, 000
$142, 000 water
$192, 000 electricity
= x = x2 = 192, 000
80, 000
$80, 000 gas
x3

125

4. Matrix inversion and determinants

126

Chapter 5
Rank, range and linear equations
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 4.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapter 4, Sections 4.74.8.

Aims of the chapter


In this short chapter we aim to extend and consolidate what we have learned so far
about systems of equations and matrices, and tie together many of the results of the
previous chapters. We will intersperse an overview of the previous two chapters with
two new concepts, the rank of a matrix and the range of matrix.
This chapter will serve as a synthesis of what we have learned so far in anticipation of a
return to these topics later in the guide.

5.1 The rank of a matrix


Any matrix A can be reduced to a matrix in reduced row echelon form by elementary
row operations. You just have to follow the algorithm and you will obtain first a
row-equivalent matrix which is in row echelon form, and then continuing with the
algorithm, a row-equivalent matrix in reduced row echelon form (see section 4.1.2).
Another way to say this is
Any matrix A is row-equivalent to a matrix in reduced row echelon form.
There are several ways of defining the rank of a matrix, and we shall meet some other
(more sophisticated) ways later. All are equivalent. We begin with the following
definition.
Definition 5.1 (Rank of a matrix) The rank, rank(A), of a matrix A is the number of
non-zero rows in a row echelon matrix obtained from A by elementary row operations.
Notice that the definition only requires that the matrix A be put into row echelon form,
because by then the number of non-zero rows is determined. By a non-zero row, we
simply mean one that contains entries other than 0. Since every non-zero row of a

127

5. Rank, range and linear equations

matrix in row echelon form begins with a leading one, this is equivalent to the following
definition.
Definition 5.2 The rank, rank(A), of a matrix A is the number of leading ones in a
row echelon matrix obtained from A by elementary row operations.
Generally, if A is an m n matrix, then the number of non-zero rows (the number of
leading ones) in a row echelon form of A can certainly be no more than the total
number of rows, m. Furthermore, since the leading ones must be in different columns,
the number of leading ones in the echelon form can be no more than the total number,
n, of columns. Thus we have:
Theorem 5.1 For an m n matrix A, rank(A) min{m, n}, where min{m, n}
denotes the smaller of the two integers m and n.

Example 5.1 Consider the matrix

Reducing this to

1
2
3

1 2 1 1
M = 2 3 0 5.
3 5 1 6

row echelon form using elementary row operations, we have:

2 1 1
1 2
1 1
1 2 1 1
3 0 5 0 1 2 3 0 1 2 3
5 1 6
0 1 2 3
0 0 0 0

This last matrix is in row echelon form and has two non-zero rows (and two leading
ones), so the matrix M has rank 2.

Activity 5.2 Prove that the matrix

has rank 3.

1 2 1 1
B = 2 3 0 5
3 5 1 4

If a square matrix A of size n n has rank n, then its reduced row echelon form has a
leading one in every row and (since the leading ones are in different columns) a leading
one in every column. Since every column with a leading one has zeros elsewhere, it
follows that the reduced echelon form of A must be I, the n n identity matrix.
Conversely, if the reduced row echelon form of A is I, then by the definition of rank, A
has rank n. We therefore have one more equivalent statement to add to our theorem:
Theorem 5.2 If A is an n n matrix, then the following statements are equivalent.
A1 exists.
Ax = b has a unique solution for any b Rn .

128

5.2. Rank and systems of linear equations

Ax = 0 has only the trivial solution, x = 0.


The reduced echelon form of A is I.
|A| = 0.
The rank of A is n.

5.2 Rank and systems of linear equations


Recall that to solve a system of linear equations, one forms the augmented matrix and
reduces it to echelon form by using elementary row operations.

Example 5.2 Consider the system of equations


x1 + 2x2 + x3 = 1
2x1 + 3x2 = 5
3x1 + 5x2 + x3 = 4.
The augmented matrix is the matrix B in the
B to find the rank, after two steps you found,

1 2 1 1
1 2
1
2 3 0 5 0 1 2
3 5 1 4
0 1 2

previous activity. When you reduced

1
1 2 1 1
3 0 1 2 3 .
1
0 0 0 2

Thus the original system of equations is equivalent to the system


x1 + 2x2 + x3 = 1
x2 + 2x3 = 3
0x1 + 0x2 + 0x3 = 2.
But this system has no solutions, since there are no values of x1 , x2 , x3 that satisfy
the last equation. It reduces to the false statement 0 = 2, whatever values we give
the unknowns. We deduce, therefore, that the original system has no solutions, and
we say that it is inconsistent. Notice that in this case there is no reason to reduce
the matrix further.
If, as in Example 5.2, the row echelon form of an augmented matrix has a row of the
kind (0 0 . . . 0 a), with a = 0, then the original system is equivalent to one in which
there is an equation
0x1 + 0x2 + + 0xn = a (a = 0).
Clearly this equation cannot be satisfied by any values of the xi s, and the system is
inconsistent.
Continuing with our example:

129

5. Rank, range and linear equations

Example 5.2 (continued)


Note that the coefficient matrix A, consists of the first three columns of the
augmented matrix, and the row echelon form of A consists of the first three columns
of the row echelon form of the augmented matrix:

1 2 1
1 2 1
A = 2 3 0 ... 0 1 2.
3 5 1
0 0 0

1 2 1 1
1 2 1 1
(A|b) 2 3 0 5 . . . 0 1 2 3 .
3 5 1 4
0 0 0 1

The rank of the coefficient matrix A is 2, but the rank of the augmented matrix
(A|b), is 3.

If a linear system is consistent then there can be no leading one in the last column of
the reduced augmented matrix, for that would mean there was a row of the form
(0 0 . . . 0 1). Thus, a system Ax = b is consistent if and only if the rank of the
augmented matrix is precisely the same as the rank of the matrix A.
Example 5.3 In contrast, consider the system of equations
x1 + 2x2 + x3 = 1
2x1 + 3x2 = 5
3x1 + 5x2 + x3 = 6.
This system has the same coefficient matrix A as Example 5.2, and the rank of A is
2. The augmented matrix for the system is the matrix M in Example 5.1 on page
128, which also has rank 2, so this system is consistent. Since the rank is 2 and there
are 3 columns in A, there is a free variable and therefore infinitely many solutions.
Activity 5.3 Write down a general solution for this system to verify these remarks.
If an m n matrix A has rank m, then there will be a leading one in every row of an
echelon form of A, and in this case a system of equations Ax = b will never be
inconsistent; it will be consistent for all b Rm . Why? There are two ways to see this.
In the first place, if there is a leading one in every row of A, the augmented matrix
(A|b) can never have a row of the form (0 0 . . . 0 1). Second, the augmented matrix
also has m rows, its size is m (n + 1). So the rank of (A|b) can never be more than m.
Example 5.4 Consider again the matrix B from Activity 5.2 on page 128, which
we interpreted as the augmented matrix B = (A|b) in Example 5.2, and its row
echelon form:

1 2 1 1
1 2 1 1
B = 2 3 0 5 . . . 0 1 2 3 .
3 5 1 4
0 0 0 1

130

5.2. Rank and systems of linear equations

This time interpret B as representing the coefficient matrix of a system of three


equations in four unknowns, Bx = d, with d R3 . The coefficient matrix B is 3 4
and has rank 3, so as we argued above, this system of equations is always consistent.
But lets look at this more closely. Any augmented matrix (B|d) will be row
equivalent to a matrix in echelon form for which the first four columns are the same
as the echelon form of B; that is,

1 2 1 1 p1
(B|d) . . . 0 1 2 3 p2
0 0 0 1 p3
for some constants pi , which could be zero. This system will have infinitely many
solutions for any d R3 , because the number of columns is greater than the rank of
B. There is one column without a leading one, so there is one non-leading variable.

5
Activity 5.4 If p1 = 1, p2 = 2 and p3 = 0, and x = (x1 , x2 , x3 , x4 )T , write down
the solution to a given system Bx = d in vector form. Use this to determine the
vector d in this case.
Suppose we have a consistent system, and suppose that the rank r is strictly less than
n, the number of unknowns. Then, as we have just seen in Example 5.4, the system in
reduced row echelon form (and hence the original one) does not provide enough
information to specify the values of x1 , x2 , . . . , xn uniquely. Lets consider this in more
detail.
Example 5.5 Suppose we are given a system for which the augmented matrix
reduces to the row echelon form

1 3 2 0 2 0 0
0 0 1 2 0 3 1

0 0 0 0 0 1 5.
0 0 0 0 0 0 0
Here the rank (number of non-zero rows) is r = 3 which is strictly less than the
number of unknowns, n = 6.
Continuing to reduced row echelon

1 3
0 0

0 0
0 0

form, we obtain the matrix

0 4 2 0 28
1 2 0 0 14
.
0 0 0 1
5
0 0 0 0
0

Activity 5.5 Verify this. What are the additional two row operations which need to
be carried out?

131

5. Rank, range and linear equations

Example 5.5 (continued)


The corresponding system is
x1 + 3x2 + 4x4 + 2x5 = 28
x3 + 2x4 = 14
x6 = 5.
The variables x1 , x3 and x6 correspond to the columns with the leading ones and are
the leading variables. The other variables are the non-leading variables.

The form of these equations tells us that we can assign any values to x2 , x4 and x5 ,
and then the leading variables will be determined. Explicitly, if we give x2 , x4 , x5 the
arbitrary values s, t, u, where s, t, u represent any real numbers, the solution is given
by
x1 = 28 3s 4t 2u, x2 = s, x3 = 14 2t, x4 = t, x5 = u, x6 = 5.
There are infinitely many solutions because the so-called free variables x2 , x4 , x5
can take any values s, t, u R.
Generally, we can describe what happens when the row echelon form has r < n non-zero
rows (0 0 . . . 0 1 . . . ). If the leading one is in the kth column, it is the coefficient
of the variable xk . So if the rank is r and the leading ones occur in columns c1 , c2 , . . . , cr
then the general solution to the system can be expressed in a form where the unknowns
xc1 , xc2 , . . . , xcr (the leading variables) are given in terms of the other n r unknowns
(the non-leading variables), and those n r unknowns are free to take any values. In
Example 5.5, we have n = 6 and r = 3, and the 3 variables x1 , x3 , x6 can be expressed in
terms of the 6 3 = 3 free variables x2 , x4 , x5 .
In the case r = n, where the number of leading ones r in the echelon form is equal to
the number of unknowns n, there is only one solution to the system for there is a
leading one in every column since the leading ones move to the right as we go down the
rows. In this case there is a unique solution obtained from the reduced echelon form. In
fact, this can be thought of as a special case of the more general one discussed above:
since r = n there are n r = 0 free variables, and the solution is therefore unique.

We can now summarise our conclusions thus far concerning a general linear system of m
equations in n variables, written as Ax = b, where the coefficient matrix A is an m n
matrix of rank r.
If the echelon form of the augmented matrix has a row (0 0 . . . 0 a), with a = 0,
the original system is inconsistent; it has no solutions. In this case
rank(A) = r < m and rank(A|b) = r + 1.
If the echelon form of the augmented matrix has no rows of the above type the
system is consistent, and the general solution involves n r free variables, where r
is the rank of the coefficient matrix. When r < n there are infinitely many solutions,
but when r = n there are no free variables and so there is a unique solution.

A homogeneous system of m equations in n unknowns is always consistent. In this case


the last statement still applies.

132

5.3. General solution of a linear system in vector notation

The general solution of a homogeneous system involves n r free variables, where r


is the rank of the coefficient matrix. When r < n there are infinitely many
solutions, but when r = n there are no free variables and so there is a unique
solution, namely the trivial solution, x = 0.

5.3 General solution of a linear system in vector


notation
Continuing with Example 5.5, we found the general solution of the linear system above
in terms of the three free variables, or parameters, s, t, u. Expressing the solution, x, as
a column vector, we have

x1
28 3s 4t 2u
x2

x3

14 2t

x= =

x
t
4

x5

u
x6
5
or




2u
4t
3s
28
0 s 0 0

14 0 2t 0




x=
0 + 0 + t + 0 .

0 0 0 u
0
0
0
5

That is, the general solution is

s, t, u R,

x = p + sv1 + tv2 + uv3 ,


where

28
0

14

p=
0 ,

0
5

3
1

v1 =
0 ,

0
0

4
0

v2 =
1 ,

0
0

2
0

v3 =
0 .

1
0

Applying the same method generally to a consistent system of rank r with n unknowns,
we can express the general solution of a consistent system Ax = b in the form
x = p + a1 v1 + a2 v2 + + anr vnr .
Note that, if we put all the ai s equal to 0, we get a solution x = p, which means that
Ap = b, so p is a particular solution of the system. Putting a1 = 1 and the remaining
ai s equal to zero, we get a solution x = p + v1 , which means that A(p + v1 ) = b. Thus
b = A(p + v1 ) = Ap + Av1 = b + Av1 .

133

5. Rank, range and linear equations

Comparing the first and last expressions, we see that Av1 = 0. Clearly, the same
equation holds for v2 , . . . , vnr . So we have proved the following.
If A is an m n matrix of rank r, the general solution of Ax = b is the sum of:
a particular solution p of the system Ax = b and
a linear combination a1 v1 + a2 v2 + + anr vnr of solutions v1 , v2 , . . . , vnr of
the homogeneous system Ax = 0.
If A has rank n, then Ax = 0 only has the solution x = 0, and so Ax = b has a unique
solution: p + 0 = p.

This is a more precise form of the result of Theorem 3.2, which states that all solutions
of a consistent system Ax = b are of the form x = p + z where p is any solution of
Ax = b and z N(A), the null space of A (the set of all solutions of Ax = 0).
Activity 5.6 Solve the following system of equations Ax = b by reducing the
augmented matrix to reduced row echelon form:
x1 x2 + x3 + x4 + 2x5 = 4
x1 + x2 + x4 x5 = 3
x1 x2 + 2x3 + 3x4 + 4x5 = 7.
Show that your solution can be written in the form p + su1 + tu2 where Ap = b,
Au1 = 0 and Au2 = 0.

5.4 Range
The range of a matrix A is defined as follows.
Definition 5.3 (Range of a matrix) Suppose that A is an m n matrix. Then the
range of A, denoted by R(A), is the subset
R(A) = {Ax | x Rn }
of Rm . That is, the range is the set of all vectors y Rm of the form y = Ax for some
x Rn .
What is the connection between the range of a matrix A and a system of linear
equations Ax = b? If A is m n, then x Rn and b Rm . If the system Ax = b is
consistent, then this means that there is a vector x Rn such that Ax = b, so b is in
the range of A. Conversely, if b is in the range of A, then the system Ax = b must have
a solution. Therefore, we have shown that for an m n matrix A:
The range of A, R(A), consists of all vectors b Rm for which the system of
equations Ax = b is consistent.

134

5.4. Range

Lets look at R(A) from a different point of view. Suppose that the columns of A are
c1 , c2 , . . . , cn . Then we may write A = (c1 c2 . . . cn ). If x = (1 , 2 , . . . , n )T Rn ,
then the product Ax is equal to
1 c 1 + 2 c 2 + + n c n .
Activity 5.8 Convince yourself of this last statement. Write out each side using
ci = (c1i , c2i , . . . , cmi )T to show that
Ax = 1 c1 + 2 c2 + + n cn .
Try to do this yourself before looking at the solution to this activity. This is a very
important result which will be used many times in this course, so make sure you
understand how it works.
So, R(A), the set of all matrix products Ax, is also the set of all linear combinations of
the columns of A. For this reason R(A) is also called the column space of A. (More on
this in Chapter 7.)
If A = (c1 c2 . . . cn ), where ci denotes column i of A, then we can write
R(A) = {a1 c1 + a2 c2 + . . . + an cn | a1 , a2 , . . . , an R}.
Example 5.6 Suppose that

Then for x = (1 , 2 )T ,

1 2
Ax = 1 3
2 1
so

1
2

1 2
A = 1 3 .
2 1


1 + 22
1
2
= 1 + 32 = 1 1 + 2 3 ,
21 + 2
2
1

1 + 22
R(A) = 1 + 32

21 + 2

or

1 , 2 R

R(A) = {1 c1 + 2 c2 | 1 , 2 R} ,

1
2

where c1 = 1 and c2 = 3 are the columns of A.


2
1

Again, thinking of the connection with the system of equations Ax = b, we have


already shown that Ax = b is consistent if and only if b is in the range of A, and we
have now shown that R(A) is equal to the set of all linear combinations of the columns
of A. Therefore we can now assert that
The system of equations Ax = b is consistent if and only if b is a linear combination
of the columns of A.

135

5. Rank, range and linear equations

Example 5.7 Consider the following systems of three equations in two unknowns.

x + 2y = 0
x + 2y = 1
x + 3y = 5
x + 3y = 5

2x + y = 3
2x + y = 2

Solving these by Gaussian elimination (or any other method) you will find that the
first system is consistent and the second system has no solution. The first system has
the unique solution (x, y)T = (2, 1)T .
Activity 5.10

Do this. Solve each of the above systems.

The coefficient matrix of each of


in the Example 5.7. For the first

A = 1
2
Checking this

Ax = 1
2

the systems is the same, and is equal to the matrix A


system,

0
2
x
3, x =
, b = 5 .
y
1
3

solution, you will find that

2
0
2
3
= 5 or
1
3
1


0
1
2
5 = 2 1 3 = 2c1 c2 .
3
2
1

On the other hand, it is not possible to express the vector (1, 5, 2)T as a linear
combination of the column vectors of A. Trying to do so would lead to precisely the
same set of inconsistent equations.

Notice, also, that the homogeneous system Ax = 0 has only the trivial solution, and
that the only way to express 0 as a linear combination of the columns of A is by
0c1 + 0c2 = 0.
Activity 5.11

Verify all of the above statements.

Activity 5.12 Look at your solution to Activity 5.6 on page 134, and express the
vector b = (4, 3, 7)T as a linear combination of the columns of the coefficient matrix

1 1 1 1 2
A = 1 1 0 1 1 .
1 1 2 3 4
Do the same for the vector 0.

136

5.4. Learning outcomes

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
find a general solution to a linear system, Ax = b, expressed in vector notation as
the sum of a particular solution plus a general solution to the associated
homogeneous system Ax = 0
explain why a general solution x to Ax = b, where A is an m n matrix of rank r,
is of the form x = p + a1 v1 + a2 v2 + + anr vnr , ai R; specifically why there
are n r arbitrary constants
explain what is meant by the rank of a matrix and by the range of a matrix, be
able to find the rank of a matrix
show that if A = (c1 c2 . . . cn ), and if x = (1 , 2 , . . . , n )T Rn , then
Ax = 1 c1 + 2 c2 + + n cn
write b as a linear combination of the columns of A if Ax = b is consistent
write 0 as a linear combination of the columns of A, and explain when it is possible
to do this in some way other than using the trivial solution, x = 0, with all the
coefficients in the linear combination equal to zero.

Exercises
Exercise 5.1
Find the rank of the matrix

1
2
A=
1
0

0 1 0
1 1 1
3 1 2
3 2 2

2
3
.
2
0

Determine N(A), the null space of A and R(A) the range of A.


Exercise 5.2
Consider the system of linear equations Ax = b given below, where and are
constants:



1 2 0
x
2

A= 5 1
x= y
b = 7
1 1 1
z

Compute the determinant of A, |A|.


Determine for which values of and this system has:
(a) a unique solution
(b) no solutions
(c) infinitely many solutions.

In case (a), use Cramers rule to find the value of z in terms of and . In case
(c), solve the system using row operations and express the solution in vector form,
x = p + tv.

137

5. Rank, range and linear equations

Exercise 5.3
A system of linear equations Bx = d is known to have the following general solution:

1
3
1
0
1
0

x=
s, t R.
2 + s 0 + t 1
0
0
1

1
3
Let c1 = 1 be the first column of B. If d = 5 , find the matrix B.
2
2
Exercise 5.4

Consider the matrix

1 2 1
A = 2 3 0.
3 5 1

Find a condition the components of the vector b = (a, b, c)T must satisfy in order for
Ax = b to be consistent. Hence, or otherwise, show that R(A) is a plane in R3 , and
write down a Cartesian equation of this plane.
Show that d = (1, 5, 6)T is in R(A). Express d as a linear combination of the columns of
A. Is it possible to do this in two different ways? If the answer is yes, then do so;
otherwise justify why this is not possible.
Exercise 5.5
Consider the matrices

1 1
1
0 1 2

A=
2 1 8
3 1
7

2 3 2 5
3 6 9 6

B=
2 9 1 9
5 6 9 4


4
1

b=
a.
b

(a) Find the rank of the matrix A. Find a general solution of Ax = 0.


Either write down a non-trivial linear combination of the column vectors of A which is
equal to the zero vector, 0, or justify why this is not possible.
Find all real numbers a and b such that b R(A), where b is the vector given above.
Write down a general solution of Ax = b.
(b) Using row operations, or otherwise, find |B|, where B is the matrix given above.
What is the rank of B?
Either write down a non-trivial linear combination of the column vectors of B which is
equal to the zero vector, 0, or justify why this is not possible.
Find all real numbers a and b such that b R(B), the range of B, where b is the vector
given above.

138

5.4. Comments on selected activities

Comments on selected activities


Feedback to activity 5.2
One more row operation on the row echelon form will obtain a matrix in reduced row
echelon form which is row equivalent to the matrix M, from which the solution is found
to be

7
3
x = 3 + t 2 , t R.
0
1
Feedback to activity 5.3
Substitute for p1 , p2 , p3 in the row echelon form of the augmented matrix and then
continue to reduce it to reduced row echelon form. The non-leading variable is x3 .
Letting x3 = t, the general solution is


x1
3
5
x2 2
2


x=
x3 = 0 + t 1 = p + tv,
0
0
x4

5
t R.

Since Bp = d, multiplying Bp you will find that d = (1, 4, 5)T . (You can check all this
by row reducing (B|d).)
Feedback to activity 5.5
Put the augmented matrix into RREF:

1 1 1 1 2

(A|b) = 1 1 0 1 1
1 1 2 3 4

4
1 1 1 1 2
R2 +R1

3 R3 R1 0 0 1 2 1

7
0 0 1 2 2

1 1 1 1 2
4
R2 R3
R3 R2
0 0 1 2 1
1 R
1 2R3

0 0 0 0 1
2

1 1 0 1
R1 R2
0 0 1 2
0 0 0 0

1 1 1 1 0
0 0 1 2 0
0 0 0 0 1

0
1
0
1 .
1
2

4
1
3

0
1
2

Set the non-leading variables to arbitrary constants: x2 = s, x4 = t, and solve for the
leading variables in terms of these parameters, starting with the bottom row. For
s, t R,
x5 = 2,

x4 = t,

x3 = 1 2t,

x2 = s,

x1 = 1 + s + t

x1
1+s+t
1
1
1
x2
0
1
0
s

x=
x3 = 1 2t = 1 + s 0 + t 2 = p + su1 + tu2
x4

t
0
0
1
x5
2
2
0
0

139

5. Rank, range and linear equations

Verify:

0
1 1 1 1 2
4

Ap = 1 1 0 1 1
1 = 3
0
1 1 2 3 4
7
2

1

1
1 1 1 1 2
0

= 0
Au1 = 1 1 0 1 1
0

0
1 1 2 3 4
0
0


0
1 1 1 1 2
0

Au2 = 1 1 0 1 1 2 = 0
1
1 1 2 3 4
0
0

Feedback to activity 5.6


First write out the matrix product of

c11
c21
Ax =
...
cm1

The product is m 1; that is,

A = (cij ) and x:

c12 . . . c1n
1

c22 . . . c2n 2
.
..
.. ..
..
.
.
.
.
cm2 . . . cmn
n

c11 1 + c12 2 + + c1n n


c21 1 + c22 2 + + c2n n

Ax =
..

.
cm1 1 + cm2 2 + + cmn n

and can be written as a sum of n, m 1 vectors, so

c11 1
c12 2
c1n n
c21 1 c22 2
c
+ . + + 2n. n
Ax =
.
.. ..
..
cm1 1
cm2 2
cmn n

c1n .
c12
c11
c22
c2n
c21

Ax = 1
... + 2 ... + + n ...
cmn
cm2
cm1

Ax = 1 c1 + 2 c2 + n cn
All these steps are reversible, so the expression
1 c 1 + 2 c 2 + n c n

can be written in matrix form as Ax where A = (cij ) and x = (1 , 2 , . . . , n )T .

140

5.4. Comments on exercises

Feedback to activity 5.9


You can use any solution x (so any values of s, t R) to write b as a linear combination
of the columns of A, so this can be done in infinitely many ways. In particular, taking
x = p, and letting ci indicate column i of the coefficient matrix A,
Ap = c1 c3 + 2c5 = b.
You should write this out in detail and check that the sum of the vectors does add to
the vector b. Notice that this combination uses only the columns corresponding to the
leading variables:

1
1
2
4
1 0 + 2 1 = 3 .
1
2
4
7

Similarly, since Au1 = 0 and Au2 = 0, any linear combination of these two vectors will
give a vector v = su1 + tu2 for which Av = 0, and you can rewrite Av as a linear
combination of the columns of A. For example, taking u1 ,



1
1
0

c1 + c2 = 1 +
1
= 0.
1
1
0

Comments on exercises
Solution to exercise 5.1
Using row operations, the matrix A reduces to

1 0
0 1
A . . .
0 0
0 0

echelon form

1
0
2
1 1 1
.
1 1 3
0
0
0

There are three non-zero rows (three leading ones), so rank(A) = 3.


To find N(A) we need to solve Ax = 0, which is a system of four equations in five
unknowns, x1 , x2 , . . . , x5 . Continuing to reduced echelon form,

1 0 0 1 1
0 1 0 0
2

A . . .
0 0 1 1 3 .
0 0 0 0
0
The leading variables are x1 , x2 , and x3 . Set the non-leading variables x4 = s and
x5 = t. Then the solution is

x1
s + t
1
1
x2 2t
0
2

x3 = s 3t = s 1 + t 3 ,
s, t R.

x4 s
1
0
x5
t
0
1

141

5. Rank, range and linear equations

So the null space consists of all vectors of the form x = sv1 + tv2 , where v1 and v2 are
the vectors displayed above. It is a subset of R5 .
The range of A can be described as the set of all linear combinations of the columns of
A,
R(A) = {1 c1 + 2 c2 + 3 c3 + 4 c4 + 5 c5 | i R}

where



1
0
1
0
2
2
1
1
1
3



c1 =
1 , c2 = 3 , c3 = 1 , c4 = 2 , c5 = 2 .
0
3
2
2
0

This is a subset of R4 . We will find a better way to describe this set when we look at
the column space of a matrix in later chapters.

Solution to exercise 5.2


|A| = 3 9.
(a) If |A| = 0, that is, if = 3, then the system will have a unique solution. In this
case, using Cramers rule, z = (3 3)/( 3).
To answer (b) and (c), reduce

1
(A|b) = 5
1

1 2
0 3
0 3

the augmented matrix to echelon form with = 3

2 0 2
1 2 0
2
1 3 7 0 9 3 3
1 1
0 3 1 2

0
2
1 2 0
2
1
1 0 3 1
1 .
1 2
0 0 0 1

So if = 3, this system will be inconsistent if = 1, which answers (b).

If = 3 and = 1 we have (c) infinitely many solutions. Setting = 1 and continuing


to reduced echelon form,

4
1 2 0 2
1 0 23
3
(A|b) . . . 0 1 31 13 0 1 13 13 .
0 0 0 0
0 0 0 0

The solution can now be read from the matrix. Setting the non-leading variable z = t,
4 2 4
2
3

t
x
3
3
3
y = 1 + 1 t = 1 + t 1 = p + tv,
t R.
3
3
3
3
z
t
0
1
Solution to exercise 5.3
The matrix B must be 3 4 since the solutions are in R4 and the vector c1 R3 . Let

1
B = 1 c2 c3 c4 .
2

142

5.4. Comments on exercises

The solution is of the form x = p + sv1 + tv2 where v1 , v2 are in N(B), therefore you
know that Bp = d, Bv1 = 0 and Bv2 = 0. Regarding the matrix products as linear
combinations of the column vectors of B, we obtain
Bp = c1 + 2c3 = d

Bv1 = 3c1 + c2 = 0

Bv2 = c1 c3 + c4 = 0.

Knowing c1 , you just need to solve these for the other three columns.


3
1
2
1
1
3

2c3 =
5
1 =
4
c3 =
2
,
c2 = 3 1 = 3 ,
2
2
4
2
2
6

1
1
0
c4 = c3 c1 = 2 1 = 1 .
2
2
4
The matrix B is

1 3 1
0
B = 1 3 2
1 .
2 6 2 4

You can check your answer by row reducing the augmented matrix (B|d) to obtain the
solution of Bx = d, and matching it to the solution given.
Solution to exercise 5.4
You might have noticed that this is the same coefficient matrix A as we used in the
examples on page 130. You can easily tackle this question by forming the augmented
matrix and reducing it using row operations,

1 2 1 a
1 2
1
a
(A|b) = 2 3 0 b 0 1 2 b 2a .
3 5 1 c
0 1 2 c 3a
After this first step it is clear that the system will be consistent if and only if
b 2a = c 3a or a + b c = 0.
Therefore, the vector y = (x, y, z)T is in R(A) if and only if x + y z = 0. This is the
Cartesian equation of a plane in R3 .
The vector d = (1, 5, 6)T is in R(A), since its components satisfy the equation. This was
also shown Example 5.3 on page 130. Looking back at these examples you will notice
that the vector (1, 5, 4)T , for which the system is inconsistent, is not in the plane R(A).
For Activity 5.3 on page 130 you found a general solution of the system of equations
Ax = d to be

7
3
x = 3 + t 2 , t R.
0
1
Any solution x will enable you to write d as a linear combination of the columns of A.
For example, taking first t = 0 and then t = 1, d = 7c1 3c2 or d = 4c1 c2 c3 ;
that is





1
1
2
1
1
2
1
5 = 7 2 3 3 or 5 = 4 2 3 0 .
6
3
5
6
3
5
1

143

5. Rank, range and linear equations

Note that any two columns of A, or any two non-parallel vectors in R(A), can be used
to obtain a vector equation of the plane R(A); that is, for example,


1
2

x = s 2 + t 3 , s, t R
3
5

is a vector equation for the plane R(A).


Solution to exercise 5.5

You need to put the matrix into row echelon form to answer the first question, and into
reduced row echelon form for the second,

1 1
1
1 0 3
0 1 2
0 1 2

A=
2 1 8 0 0 0
3 1
7
0 0 0
(a) The rank of A is 2. There is one non-leading variable. If you write x = (x, y, z)T ,
then setting z = t, you will obtain the solution

3
x = t 2 , t R.
1
Since there are non-trivial solutions of Ax = 0, it is possible to express 0 as a linear
combination of the columns of A with non-zero coefficients. A non-trivial linear
combination of the column vectors which is equal to the zero vector is given by any
non-zero vector in the null space. For example, using t = 1, the product Ax yields,


0
1
1
1
1 2 0
0

3c1 + 2c2 + c3 = 3
2 + 2 1 + 8 = 0 = 0.
0
3
1
7

The vector b R(A) if b is a linear combination of the column vectors of A, which is


exactly when Ax = b is consistent. Notice that the matrix A has rank 2, so the
augmented matrix must also have rank 2. Reducing (A|b) using row operations,

1 1
1 4
1 1
1
4
1 1 1
4
0 1 2 1

1
1

0 1 2
0 1 2
.
2 1 8 a
0 3 6
0 0 0
a8
a5
3 1
7 b
0 2 4 b 12
0 0 0 b 10
Therefore, Ax = b is consistent
to reduced echelon form

1 1
0 1

0 0
0 0

144

if and only if a = 5 and b = 10. In that case, continuing


1
2
0
0

4
1
0
1

0
0
0
0

0 3 3
1 2 1
,
0 0 0
0 0 0

5.4. Comments on exercises

so a general solution is,

3
3
x = 1 + t 2
0
1

t R.

(b) Using row operations,


2 3 2 5
1 2 3 2
3 6 9 6
2 3 2 5
= (3)
|B| =
= (3)
2 9 1 9
2 9 1 9
5 6 9 4
5 6 9 4
1 2 3 2
0 1
1
1
= (30)
= (30)
0 1 4
1
0 2 3 3

1 2 3 2
0 1 4
1
0 5
5
5
0 4 6 6

1 2 3 2
0 1
1
1
= 450.
0 0
5
2
0 0 5 1

Since det(B) = 0, the rank of B is 4. Therefore, the main theorem tells us that Bx = 0
has only the trivial solution. Therefore there is no way to write 0 as a linear combination
of the column vectors of B except the trivial way, with all coefficients equal to 0.
Also, using this theorem, Bx = b has a unique solution for all b R4 . Therefore
R(B) = R4 . That is, a and b can be any real numbers, the system Bx = b is always
consistent.

145

5. Rank, range and linear equations

146

Chapter 6
Sequences, series and difference
equations
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 3, 4, 5, 23 and 24.

6
Aims of the chapter
In this chapter we make a slight detour into the topic of sequences, series and difference
equations (also known as recurrence equations). Many problems in economics and
finance involve difference equations, particularly those involving quantities which
change with time, but not continuously (such as the balance of a deposit account where
interest is paid once a year, at the end of the year). This chapter is independent of the
other chapters so far, but the material is important in its own right and, moreover, we
will see later that matrices and linear algebra can be used to solve systems of difference
equations.

6.1 Sequences
6.1.1

Sequences in general

A sequence1 of numbers y0 , y1 , y2 , . . . is an infinite and ordered list of numbers with one


term, yt , corresponding to each non-negative integer, t. We call yt1 the tth term of the
sequence. Notice that, in our notation, the first term is y0 and yt is actually the (t + 1)st
term of the sequence. (Be careful not to be confused by this, as some texts differ. Its
quite legitimate to, instead, denote the first term by y1 .) For example, yt could
represent the price of a commodity t years from now, or the balance in a bank account t
years from now. Often, a sequence is defined explicitly by a formula. For instance, the
formula yt = t2 generates the sequence
y0 = 0, y1 = 1, y2 = 4, y3 = 9, y4 = 16, . . .
1

See Anthony and Biggs, Section 3.1.

147

6. Sequences, series and difference equations

and the sequence 3, 5, 7, 9, . . . may be described by the formula


yt = 2t + 3 (t 0).

6.1.2

Arithmetic progressions

The arithmetic progression with first term a and common difference d has its terms
given by the formula yt = a + dt. For example, the arithmetic progression with first
term 5 and common difference 3 is 5, 8, 11, 14, . . .. Note that yt is obtained from yt1 by
adding the common difference d. In symbols, yt = yt1 + d.

6.1.3

Geometric progressions

Another very important type of sequence is the geometric progression. The geometric
progression with first term a and common ratio x is given by the formula yt = axt .
Notice that successive terms are related through the relationship yt = xyt1 . For
example, the geometric progression with first term 3 and common ratio 1/2 is given by
yt = 3(1/2)t ; that is, the sequence is 3, 3/2, 3/4, 3/8, . . ..

6.1.4

Compound interest

Perhaps the simplest occurrence of geometric progressions in economics is in the study


of compound interest.2 Suppose that we have a savings account for which the annual
percentage interest rate is constant at 8%. What this means is that if we have $P in the
account at the beginning of a year then, at the end of that year, the account balance is
increased by 8% of $P . In other words, the balance increases to $(P + 0.08P ).
Generally, if the annual percentage rate of interest is R%, then the interest rate is
r = R/100 and in the course of one year, a balance of $P becomes $P + rP =
$(1 + r)P . One year after that, the balance in dollars becomes $(1 + r)((1 + r)P ), which
is $(1 + r)2 P . Continuing in this way, we can see that if P dollars are deposited in an
account where interest is paid annually at rate r, and if no money is taken from or
added to the account, then after t years we have a balance of P (1 + r)t dollars. This
process is known as compounding (or compound interest), because interest is paid on
interest previously added to the account.
Activity 6.1 Suppose that $1, 000 is invested in an account that pays interest at a
fixed rate of 7%, paid annually. How much is there in the account after four years?

6.1.5

Frequent compounding

What happens if interest is added more frequently than once a year? Suppose, for
example, that instead of 8% interest paid at the end of the year, we have 4% interest
added twice-yearly, once at the middle of the year and once at the end. If $100 is
invested, the amount after one year will be
100(1 + 0.04)2 = 108.16
2

See Anthony and Biggs, Sections 4.3 and 7.3.

148

6.2. Series

dollars, which is slightly more than the $108 which results from the single annual
addition. If the interest is added quarterly (so that 2% is added four times a year), the
amount after one year will be
100(1 + 0.02)4 = 108.24
dollars (approximately). In general, when the year is divided into m equal periods, the
rate is r/m over each period, and the balance after one year is
r m
P 1+
,
m
where P is the initial deposit.
Taking m larger and larger formally, letting m tend to infinity we find ourselves in
the situation of continuous compounding. Now, it is a standard fact (that we wont
verify here) that, as m gets larger and larger, tending to infinity,
r m
1+
m
approaches er , where e is the base of the natural logarithm. (See the subject guide for
MT1174 Calculus.) Formally,
r m
= er .
lim 1 +
m
m
r
So the balance after one year should be P e . If invested for a further year, we would
have P er er = P (er )2 = P e2r . After t years continuous compounding, the balance of the
account would be P ert .

6.2 Series
Let us continue with the story of our investor. It is natural to investigate how the
balance varies if the investor adds a certain amount to the account each year. Suppose
that they add $P to the account at the beginning of each year, so that at the beginning
of the first year the balance is $P . At the beginning of the second year the balance in
dollars will be $P (1 + r) + P ; this represents the money from the first year with interest
added, and the new, further, deposit of $P . Convince yourself that, continuing in this
way, the balance at the beginning of year t is, in dollars,
P + P (1 + r) + + P (1 + r)t2 + P (1 + r)t1 .
How can we calculate this expression? Note that it is the sum of the first t terms (that
is, term 0 to term t 1) of the geometric progression with first term P and common
ratio 1 + r. Before coming back to this, we shall discuss such things in a more general
setting.
Given a sequence y0 , y1 , y2, y3 , . . ., a finite series is a sum of the form
y0 + y2 + + yt1 ,
the first t terms added together, for some number t. There are two important results
about series, concerning the cases where the corresponding sequence is an arithmetic
progression (in which case the series is called an arithmetic series) and where it is a
geometric progression (in which case the series is called a geometric series).

149

6. Sequences, series and difference equations

6.2.1

Arithmetic series

The main result here is that if yt = a + dt describes an arithmetic progression and St is


the sequence
St = y0 + y1 + y2 + . . . + yt1 ,
then

t(2a + (t 1)d)
.
2
There is a useful way of remembering this result. Notice that St may be rewritten as
St =

St = t

(y0 + yt1 )
(a + (a + (t 1)d))
=t
,
2
2

so that we have the following easily remembered result: an arithmetic series has value
equal to the number of terms, t, times the value of the average of the first and last
terms (y0 + yt1 )/2. Equivalently, the average value St /t of the t terms is the average,
(y0 + yt1 )/2 of the first and last terms.

Activity 6.2 Find the sum of the first n terms of an arithmetic series whose first
term is 1 and whose common difference is 5.

6.2.2

Geometric series

We now look at geometric series. It is easily checked (by multiplying out the expression)
that, for any x,
(1 x)(1 + x + x2 + + xt1 ) = 1 xt .
So, if x = 1 and yt = axt , then the geometric series

St = y0 + y2 + . . . + yt1 = a + ax + ax2 + + axt1


is therefore given by
St =

a(1 xt )
.
1x

Example 6.1 In our earlier discussion on savings accounts, we came across the
expression
P + P (1 + r) + . . . + P (1 + r)t2 + P (1 + r)t1 .
We now see that this is a geometric series with t terms, first term P and common
ratio 1 + r. Therefore it equals
P

1 (1 + r)t
P
=
(1 + r)t 1 .
1 (1 + r)
r

Activity 6.3 Find an expression for


2 + 2(3) + 2(32 ) + 2(33 ) + + 2(3n ).

150

6.3. Finding a formula for a sequence

6.3 Finding a formula for a sequence


Often we can use results on series to determine an exact formula for the members of a
sequence of numbers. The following example illustrates this. We shall see a different
method later (and both are equally valid).
Example 6.2 Suppose a sequence of numbers is constructed as follows. The first
number, y0 , is 1, and each other number in the sequence is obtained from the
previous number by multiplying by 2 and adding 1 (so that yt = 2yt1 + 1, for
t 1). Whats the general expression for yt in terms of t?

We can see that


y1
y2
y3
y4

=
=
=
=

2y0 + 1 = 2(1) + 1 = 2 + 1
2y1 + 1 = 2(2 + 1) + 1 = 22 + 2 + 1
2y2 + 1 = 2(22 + 2) + 1 = 23 + 22 + 2 + 1
2y3 + 1 = 2(23 + 22 + 2 + 1) + 1 = 24 + 23 + 22 + 2 + 1.

In general, it would appear that


yt = 2t + 2t1 + + 22 + 1.
But this is just a geometric series: perhaps this is clearer if we write it as
yt = 1 + 2 + 22 + + 2t1 + 2t ,
from which it is clear that this is the sum of the first t + 1 terms of the geometric
progression with first term 1 and common ratio 2. By the formula for the sum of a
geometric series, we have
1 2t+1
yt =
= 2t+1 1,
12

6.4 Limiting behaviour


When x is greater than 1, as t increases, xt will eventually become greater than any
given number, and we say that xt tends to infinity as t tends to infinity.3 We write this
in symbols as
xt as t
or
lim xt = .
t

On the other hand, when x < 1 and x > 1, we have


xt 0 as t

or

lim xt = 0.

We notice that, while xt gets closer and closer to 0 for all values of x in the range
1 < x < 1, its behaviour depends to some extent on whether x is positive or negative.
3

See Anthony and Biggs, Section 3.3.

151

6. Sequences, series and difference equations

When x is negative, the terms are alternately positive and negative, and we say that the
approach to zero is oscillatory. For example, when x = 0.2, the sequence xt is
0.2, 0.04, 0.008, 0.0016, 0.00032, 0.000064, 0.0000128, 0.00000256, . . .
When x is less than 1, the sequence is again oscillatory, but it does not approach any
limit, the terms being alternately large-positive and large-negative. In this case, we say
that at oscillates increasingly.
As an application of this, let us consider again the geometric series
St = a + ax + ax2 + + axt1 .
We have

a(1 xt )
.
1x
If 1 < x < 1 then xt 0 as t . This means that St approaches the number
a(1 0)/(1 x) = a/(1 x), as t increases. In other words,
St =

St

a
as t .
1x

We call this limit the sum to infinity of the sequence given by yt = axt . Note that a
geometric sequence has a finite sum to infinity only if the common ratio is strictly
between 1 and 1.
Example 6.3 Consider the sequence with yi = 1/2i for i 0. The sum of the first t
terms of this sequence is
St = 1 +

1
1
1
+ 2 + + t1 .
2 2
2

By the formula for the sum of a geometric series,


St = 2 1

1
2

and we see that St 2 as t .


Activity 6.4 Find an expression for
St =

2
+
3

2
3

2
3

++

2
3

and determine the limit of St as t tends to infinity.

6.5 Financial applications


A number of problems in financial mathematics can be solved just using what we know
about arithmetic and geometric series. Here is an example.

152

6.6. First-order difference equations

Example 6.4 John has opened a savings account with a bank, and they pay a
fixed interest rate of 5% per annum, with the interest paid once a year, at the end of
the year. He opened the savings account with a payment of $100 on 1 January 2003,
and will be making deposits of $200 yearly, on the same date. What will his savings
be after he has made N of these additional deposits? (Your answer will be an
expression involving N.)
If yN is the required amount, then we have
y1 = (1.05)100 + 200,
y2 = (1.05)y1 + 200
= 100(1.05)2 + 200(1.05) + 200,
and, in general, we can spot the pattern and observe that
yN = 100(1.05)N + 200(1.05)N 1 + 200(1.05)N 2 + + 200(1.05) + 200
= 100(1.05)N + 200 1 + (1.05) + (1.05)2 + + (1.05)N 2 + (1.05)N 1
1 (1.05)N
= 100(1.05)N + 200
1 (1.05)
N
= 100(1.05) + 4000 (1.05)N 1 ,
where we have used the formula for a geometric series.

6.6 First-order difference equations


We have noted that if
y0 , y1 , y2 , y3 , . . .
is an arithmetic progression with common difference d then yt may be related to yt1
through the equation yt = yt1 + d. In fact, this equation, together with the value of y0 ,
tells us precisely what yt is. For example, suppose we have yt = yt1 + 3 and y0 = 5.
Then we can calculate y1 directly as y1 = y0 + 3 = 5 + 3 = 8, and, similarly,
y2 = y1 + 3 = 8 + 3 = 11, and so on. In general, yt will be yt = 5 + 3t. The relationship
yt = yt1 + d is an example of a (first-order) difference equation (or first-order recurrence
equation)4 , and the value of y0 is known as an initial condition. The difference equation
yt = yt1 + 3, with y0 = 5, has, as we have seen, the solution yt = 5 + 3t. The question
considered here is how to determine such explicit solutions to difference equations.
The general first-order difference equation is of the form
yt = ayt1 + b
where a and b are numbers. It is easy to see (and you should convince yourself of this)
that if a = 1 then the sequence of numbers yt is an arithmetic progression with common
difference b and first term y0 . Therefore we shall discuss the solution to such a general
difference equation when a = 1.
4

See Anthony and Biggs, Section 3.2.

153

6. Sequences, series and difference equations

Important point: It should, of course, be understood that the difference equation


yt = ayt1 + b (t 1)
is entirely equivalent to the difference equation
yt+1 = ayt + b (t 0).
They say precisely the same thing about the sequence.
If b = 0, then yt = ayt1 and so y1 = ay0 , y2 = ay1 = a2 y0 , and, in general, yt = at y0 . In
this case, then, finding the solution is easy; the sequence of numbers yt is a geometric
progression with first term y0 and common ratio a. (In a specific application, y0 would
be given also.)
Difference equations in which a = 1 and b = 0 occur often in economics. Suppose, for
example, that yt represents the balance in a savings account at the end of t years and
that the interest rate is r. Suppose that, from this account, each year, the investor
withdraws an amount I. Then we have the difference equation

yt = (1 + r)yt1 I.
This is of the form yt = ayt1 + b where a = 1 + r and b = I.
Lets work out a few terms of a difference equation.

Example 6.5 Suppose that y0 = 1 and for t 1, yt = 2yt1 + 1. The sequence of yt


is not an arithmetic or a geometric progression. (We do not get from one term to the
next simply by adding a fixed constant as we would were it an arithmetic
progression; nor by simply multiplying by a fixed constant as we would were it a
geometric progression. Instead, we both multiply by a fixed constant and then add a
fixed constant.) Working out successive terms, we have
y1 = 2y0 + 1 = 2(1) + 1 = 3
y2 = 2y1 + 1 = 2(3) + 1 = 7
y3 = 2y2 + 1 = 2(7) + 1 = 15,
and so on.
Now, suppose you wanted to know the value of y312 . Do you really want to have to
carry out 312 calculations of the type we have just seen? Certainly not! Which is
why we want a formula for yt . What we mean by a formula is some expression for yt
involving only t (and not yt1 ), and we call this formula the solution of the difference
equation.
For example, we know that the difference equation
yt = yt1 + 2, y0 = 1
has solution yt = 1 + 2t (because we recognise it as an arithmetic progression). To
find any particular yt we simply substitute into the formula the desired value of t.
For instance, y312 = 1 + 2(312) = 625.

154

6.7. Solving first-order difference equations

6.7 Solving first-order difference equations


We consider the equation yt = 2yt1 + 1, with y0 = 1.
Suppose we want to find the general solution of the equation yt = ayt1 + b (with
a = 1). First, note that the constant
y =

b
1a

satisfies y = ay + b. This means that if y0 happened to be y , then wed have y1 = y ,


y2 = y and, generally, yt = y for all t. For this reason, the number y is known as the
constant solution or time-independent solution. The general solution to the equation
yt = ayt1 + b is
yt = y + (y0 y )at ,
where y = b/(1 a).

We state this as a theorem, just to make it clear how important it is.

Theorem 6.1 The general solution of the equation yt = ayt1 + b (with a = 1) is


yt = y + (y0 y )at
where y = b/(1 a).
Example 6.6 Consider again yt = 2yt1 + 1 with y0 = 1. We have a = 2, b = 1 and
y =

b
1
=
= 1,
1a
12

so the solution is
yt = y + (y0 y )at = 1 + (1 (1))2t = 1 + 2t+1 ,
exactly as we found above.
Example 6.7 We find the solution of the equation
yt = 5yt1 + 6,
given that y0 = 5/2. If we take a = 5 and b = 6 in the standard form yt = ayt1 + b,
we have exactly the equation given. The first thing to do is to find the constant
solution. By the formula, this is y = b/(1 a) = 6/(1 5) = 3/2. We can now
write down the general solution and insert the given value of y0 :
3
yt = y + (y0 y )at = + 4(5t ).
2

155

6. Sequences, series and difference equations

Activity 6.5 Suppose that yt = (2/3)yt1 + 5 and that y0 = 2. Find yt .

6.8 Long-term behaviour of solutions


The behaviour of the general solution (or time path) for yt , yt = y + (y0 y )at ,
depends5 simply on the behaviour of at . For example, if at 0, then the formula tells
us that yt y . We can tabulate the results as follows. (For this table, we assume
y0 = y because in this case it is clear that the solution is constant, and equal to y .)
Value of a
a>1
0a<1
1 < a < 0
a < 1

Behaviour of at
a
at 0 (decreasing)
at 0 (oscillating)
oscillates increasingly
t

Behaviour of yt
yt or yt
yt y
yt y
oscillates increasingly

6
In the first of these cases (a > 1), whether yt or yt will, of course, depend
on the sign of y0 y .
Activity 6.6 The case a = 1 is not covered in the table just given. How does the
solution yt behave in this case?

6.9 The cobweb model


Consider an agricultural product for which there is a yearly crop. If there are no
disturbances, the equilibrium price p and quantity q will prevail. Suppose that one
year, however, for some external reason such as drought, there is a shortage, so that the
quantity falls and the price rises to p0 . During the winter the farmers plan their
production for the next year on the basis of this higher price, and so an increased
quantity appears on the market in the next year: specifically q1 = q S (p0 ). Because the
quantity is greater, the price consumers pay falls, to the value p1 = pD (q1 ). (If you are
unfamiliar with supply and demand functions, please see the subject guide for MT1174
Calculus or Chapters 1 and 5 of Anthony and Biggs.) Overall, the effect of the
disturbance on the price is that it goes from p0 , which is greater than p , to p1 , which is
less than p . The process is repeated again in the following year: this time the lower
price p1 leads to a decrease in production q2 and that in turn means a higher price p2 .
The next year a similar process takes place, and so on. When the sequences
p0 , p1 , p2 , . . . , and q1 , q2 , . . . , are plotted on the supply and demand diagram, we get a
figure like a cobweb.6 This is the reason for the name cobweb model.

5
6

See Anthony and Biggs, Section 3.3.


See Anthony and Biggs, Chapter 5.

156

6.10. Financial applications of first-order difference equations

Example 6.8 Generalising the argument above, we see that pt1 determines qt ,
which in turn determines pt , according to the rules
qt = q S (pt1 ),

pt = pD (qt ),

where q S is the supply function and pD the inverse demand function. Suppose that
the demand and supply equations are, respectively, as follows:
q + p = 24, 2q + 18 = p.
Then the equilibrium quantity and price are q = 2, p = 22, and
q S (p) = 0.5p 9,

pD (q) = 24 q.

The equations linking pt1 , qt and pt are thus


qt = 0.5pt1 9,

pt = 24 qt .

Eliminating qt we obtain a first-order difference equation for pt :

pt = 33 0.5pt1 .
This is in the standard form yt = ayt1 + b, with pt replacing yt and
a = 0.5, b = 33. The time-independent solution is b/(1 a) = 33/(3/2) = 22, and
the explicit solution in terms of p0 is
pt = 22 + (p0 22)(0.5)t .
Note that the time-independent solution is the equilibrium price p = 22, and that in
this case the sequence approaches p in an oscillatory way. We say that we have a
stable cobweb. However, it is possible, for other supply and demand curves, that the
price oscillates about the equilibrium price with ever-increasing magnitude. In such
cases, the price does not approach p and we say we have an unstable or exploding
cobweb.7

6.10 Financial applications of first-order difference


equations
First-order difference equations are very useful in the mathematics of finance.8
We consider how capital accrues under compound interest. In particular, we consider
the situation in which there is a fixed annual interest rate r available to investors, and
interest is compounded annually. In this case, if we invest P then after t years we have
an amount P (1 + r)t . This same result can be derived very simply via difference
equations. If we let yt be the capital at the end of the tth year we have y0 = P and the
7
8

See Anthony and Biggs, Chapter 5.


See Anthony and Biggs, Chapter 4.

157

6. Sequences, series and difference equations

difference equation
yt = (1 + r)yt1 ,

(t = 1, 2, 3, . . .).

This is in the standard form, with a = (1 + r) and b = 0. The solution is fairly obvious
(since this is just a geometric progression).
It might seem unnecessary to use difference equations for such a simple investment
scenario, when it is very easy to determine by elementary means the amount of capital
after t years. However, suppose that we withdraw an amount I at the end of each year
for N years. Then what is the balance of the account after t years? This is less obvious,
but difference equations provide an easy means of determining the answer. As we noted
above, in this case, the difference equation is
yt = (1 + r)yt1 I,

where y0 = P.

This is another case of the first-order linear difference equation, in standard form with
a = 1 + r and b = I. The time-independent solution is therefore y = I/r. The general
solution is yt = y + (y0 y )at , and since y0 = P we obtain
yt =

I
I
+ P
r
r

(1 + r)t .

This formula enables us to answer a number of questions. First, we might want to know
how large the withdrawals I can be given an initial investment of P , if we want to be
able to withdraw I annually for N years. The condition that nothing is left after N
years is, yN = 0. This is
I
I
+ P
(1 + r)N = 0,
r
r
and rearranging, we get

I
(1 + r)N 1 = P (1 + r)N ,
r

so that

r(1 + r)N
P.
(1 + r)N 1

I(P ) =

An inverse question is: what principal P is required to provide an annual income I for
the next N years? Rearranging the equation gives the result
P (I) =

I
r

1
(1 + r)N

6.11 Homogeneous second-order difference


equations
An equation of the form
yt + a1 yt1 + a2 yt2 = 0,
in which a1 and a2 are constants, is a homogeneous linear second-order difference (or
recurrence) equation, with constant coefficients.9 Solving these equations is a lot like
9

See Anthony and Biggs, Chapter 23.

158

6.11. Homogeneous second-order difference equations

solving second-order differential equations. (See the subject guide for MT1174
Calculus.)
The general solution to the equation depends on whether the auxiliary equation
z 2 + a1 z + a2 = 0 has two distinct solutions, or just one solution, or no (real) solutions.
Thus, the form of general solution depends on the value of the discriminant, a21 4a2 .
We consider each case in turn.
When the auxiliary equation has two distinct solutions, and , the general
solution is
yt = At + B t (A, B constants).
In any specific case, A and B are determined by the initial values y0 and y1 , as in
Example 6.9.
When the auxiliary equation has just one solution, , the general solution is
yt = Ctt + Dt = (Ct + D)t .
As in the previous case, the values of the constants C and D can be determined by
using the initial values y0 and y1 .
The auxiliary equation has no solutions when the quantity a21 4a2 is negative. In
that case, 4a2 a21 is positive, and hence so is a2 . Thus there is a positive square

root r of a2 ; that is, we can define r = a2 . In order to write down the general
solution in this case we define the angle by
cos =

a1
a1
= .
2r
2 a2

Then the general solution in this case is


yt = Er t cos t + F r t sin t,
where E and F are constants.
Example 6.9 We find the general solution of the difference equation
yt 6yt1 + 5yt2 = 0.
The auxiliary equation is z 2 6z + 5 = 0, that is (z 5)(z 1) = 0, with solutions 1
and 5. The general solution is therefore
yt = A(1t ) + B(5t ) = A + B5t ,
for arbitrary constants A and B.
Example 6.10

Let us find yt if
yt 2yt1 + 4yt2 = 0,

159

6. Sequences, series and difference equations

and y0 = 1, y1 = 1 3. Here, the auxiliary equation, z 2 2z + 4 = 0, has no


solutions,
so we are in the third case. In the notation used above, we have

r = 4 = 2. It follows that
cos = (2)/(2r) = 2/4 = 1/2,
so = /3. The general solution is therefore
yt = 2t (E cos (t/3) + F sin (t/3)) .
Putting t= 0, and using the given initial condition y0 = 1, we have E = 1. Similarly
y1 = 1 3 implies that
2 (E cos(/3) + F sin(/3)) = 2
so that 1 +

3F = 1

1
3
+F
2
2

=1

3,

3. Therefore F = 1 and the required solution is


yt = 2t (cos (t/3) + sin (t/3)) .

6.12 Non-homogeneous second-order equations


We now consider how to solve an equation of the form
yt + a1 yt1 + a2 yt2 = f (t).
Suppose first that f (t) = k, where k is a constant. By analogy with the first-order case,
we start by looking for a constant solution. 10 yt = y for all t. For this we require
y + a1 y + a2 y = k,

or y =

k
.
1 + a1 + a2

We call y a particular solution. Then,


General solution of the non-homogeneous equation =
Particular solution + General solution of the homogeneous equation.
This principle holds when f (t) is any function of t, where a particular solution is found
by substitution (rather as in solving second-order differential equations see the
subject guide for MT1174 Calculus). For example, if f (t) were at , it would be
appropriate to try to find a particular solution of the form cat .
Activity 6.7
yt .
10

Suppose that yt 5yt1 14yt2 = 18, and that y0 = 1, y1 = 8. Find

See Anthony and Biggs, Section 23.4.

160

6.13. Behaviour of solutions

6.13 Behaviour of solutions


You should know how to discuss the behaviour of the solution, or time path.11 For
example, suppose that the solution to a second-order difference equation is
y t = 3t 2t .
How does this behave as t ? As t , we know that both 3t and 2t tend to
infinity, but what happens to their difference? It appears that 3t grows much faster than
2t and thus we might expect that yt . This is indeed the case, as can be seen by
writing
t
2
3t 2t = 3t 1
.
3
Since (2/3)t tends to zero, it follows that 1 (2/3)t tends to 1. The other factor 3t tends
to infinity, so the product tends to infinity.

6.14 Economic applications of second-order


difference equations

Second-order difference equations occur quite naturally in macro-economic modelling.


We shall consider a closed national economy, with no government (for the sake of
simplicity). Three important quantities tell us something about the state of the
economy:
Investment, I
Income, Y
Consumption, C
Suppose we can measure each of the quantities in successive time periods of equal
length (for example, each year). Denote by It , Yt , Ct the values of the key quantities in
time-period t. Then we have a sequence of values I0 , I1 , I2 , . . ., and similarly for the other
quantities. We shall assume that the equilibrium condition Yt = Ct + It holds for each t.
In the multiplier-accelerator model, we assume that the following equations link the key
quantities:
Ct = c + bYt1 , where c and b are positive constants
It = i + v(Yt1 Yt2 ), where i and v are positive constants.

Using the equilibrium condition Yt = Ct + It , we can obtain a difference equation


involving only the Y s.
Y t = C t + It
= (c + bYt1 ) + (i + v (Yt1 Yt2 ))
= (c + i) + (b + v)Yt1 vYt2 .
11

See Anthony and Biggs, Chapter 24, for instance.

161

6. Sequences, series and difference equations

In other words, we have the second-order difference equation


Yt (b + v)Yt1 + vYt2 = c + i.
Example 6.11

Suppose
3
Ct = Yt1 ,
8

1
It = 40 + (Yt1 Yt2 )
8

and lets assume the equilibrium condition Yt = Ct + It holds. Lets suppose that
Y0 = 65 and Y1 = 64.5, and try to determine an expression for Yt .
Arguing as above, we have
Y t = C t + It
1
3
=
Yt1 + 40 + (Yt1 Yt2 )
8
8
1
1
= 40 + Yt1 Yt2 ,
2
8

so

1
1
Yt Yt1 + Yt2 = 40.
2
8

The auxiliary equation is

1
1
z 2 z + = 0,
2
8
2
which has discriminant (1/2) 4(1/8) = 1/4. This is negative, so there are no
solutions. We are therefore in the third case of a second-order difference
equation. To
proceed, we use the method given above. We have r = 1/8 = 1/(2 2), and

2 2
1
(1/2)
=
= ,
cos =
2r
4
2
so = /4. Thus, the general solution to the homogeneous equation in this case is
1

(E cos(t/4) + F sin(t/4)) .

2 2
We need a particular solution of

1
1
Yt Yt1 + Yt2 = 40.
2
8
Trying Yt = k, a constant, we see that k (1/2)k + (1/8)k = 40, so k = 64. It follows
that for some constants E and F ,
yt = 64 +

2 2

(E cos(t/4) + F sin(t/4)) .

To find E and F we use the initial conditions, Y0 = 65 and Y1 = 64.5. Now,


Y0 = 64 + E cos(0) + F sin(0) = 64 + E,

162

6.14. Learning outcomes

so E = 1. Also,
Y1 = 64 + E

2 2
1

= 64 + E
2 2
E F
= 64 + + ,
4
4

cos(/4) + F
1

+F

2 2
1

2 2

sin(/4)
1

and since this is 64.5, we have E + F = 2 and hence F = 1. The final answer is
therefore
t
1
t
t

Yt = 64 +
cos
+ sin
.
4
4
2 2

Learning outcomes

At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by arithmetic and geometric progressions, and calculate the
sum of finite arithmetic and geometric series
explain compound interest and calculate balances under compound interest
apply sequences and series in management and finance
analyse the long-term behaviour of series and sequences
solve problems involving first-order difference equations
solve second-order difference equations
analyse the behaviour of solutions to difference equations
solve problems involving the application of difference equations.

Exercises
Exercise 6.1
A geometric progression has a sum to infinity of 3 and has second term y1 equal to 2/3.
Show that there are two possible values of the common ratio x and find the
corresponding values of the first term a.
Exercise 6.2
Suppose we have an initial amount, A0 , to invest and we add an additional investment
F at the end of each subsequent year. All investments earn an interest of i% per annum,
paid at the end of each year.
(a) Use the formula for the sum of a geometric series to derive a formula for the value of
the investment, An , after n years.
(b) An investor puts $10, 000 into an investment account that yields interest of 10% per
annum. The investor adds an additional $5, 000 at the end of each year. How much will

163

6. Sequences, series and difference equations

there be in the account at the end of five years? Show that if the investor has to wait N
years until the balance is at least $80, 000, then
N

ln(13/6)
.
ln(1.1)

Exercise 6.3
An amount of $1,000 is invested and attracts interest at a rate equivalent to 10% per
annum. Find expressions for the total after one year if the interest is compounded:
(a) annually
(b) quarterly
(c) monthly
(d) daily. (Assume the year is not a leap year.)

What would be the total after one year if the interest is 10% compounded continuously?
Exercise 6.4
Suppose yi = 1/22i . Find the limit, as t , of
St = y0 + y2 + + yt1 .
Exercise 6.5
Planners believe that, as a result of a recent government grant scheme, the number of
new high technology businesses starting up each year will be N. There are already 3,000
such businesses in existence in the country, but it is expected that each year 5% of all
those in existence at the beginning of the year will fail (shut down). Let yt denote the
number of businesses at the end of year t. Explain why
yt = 0.95yt1 + N.
Solve this difference equation for general N. Find a condition on N which will ensure
that the number of businesses will increase from year to year.
Exercise 6.6
The supply and demand functions for a good are
q S (p) = 0.05p 4, q D (p) = 20 0.15p.
Find the equilibrium price. What is the inverse demand function pD (q)? Suppose that
the sequence of prices pt is determined by pt = pD (q S (pt1 )) (as in the cobweb model).
Find an expression for pt .

164

6.14. Exercises

Exercise 6.7
A market for a commodity is modelled by taking the demand and supply functions as
follows:
D(p) = 1 p,
S(p) = p,
so that when the price p prevails the amount of commodity demanded by the market is
D(p) and the amount which producers will supply is S(p). Price adjusts over time t in
response to the excess of the demand over the supply according to the equation:
pt+1 pt = a(D(pt ) S(pt )),
where a is a positive constant. Initially the price p is p0 = 34 . Solve this equation and
show that over time the price adjusts towards the clearing value (i.e. the price at which
supply and demand are equal) if and only if
0 < a < 1.
Under what circumstances does the price tend towards the equilibrium price in an
oscillatory fashion? What happens to the price if a = 21 ?

Exercise 6.8
Find the general solution of the difference equation
yt yt1 6yt2 = 0.
Exercise 6.9
(a) Suppose that consumption this year is the average of this years income and last
years consumption; that is,
1
Ct = (Yt + Ct1 ).
2
Suppose also that the relationship between next years income and current investment is
Yt+1 = kIt , for some positive constant k. Show that, if the equilibrium condition
Yt = Ct + It holds, then
Yt

k
k+1
Yt1 + Yt2 = 0.
2
2

(b) In the model set up in part (a), suppose that k = 3 and that the initial value Y0 is
positive. Show that Yt oscillates with increasing magnitude.
(c) Find the values of k for which the model set up in part (a) leads to an oscillating Yt ,
and determine whether or not the oscillations increase in magnitude. (Remember we are
given that k > 0.)

165

6. Sequences, series and difference equations

Comments on selected activities


Feedback to activity 6.1
The required amount is 1000(1 + 0.07)4 = 1310.80 dollars.
Feedback to activity 6.2
We have
1
n
5
3
Sn = n (2(1) + (n 1)5) = (5n 3) = n2 n.
2
2
2
2
Feedback to activity 6.3
Noting that there are n + 1 terms in the sum (not n: be careful!), and that it is the sum
of a geometric progression with first term 2 and common ratio 3, the expression is
2(1 3n+1 )
= 3n+1 1.
13

Feedback to activity 6.4


St is the sum of the first t terms of a geometric progression with first term 2/3 and
common ratio 2/3, so
St =

1 (2/3)t
=2 1
1 (2/3)

2
3

2
3

As t , (2/3)t 0 and so St 2.
Feedback to activity 6.5
For yt = (2/3)yt1 + 5, we have a = 2/3 and b = 5. Then,
y =

b
5
=
= 15,
1a
1 (2/3)

and the solution is

yt = y + (y0 y )a = 15 + (2 15)
Feedback to activity 6.6
When a = 1, we have

2
3

2
= 15 13
3

yt = y + (y0 y )(1)t

and y alternately takes two values: it flips between the value y + (y0 y ) = y0
(when t is even) and the value y (y0 y ) = 2y y0 (when t is odd).
Feedback to activity 6.7
The auxiliary equation is
z 2 5z 14 = (z + 2)(z 7) = 0,
with solutions 2 and 7. The homogeneous equation
yt 5yt1 14yt2 = 0

166

6.14. Comments on exercises

therefore has general solution yt = A(2)t + B(7t ). A particular solution of the


non-homogeneous equation is the constant solution y = 18/(1 5 14) = 1, so this
equation has general solution
yt = 1 + A(2)t + B7t .
To find the values of A and B we use the given values of y0 and y1 . Since y0 = 1, we
must have 1 + A + B = 1 and since y1 = 8, 1 2A + 7B = 8. Solving these, we
obtain A = 1 and B = 1, and therefore
yt = 1 (2)t + 7t .

Comments on exercises
Solution to exercise 6.1
We know that the sum to infinity is given by the formula a/(1 x) and that y1 = ax.
Therefore, the given information is
a
2
= 3, ax = .
1x
3
From the first equation, a = 3(1 x) and the second equation then gives
3(1 x)x = 2/3, from which we obtain the quadratic equation 9x2 9x + 2 = 0. This
has the two solutions x = 2/3 and x = 1/3. The corresponding values of the first term a
(given by a = 3(1 x)) are 1 and 2, respectively. So, as suggested by the question, there
are two geometric progressions that have the required sum to infinity and second term.
Solution to exercise 6.2
(a) After 1 year, at the beginning of the second, the amount A1 in the account is
A0 (1 + i/100) + F , because the initial amount A0 has attracted interest at rate i/100
and F has been added. Similar considerations show that
A2 =
=

i
A1 + F
100
i
i
1+
A0 1 +
100
100
1+

= A0 1 +

i
100

A3 =

1+

i
A2 + F
100

1+

i
100

= A0

i
1+
100

A0 1 +
3

+F

+F

i
100

1+

+F
i
100

+F

i
1+
100

1+

+F
+ F,

i
100

+F

1+

+F
i
100

+F

+ F.

167

6. Sequences, series and difference equations

In general, if we continued, we could see that


An = A0 1 +

i
100

+F

1+

i
100

++F

1+

i
100

n1

+F

1+

i
100

n2

++F

1+

i
100

+ F.

Now,
F

1+

i
100

n1

i
++F
100
F (1 (1 + i/100)n )
=
1 (1 + i/100)
n
i
100F
=
1+
1 ,
i
100

+F = F +F

1+

1+

i
100

where we have used the formula for the sum of a geometric progression. Therefore
An = A0

i
1+
100

100F
+
i

i
1+
100

1 .

For (b), we use the formula just obtained, with A0 = 10000, i = 10, F = 5000 and
n = 5, and we see that
A5 = 10000 1 +

10
100

100(5000)
10

1+

10
100

= 10000 (1.1)5 + 50000 (1.1)5 1


= 46630.60.
Now, for the balance to be at least $80, 000 after N years, we need AN 80000 which
means
10000 (1.1)N + 50000 (1.1)N 1 80000.
This is equivalent, after a little manipulation, to
60000(1.1)N 130000,
or (1.1)N 13/6. To solve this, we can take logarithms and see that we need
N ln(1.1) ln(13/6),
so
N

ln(13/6)
,
ln(1.1)

as required.
Solution to exercise 6.3
We use the fact that if the interest is paid in m equally spaced instalments, then the
m
total after one year is 1000 1 + mr , where r = 0.1 and m = 1, 4, 12, 365 in the four
cases. Therefore the answers to the first four parts of the problem are as follows:

168

n1

6.14. Comments on exercises

(a) 1000 (1 + 0.1) = 1100.


0.1
(b) 1000 1 +
4
(c) 1000 1 +

0.1
12

= 1000(1.025)4.
12

.
365

0.1
(d) 1000 1 +
.
365
For the last part, we use the fact that under continuous compounding at rate r, an
amount P grows to P er after one year, so the answer here is 1000e0.1 .
Solution to exercise 6.4
Note that 1/22i = 1/4i = (1/4)i , so this is a geometric series where the common ratio is
1/4. The first term is 1, and there are t terms, so
4
1 (1/4)t
St =
=
1 (1/4)
3

1
4

As t , (1/4)t 0 and so St 4/3.


Solution to exercise 6.5
Since 5% of the yt1 businesses in operation at the start of year t fail during that year,
it follows that 95% of these survive. Additionally, N new businesses are created, so
yt = 0.95yt1 + N.
This is a first-order difference equation with, in the standard notation, a = 0.95 and
b = N. Also, from the given information, y0 = 3000. The time-independent solution is
y =

b
N
N
=
=
= 20N
1a
1 0.95
0.05

and the solution is


yt = y + (y0 y )(0.95)t = 20N + (3000 20N)(0.95)t .
There are several ways to solve the last part of the question. Perhaps the easiest way is
to notice that since (0.95)t decreases with t, yt will increase with t if and only if the
number, 3000 20N multiplying (0.95)t is negative. So we need 3000 20N < 0, or
N > 150.
Solution to exercise 6.6
The equilibrium price is given by 0.05p 4 = 20 0.15p, so p = 120. The inverse
demand function is obtained by solving the equation q = 20 0.15p for p, so
pD (q) = p =

400 20
q.
3
3

169

6. Sequences, series and difference equations

Now,
pt = pD (q S (pt1 )) = pD (0.05pt1 4) =

1
400 20

(0.05pt1 4) = 160 pt1 .


3
3
3

This has time-independent solution


p =

16
= 120,
1 (1/3)

which is the equilibrium price. The solution for pt is


pt = 120 + (p0 120)

1
3

Solution to exercise 6.7


We have

pt+1 pt = a ((1 pt ) pt ) = a 2apt ,


so pt+1 = (1 2a)pt + a, which is entirely equivalent to the equation
pt = (1 2a)pt1 + a.
Now, the time-independent solution is
p =

1
a
= ,
1 (1 2a)
2

and so
pt = p + (p0 p )(1 2a)t =

1
+
2

3 1

4 2

(1 2a)t =

1 1
+ (1 2a)t .
2 4

The equilibrium price is given by 1 p = p, and so is 1/2. From our expression for pt ,
we see that pt 1/2 as t if and only if (1 2a)t 0. For this to be true, we need
1 < 1 2a < 1, which is equivalent to 0 < a < 1. The price will oscillate towards 1/2
when, additionally, 1 2a is negative. So this happens when 1/2 < a < 1. When
a = 1/2, 1 2a = 0 and the price pt equals 1/2 for all t.
Solution to exercise 6.8
The auxiliary equation is
z 2 z 6 = (z 3)(z + 2) = 0,
so for some constants A and B
yt = A3t + B(2)t .

170

6.14. Comments on exercises

Solution to exercise 6.9


(a) There are a number of ways of deriving the difference equation. We note first that
substituting It = Yt+1 /k in the equation Yt = Ct + It gives
1
Yt = Ct + Yt+1 .
k
Thus,
Ct = Y t
and so (replacing t by t 1),

Yt+1
k

Yt
.
k
Substituting in the equation Ct = (Yt + Ct1 )/2, we get
Ct1 = Yt1
1
2

Yt Yt+1 /k =

Yt + Yt1

Yt
k

Rearranging and replacing t by t 1, we obtain the second-order difference equation


Yt

k
k+1
Yt1 + Yt2 = 0.
2
2

(b) When k = 3, the difference equation is


3
Yt 2Yt1 + Yt2 = 0.
2
The auxiliary equation z 2 2z + (3/2) = 0 has no solutions, so the solution for Yt is
Yt = r t (E cos t + F sin t) ,
where r =

3/2 and
cos = (2)/(2

3/2) =

2/3.

Since E = Y0 , and Y0 is positive, we have E > 0. Also r > 1, so Yt oscillates with


increasing magnitude.
(c) In general, the auxiliary equation for the difference equation is
k+1
k
z + = 0.
2
2

z2
This has no solutions if

k+1
2

<4

k
2

that is,
(k + 1)2 < 8k.
In this case the general solution is of the form
Yt =

k
2

(E cos t + F sin t) .

171

6. Sequences, series and difference equations

This solution is oscillatory.


Suppose that (k + 1)2 > 8k. Then the auxiliary equation has the roots
, =

(k + 1)/2

(k + 1)2 /4 2k
,
2

and both of these are positive. Then the solution is of the form
Yt = At + B t ,
and since and are positive, in this case there can be no oscillatory behaviour. The
same holds true when (k + 1)2 = 8k.
We have shown that oscillations occur when (k + 1)2 < 8k, in other words when k lies
strictly between the roots of the equation (k + 1)2 = 8k. Rewriting this as the quadratic
equation k 2 6k + 2 = 0, we find that the roots are

3 2 2 and 3 + 2 2.

So the model predicts that, when k is between these two numbers, the national income
Yt will oscillate. (In economics language, it will exhibit business cycles.)
Whether the oscillations increase or decrease in magnitude depends on k. Since the
solution involves the factor ( k/2)t , the oscillations decrease if k/2 < 1 that is, if
k < 2 and increase if k > 2.

172

Chapter 7
Vector spaces
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 5.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapter 4, Sections 4.14.2.

Aims of the chapter


In this chapter we study the important theoretical concept of a vector space. This, and
the related concepts to be explored in the subsequent chapters, will help us understand
much more deeply and comprehensively what weve already learned about matrices and
linear equations. There is, necessarily, a bit of a step upwards in the level of
abstraction, but it is worth the effort in order to help our fundamental understanding.

7.1 Vector spaces


7.1.1

Definition of a vector space

We know that vectors of Rn can be added together and that they can be scaled by real
numbers. That is, for every x, y Rn and every R, it makes sense to talk about
x + y and x. Furthermore, these operations of addition and multiplication by a scalar
(that is, multiplication by a real number) behave and interact sensibly, in that, for
example,
(x + y) = x + y,
(x) = ()x,
x + y = y + x,
and so on.
But it is not only vectors in Rn that can be added and multiplied by scalars. There are
other sets of objects for which this is possible. Consider the set V of all functions from
R to R. Then any two of these functions can be added: given f, g V we simply define
the function f + g by
(f + g)(x) = f (x) + g(x).

173

7. Vector spaces

Also, for any R, the function f is given by


(f )(x) = f (x).
These operations of addition and scalar multiplication are sometimes said to be
pointwise addition and pointwise scalar multiplication. This might seem a bit abstract,
but think about what the functions x + x2 and 2x represent: the former is the function x
plus the function x2 , and the latter is the function x multiplied by the scalar 2. So this
is just a different way of looking at something you are already familiar with. It turns
out that V and its rules for addition and multiplication by a scalar satisfy the same key
properties as the set of vectors in Rn with its addition and scalar multiplication. We
refer to a set with an addition and scalar multiplication which behave appropriately as
a vector space. We now give the formal definition of a vector space.
Definition 7.1 (Vector space) A (real) vector space V is a non-empty set equipped
with an addition operation and a scalar multiplication operation such that for all
, R and all u, v, w V ,
1. u + v V

(closure under addition)

2. u + v = v + u

(the commutative law for addition)

3. u + (v + w) = (u + v) + w

(the associative law for addition)

4. there is a single member 0 of V , called the zero vector, such that for all v V ,
v+0=v
5. for every v V there is an element w V (usually written as v), called the
negative of v, such that v + w = 0
6. v V

(closure under scalar multiplication)

7. (u + v) = u + v

(distributive law)

8. ( + )v = v + v

(distributive law)

9. (v) = ()v

(associative law for scalar multiplication)

10. 1v = v.
Other properties follow from those listed in the definition. For instance, we can see that
0x = 0 for all x, as follows:
0x = (0 + 0)x = 0x + 0x,
so, adding the negative 0x of 0x to each side,
0 = 0x + (0x) = (0x + 0x) + (0x) = 0x + (0x + (0x)) = 0x + 0 = 0x.
(A bit sneaky, but just remember the result: 0x = 0.)

174

7.1. Vector spaces

Activity 7.1 Prove that (1)x = x, the negative of the vector x, using a similar
argument with 0 = 1 + (1).
(Note that this definition says nothing at all about multiplying together two vectors:
the only operations with which the definition is concerned are addition and scalar
multiplication.)
A vector space as we have defined it is called a real vector space, to emphasise that the
scalars , and so on are real numbers rather than (say) complex numbers. There is a
notion of complex vector space, where the scalars are complex numbers, which we shall
not cover. In this guide all scalars will be real numbers.

7.1.2

Examples

Example 7.1 The set Rn is a vector space with the usual way of adding and scalar
multiplying vectors.
Example 7.2 The set V = {0} consisting only of the zero vector is a vector space,
with addition defined by 0 + 0 = 0, and scalar multiplication defined by 0 = 0 for
all R.
Example 7.3 The set V of functions from R to R with pointwise addition and
scalar multiplication (described earlier in this section) is a vector space. Note that
the zero vector in this space is the function that maps every real number to 0
that is, the identically-zero function.
Activity 7.2 Show that all 10 properties of a vector space are satisfied. In
particular, if the function f is a vector in this space, what is the vector f ?
Example 7.4 The set of m n matrices with real entries is a vector space, with
the usual addition and scalar multiplication of matrices. The zero vector in this
vector space is the zero m n matrix which has all entries equal to 0.
Example 7.5 Let V be the set of all vectors in R3 with third entry equal to 0, that
is,

V = y : x, y R .

Then V is a vector space with the usual addition and scalar multiplication. To verify
this, we need only check that V is closed under addition and scalar multiplication.
The associative, commutative and distributive laws (properties 2, 3, 7, 8, 9, 10) will
hold for vectors in V because they hold for all vectors in R3 (and all linear
combinations of vectors in V are in V ). Furthermore, if we can show that V is
closed under scalar multiplication, then for any particular v V , 0v = 0 V and
(1)v = v V . So we simply need to check that V = (V is non-empty), that if

175

7. Vector spaces

u, v V then u + v V , and if R and v V then v V . Each of these is


easy to check.
Activity 7.3 Verify that V = , and that for u, v V and R, u + v V and
v V .

7.2 Subspaces
The last example above is informative. Arguing as we did there, if V is a vector space
and W V is non-empty and closed under scalar multiplication and addition, then W
too is a vector space (and we do not need to verify that all the other properties hold).
The formal definition of a subspace is as follows.
Definition 7.2 (Subspace) A subspace W of a vector space V is a non-empty subset
of V that is itself a vector space (under the same operations of addition and scalar
multiplication as V ).
The discussion given justifies the following important result.

Theorem 7.1 Suppose V is a vector space. Then a non-empty subset W of V is a


subspace if and only if:
for all u, v W , u + v W (W is closed under addition), and
for all v W and R, v W (W is closed under scalar multiplication).
Example 7.6 In R2 , the lines y = 2x and y = 2x + 1 can be defined as the sets of
vectors,
S=

x
y

: y = 2x, x R ,

U=

x
y

: y = 2x + 1, x R .

Each vector in one of the sets is the position vector of a point on that line. We will
show that the set S is a subspace of R2 , and that the set U is not a subspace of R2 .
If v =

1
2

and p =

0
, these sets can equally well be expressed as,
1

S = {x : x = tv, t R}

U = {x : x = p + tv, t R}.

Activity 7.4 Show that the two descriptions of S describe the same set of vectors.
Example 7.6 (continued)
To show S is a subspace, we need to show that it is non-empty, and we need to show
that it is closed under addition and closed under scalar multiplication using any

176

7.2. Subspaces

vectors in S and any scalar in R. Well use the second set of definitions, so our line is
the set of vectors
1
S = {x : x = tv, t R}
v=
.
2
The set S is non-empty, since 0 = 0v S.
Let u, w be any vectors in S and let R. Then
1
2

u=s

w=t

1
2

for some s, t R.

closure under addition:


u+w =s

1
2

+t

1
2

1
2

= (s + t)

(since s + t R).

closure under scalar multiplication:


1
2

u = s

= (s)

1
2

(since s R).

This shows that S is a subspace of R2 .


To show U is not a subspace, any one of the three following statements
(counterexamples) will suffice.

1. 0
/ U.

2. U is not closed under addition:


0
1

1
3

but

0
1

1
3

1
4

/U

since 4 = 2(1) + 1.
3. U is not closed under scalar multiplication:
0
1

U,

2 R but 2

0
1

0
2

/U

Activity 7.5 Show that 0


/ U. Explain why this suffices to show that U is not a
subspace.
Example 7.6 (continued)
The line y = 2x + 1 is an example of an affine set, a translation of a subspace.
It is useful to visualise what is happening here by looking at the graphs of the lines
y = 2x and y = 2x + 1. Sketch y = 2x and sketch the position vector of any point on
the line. You will find that the vector lies along the line, so any scalar multiple of
that position vector will also lie along the line, as will the sum of any two such
position vectors. These position vectors are all still in the set S. Now sketch the line
y = 2x + 1. First notice that it does not contain the origin. Now sketch the position

177

7. Vector spaces

vector of any point on the line. You will find that the position vector does not lie
along the line, but goes from the origin up to the point on the line. If you scalar
multiply this vector by any constant = 1, it will be the position vector of a point
which is not on the line, so the resulting vector will not be in U. The same is true if
you add together the position vectors of two points on the line. So U is not a
subspace.
Activity 7.6 Let v be any non-zero vector in a vector space V . Show that the set
S = {v : R}
is a subspace of V . The set S defines a line through the origin in V .
If V is a vector space, the sets V and {0} are subspaces of V . The set {0} is not empty,
it contains one vector, namely the zero vector. It is a subspace because 0 + 0 = 0 and
0 = 0 for any R.

Given any subset S of a vector space V , how do you decide if it is a subspace? First
check that 0 S. Then using some vectors in the subset, see if adding them and scalar
multiplying them will give you another vector in S. To prove that S is a subspace, you
will need to verify that it is closed under addition and closed under scalar multiplication
for any vectors in S, so you will need to use letters to represent general vectors, or
components of general vectors, in the set. That is, using letters show that the sum u + v
and the scalar product u of vectors in S also satisfy the definition of a vector in S.

To prove a set S is not a subspace you only need to find one counterexample, one or two
particular vectors (use numbers) for which the sum or the scalar product does not
satisfy the definition of S. Note that if 0 is not in the set, it cannot be a subspace.
Activity 7.7 Write down a general vector (using letters) and a particular vector
(using numbers) for each of the following subsets. Show that one of the sets is a
subspace of R3 and the other is not.

S1 =
x
: xR ,
S2 =
2x
: xR

0
0

7.2.1

An alternative characterisation of a subspace

We have seen that a subspace is a non-empty subset W of a vector space that is closed
under addition and scalar multiplication, meaning that if u, v W and R, then
both u + v and v are in W . Now, it is fairly easy to see that the following equivalent
property characterises when W will be a subspace:
Theorem 7.2 A non-empty subset W of a vector space is a subspace if and only if for
all u, v W and all , R, we have u + v W .
That is, W is a subspace if it is non-empty and closed under linear combination.

178

7.3. Subspaces connected with matrices

7.3 Subspaces connected with matrices


7.3.1

Null space

Suppose that A is an m n matrix. Then the null space N(A), the set of solutions to
the homogeneous linear system Ax = 0, is a subspace of Rn .
Theorem 7.3 For any m n matrix A, N(A) is a subspace of Rn .
Proof
To prove this we have to verify that N(A) = , and that if u, v N(A) and R,
then u + v N(A) and u N(A). Since A0 = 0, 0 N(A) and hence N(A) = .
Suppose u, v N(A). Then to show u + v N(A) and u N(A), we must show that
u + v and u are solutions of Ax = 0. We have
A(u + v) = Au + Av = 0 + 0 = 0
and
A(u) = (Au) = 0 = 0,
so we have shown what we needed.
The null space is the set of solutions to the homogeneous linear system. If we instead
consider the set of solutions S to a general system Ax = b, S is not a subspace of Rn if
b = 0 (that is, if the system is not homogeneous). This is because 0 does not belong to
S. However, as we saw in Chapter 3 (Theorem 3.2), there is a relationship between S
and N(A): if x0 is any solution of Ax = b then S = {x0 + z : z N(A)}, which we may
write as x0 + N(A). S is an affine set, a translation of the subspace N(A).
Generally, if W is a subspace of a vector space V and x V then the set x + W defined
by
x + W = {x + w : w W }
is called an affine subset of V . An affine subset is not generally a subspace (although
every subspace is an affine subset, as we can see by taking x = 0).

7.3.2

Range

Recall that the range of an m n matrix is


R(A) = {Ax : x Rn }.
Theorem 7.4 For any m n matrix A, R(A) is a subspace of Rm .
Proof
The set R(A) is non-empty as A0 = 0 R(A). We need to show that if u, v R(A)
then u + v R(A) and, for any R, v R(A). So suppose u, v R(A). Then for
some y1 , y2 Rn , u = Ay1 , v = Ay2 . We need to show that u + v = Ay for some y.
Well,
u + v = Ay1 + Ay2 = A(y1 + y2 ),

179

7. Vector spaces

so we may take y = y1 + y2 to see that, indeed, u + v R(A). Next,


v = (Ay1 ) = A(y1 ),
so v = Ay for some y (namely y = y1 ) and hence v R(A).

7.4 Linear span


Recall that by a linear combination of vectors v1 , v2 , . . . , vk we mean a vector of the
form
v = 1 v1 + 2 v2 + + k vk .
If we add together two vectors of this form, we get another linear combination of the
vectors v1 , v2 , . . . , vk . The same is true of any scalar multiple of v.
Activity 7.8 Show this; show that if v = 1 v1 + 2 v2 + + k vk and
w = 1 v1 + 2 v2 + + k vk then v + w and sv, s R, are also linear
combinations of the vectors v1 , v2 , . . . , vk .

The set of all linear combinations of a given set of vectors of a vector space V forms a
subspace, and we give it a special name.
Definition 7.3 (Linear span) Suppose that V is a vector space and that
v1 , v2 , . . . , vk V . The linear span of X = {v1 , . . . , vk } is the set of all linear
combinations of the vectors v1 , . . . , vk , denoted by Lin{v1 , v2 , . . . , vk } or Lin(X) .
That is,
Lin{v1 , v2 , . . . , vk } = {1 v1 + + k vk : 1 , 2 , . . . , k R}.
Theorem 7.5 If X = {v1 , . . . , vk } is a set of vectors of a vector space V , then Lin(X)
is a subspace of V . It is the smallest subspace containing the vectors v1 , v2 , . . . , vk .
Proof
The set Lin(X) is non-empty, since 0 = 0v1 + . . . + 0vk Lin(X). If you have carefully
carried out the activity above, then you have shown that Lin(X) is closed under
addition and scalar multiplication. Therefore, it is a subspace of V . Furthermore, any
vector space which contains the vectors v1 , v2 , . . . , vk must also contain all linear
combinations of these vectors, so it must contain Lin(X). That is, Lin(X) is the smallest
subspace of V containing v1 , v2 , . . . , vk .
The subspace Lin(X) is also known as the subspace spanned by the set
X = {v1 , . . . , vk }, or, simply, as the span of {v1 , v2 , . . . , vk }.

Different texts may use different notations for the linear span of a set of vectors.
Notation is important, but it is nothing to get anxious about: just always make it clear
what you mean by your notation: use words as well as symbols!

180

7.4. Linear span

7.4.1

Lines and planes in R3

What is the set Lin{v} of a single non-zero vector v Rn ? We have already seen that
this defines a line through the origin in any vector space, Rn , as we have
Lin{v} = {v : R},
and in Activity 7.6 on page 178, you proved that this is a subspace for any vector
space, V .
In Chapter 2 (section 2.11) we saw that a plane in R3 can be defined as the set of all
vectors x = (x, y, z)T whose components satisfy a single Cartesian equation,
ax + by + cz = d, or as the set of all vectors x which satisfy a vector equation with two
parameters, x = p + sv + tw, s, t R, where v and w are non-parallel vectors and p is
the position vector of a point on the plane. These definitions are equivalent as it is
possible to go from one representation of a given plane to the other.
If d = 0, the plane contains the origin, so, taking p = 0, the plane is the set of vectors
{x : x = sv + tw, s, t R}.

Since this is the linear span, Lin{v, w}, of two vectors in R3 , a plane through the origin
is a subspace of R3 .
Lets look at a specific example.
Example 7.7 Let S be the set given by

S=
y
: 3x 2y + z = 0 .

Then for x S,





0
1
x
x
0
x

y
=
0
+
y
=x
0
+ y 1.
x= y =
2
3x
2y
3
z
2y 3x

That is, x = xv1 + yv2 where x, y can be any real numbers and v1 , v2 are the
vectors given above. Since S is the linear span of two vectors, it is a subspace of R3 .
Of course, you can show directly that S is a subspace by showing it is non-empty,
and closed under addition and scalar multiplication.

If d = 0 then the plane is not a subspace. It is an affine set, a translation of a linear


space.
Activity 7.9 Show this in general. Show that the set

S=
y
: ax + by + cz = d

is a subspace if d = 0 and it is not a subspace if d = 0. Do this by showing that S is


non-empty, and that the set S as defined above is closed under addition and scalar
multiplication if d = 0, but not if d = 0.

181

7. Vector spaces

Conversely, if v1 , v2 R3 are two non-parallel vectors, then the set K = Lin{v1 , v2 } is a


plane, and we can obtain its Cartesian equation. Let us return to Example 7.7.
Example 7.7 (continued)
The plane is the set of all linear combinations of v1 and v2 , that is, all vectors x
such that


x
1
0

x= y =s
0
+t 1
s, t R.
z
3
2

This yields three equations in the two unknowns, s and t. Eliminating s and t from
these equations yields a single Cartesian equation between the variables x, y, z:

x=s

y=t
= z = 3x + 2y or 3x 2y + z = 0.

z = 3s + 2t
In the same way as for planes in R3 , any hyperplane in Rn which contains the origin is a
subspace of Rn . You can show this directly, exactly as in the activity above, or you can
show it is the linear span of n 1 vectors in Rn .

7.4.2

Row space and column space

In the previous chapter we observed that the range R(A) of an m n matrix A is equal
to the set of all linear combinations of its columns. (See section 5.4.) In other words,
R(A) is the span of the columns of A and is often called the column space of A and
denoted by CS(A).
It is also possible to consider the row space RS(A) of a matrix: this is the span of the
rows of A. If A is an m n matrix the row space is a subspace of Rn and the column
space is a subspace of Rm .

Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by a vector space and a subspace
prove that a given set is a vector space, or a subspace of a given vector space
explain what is meant by the linear span of a set of vectors.

182

7.4. Exercises

Exercises
Exercise 7.1
Let
S=

x1
x2

: x2 = 3x1 .

Prove that S is a subspace of R2 (where the operations of addition and scalar


multiplication are the usual ones).
Exercise 7.2
Which of the following are subspaces of R3 ?

S1 = y x + y + z = 0
S2 = y x2 + y 2 + z 2 = 1

z
z

S3 = y x = 0
S4 = y xy = 0

z
z

x
0

S5 =
y
x = 0 and y = 0 =
0
zR

z
z

Provide proofs or counterexamples to justify your answers. Describe the sets


geometrically.
Exercise 7.3
Let

2
1
u = 1 and v = 1 .
1
3

Determine which of the vectors below are in Lin{u, v}, and for each such vector, express
it as a linear combination of u and v.

3
0
7
a = 2 ,
b = 0,
c = 5 .
4
0
7
Exercise 7.4
Let V be the vector space of all functions from R R with pointwise addition and
scalar multiplication. Let n be a fixed positive integer and let W be the set of all real
polynomial functions of degree at most n; that is, W consists of all functions of the form
f (x) = a0 + a1 x + a2 x2 + + an xn , where a0 , a1 , . . . , an R.
Prove that W is a subspace of V , under the usual pointwise addition and scalar
multiplication for real functions. Find a finite set of functions which spans W .

183

7. Vector spaces

Exercise 7.5
Show that the set S1 spans R3 , but any vector v R3 can be written as a linear
combination of the vectors in S1 in infinitely many ways. Show that S2 and S3 do not
span R3 .


1
0
1
1

S1 =
2 ,
0
, 1 , 1 ,

3
1
1
0



2
1
2
1
1

, S3 =
1 , 0 .
S2 =
0
, 1 , 2

1
1
1
3
9
Exercise 7.6

(a) Solve the equation to find the coefficients and by finding A1 :


2
5

1
2

1
1

1 1
2 1

= Ax

1
1
,
= R2 . That is, show any vector
2
1
b R2 can be expressed as a linear combination of w1 and w2 by solving b = Ax for x:

(b) Show that Lin{w1 , w2 } = Lin

b1
b2

1
2

1
1

1 1
2 1

= Ax

(c) Show, in general, that if v and w are non-zero vectors in R2 , with v = (a, c)T and
w = (b, d)T , then
Lin{v, w} = R2 v = tw for any t R

a b
= 0.
c d

Comments on selected activities


Feedback to activity 7.1
For any x,
0 = 0x = (1 + (1))x = 1x + (1)x = x + (1)x
so adding the negative x of x to each side, and using properties 3 and 4 of the
definition of a vector space,
x = x + 0 = x + x + (1)x = (1)x
which proves that x = (1)x.
Feedback to activity 7.2
The properties are not hard to check: we omit the details here. The negative of a
function f is the function f given by (f )(x) = (f (x)) for all x.

184

7.4. Comments on selected activities

Feedback to activity 7.3


Clearly V = . Suppose



x
x

u = y , v = y V,
0
0

and that R. Then

x + x
u + v = y + y V
0

and

Feedback to activity 7.5


The vector 0 is not in the set U as
0=

0
0

x
v = y V.
0

0
1

+t

1
2

for any t R,

so property 4 of the definition of a vector space is not satisfied (definition 7.1).


Feedback to activity 7.6
Note first that S is non-empty because 0 S. Suppose that x, y S. (Why are we not
using the usual symbols u and v? Can you see? Its because v is already used in the
definition of the set S.) Suppose also that R. Now, because x and y belong to S,
there are , R such that x = v and y = v. Then,
x + y = v + v = ( + )v,

which is in S since it is a scalar multiple of v. Also,


x = (v) = ()v S
and it follows that S is a subspace.

Feedback to activity 7.7
x
A general vector in S1 is of the form x2 , x R, and one particular vector, taking
0

1

x = 1, is 1 . To show S1 is not a subspace you need to find one counterexample, one


0
or two particular vectors in S1 which do not satisfy the closure properties, such as


1
1
1
2
1 S1 but 1 + 1 = 2
/ S1 .
0
0
0
0

x

A general vector in S2 is of the form 2x , x R, and one particular vector, taking


0

1

x = 1, is 2 . S2 is a subspace. We show it is closed under addition and scalar


0

185

7. Vector spaces

multiplication using general vectors: if u, v S2 , a, b R,


a
b
a+b
u + v = 2a + 2b = 2(a + b) S2 ,
0
0
0

and, if R,

a
u = 2(a) S2 ,
0

a + b R,

a R.

Feedback to activity 7.8


Any two such vectors will be of the form

v = 1 v1 + 2 v2 + + k vk
and
and we will have

v = 1 v1 + 2 v2 + + k vk

v + v = (1 + 1 )v1 + (2 + 2 )v2 + + (k + k )vk ,

which is a linear combination of the vectors v1 , v2 , , vk . Also,


v = (1 v1 + 2 v2 + + k vk ) = (1 )v1 + (2 )v2 + + (k )vk ,
is a linear combination of the vectors v1 , v2 , , vk .
Feedback to activity 7.9
Its easy to see that S = . Suppose d = 0. Let u, v S and R. Then


x
x

u = y , v = y ,
z
z

where ax + by + cz = 0 and ax + by + cz = 0. Consider u + v. This equals


X
x + x
Y = y + y
Z
z + z

and we want to show this belongs to S. Now, this is the case, because
aX + bY + cZ = a(x + x ) + b(y + y ) + c(z + z )
= (ax + by + cz) + (ax + by + cz ) = 0 + 0 = 0,

and similarly it can be shown that, for any R, v S. So, in this case, S is a
subspace. You can see why this argument fails when d is not 0; for, then aX + bY + cZ
will equal 2d, which will not be the same as d. So we will not have u + v S. (Similarly,
we could see that v will not be in S if = 1.) Also, if d = 0, the simple statement that
0 does not satisfy the equation means that in this case S is not a subspace.

186

7.4. Comments on exercises

Comments on exercises
Solution to exercise 7.1
Since 0 S, the set S = . You need to show that for any R and u, v S, u S
and u + v S. Both are reasonably straightforward, and the details are omitted.
Solution to exercise 7.2

The set S1 = y x + y + z = 0
is a subspace of R3 because it is a plane

z
through the origin. You should also be able to prove this directly using the definition of
S1 , in the same way as you did for Activity 7.9 on page 181.

2
2
2

The set S2 =
y
x +y +z =1
is NOT a subspace. To prove this you only

z
need one counterexample. For example, 0
/ S2 . This set is the unit sphere (the surface
of the unit ball) centred on the origin.

The set S3 = y x = 0
is a subspace.

z
To prove this let u, v S3 , R. S3 is non-empty since 0 S3 .

0
0
0
0
u = u2 , v = v2 . Then u + v = u2 + v2 S3 , u = u2 S3
u3
v3
u3 + v3
u3

Hence the set is non-empty and closed under addition and closed under scalar
multiplication, so it is a subspace of R3 . This subspace is the yz-plane (a plane through
the origin).

The next set, S4 =


y
xy = 0 is non-empty, since 0 S4 , and it is closed under

z
scalar multiplication, but it is NOT a subspace.
To prove this, take for example, u = (1, 0, 2)T S4 and v = (0, 1, 1)T S4 . These
vectors are in S4 since they satisfy the condition which defines S4 . But

1
0
1

u + v = 0 + 1 = 1
/ S4 .
2
1
3

The set is not closed under addition, so it is not a subspace. This set is the union of the
xz-plane and the yz-plane.

x
0

The set S5 = y x = 0 and y = 0 = 0 z R


is a subspace. This set

z
z
is the intersection of the xz-plane and the yz-plane, which is the z-axis. This is a line

187

7. Vector spaces

through the origin, and therefore, a subspace. You should also be able to prove this
directly from the definition of S5 .
Solution to exercise 7.3
You need to determine if each of the given vectors can be expressed as a linear
combination of u and v. Each vector equation is equivalent to a system of three
equations in two unknowns. You can solve them using Gaussian elimination, or, in these
simple cases, just solve the equations directly.
For the vector a, set

2
1
3
u + v = 1 + 1 = 2 .
1
3
4

Then

1 1
2 1 3
(A|a) = 1 1 2 . . . 0 1
1
3
4
0 0

The system is inconsistent, there is no solution, so a


/ Lin{u, v}.

2
1 .
6

Clearly 0 Lin{u, v}, and 0 = 0u + 0v.

Solving the system

2
1
7
1 + 1 = 5 ,
1
3
7

you should obtain the unique solution, = 2, = 3. Then c = 2u 3v Lin{u, v}.


Solution to exercise 7.4
Clearly, W = . We need to show that for any R and f, g W , f W and
f + g W . Suppose that
f (x) = a0 + a1 x + a2 x2 + + an xn , g(x) = b0 + b1 x + b2 x2 + + bn xn .
Then
(f + g)(x) = f (x) + g(x) = (a0 + b0 ) + (a1 + b1 )x + + (an + bn )xn ,
so f + g is also a polynomial of degree at most n and therefore f + g W . Similarly,
(f )(x) = f (x) = (a0 ) + (a1 )x + + (an )xn ,
so f W also. It can be seen that the set of functions {1, x, x2 , . . . , xn } spans W ,
where 1 denotes the function that is identically equal to 1 (that is, the function f with
f (x) = 1 for all x).
Solution to exercise 7.5
To show that

188



1
1
0
1
x

2 +
0
+ 1 + 1 = y
3
1
1
0
z

7.4. Comments on exercises

has infinitely many solutions for any vector b R3 , and therefore spans R3 , you need
to look at the solutions of Ax = b:

1 1 0 1
x

y .
x=
b
=
A = 2 0 1 1

z
3 1 1 0

If the row echelon form of the coefficient matrix A has three leading ones, then there
will always be infinitely many solutions: a solution will exist since a leading one in each
row of A means an augmented system cannot be inconsistent and there will be infinitely
many solutions since there is one free (non-leading) variable.
Row reducing A (not

1 1
A = 2 0
3 1

all steps are shown),

0 1
1 1
0
1
1 1 0
1 1 0 2 1 1 0 1 21
1 0
0 4 1 3
0 0 1

1
1
2

The row echelon form of the matrix A has three leading ones. Therefore, the set S1
spans R3 , and any vector b R3 can be written as a linear combination of the vectors
in S1 in infinitely many ways.
S2 does not span R3 . Since

1 2 1
|B| = 0 1 2 = 1(9 6) 1(4 1) = 0
1 3 9
the reduced echelon form of B will contain a row of zeros. There will be some b R3
for which the system is inconsistent.
For example, the vector b = ( 0 0 1 )T , cannot be expressed as a linear combination
of the vectors in S2 as the system Bx = b has no solution, since

1 2 1 0
1 2 1 0
1 2 1 0
0 1 2 0 0 1 2 0 0 1 2 0 .
1 3 9 1
0 5 10 1
0 0 0 1
S3 does not span R3 . At least three vectors are required to span R3 .
Solution to exercise 7.6
1 1
(a) A =
|A| = 3 A1 exists. Then
2 1
=

1
3

1 1
2 1

2
5

1
3

(b) Since A1 exists, the equation b = Ax,


b1
b2

1
2

1
1

189

7. Vector spaces

has solution x = A1 b,

1
3

1 1
2 1

b1
b2

1
b
3 1
2
b
3 1

+ 13 b2
13 b2

for any b R2 .

(c) Lin{v, w} = R2 v + w = b has a solution for any b R2 |A| = 0.

So it remains to show that |A| = 0 b = tw for any t R, or equivalently, that


|A| = 0 v = tw for some t R.
If none of a, b, c, d equals zero,

a
c
a b
= 0 ad = bc
= = s a = sb, c = sd
c d
b
d
If any of the coefficients equals zero, a similar argument works.

190

a
c

=s

b
d

Chapter 8
Linear independence, bases and
dimension
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 6.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapter 4, Sections 4.34.5.

Aims of the chapter


In this chapter we look more deeply into the structure of a vector space, developing the
concept of a basis, which will enable us to know precisely what we mean by the
dimension of a vector space. You should then have a clear understanding of the
statement that Rn is an n-dimensional vector space.

8
8.1 Linear independence
Linear independence is a central idea in the theory of vector spaces. If {v1 , v2 , . . . , vk }
is a set of vectors in a vector space V , then the vector equation
1 v1 + 2 v2 + + r vk = 0
always has the trivial solution, 1 = 2 = = k = 0.

We say that vectors x1 , x2 , . . . , xk in Rn are linearly dependent (LD) if there are


numbers 1 , 2 , . . . , k , not all zero, such that
1 x1 + 2 x2 + + k xk = 0.
In this case the left-hand side is termed a non-trivial linear combination. The vectors
are linearly independent (LI) if they are not linearly dependent; that is, if no non-trivial
linear combination of them is the zero vector or, equivalently, whenever
1 x1 + 2 x2 + + k xk = 0,
then, necessarily, 1 = 2 = = k = 0. We have been talking about Rn , but the
same definitions can be used for any vector space V . We state them formally now.

191

8. Linear independence, bases and dimension

Definition 8.1 (Linear independence) Let V be a vector space and v1 , . . . , vk V .


Then v1 , v2 , . . . , vk form a linearly independent set or are linearly independent if and
only if
1 v1 + 2 v2 + + k vk = 0 = 1 = 2 = = k = 0 :
that is, if and only if no non-trivial linear combination of v1 , v2 , . . . , vk equals the zero
vector.

Definition 8.2 (Linear dependence) Let V be a vector space and v1 , v2 , . . . , vk V .


Then v1 , v2 , . . . , vk form a linearly dependent set or are linearly dependent if and only if
there are real numbers 1 , 2 , . . . , k , not all zero, such that
1 v1 + 2 v2 + + k vk = 0,
that is, if and only if some non-trivial linear combination of the vectors is the zero
vector.
Example 8.1 In R3 , the following vectors are linearly dependent:



1
2
4

v1 = 2 , v2 = 1 , v3 =
5 .
11
3
5
This is because

2v1 + v2 v3 = 0.
Note that this can also be written as v3 = 2v1 + v2 .

This example illustrates the following general result. Try to prove it yourself before
looking at the proof.
Theorem 8.1 The set {v1 , v2 , . . . , vk } V is linearly dependent if and only if some
vector vi is a linear combination of the other vectors.
Proof
Since this is an if and only if statement, we must prove it both ways. If
{v1 , v2 , . . . , vk } is linearly dependent, the equation 1 v1 + 2 v2 + + r vk = 0 has a
solution with some i = 0. Then we can solve for the vector vi :
1
2
i1
i+1
k
vi = v1 v2
vi1
vi+1 vk ,
i
i
i
i
i
which expresses vi as a linear combination of the other vectors in the set.
If vi is a linear combination of the other vectors, say,
vi = 1 v1 + + i1 vi1 + i+1 vi+1 + + k vk , then
1 v1 + + i1 vi1 vi + i+1 vi+1 + k vk = 0
is a non-trivial linear combination of the vectors that is equal to the zero vector, since
the coefficient of vi is 1 = 0. Therefore, the vectors are linearly dependent.
It follows from this theorem that a set of two vectors is linearly dependent if and only if
one vector is a scalar multiple of the other.

192

8.2. Testing for linear independence in Rn

Example 8.2 The vectors




1
2

v1 = 2 , v2 = 1 ,
3
5

in Example 8.1 are linearly independent, since one is not a scalar multiple of the
other.
Activity 8.1 Show that, for any vector v in a vector space V , the set of vectors
{v, 0} is linearly dependent.

8.2 Testing for linear independence in Rn


Given k vectors v1 , . . . , vk Rn , the vector equation
1 v1 + 2 v2 + + r vk = 0
is a homogeneous system of n linear equations in k unknowns. This can be written in
matrix form as Ax = 0, where A is the n k matrix A = (v1 v2 vk ) with the
vectors v1 , v2 , . . . vk as its columns, and x is the vector (or k 1 matrix),

1
2

x=
... .
k

Recall (Activity 5.8, page 135) that the matrix product Ax is exactly the linear
combination 1 v1 + v2 + + k vk .

Then the question of whether or not a set of vectors in Rn is linearly independent can
be answered by looking at the solutions of the homogeneous system Ax = 0.
Theorem 8.2 The vectors v1 , v2 , . . . , vk are linearly dependent if and only if the
linear system Ax = 0 has a solution other than x = 0, where A is the matrix
A = (v1 v2 vk ). Equivalently, the vectors are linearly independent precisely when
the only solution to the system is x = 0.
If the vectors are linearly dependent, then any solution x = 0 of the system Ax = 0 will
directly give a non-trivial linear combination of the vectors that equals the zero vector.
Activity 8.2 Show that the vectors
v1 =

1
2

v2 =

1
1

v3 =

2
5

are linearly dependent by solving Ax = 0. Use your solution to give a non-trivial


linear combination of the vectors that equals the zero vector.

193

8. Linear independence, bases and dimension

Now, we know from our experience of solving linear systems with row operations that
the system Ax = 0 will have precisely the one solution x = 0 if and only if we obtain
from the n k matrix A an echelon matrix in which there are k leading ones. That is, if
and only if rank(A) = k. (Think about this!) Thus, we have the following result.
Theorem 8.3 Suppose that v1 , . . . , vk Rn . Then the set {v1 , . . . , vk } is linearly
independent if and only if the n k matrix (v1 v2 vk ) has rank k.
But the rank is always at most the number of rows, so we certainly need to have k n.
Also, there is a set of n linearly independent vectors in Rn . In fact, there are infinitely
many such sets, but an obvious one is
{e1 , e2 , . . . , en } ,
where ei is the vector with every entry equal to 0 except for the ith entry, which is 1.
Activity 8.3 Show that the set of vectors
{e1 , e2 , . . . , en } ,
in Rn is linearly independent.
Thus, we have the following result.
Theorem 8.4 The maximum size of a linearly independent set of vectors in Rn is n.

So any set of more than n vectors in Rn is linearly dependent. On the other hand, it
should not be imagined that any set of n or fewer is linearly independent: that isnt true.
Example 8.3 In R4 , which of the following sets of vectors are linearly independent?


1
1
2
0
2

2 1 0 5
0

L1 =
,
,
,
,
,
1 9 3 1 9

0
2
1
0
1


1
1

0
2

L2 =
1 , 9 ,

0
2


1
1
2

0 2 1

L3 =
,
,
,
1 9 3

0
2
1


1
1
2
0

2 1 0
0

L4 =
,
,
,
.
1 9 3 1

0
2
1
0
Try this yourself before reading the answers.

194

8.3. Basis

The set L1 is linearly dependent because it consists of five vectors in R4 . The set L2
is linearly independent because neither vector is a scalar multiple of the other. To
see that the set L3 is linearly dependent, write the vectors as the columns of a
matrix A and reduce A to echelon form to find that the rank of A is 2. This means
that there is a non-trivial linear combination of the vectors which is equal to 0, or
equivalently, that one of the vectors is a linear combination of the other two. The
last set, L4 contains the set L3 and is therefore also linearly dependent, since it is
still true that one of the vectors is a linear combination of the others.
Activity 8.4 For the set L3 above, find the solution of the corresponding
homogeneous system Ax = 0 where A is the matrix whose columns are the vectors
of L3 . Use the solution to write down a non-trivial linear combination of the vectors
that is equal to the zero vector. Express one of the vectors as a linear combination of
the other two.
There is an important property of linearly independent sets of vectors which holds for
any vector space V .
Theorem 8.5 If x1 , x2 , . . . , xm are linearly independent in V and
c1 x1 + c2 x2 + + cm xm = c1 x1 + c2 x2 + + cm xm
then
c1 = c1 ,

c2 = c2 ,

...,

cm = cm .

Activity 8.5 Prove this. Use the fact that

c1 x1 + c2 x2 + + cm xm = c1 x1 + c2 x2 + + cm xm
if and only if
(c1 c1 )x1 + (c2 c2 )x2 + + (cm cm )xm = 0.

What does this theorem say about x = c1 x1 + c2 x2 + + cm xm ? (Think about this


before you continue reading.)
It says that if a vector x can be expressed as a linear combination of linearly
independent vectors, then this can be done in only one way. The linear combination is
unique.

8.3 Basis
The following result about Rn is very important in the theory of vector spaces. It says
that a linearly independent set of n vectors in Rn spans Rn .
Theorem 8.6 If v1 , v2 , . . . , vn are linearly independent vectors in Rn , then for any x
in Rn , x can be written as a unique linear combination of v1 , . . . , vn .

195

8. Linear independence, bases and dimension

Proof
Because v1 , . . . , vn are linearly independent, the n n matrix
A = (v1 v2 . . . vn )
has rank(A) = n. (In other words, A reduces to the n n identity matrix.) By
Theorem 5.2 the system Az = x has a unique solution for any x Rn . But lets spell it
out. Since there is a leading one in every row of the reduced echelon form of A, we can
find a solution to Az = x, so any vector x can be expressed in the form

1
2

x = Az = (v1 v2 . . . vn )
... ,
n

where we have written z as (1 , 2 , . . . , n )T . Expanding this matrix product, we have


that any x Rn can be expressed as a linear combination
x = 1 v1 + 2 v2 + + n vn ,

as required. This linear combination is unique since the vectors are linearly independent
(or, because there is a leading one in every column of the echelon matrix, so there are
no free variables).
It follows from this theorem that if we have a set of n linearly independent vectors in
Rn , then the set of vectors also spans Rn , so any vector in Rn can be expressed in
exactly one way as a linear combination of the n vectors. We say that the n vectors form
a basis of Rn . The formal definition of a [finite] basis for a vector space is as follows.

Definition 8.3 ((Finite) Basis) Let V be a vector space. Then the subset
B = {v1 , v2 , . . . , vn } of V is said to be a basis for (or of) V if:
B is a linearly independent set of vectors, and
V = Lin(B).
An alternative characterisation of a basis can be given: B is a basis of V if every vector
in V can be expressed in exactly one way as a linear combination of the vectors in B.
The set B spans V if and only if a linear combination exists, and B is linearly
independent if and only if any linear combination is unique. We have therefore shown
Theorem 8.7 B = {v1 , v2 , . . . , vn } is a basis of V if and only if any v V is a unique
linear combination of v1 , v2 , . . . , vn .
Example 8.4 The vector space Rn has the basis {e1 , e2 , . . . , en } where ei is (as
earlier) the vector with every entry equal to 0 except for the ith entry, which is 1.
Its clear that the vectors are linearly independent (as you showed in Activity 8.3 on
page 194), and there are n of them, so we know straight away that they form a basis.
In fact, its easy to see that they span the whole of Rn , since for any
x = (x1 , x2 , . . . , xn )T Rn ,
x = x1 e1 + x2 e2 + + xn en .

196

8.3. Basis

The basis {e1 , e2 , . . . , en } is called the standard basis of Rn .


Example 8.5 We will find a basis of the subspace of R3 given by

W =
y
: x + y 3z = 0 .

If x = (x, y, z)T is any vector in W , then its components must satisfy y = x + 3z,
and we can express x as


x
x
1
0

x = y = x + 3z = x 1 + z 3 = xv + zw x, z R.
z
z
0
1

This shows that the set {v, w} spans W . The set is linearly independent. Why?
Because of the positions of the zeros and ones, if v + w = 0 then necessarily
= 0 and = 0.
Example 8.6 The set
S=

1
2

1
1

is a basis of R2 . To show this we have to show it spans R2 and is linearly


independent, or equivalently, that any vector b R2 is a unique linear combination
of these two vectors. Writing the vectors as the columns of a matrix A, we find that
|A| = 0, so this is true by Theorem 5.2 (page 128).
As in the above example, we can show that n vectors in Rn are a basis of Rn by writing
them as the columns of a matrix A and invoking Theorem 5.2. Turning this around, we
can see that if A = (v1 v2 . . . vn ) is an n n matrix with rank(A) = n, then the
columns of A are a basis of Rn . Indeed, by Theorem 5.2, the system Az = x will have a
unique solution for any x Rn , so any vector x Rn can be written as a unique linear
combination of the column vectors. We therefore have two more equivalent statements
to add to the theorem.
Theorem 8.8 If A is an n n matrix, then the following statements are equivalent.
A1 exists.
Ax = b has a unique solution for any b Rn .
Ax = 0 has only the trivial solution, x = 0.
The reduced echelon form of A is I.
|A| = 0.
The rank of A is n.
The columns vectors of A are a basis of Rn .

197

8. Linear independence, bases and dimension

The row vectors of A are a basis of Rn .


The last statement can be seen from the facts that |AT | = |A|, and the rows of A are
the columns of AT . This theorem provides an easy way to determine if a set of n vectors
is a basis of Rn . We simply write the n vectors as the columns of a matrix, and evaluate
its determinant.
Activity 8.6 Which of these sets is a basis of R3 ?


1
1
1
1
1
1
U = 0 ,2, 2 ,
W = 0 ,2,2 .

1
3
5
1
3
5

Show that one of these sets is a basis of R3 and that the other one spans a plane in
R3 . Find a basis for this plane. Then find a Cartesian equation for the plane.

8.3.1

Coordinates

What is the importance of a basis? If S = {v1 , v2 , . . . , vn } is a basis of Rn , then any


vector v Rn can be expressed uniquely as v = 1 v1 + 2 v2 + + n vn . The real
numbers 1 , 2 , , n are the coordinates of v with respect to the basis, S.

Definition 8.4 (Coordinates) If S = {v1 , v2 , . . . , vn } is a basis of a vector space V


and v = 1 v1 + 2 v2 + + n vn , then the real numbers 1 , 2 , , n are the
coordinates of v with respect to the basis, S. We use the notation

1
2

[v]S =
...
n S
to denote the coordinate vector of v in the basis S.

Example 8.7 The sets B = {e1 , e2 } and S = {v1 , v2 } where


B=

1
0

0
1

and

1
2

S=

1
1

are each a basis of R2 . The coordinates of the vector v = (2, 5)T in each basis are
given by the coordinate vectors,
[v]B =

2
5

and

[v]S =

1
3

.
S

In the standard basis, the coordinates of v are precisely the components of the
vector v. In the basis S, the components of v arise from the observation that
v = 1

198

1
2

+3

1
1

2
5

8.3. Basis

Activity 8.7 For the example above, sketch the vector v on graph paper and show
it as the sum of the vectors given by each of the linear combinations: v = 2e1 5e2
and v = 1v1 + 3v2 .

8.3.2

Dimension

A fundamental result is that if a vector space V has a finite basis, then all bases of V
are of the same size. 1
Theorem 8.9 Suppose that the vector space V has a finite basis consisting of d
vectors. Then any basis of V consists of exactly d vectors.
This enables us, finally, to define exactly what we mean by the dimension of a vector
space V .
Definition 8.5 (Dimension) The number d of vectors in a finite basis of a vector
space V is the dimension of V , and is denoted dim(V ). The vector space V = {0} is
defined to have dimension 0.
A vector space which has a finite basis is said to be finite-dimensional. Not all vector
spaces are finite-dimensional. (For example, the vector space of real functions with
pointwise addition and scalar multiplication has no finite basis. Such a vector space is
said to be infinite-dimensional.)
Example 8.8 We already know Rn has a basis of size n. (For example, the
standard basis consists of n vectors.) So Rn has dimension n (which is reassuring,
since it is often referred to as n-dimensional Euclidean space).
If we know the dimension of a vector space V , then we know how many vectors we need
for a basis. If we have the correct number of vectors for a basis and we know either that
the vectors span V , or that they are linearly independent, then we can conclude that
both must be true and they form a basis, as is shown in the following theorem. That is,
we do not need to show both.2
Theorem 8.10 Let V be a finite-dimensional vector space of dimension d. Then:
d is the largest size of a linearly independent set of vectors in V . Furthermore, any
set of d linearly independent vectors is necessarily a basis of V
d is the smallest size of a spanning set of vectors for V . Furthermore, any finite set
of d vectors that spans V is necessarily a basis.
Thus, d = dim(V ) is the largest possible size of a linearly independent set of vectors in
V , and the smallest possible size of a spanning set of vectors (a set of vectors whose
linear span is V ).
1
2

For a proof, see Anthony and Harvey, Section 6.4.


For a proof, see Anthony and Harvey, Section 6.4.

199

8. Linear independence, bases and dimension

Example 8.9 We know that the plane W in R3 ,


W =
y
: x + y 3z = 0

has dimension 2, because we found a basis for it consisting of two vectors. If we


choose any set of 2 linearly independent vectors in W , then that set will be a basis
of W . For example, the vectors v1 = (1, 2, 1)T and v2 = (3, 0, 1)T are linearly
independent (why?), so by the theorem, S = {v1 , v2 } is a basis of W .

8.3.3

Dimension and bases of subspaces

Suppose that W is a subspace of the finite-dimensional vector space V . Any set of


linearly independent vectors in W is also a linearly independent set in V .
Activity 8.8 Prove this last statement.
Now, the dimension of W is the largest size of a linearly independent set of vectors in
W , so there is a set of dim(W ) linearly independent vectors in V . But then this means
that dim(W ) dim(V ), since the largest possible size of a linearly independent set in V
is dim(V ). There is another important relationship between bases of W and V : this is
that any basis of W can be extended to one of V . The following result states this
precisely.

Theorem 8.11 Suppose that V is a finite-dimensional vector space and that W is a


subspace of V . Then dim(W ) dim(V ). Furthermore, if {w1 , w2 , . . . , wr } is a basis of
W then there are s = dim(V ) dim(W ) vectors v1 , v2 , . . . , vs V such that
{w1 , w2 , . . . , wr , v1 , v2 , . . . , vs } is a basis of V . (In the case W = V , the basis of W is
already a basis of V .) That is, we can obtain a basis of the whole space V by adding
certain vectors of V to any basis of W .
Proof
If {w1 , w2 , . . . , wr } is a basis of W then the set of vectors is a linearly independent set
of vectors in V . If the set spans V , then it is a basis of V . If not, there is a vector v1 V
which cannot be expressed as a linear combination of the vectors w1 , w2 , . . . , wr . Then
the set of vectors {w1 , w2 , . . . , wr , v1 } is a linearly independent set of vectors in V . To
see this, look at a linear combination of the vectors which is equal to the zero vector,
a1 w1 + . . . + ar wr + b1 v1 = 0.
Since v1 is not a linear combination of the other vectors, we must have b1 = 0. The
remaining linear combination of vectors contains only the vectors in the linearly
independent set, so the only solution is the trivial one, all ai = 0. Therefore all ai = 0
and b1 = 0, so the set {w1 , w2 , . . . , wr , v1 } is linearly independent. Continuing in this
way, we can find vectors v2 , . . . , vs V until the linearly independent set of vectors
{w1 , w2 , . . . , wr , v1 , v2 , . . . , vs } spans V and is therefore a basis. This must occur when
r + s = dim(V ), so dim(W ) dim(V ).

200

8.4. Finding a basis for a linear span in Rn

Example 8.10

The plane W in R3 ,
W = {x : x + y 3z = 0}

has a basis consisting of the vectors v1 = (1, 2, 1)T and v2 = (3, 0, 1)T . If v3 is any
vector which is not in this plane, for example, v3 = (1, 0, 0)T , then the set
S = {v1 , v2 , v3 } is a basis of R3 .

8.4 Finding a basis for a linear span in Rn


Suppose we are given k vectors x1 , x2 . . . , xk in Rn , and we want to find a basis for the
linear span Lin{x1 , . . . , xk }. The point is that the k vectors themselves might not form a
linearly independent set (and hence they are not a basis).
A useful technique is to form a matrix with the xTi as rows, and to perform row
operations until the resulting matrix is in echelon form. Then a basis of the linear span
is given by the transposed non-zero rows of the echelon matrix (which, it should be
noted, will not generally be among the initial given vectors). The reason this works is
that: (i) row operations are such that at any stage in the resulting procedure, the row
space of the matrix is equal to the row space of the original matrix, which is precisely
the linear span of the original set of vectors, and (ii) the non-zero rows of an echelon
matrix are linearly independent (which is clear, since each has a one in a position where
the vectors below it all have zero).
We find a basis for the subspace of R5 spanned by the vectors

1
2
1
3
1
1
2
0

, x2 = 2 , x3 = 4 , x4 = 0 .
x1 =
2

1
2
1
3
1
2
1
3

Example 8.11

Write the vectors as the columns of a


each column vector as a row).

2
AT =
1
3
Reducing this to echelon

1 1
2
1

1 2
3
0

1 1
0 1

0 1
0 1

matrix A and then take AT (effectively writing

1 2 1 1
1 2 2 2
.
2 4 1 1
0
0 3 3

form by elementary row operations,

2 1 1
1 1 2 1 1

2 2 2
0 3 6 0 0
0 1 2 0 0
4 1 1
0 3 3
0 3 6 0 0

1 1 2 1 1
2 1 1

2 0 0
0 1 2 0 0 .
0 0
0
0 0
2 0 0
0 0
0
0 0
2 0 0

201

8. Linear independence, bases and dimension

The echelon matrix at the end of this tells us that a basis for Lin{x1 , x2 , x3 , x4 } is
formed from the first two rows, transposed, of the echelon matrix, that is,

1
0

2 , 2 .

1 0

1
0
If we want to find a basis that consists of a subset of the original vectors, then we need
to take those vectors that correspond to the final non-zero rows in the echelon matrix.
By this, we mean the rows of the original matrix that have ended up as non-zero rows
in the echelon matrix. For instance, in Example 8.11, the first and second rows of the
original matrix correspond to the non-zero rows of the echelon matrix, so a basis of the
span is {x1 , x2 }. On the other hand, if we interchange rows, the correspondence wont
be so obvious.
A better method to obtain such a basis is given in the next section, using the matrix A
whose columns are the vectors x1 , x2 . . . , xk . Then, as we have seen,
Lin{x1 , . . . , xk } = R(A). That is, Lin{x1 , . . . , xk } is the range or column space of the
matrix A.

8.5 Basis and dimension of range and null space

We have shown that the range and null space of an m n matrix are subspaces of Rm
and Rn respectively (section 7.3). Their dimensions are so important that they are given
special names.
Definition 8.6 (Rank and nullity) The rank of a matrix A is
rank(A) = dim(R(A))
and the nullity is
nullity(A) = dim(N(A)).
We have, of course, already used the word rank, so it had better be the case that the
usage just given coincides with the earlier one. Fortunately it does. In fact, we have the
following connection.
Theorem 8.12 Suppose that A is an m n matrix with columns c1 , c2 , . . . , cn , and
that an echelon form obtained from A has leading ones in columns i1 , i2 , . . . , ir . Then a
basis for R(A) is
B = {ci1 , ci2 , . . . , cir }.
Note that the basis is formed from columns of A, not columns of the echelon matrix: the
basis consists of those columns of A corresponding to the leading ones in the echelon
matrix.

202

8.5. Basis and dimension of range and null space

We will outline a proof of this theorem, so you can see how it works.3 We have already
seen that a solution x = (1 , 2 , . . . , n ) of Ax = 0 gives a linear combination of the
columns of A which is equal to the zero vector,
0 = 1 c 1 + 2 c 2 + . . . + n c n .
If E denotes the reduced echelon form of A, and if c1 , c2 , . . . , cn denote the columns of
E, then exactly the same relationship holds:
0 = 1 c1 + 2 c2 + . . . + n cn .
In fact, we use E to obtain the solution x = (1 , 2 , . . . , n ). So the linear dependence
relations are the same for the columns of both matrices, which means that the linearly
independent columns of A correspond precisely to the linearly independent columns of
E. Which columns of E are linearly independent? The columns which contain the
leading ones.
We have already seen that a matrix A and its reduced row echelon form have the same
row space, and that the non-zero rows form a basis of this row space. So the dimension
of the row space of A, RS(A), and the dimension of the column space of A,
CS(A) = R(A), are each equal to the number of leading ones in an echelon form of A,
that is both are equal to rank(A). We restate this important fact:
dim(RS(A)) = dim(R(A)) = rank(A).
Example 8.12

Let A be the matrix,

1 1 2 1
A = 2 0 1 1.
9 1 3 4

The reduced echelon form of the matrix

1
E = 0
0

is (verify this!)

0 12 12
1 32 12 .
0 0 0

The leading ones in this echelon matrix are in the first and second columns, so a
basis for R(A) can be obtained by taking the first and second columns of A. (Note:
columns of A, not of the echelon matrix!) Therefore a basis for R(A) is

1
1
2, 0 .

9
1

A basis of the row space of A consists of the two non-zero rows of the reduced
matrix, or the first two rows of the original matrix,


1
0
1
2

0 1
1 0
,
or
,
1 3
2 1 .

2
21

1
1
1
2
2

See, also, Anthony and Harvey, Section 6.5.2.

203

8. Linear independence, bases and dimension

Note that the column space is a two-dimensional subspace of R3 (a plane) and the
row space is a two-dimensional subspace of R4 . The columns of A and E satisfy the
same linear dependence relations, which can be easily read from the reduced echelon
form of the matrix,
3
1
c3 = c1 + c2 ,
2
2

1
1
c4 = c1 + c2 .
2
2

Activity 8.9 Check that the columns of A satisfy these same linear dependence
relations.
There is a very important relationship between the rank and nullity of a matrix. We
have already seen some indication of it in our considerations of linear systems. Recall
that if an m n matrix A has rank r then the general solution to the (consistent)
system Ax = 0 involves n r free parameters. Specifically (noting that 0 is a
particular solution, and using a characterisation obtained earlier in Chapter 5), the
general solution takes the form
x = s1 u1 + s2 u2 + + snr unr ,

where u1 , u2 , . . . , unr are themselves solutions of the system Ax = 0. But the set of
solutions of Ax = 0 is precisely the null space N(A). Thus, the null space is spanned by
the n r vectors u1 , . . . , unr , and so its dimension is at most n r. In fact, it turns
out that its dimension is precisely n r. That is,
nullity(A) = n rank(A).
To see this, we need to show that the vectors u1 , . . . , unr are linearly independent.
Because of the way in which these vectors arise (look at Example 8.12), it will be the
case that for each of them, there is some position where that vector will have an entry
equal to 1 and the entry in that same position of all the other vectors will be 0. From
this we can see that no non-trivial linear combination of them can be the zero vector, so
they are linearly independent. We have therefore proved the following central result.
Theorem 8.13 (Rank-nullity theorem) For an m n matrix A,
rank(A) + nullity(A) = n.
Activity 8.10

Find a basis of the null space of the matrix A from Example 8.12,

1 1 2 1
A = 2 0 1 1.
9 1 3 4

Verify the rank-nullity theorem for this matrix.

204

8.6. Learning outcomes

8.6 Learning outcomes


At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by linear independence and linear dependence
determine whether a given set of vectors is linearly independent or linearly
dependent, and in the latter case, find a non-trivial linear combination of the
vectors which equals the zero vector
explain what is meant by a basis, and by the dimension of a finite-dimensional
vector space
find a basis for a linear span
find a basis for the null space, range and row space of a matrix from its reduced
row echelon form
explain how rank and nullity are defined, and the relationship between them (the
rank-nullity theorem).

8.7 Exercises
Exercise 8.1
Show that the following three vectors are linearly independent:

2
1 ,
1


3
4,
6

2
3 .
2

Express the vector v =


7 as a linear combination of these three vectors.
2
Exercise 8.2
Let


2
x1 = 3 ,
5


1
x2 = 1 ,
2


a
v = b.
c

Find a vector x3 such that {x1 , x2 , x3 } is a linearly independent set of vectors.

Find a condition that a, b, c must satisfy for the set of vectors {x1 , x2 , v} to be linearly
dependent.
Exercise 8.3
Show that any non-empty subset of a linearly independent set of vectors is linearly
independent.
Exercise 8.4
Show that the following vectors are linearly dependent by finding a non-trivial linear

205

8. Linear independence, bases and dimension

combination of the vectors that equals the zero vector.




1
0
4
9
2 1 11 2
,

1 3 , 5 , 1 .
2
4
1
3
Exercise 8.5
Let A be any matrix. Let v1 and v2 be two non-zero vectors and suppose that
Av1 = 2v1 and Av2 = 5v2 . Prove that {v1 , v2 } is linearly independent.

(Hint: Assume 1 v1 + 2 v2 = 0; multiply this equation through by A to get a second


equation for v1 and v2 ; then solve the two equations simultaneously.)
Can you generalise this result?
Exercise 8.6
For each of the sets Si of vectors given below, find a basis of the vector space Lin(Si )
and state its dimension.
S1 =

1
2

2
3

S2 =


2
1
1

S3 =
0
, 1 , 2

1
3
9

Exercise 8.7

1
0
2
3
,
,
,
1
0
2
3


1
4
2
1

1
4
0
2


S4 =
1 , 1 , 1 , 1

2
8
2
3

Which of the following sets are a basis for R3 ? (State reasons for your answers.)

2
4
7
1
1
1
S1 = 2 , 1 , 1 , 2
S2 = 0 , 1

3
0
0
1
1
1



1
3
1
2
2
1
S3 = 1 , 2 , 3
S4 = 1 , 1 , 3

1
1
0
1
0
3

For any set which is a basis of R3 , find the coordinates of the vector w = (1, 2, 1) in
that basis.
Exercise 8.8
Write down a basis for the yz-plane in R3 .
Exercise 8.9
Find a basis for the null space of the matrix
1 1 1 0
2 1 0 1

206

8.8. Comments on selected activities

Exercise 8.10
Let

1 2 1 1
2
1 3 0 2 2
.
A=
0
1 1 3
4
1
2 5 13 5

Find a basis for the column space of A.


Exercise 8.11

Find a basis of the row space, a basis of


null space of the matrix

B= 0
1

the range (column space), and a basis of the

2 1 3 0
1 1 1 1 .
3 2 0 1

Find the rank of B and verify the rank-nullity theorem.

Let b = c1 + c5 , the sum of the first and last column of the matrix B. Without solving
the system, use the information you have obtained to write down a general solution of
the system of equations Bx = b.
Exercise 8.12
A matrix A is said to have full column rank if and only if the columns of A are linearly
independent. If A is an m k matrix with full column rank, show that:
(1) AT A is a symmetric k k matrix
(2) AT A is invertible.

1 0
Then verify the above results for the matrix M = 0 2 .
1 1

8.8 Comments on selected activities


Feedback to activity 8.1
The linear combination, 0v + 10 = 0, is a non-trivial linear combination of the vectors
which is equal to the zero vector. Any set of vectors containing the zero vector, 0, is
linearly dependent.
Feedback to activity 8.2
Let
A=

1 1
2
2 1 5

Then, using row operations, it can be seen that the general solution x to Ax = 0 is
x = (r, 3r, r)T for r R. In particular, taking r = 1, and multiplying out the equation
Ax = 0, we have that
1
2

1
1

2
5

0
0

207

8. Linear independence, bases and dimension

Feedback to activity 8.3


You can argue this directly. Looking at the components of the vector equation
a1 e1 + a2 e2 + . . . an en = 0
you can see that the positions of the ones and zeros in the vectors lead to the equations
a1 = 0 from the first component, a2 = 0 from the second component, and so on, so that
ai = 0 (1 i n), is the only possible solution. Alternatively, the matrix
A = (e1 , e2 , . . . , en ) is the n n identity matrix, so the only solution to Az = 0 is the
trivial solution, proving that the vectors are linearly independent.
Feedback to activity 8.4
The general solution to the system is

x
3/2
x = y = t 1/2 ,
z
1

t R.

Taking t = 1 and multiplying out the equation Ax = 0, we see that


1
1
2

3
1
0 + 2 1 = 0,
2 1 2 9 3
0
2
1
and hence



1
1
2
1 3 0 1 2


=
3 2 1 + 2 9 .
0
2
1

Feedback to activity 8.5


As noted,
c1 x1 + c2 x2 + + cm xm = c1 x1 + c2 x2 + + cm xm
if and only if
(c1 c1 )x1 + (c2 c2 )x2 + + (cm cm )xm = 0.

But since the vectors are linearly independent, this can be true only if c1 c1 = 0,
c2 c2 = 0, and so on. That is, for each i, we must have ci = ci .
Feedback to activity 8.6
Write each set of vectors as the

B=
0
1

columns of a matrix:

1 1
1 1 1
2 2 ,
A = 0 2 2.
3 5
1 3 5

|A| = 0, so W is a basis of R3 . |B| = 0, so U is not. Because the set U is linearly


dependent, one of the vectors is a linear combination of the other two. Also, any two
vectors of U are linearly independent since no one of them is a scalar multiple of any
other. Therefore any two vectors from the set U are a basis of Lin(U). Since Lin(U) is a
two-dimensional subspace of R3 , it is plane.

208

8.8. Comments on selected activities

Using the first two vectors in U for the basis, if x U,



1
x
1

x= y =s
0
+ t 2,
z
1
3

s, t R.

Equating components, you obtain three equations in the two unknowns s and t.
Eliminating s and t between the three equations you will obtain a single equation
relating x, y and z. Explicitly, we have
x = s + t, y = 2t, z = s + 3t,
so

y
y
t= , s =tx= x
2
2

and

y
3
x + y,
2
2
so we have x 2y + z = 0. This is the Cartesian equation of the plane.
z = s + 3t =

Note that a Cartesian equation could equally well have been obtained by writing the
two basis vectors and the vector x as the columns of a matrix M and using the fact
that |M| = 0 if and only if the columns of M are linearly dependent. That is,
1 1 x
0 2 y = 2x + 4y 2z = 0.
1 3 z
Feedback to activity 8.8
If S = {w1 , w2 , . . . , wr } is a linearly independent set of vectors in W , then we can state
that the only linear combination
a1 w1 + a2 w2 + . . . + ar wr = 0
is the trivial one, with all ai = 0. But all the vectors in W are also in V , and this
statement still holds true, so S is a linearly independent set of vectors in V .
Feedback to activity 8.10
A general solution of the system of equations Ax = 0 is
1
1
2
2
12
32
x = s1
+ s2
= s1 u1 + s2 u2 .
1
0
0
1

The set {u1 , u2 } is a basis of the null space of A, so dim(N(A)) = 2. From Example
8.12, rank(A) = 2. The matrix A has n = 4 columns.
rank(A) + nullity(A) = 2 + 2 = 4 = n.
Note that the basis vectors of the null space give precisely the same linear dependence
relations between the column vectors as those given in the example. Since Au1 = 0 and
Au2 = 0,
1
3
Au1 = c1 c2 + c3 = 0
2
2

and

1
1
Au2 = c1 c2 + c4 = 0.
2
2

209

8. Linear independence, bases and dimension

8.9 Comments on exercises


Solution to exercise 8.1
Call the vectors x1 , x2 , x3 . To show linear independence, you can show that the matrix
A = (x1 x2 x3 ) has rank 3 using row operations, or show |A| = 0. Since the question
also asks you to express the vector v as a linear combination of the first three, you will
need to solve Ax = v. So you can answer the entire question by reducing the augmented
matrix, (A|v), to reduced echelon form. Then the first three columns will be the reduced
row echelon form of A, which will be the identity matrix, showing that the vectors are
linearly independent, and you should obtain the unique solution x = (2, 1, 3)T . So

5
2
3
2
7 = 2 1 4 + 3 3 .
2
1
6
2
Solution to exercise 8.2
There are many ways of solving this problem, and there are infinitely many possible x3 s.
We can solve it by trying to find a vector x3 such that the matrix (x1 x2 x3 ) has rank 3.
Another approach is to answer the second part of the question first, determine what
vectors v form a linearly dependent set with x1 and x2 (find the condition on a, b and c
as asked) and then write down any vector whose components do not satisfy the
condition.

Write the three vectors as the columns of a

A= 1
2

matrix and row reduce it,

2 a
3 b.
5 c

Notice that you can choose to order x1 and x2 so that the row reduction will be easier
since it makes no difference in this question. The vectors {x1 , x2 , v} will be linearly
dependent if the row echelon form of A has a row of zeros.

1 2 a
1 2
a
1 2
a
R2 R1
3 R2
0 1 b a R
0 1
A = 1 3 b R
b a .
3 2R1

2 5 c
0 1 c 2a
0 0 cab
The vectors will be linearly dependent if and only if the components of v satisfy
a + b c = 0.
So, choose any vector for x3 which does not satisfy this equation, such as
x3 = ( 1 0 0 )T .
Note that this condition is the equation of a plane in R3 determined by the vectors x1
and x2 . The set {x1 , x2 , v} is linearly dependent if and only if v is the position vector of
a point in this plane.

210

8.9. Comments on exercises

Solution to exercise 8.3


Suppose that S is a linearly independent set of vectors. So the only linear combination
of vectors in S that can equal the zero vector is the trivial linear combination (in which
all the coefficients are 0). Now, suppose R = {x1 , x2 , . . . xr } is some subset of S and
suppose that
1 x1 + 2 x2 + + r xr = 0.
The xi for i = 1, 2, . . . , r are some of the vectors in S. So S will contain these vectors
and some others (lets say k others): that is, for some vectors xr+1 , . . . , xr+k , S will
equal the set {x1 , . . . , xr , xr+1 , . . . , xk }. So we can in fact consider the left-hand side of
the equation to be a linear combination of all the vectors in S and we have
1 x1 + 2 x2 + + r xr + 0xr+1 + + 0xr+k = 0.
By linear independence of S, it follows that all the coefficients are 0 and, in particular,
1 = 2 = = r = 0.
It follows that R is a linearly independent set of vectors.
Solution to exercise 8.4
Let

1 0
4
9
2 1 11 2
,
A=
1 3
5
1
2 4
1 3

the matrix with columns equal to the given vectors. If we only needed to show that the
vectors were linearly dependent, it would suffice to show, using row operations, that
rank(A) < 4. But were asked for more: we have to find an explicit non-trivial linear
combination that equals the zero vector. So we need to find a non-trivial solution of
Ax = 0. One solution is x = (5, 3, 1, 1)T . (You should use row operations to find
this. The details are omitted here.) This means that



0
9
4
0
1
1 11 2 0
2


5
1 3 3 + 5 1 = 0.
0
3
1
4
2
Solution to exercise 8.5
To prove that {v1 , v2 } is linearly independent, assume that 1 and 2 are scalars
such that
()
1 v1 + 2 v2 = 0.
Then

A(1 v1 + 2 v2 )
1 Av1 + 2 Av2
1 (2v1 ) + 2 (5v2 )
21 v1 + 52 v2

=
=
=
=

0
0
0
0

211

8. Linear independence, bases and dimension

Add this last equation to 2 times equation () to obtain 32 v2 = 0. Since v2 = 0, we


must have 2 = 0. Substituting back into either equation gives 1 v1 = 0, so that 1 = 0
since v1 = 0. This shows that v1 , v2 are linearly independent.
Generalisation 1. The same proof works for any constants, Av1 = v1 , Av2 = v2
provided = .
Generalisation 2. It also extends to three (or more) non-zero vectors: say, Av1 = v1 ,
Av2 = v2 , Av3 = v3 with , , distinct constants (that is, no two are equal).
Solution to exercise 8.6
1 2
= 0. A basis of R2 is S2 or {e1 , e2 }. R2 has dimension 2.
S1 spans R2 since
2 3
1
1
1
spans the line x = t
1
subspace of R2 .
A basis of S2 is

(since all other vectors are scalar multiples of it). The set
with Cartesian equation y = x. This is a one-dimensional

To find a basis of S3 or S4 , write the vectors as the columns of a matrix, A. You can
either row reduce AT , and the non-zero rows will be a basis, or you can row reduce A
and the columns of A corresponding to the columns with leading ones in the echelon
form will be a basis. Using the first method for S3 (row reduce AT ),

1 0 1
1 0 1
1 0 1
2 1 3 0 1 5 0 1 5 .
1 2 9
0 2 10
0 0 0

Then a basis is given by the top two vectors. The set S3 spans a plane (through the
origin) in R3 , Lin(S3 ) has dimension 2.

Using the second method for S4 , write the vectors as the columns of a matrix A and
reduce A to echelon form. Columns 1, 2 and 4 have leading ones, so the first, second
and fourth vector form a basis of Lin(S4 ), which is a three-dimensional subspace of R4 .
Solution to exercise 8.7
S1 is not a basis. No set of four vectors in R3 can be linearly independent.
S2 is not a basis. Two vectors cannot span R3 : there must be at least three.
S3 is not a basis. Either notice that the third vector is the sum of the first two, or
reduce AT to echelon form and show it has a row of zeros, or show |A| = 0. The set is
not linearly independent.
S4 is a basis. You can reduce AT to echelon form or you can compute |A| = 5 = 0,
which shows that Ax = b has a unique solution for all b R3 .

1
To find the coordinates of w = 2 in the basis S4 you need to find the unique
1
solution of

1
1
2
1
1 + 1 + 3 = 2 .
1
0
3
1

212

8.9. Comments on exercises

Reduce the augmented matrix to reduced echelon form to find = 2, = 3, = 1,


so that

2
[w]S4 = 3 .
1 S4
Solution to exercise 8.8
The yz-plane is the set of all vectors of the form (0, y, z)T . So the set of vectors {e2 , e3 }
is a basis of the yz-plane.
Solution to exercise 8.9
We omit the details. A basis for the null space is (1, 2, 1, 0)T , (1, 1, 0, 1)T . There
are many other possible answers.
Solution to exercise 8.10
A basis for the column space is



2
2
1

3 2
1
.
,
,

0 1 4

5
2
1
Details of the calculations are omitted.
Solution to exercise 8.11
Reduce the matrix B

1 2 1 3
B = 0 1 1 1
1 3 2 0

to reduced echelon form,

0
1 2 1 3
0
1 2 1 3
0
1 0 1 1 1 1 0 1 1 1 1
1
0 1 1 3 1
0 0 0 4 2

1 2 1 3 0
0 1 1 1 1
0 0 0 1 21

1 2 1 0 32
1 0 1 0 52
0 1 1 0 12 0 1 1 0 12 .
0 0 0 1 12
0 0 0 1 12

The leading ones are in the first, second and fourth columns.

1
0
0

1 0

1 , 1 , 0 R5 .
A basis of the row space is:

0 0 1

1
1

2
2
2

2
3
1

A basis of the range of B is:


0 , 1 , 1 R3 .

1
3
0
So the range of B is all of R3 .

213

8. Linear independence, bases and dimension

From the reduced echelon form of B, we can find the solution of Bx = 0. Setting the
non-leading variables x3 = s and x5 = t, we have

s 52 t
x1
1
5
x2 s + 12 t
1
1


x3 =
= 1 + 0 = x1 + x2 .
s
0
1
x 1t
4
2
x5
0
2
t
The set {x1 , x2 } R5 is a basis of the null space.

Therefore rank(B) = 3 and nullity(B) = 2. The number of columns is n = 5, so that


rank(B) + nullity(B) = 3 + 2 = 5 = n.
If b = c1 + c5 , then p = ( 1 0 0 0 1 )T is a solution of Bx = b. (Why?) Combining
this with the solution of the system Bx = 0, a general solution of Bx = b is

1
1
5
0
1
1

+ 1 + 0 = p + x1 + x2 .
x=
0

0
0
1
1
0
2
Solution to exercise 8.12

You have already shown that AT A is symmetric as an exercise in Chapter 2. To show it


is invertible, you could show that (AT A)v = 0 has only the trivial solution, v = 0,
which implies that AT A is invertible by Theorem 5.2.
Let us multiply AT Av = 0 on the left by vT to get vT AT Av = 0. Now,
vT AT Av = (Av)T (Av). For any vector w, wT w = w 2 , so we have ||Av||2 = 0, which
implies Av is the zero vector.
But the columns of A are linearly independent (A has full column rank), so Av = 0 has
only the trivial solution v = 0. Hence AT A is invertible.
The columns of the 3 2 matrix M are linearly independent since they are not scalar
multiples of one another. The 2 2 matrix M T M is

1 0
1 0 1
2 1
MT M =
0 2 =
0 2 1
1 5
1 1

which is symmetric, and invertible since |M T M| = 9 = 0.

214

Chapter 9
Linear transformations, change of
basis
Suggested Reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 7.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapter 8 and Chapter 4,
Section 4.6.

Aims of the chapter


In this chapter we consider linear functions between vector spaces, known as linear
transformations. We will look at the matrix representations of linear transformations
between Euclidean vector spaces, and discuss the concept of similarity of matrices.
These ideas will then be put to use for a change of basis of a vector space, and hence, a
change of coordinates. This provides the fundamental theoretical underpinning for the
technique of diagonalisation, which has many applications (to be discussed later).

9.1 Linear transformations


We now turn attention to linear mappings (or linear transformations as they are usually
called) between vector spaces.
Definition 9.1 (Linear transformation) Suppose that V and W are (real) vector
spaces. A function T : V W is linear if for all u, v V and all R,
1. T (u + v) = T (u) + T (v) and
2. T (u) = T (u).
T is said to be a linear transformation (or linear mapping or linear function).
Equivalently, T is linear if for all u, v V and , R,
T (u + v) = T (u) + T (v).
(This single condition implies the two in the definition, and is implied by them.)

215

9. Linear transformations, change of basis

Activity 9.1 Prove that this single condition is equivalent to the two of the
definition.
Sometimes you will see T (u) written simply as T u.

9.2 Examples
Example 9.1 Let V = Rn and W = Rm and suppose that A is an m n matrix.
Let TA be the function given by TA (x) = Ax for x Rn . That is, TA is simply
multiplication by A. Then TA is a linear transformation. This is easily checked, as
follows: first,
TA (u + v) = A(u + v) = Au + Av = TA (u) + TA (v).
Next,
TA (u) = A(u) = Au = TA (u).
So the two linearity conditions are satisfied. We call TA the linear transformation
corresponding to A.
Example 9.2 (More complicated) Let us take V = Rn and take W to be the vector
space of all functions f : R R (with pointwise addition and scalar multiplication).
Define a function T : Rn W as follows:

u1
u2

T (u) = T
... = pu1 ,u2 ,...,un = pu ,
un

where pu = pu1 ,u2 ,...,un is the polynomial function given by

pu1 ,u2 ,...,un (x) = u1 x + u2 x2 + u3 x3 + + un xn .


Then T is a linear transformation. To check this we need to verify that
T (u + v) = T (u) + T (v)

and

T (u) = T (u).

Now, T (u + v) = pu+v , T (u) = pu , and T (v) = pv , so we need to check that


pu+v = pu + pv . This is in fact true, since, for all x,
pu+v (x) =
=
=
=
=

pu1 +v1 ,...,un+vn (x)


(u1 + v1 )x + (u2 + v2 )x2 + (un + vn )xn
(u1 x + u2 x2 + + un xn ) + (v1 x + v2 x2 + + vn xn )
pu (x) + pv (x)
(pu + pv )(x).

The fact that for all x, pu+v (x) = (pu + pv )(x) means that the functions pu+v and
pu + pv are identical. The fact that T (u) = T (u) is similarly proved, and you
should try it!

216

9.3. Linear transformations and matrices

Activity 9.2 Prove that T (u) = T (u).

9.3 Linear transformations and matrices


In what follows, we consider only linear transformations from Rn to Rm (for some m
and n). But much of what we say can be extended to linear transformations mapping
from any finite-dimensional vector space to any other finite-dimensional vector space.
We have seen that any m n matrix A gives a linear transformation TA : Rn Rm (the
linear transformation corresponding to A), given by TA (u) = Au. There is a reverse
connection: for every linear transformation T : Rn Rm there is a matrix A such that
T = TA .
Theorem 9.1 Suppose that T : Rn Rm is a linear transformation and let
{e1 , e2 , . . . , en } denote the standard basis of Rn . If A = AT is the matrix whose columns
are the vectors T (e1 ), T (e2 ), . . . , T (en ): that is,
A = (T (e1 ) T (e2 ) . . . T (en )) ,
then T = TA : that is, for every u Rn , T (u) = Au.
Proof
Let u = (u1 , u2 , . . . , un )T be any vector in Rn . Then

AT u =
=
=
=

u1
u2

(T (e1 ) T (e2 ) . . . T (en ))


...
un
u1 T (e1 ) + u2 T (e2 ) + + un T (en )
T (u1 e1 ) + T (u2e2 ) + + T (un en )
T (u1 e1 + u2 e2 + + un en ).

But

u1 e1 + u2 e2 + + un en




0
0
1
0
1
0




0
0
+ u2 . + + un 0.
.
= u1
..
..
..



0
0
0
1
0
0

u1
u2

=
... ,
un

so we have (exactly as we wanted) AT u = T (u).

Thus, to each matrix A there corresponds a linear transformation TA , and to each linear
transformation T there corresponds a matrix AT . Note that the matrix AT we found

217

9. Linear transformations, change of basis

was determined by using the standard basis in both vector spaces: later in this chapter
we will generalise this to use other bases.

Example 9.3 Let T : R3 R3 be the linear transformation given by


x
x+y+z
.
T y =
xy
z
x + 2y 3z

1
6
In particular, if u = 2 , then T (u) = 1 .
3
4

To find the matrix of this linear transformation we need the images of the standard
basis vectors. We have that

1
1
1
T (e1 ) = 1 ,
T (e2 ) = 1 ,
T (e3 ) = 0 .
1
2
3
The matrix representing T is AT = (T (e1 ) T (e2 ) T (e3 )), which is

1 1
1
AT = 1 1 0 .
1 2 3

Notice that the entries of the matrix AT are just the coefficients of x, y, z in the
definition of T .

9.3.1

Rotation in R2

We find the matrix A that represents the linear transformation T : R2 R2 which is


rotation anticlockwise by an angle about the origin. Let the images of the standard
basis vectors e1 and e2 be the vectors,
T (e1 ) =

a
c

T (e2 ) =

b
d

so that
AT =

a b
c d

We need to determine the coordinates a, c, b, d. It is helpful to draw a diagram of R2


such as the one shown below, with the images T (e1 ) and T (e2 ) after rotation
anticlockwise by an angle , 0 < < 2 .

218

9.3. Linear transformations and matrices

y
1

T (e1 )

(b, d)
T (e2 )

(a, c)

(0, 0)

a
b
and T (e2 ) =
are orthogonal and each has length one
c
d
since they are the rotated standard basis vectors. Drop a perpendicular from the point
(a, c) to the x-axis, forming a right triangle with angle at the origin. Since the
x-coordinate of the rotated vector is a and the y-coordinate is c, the side opposite the
angle has length c and the side adjacent to the angle has length a. The hypotenuse
of this triangle (which is the rotated unit vector e1 ) has length equal to one. We
therefore have a = cos and c = sin . Similarly, drop the perpendicular from the point
(b, d) to the x-axis and observe that the angle opposite the x-axis is equal to . Again,
basic trigonometry tells us that the x-coordinate is b = sin (it has length sin and is
in the negative x-direction), and the height is d = cos . Therefore,

The vectors T (e1 ) =

A=

a b
c d

cos
sin

sin
cos

is the matrix of rotation anticlockwise by an angle . Although we have shown this


using an angle 0 < < 2 , the argument can be extended to any angle .
In particular, if = 4 , then rotation anticlockwise by
B=

cos 4
sin 4

sin 4
cos 4

1
2
1
2

radians is given by the matrix


12
1
2

Activity 9.3 Confirm this by sketching the vectors e1 and e2 and the image vectors
T (e1 ) =

1
2
1
2

and T (e2 ) =

12
1
2

What is the matrix of the linear transformation which is a rotation anticlockwise by


radians? What is the matrix of the linear transformation which is a reflection in
the y-axis? Think about what each of these two transformations does to the
standard basis vectors e1 and e2 .

9.3.2

Identity and zero linear transformations

If V is a vector space, we can define a linear transformation T : V V by T (v) = v,


called the identity linear transformation. If V = Rn , the matrix of this linear
transformation is I, the n n identity matrix.

219

9. Linear transformations, change of basis

There is also a linear transformation T : V W defined by T (v) = 0. If V = Rn and


W = Rm , the matrix of this linear transformation is an m n matrix consisting entirely
of zeros.

9.3.3

Composition and combinations of linear transformations

The composition of linear transformations is again a linear transformation. If


T : V W and S : W U, then ST is the linear transformation given by
ST (v) = S(T (v)) = S(w) = u,
T

where w = T (v). Note that ST means do T and then do S: V W U. (For ST ,


work from the inside, out).
In terms of matrices,
ST (v) = S(T (v)) = S(AT v) = AS AT v.
That is, AST = AS AT . The matrix of the composition is obtained by matrix
multiplication of the matrices of the linear transformations. The order is important.
Composition of linear transformations, like multiplication of matrices, is not
commutative.
A linear combination of linear transformations is again a linear transformation. If
S, T : V W are linear transformations between the same vector spaces, then S + T
and S, R, are linear transformations, and therefore, so is S + T for any
, R.

9.3.4

Inverse linear transformations

If V and W are finite dimensional vector spaces of the same dimension, then the inverse
of a linear transformation T : V W is the linear transformation T 1 : W V such
that
T 1 (T (v)) = v.
If T 1 exists, then its matrix satisfies T 1 (T (v)) = AT 1 AT v = Iv. That is, T 1 exists
if and only if (AT )1 exists, and (AT )1 = AT 1 .
Example 9.4 In R2 , the inverse of rotation anticlockwise by an angle is rotation
clockwise by the same angle. Thinking of clockwise rotation by as anticlockwise
rotation by an angle , the matrix of rotation clockwise by is given by,
AT 1 =

cos() sin()
sin() cos()

cos sin
sin cos

This is easily checked,


AT 1 AT =

220

cos
sin

sin
cos

cos sin
sin cos

1 0
0 1

9.3. Linear transformations and matrices

Activity 9.4 Check this by multiplying the matrices. (You should note that
sin2 + cos2 = 1: see the subject guide for MT1174 Calculus.)

Example 9.5 Is there an inverse to the first example we considered,


x
x+y+z
?
T y =
xy
z
x + 2y 3z
We found,

1 1
1
AT = 1 1 0 .
1 2 3

Since |A| = 9, the matrix is invertible, and

3
1
3
A1
=
T
9
3

That is,

T 1

u
v
w

T 1 is given by the matrix

5
1
4 1 .
1 2

+ 59 v + 91 w
49 v + 91 w .
=
1
u 19 v + 29 w
3
1
u
3
1
u
3

Activity 9.5 Check that T 1 T = I.

9.3.5

Linear transformations from V to W

Theorem 9.2 Let V be a finite dimensional vector space and let T be a linear
transformation from V to a vector space W . Then T is completely determined by what
it does to a basis of V .
Proof
Let dim(V ) = n, and let B = {v1 , v2 , . . . , vn } be a basis of V . Then any v V , can be
uniquely expressed as a linear combination of these basis vectors,
v = a1 v1 + a2 v2 + + an vn .
Then

T (v) = T (a1 v1 + a2 v2 + + an vn ) = a1 T (v1 ) + a2 T (v2 ) + + an T (vn ).


That is, the image T (v) is the same linear combination of the images of the basis
vectors.

221

9. Linear transformations, change of basis

9.4 Range and null space


Just as we have the range and null space of a matrix, so we have the range and null
space of a linear transformation, defined as follows.
Definition 9.2 (Range and null space of a linear transformation) Suppose that T
is a linear transformation from vector space V to vector space W . Then the range,
R(T ), of T is
R(T ) = {T (v) : v V },
and the null space, N(T ), of T is

N(T ) = {v V : T (v) = 0},


where 0 denotes the zero vector of W .
The null space is also called the kernel, and may be denoted ker(T ) in some texts.
Of course, for any matrix, A, R(TA ) = R(A) and N(TA ) = N(A).
Activity 9.6 Prove this last statement.
The range and null space of a linear transformation T : V W are subspaces of W and
V , respectively.
Activity 9.7 Prove this!

Example 9.6 We find the null space and range of the linear transformation
S : R2 R4 ,

x+y
x
x

S
=
x y.
y
y
The matrix of the linear transformation is

1 1
1 0

AS =
1 1 .
0 1

Observe that this matrix has rank 2 (by having two linearly independent columns, or
you could alternatively see this by putting it into row echelon form), so that
N(S) = {0}, the subspace of R2 consisting of only the zero vector. This can also be
seen directly from the fact that


0
x+y
x 0
x
0

=
.
x y = 0 x = 0, y = 0
y
0
y
0
The range, R(S) is the two-dimensional subspace of R4 with basis given by the
column vectors of AS .

222

9.5. Rank and nullity

9.5 Rank and nullity


If V and W are both finite dimensional, then so are R(T ) and N(T ). We define the rank
of T , rank(T ) to be dim(R(T )) and the nullity of T , nullity(T ), to be dim(N(T )). As for
matrices, there is a strong link between these two dimensions:
Theorem 9.3 (Dimension theorem) or (Rank-nullity theorem for linear
transformations)
Suppose that T is a linear transformation from the finite-dimensional vector space V to
the vector space W . Then
rank(T ) + nullity(T ) = dim(V ).
(Note that this result holds even if W is not finite-dimensional.)
Proof
Assume that dim(V ) = n and nullity(T ) = k. We need to show that rank(T ) = n k. Let
{v1 , v2 , . . . , vk } be a basis of the null space, N(T ). As N(T ) is a subspace of V , we can
extend this basis to a basis of V , {v1 , v2 , . . . , vk , vk+1, . . . , vn } (by Theorem 8.11). For
any v V we have v = a1 v1 + a2 v2 + + an vn . Then
T (v) = a1 T (v1 ) + + ak T (vk ) + ak+1 T (vk+1 ) + + an T (vn )
=

T (v) = ak+1 T (vk+1 ) + an T (vn )

since T (vi ) = 0 for i = 1, . . . , k. Hence the vectors {T (vk+1 ), , T (vn )} span the range,
R(T ). If they are a basis of R(T ), then rank(T ) = n k. So it only remains to show that
they are linearly independent.
If there is a linear combination of the vectors equal to the zero vector,
bk+1 T (vk+1) + + bn T (vn ) = T (bk+1 vk+1 + + bn vn ) = 0,

then the vector bk+1 vk+1 + + bn vn is in the null space of T , and can be written as a
linear combination of the basis vectors of N(T ),
bk+1 vk+1 + + bn vn = b1 v1 + + bk vk .
Rearranging, we have
b1 v1 + + bk vk bk+1 vk+1 bn vn = 0.
But {v1 , v2 , . . . , vk , vk+1, . . . , vn } is a basis of V , hence all coefficients bi = 0. This
shows that {T (vk+1), , T (vn )} are linearly independent and the theorem is proved.
For an m n matrix A, if T = TA , then T is a linear transformation from V = Rn to
W = Rm , and rank(T ) = rank(A), nullity(T ) = nullity(A), so this theorem states the
earlier result that
rank(A) + nullity(A) = n.

223

9. Linear transformations, change of basis

Example 9.7 Is it possible to construct a linear transformation T : R3 R3 with


N(T ) = t 2
: tR ,
R(T ) = xy-plane?

A linear transformation T : R3 R3 must satisfy the dimension theorem with n = 3:


nullity(T ) + rank(T ) = 3.
Since the dimension of the null space of T is 1 and the dimension of R(T ) is 2, the
rank-nullity theorem is satisfied, so at this stage, we certainly cant rule out the
possibility that such a linear transformation exists. (Of course, if it was not satisfied,
wed know straight away that we couldnt have a linear transformation of the type
suggested.)
To find a linear transformation T with N(T ) and R(T ) as above, we construct a
matrix AT , which must be 3 3 since T : R3 R3 . Note that if R(AT ) = R(T ) is
the xy-plane, then the column vectors of AT must be linearly dependent and
include a basis for this plane. You can take any two linearly independent vectors in
the xy-plane to be the first two columns of the matrix, and the third column must
be a linear combination of the first two. The linear dependency condition they must
satisfy is given by the basis of the null space.
For example, we take the first two column vectors to be the standard basis vectors,
c1 = e1 , and c2 = e2 . Then using the null space basis vector v, AT v = 0:

1

AT v = ( c1 c2 c3 ) 2 = 1 c1 + 2 c2 + 3 c3 = 0 .
3

Therefore, we must have c3 = 13 c1 23 c2 , so that one possible linear


transformation satisfying these conditions is given by the matrix

1 0 13
AT = 0 1 23 .
0 0 0

9.6 Coordinate change


Suppose that the vectors v1 , v2 , . . . , vn form a basis B for Rn . Then, as we have seen,
any x Rn can be written in exactly one way as a linear combination,
x = 1 v1 + 2 v2 + + n vn ,

224

9.6. Coordinate change

of the vectors in the basis. The vector

1
2

[x]B =
...
n

is called the coordinate vector of x with respect to the basis B = {v1 , v2 , . . . , vn }.

One very straightforward observation is that the coordinate vector of any x Rn with
respect to the standard basis is just x itself. This is because if x = (x1 , x2 , . . . , xn )T ,
x = x1 e1 + x2 e2 + + xn en .
What is less immediately obvious is how to find the coordinates of a vector x with
respect to a basis other than the standard one.
Example 9.8 Suppose that we let B be the following basis of R3 :

2
3
1
B = 2 , 1 , 2 .

3
3
1

If x is the vector (5, 7, 2)T , then the coordinate vector of x with respect to B is

1
[x]B = 1 ,
2 B
because

1
2
3
x = 1 2 + (1) 1 + 2 2 .
3
3
1

To find the coordinates of a vector with respect to a basis {v1 , v2 , . . . , vn }, we need to


solve the system of linear equations
a1 v1 + a2 v2 + + an vn = x,
which in matrix form is
(v1 v2 . . . vn )a = x
with a = (a1 , a2 , . . . , an )T . In other words, if we let PB be the matrix whose columns are
the basis vectors (in order),
PB = (v1 v2 . . . vn ).
Then for any x Rn ,

x = PB [x]B .

The matrix PB is invertible (because its columns are linearly independent, and hence its
rank is n). So we can also write
[x]B = PB1 x.

225

9. Linear transformations, change of basis

Definition 9.3 (Transition matrix) If B = {v1 , v2 , . . . , vn } is a basis of Rn , the matrix


PB = (v1 v2 . . . vn )
whose columns are the B basis vectors is called the transition matrix from B
coordinates to standard coordinates. Then the matrix PB1 is the transition matrix from
standard coordinates to coordinates in the basis B.
Note that, considered as the matrix of a linear transformation, P (x) = PB x, the
transition matrix from B coordinates to standard coordinates, actually maps the
standard basis vectors, ei , to the new basis vectors, vi . That is, P (ei ) = vi .
Example 9.9 Suppose we wish to change basis in R2 by a rotation of the axes 4
radians anticlockwise. What are the coordinates of a vector with respect to this new
basis, B = {v1 , v2 }?

The matrix of the linear transformation which performs this rotation is given by
AT =

cos 4
sin 4

sin 4
cos 4

1
2
1
2

12

= PB ,

1
2

where the column vectors of the matrix are the new basis vectors, v1 , v2 , so the
matrix is also the transition matrix from B coordinates to standard coordinates;
that is, we have v = PB [v]B . Then the coordinates of a vector with respect to the
new basis are given by [v]B = PB1 v. The inverse of rotation anticlockwise, is
rotation clockwise, so we have,
PB1

cos( 4 ) sin( 4 )
sin( 4 ) cos( 4 )

sin 4
cos 4

cos 4
sin 4

1
2
12

1
2
1
2

From a different viewpoint, consider the vector


1
1

x=

1
2
1
2

2v1 .

What are
its coordinates in the new basis B? We can find these directly since we
have x = 2v1 + 0v2 , and in B coordinates
[v1 ]B =

1
0

and
B

so that,
[x]B =
Note that
x = PB [x]B =

0
1

[v2 ]B =

1
2
0

1
2
1
2

=
B

12
1
2

2
0

.
B

2
0

,
B

=
B

1
1

Given a basis B of Rn with transition matrix PB , and another basis B with transition

226

9.7. Change of basis and similarity

matrix PB , how do we change from coordinates in the basis B to coordinates in the


basis B ?
The answer is quite simple. First we change from B coordinates to standard coordinates
using v = PB [v]B and then change from standard coordinates to B coordinates using
[v]B = PB1 v. That is,
[v]B = PB1 PB [v]B .
The matrix M = PB1 PB is the transition matrix from B coordinates to B coordinates.
In practice, the easiest way to obtain the matrix M is as the product of the two
transition matrices, M = PB1 PB . But lets look more closely at the matrix M. If the
basis B is the set of vectors B = {v1 , v2 , . . . , vn }, then these are the columns of the
transition matrix, PB = (v1 v2 . . . vn ). Looking closely at the columns of the product
matrix,
M = PB1 PB = PB1 (v1 v2 . . . vn ) = (PB1 v1 PB1 v2 . . . PB1 vn ),
that is, each column of the matrix M is obtained by multiplying the matrix PB1 by the
corresponding column of PB . But PB1 vi is just the B coordinates of the vector vi , so
the matrix M is given by
M = ([v1 ]B [v2 ]B . . . [vn ]B ).
We have, therefore, established the following result.
Theorem 9.4 If B and B are two bases of Rn , with B = {v1 , v2 , . . . , vn }, then the
transition matrix from B coordinates to B coordinates is given by
M = ([v1 ]B [v2 ]B . . . [vn ]B ).

9.7 Change of basis and similarity


We have already seen that if T is a linear transformation from Rn to Rm , then there is a
corresponding matrix AT such that T (x) = AT x for all x. The matrix AT is given by
(T (e1 ) T (e2 ) . . . T (en )).
This matrix is obtained using the standard basis in both Rn and in Rm .
Now suppose that B is a basis of Rn and B a basis of Rm , and suppose we want to
know the coordinates [T (x)]B of T (x) with respect to B , given the coordinates [x]B of
x with respect to B. Is there a matrix M such that
[T (x)]B = M[x]B
for all x? Indeed there is, as the following result shows.

Theorem 9.5 Suppose that B = {v1 , . . . , vn } and B = {v1 , . . . , vm


} are (ordered)
n
m
n
m
bases of R and R and that T : R R is a linear transformation. Let
M = AT [B, B ] be the m n matrix with ith column equal to [T (vi )]B , the coordinate
vector of T (vi ) with respect to the basis B . Then for all x, [T (x)]B = M[x]B .

227

9. Linear transformations, change of basis

The matrix AT [B, B ] is called the matrix representing T with respect to bases B and
B . A direct proof of this theorem can be found in the texts,1 but instead lets take the
indirect approach. Lets look at the stages of transition which occur from changing basis
from B to standard, performing the linear transformation in standard coordinates, and
then changing to the basis B .
It is instructive (and more useful) to find the matrix AT [B, B ] in terms of the matrix
AT (representing T in standard coordinates) and the transition matrices for the bases B
and B . Let PB and PB be, respectively, the transition matrix from B coordinates to
standard coordinates in Rn and the transition matrix from B coordinates to standard
coordinates in Rm . (So PB is an n n matrix having the basis vectors of B as columns
and PB is an m m matrix having the basis vectors of B as columns.) Then we know
that for any v Rn , v = PB [v]B . Similarly, for any u Rm , u = PB [u]B , so
[u]B = PB1 u. We therefore have (taking u = T (x))
[T (x)]B = PB1 T (x).
Now, for any x Rn (in standard coordinates), T (x) = AT x where
AT = (T (e1 ) T (e2 ) . . . T (en ))
is the matrix corresponding to T . So we have (taking v = x)
[T (x)]B = PB1 T (x) = PB1 AT x = PB1 AT PB [x]B = (PB1 AT PB )[x]B .
Since this is true for all x, we have therefore obtained the following result.

Theorem 9.6 Suppose that T : Rn Rm is a linear transformation, that B is a basis


of Rn and B is a basis of Rm . Let PB and PB be the matrices whose columns are,
respectively, the vectors of B and B . Then the matrix representing T with respect to B
and B is given by
AT [B, B ] = PB1 AT PB ,
where
AT = (T (e1 ) T (e2 ) . . . T (en )) .
So, for all x,
[T (x)]B = PB1 AT PB [x]B .
If you analyse the columns of the matrix product AT [B, B ] = PB1 AT PB in the same
way as we did for Theorem 9.4, you can obtain a proof of Theorem 9.5.
Activity 9.8 Try this sometime, but not necessarily now.
Thus, if we change the basis from the standard bases of Rn and Rm , the matrix
representation of the linear transformation changes.
A particular case of this theorem is so important it is worth stating separately. It
corresponds to the case in which m = n and B = B.
1

See Anthony and Harvey, Section 7.4.

228

9.7. Learning outcomes

Theorem 9.7 Suppose that T : Rn Rn is a linear transformation and that


B = {x1 , x2 , . . . , xn } is some basis of Rn . Let
P = (x1 x2 . . . xn )
be the matrix whose columns are the vectors of B. Then for all x Rn ,
[T (x)]B = P 1 AT P [x]B ,
where AT is the matrix corresponding to T ,
AT = (T (e1 ) T (e2 ) . . . T (en )).
In other words,
AT [B, B] = P 1 AT P.
The relationship between the matrices AT [B, B] and AT is a central one in the theory of
linear algebra. The matrix AT [B, B] performs the same linear transformation as the
matrix AT , only AT [B, B] describes it in terms of the basis B rather than in standard
coordinates. This likeness of effect inspires the following definition.
Definition 9.4 (Similarity) We say that two square matrices A and M are similar if
there is an invertible (non-singular) matrix P such that M = P 1AP .
Note that similar has a very precise meaning here: it doesnt mean that the matrices
somehow look like each other (as normal use of the word similar would suggest), but
that they represent the same linear transformation in different bases.
As we shall see in the remaining chapters, this relationship can be used to great
advantage if the new basis B is chosen carefully.

Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by a linear transformation and be able to prove a given
mapping is linear
explain what is meant by the range and null space, and rank and nullity of a linear
transformation
know the dimension theorem (the rank-nullity theorem) for linear transformations
and be able to apply it
comprehend the two-way relationship between matrices and linear transformations
find the matrix representation of a transformation with respect to two given bases
know how to change between different bases of a vector space
know what it means to say that two square matrices are similar.

229

9. Linear transformations, change of basis

Exercises
Exercise 9.1
Find bases for the null space and range of the linear transformation T : R3 R3 given
by

x1
x1 + x2 + 2x3
.
T x2 =
x1 + x3
x3
2x1 + x2 + 3x3
Verify the dimension theorem. Is T invertible?
Exercise 9.2
Let T and S be linear transformations of R2 R2 with respective matrices:
AT =

1
2
1
2

12
1
2

1 0
0 1

AS =

Sketch the vectors e1 and e2 in the xy-plane, and sketch the unit square. Describe the
effect of T in words, and illustrate it using the unit square, by adding the images T (e1 )
and T (e2 ) to your sketch (and filling in the image of the unit square). Do the same for
the linear transformation S.
Now consider the composed linear transformations ST and T S. Illustrate the effect of
ST and T S using the unit square (by first performing one linear transformation and
then the other). Then calculate their matrices to check that ST = T S.
Exercise 9.3

Suppose that T : R2 R3 is the linear transformation given by

x2
x1
T
= 5x1 + 13x2 .
x2
7x1 + 16x2

Find the matrix AT [B, B ] of T with respect to the bases B = {(3, 1)T , (5, 2)T } and
B = {(1, 0, 1)T , (1, 2, 2)T , (0, 1, 2)T }.
Exercise 9.4
Consider the vectors

1

v1 = 0 ,
1

1
v2 = 1 ,
2

v3 = 1
5

and

u = 2
3

Show that B = {v1 , v2 , v3 } is a basis of R3 . Find the B coordinates of u and hence


express u as a linear combination of v1 , v2 , v3 .
A linear transformation S : R3 R3 is known to have the following effect
S(v1 ) = e1

230

S(v2 ) = e2

S(v3 ) = e3

9.7. Comments on selected activities

where e1 , e2 , e3 are the standard basis vectors in R3 . Using properties of linear


transformations, find S(u).
Find, if possible, the null space of S and the range of S. Write down the corresponding
matrix AS .
Exercise 9.5
Let T be a linear transformation, T : R3 R2 whose null space, N(T ), consists of
vectors x = (x, y, z)T R3 with x = y = z and whose range, R(T ) is R2 .

Find a basis of N(T ). State the dimension theorem for linear transformations and use
it to verify that such a linear transformation T can exist.
If T maps e1 , e2 R3 to the standard basis vectors in R2 , find a matrix AT such that
T (x) = AT x. Write down an expression for T (x) as a vector in R2 in terms of x, y, z.
Exercise 9.6
If S and T are the linear transformations given in the previous two questions, decide
which composed linear transformation, ST or T S is defined and find its corresponding
matrix.
Exercise 9.7
Show that each of the sets B and B is a basis of R3 :

1
0
1
0
1
1
B = 010
and
B = 1 1 1

1
3
1
1
0
1

Write down the transition matrix P from B coordinates to standard. Write down the
transition matrix Q from B coordinates to standard.
Find the transition matrix from B coordinates to B coordinates.

2
If [x]B = 1 , find [x]B .
3 B

Comments on selected activities


Feedback to activity 9.1
To show that the condition is equivalent to the other two, we need to prove two things:
first, that the two conditions imply this one and, second, that this single condition
implies the other two. So suppose the two conditions of the definition hold and suppose
that u, v V and , R. Then we have T (u) = T (u) and T (v) = T (v) (by
property 2) and, by property 1, we then have
T (u + v) = T (u) + T (v) = T (u) + T (v),
as required. On the other hand, suppose that for all u, v V and , R, we have
T (u + v) = T (u) + T (v). Then property 1, follows on taking = = 1 and
property 2 follows on taking = 0.

231

9. Linear transformations, change of basis

Feedback to activity 9.2


T (u) = pu , and T (u) = pu , so we need to check that pu = pu . Now, for all x,
pu (x) =
=
=
=

pu1 ,u2 ,...,un (x)


(u1 )x + (u2 )x2 + + (un )xn
(u1 x + u2 x2 + + un xn )
pu (x),

as required.
Feedback to activity 9.3
Rotation by radians is given by matrix A, whereas reflection in the y-axis is given by
matrix B:
1 0
1 0
A=
B=
.
0 1
0 1
Feedback to activity 9.6
This is just definition-chasing. By definition, TA is the mapping given by TA (x) = Ax
and
R(TA ) = {TA (x) : x V } = {Ax : x V } = R(A),
N(TA ) = {x V : TA (x) = 0} = {x V : Ax = 0} = N(A).
Feedback to activity 9.7
This is very similar to the proofs in the previous chapter that, for a matrix A, R(A) and
N(A) are subspaces.

First, we show R(T ) is a subspace of W . Note that it is non-empty since T (0) = 0 and
hence it contains 0. (The fact that T (0) = 0 can be seen in a number of ways. For
instance, take any x V . Then T (0) = T (0x) = 0T (x) = 0.) We need to show that if
u, v R(T ) then u + v R(T ) and, for any R, v R(T ). Suppose u, v R(T ).
Then for some y1 , y2 V , u = T (y1 ), v = T (y2 ). Now,
u + v = T (y1 ) + T (y2 ) = T (y1 + y2 ),
and so u + v R(T ). Next,
v = (T (y1 )) = T (y1 ),
so v R(A).

Now consider N(T ). It is non-empty because the fact that T (0) = 0 shows 0 N(T ).
Suppose u, v N(A) and R. Then to show u + v N(T ) and u N(T ), we must
show that T (u + v) = 0 and T (u) = 0. We have
T (u + v) = T (u) + T (v) = 0 + 0 = 0
and
T (u) = (T (u)) = 0 = 0,
so we have shown what we needed.

232

9.7. Comments on exercises

Comments on exercises
Solution to exercise 9.1
The matrix AT and its reduced row

1 1
AT = 1 0
2 1

echelon form are

2
1 0 1
1 0 1 1
3
0 0 0

A basis for the null space is {(1, 1, 1)T }, and a basis for the range is

1
1
1,0 .

2
1
There are other possible answers. To verify the dimension theorem,
rank(T ) + nullity(T ) = 2 + 1 = 3 = dim(R3 )
This linear transformation is not invertible, as A1
T does not exist.
Solution to exercise 9.2
To sketch the effect of T on the unit square, mark off a unit square on a set of axes.
Mark the unit vector in the x direction, e1 , in one colour, and the unit vector in the y
direction, e2 in another colour (or differentiate between them by single and double
arrowheads). Now draw the vector images of these, T (e1 ) and T (e2 ), in the same
colours, and complete the image of the unit square with these vectors as its two
corresponding sides.
The linear transformation T is rotation anticlockwise by

radians.

Sketch the unit square and its image under S in the same way. The linear
transformation S is reflection in the y-axis.

The linear transformation ST is first do T and then do S, since ST (v) = S(T (v)).
Think about what this means geometrically. First rotate the unit square by 4 and then
reflect this in the y-axis. The linear transformation T S is very different, first reflect in
the y-axis, and then rotate by 4 .
For the illustrations of ST and T S using the unit square, continue as above. Draw a
unit square marking e1 and e2 distinctively on it. For ST , first find the images under T ,
T (e1 ) and T (e2 ) and mark these, then find the images of these vectors under S. Now
complete the unit square. When you have completed both sketches, you will find the
images under ST and T S in very different positions.
Calculating their matrices will verify that ST = T S:
AST = AS AT =

AT S = AT AS =

1 0
0 1
1
2
1
2

12
1
2

1
2
1
2

12
1
2

1 0
0 1

12
1
2

12
12

1
2
1
2

12
1
2

233

9. Linear transformations, change of basis

The columns of AST should be ST (e1 ) and ST (e2 ). Check that this matches your
sketch. Do the same for T S.
Solution to exercise 9.3
The answer, using the notation in the final section of the chapter, is
AT [B, B ] = PB1 AT PB . Now,

0
1
3 5
AT = 5 13 ,
PB =
1 2
7 16
and

PB1

1 2
1 1 0
3
= 0
2 1 = 31
2
1 2 2
3

2
3
2
3

13

13
13
2
3

where we have omitted the details of the calculation of this inverse. It then follows that

1
3
AT [B, B ] = PB1 AT PB = 0
1 .
2 1
Solution to exercise 9.4
Write the vectors vi as the columns of a matrix,

1 1 0
PB = 0 1 1 .
1 2 5

Since |PB | = 2 = 0, the columns are linearly independent and hence a basis of R3 . PB is
the transition matrix from B coordinates to standard coordinates, v = PB [v]B . Finding
PB1 by the cofactor method, or otherwise, the B coordinates of u are

3
5 1
1
5
1
[u]B = PB1u = 1
5 1 2 = 4 .
2
1 3 1
3
2 B

Hence u = 5v1 + 4v2 2v3 .

Using properties of linear transformations,

5
S(u) = S(5v1 + 4v2 2v3 ) = 5S(v1 ) + 4S(v2 ) 2S(v3 ) = 5e1 + 4e2 2e3 = 4 .
2

Since R(S) is spanned by {e1 , e2 , e3 }, R(S) = R3 and N(S) = {0}. The linear
transformation S is the inverse of the linear transformation T with T (e1 ) = v1 ,
T (e2 ) = v2 , T (e3 ) = v1 , which has matrix PB , so the matrix AS = PB1 .

234

9.7. Comments on exercises

Solution to exercise 9.5


For the given linear transformation, T : R3 R2 ,

1
x

N(T ) = y : x = y = z = s 1 : s R ,

z
1

so that a basis of N(T ) is the vector (1, 1, 1)T . The dimension theorem states that the
dimension of the range plus the dimension of the null space is equal to the dimension of
the domain, which is R3 . We have nullity(T ) = 1 and rank(T ) = 2 since R(T ) = R2 .
That is, rank(T ) + nullity(T ) = 2 + 1 = 3 which confirms that T can exist. Given that
T (e1 ) and T (e2 ) are the standard basis vectors of R2 , we have
A

1 0 1
0 1 1

where the last column is obtained from the basis of the null space, since
c1 + c2 + c3 = 0. Then

x
1 0 1
xz
T (x) =
y =
.
0 1 1
yz
z
Solution to exercise 9.6
Only the linear transformation T S is defined, with AT S = AT AS ,

3
5 1
1 0 1 1
2 4 1
1
5 1 =
AT S =
0 1 1 2
1 4 1
1 3 1

Solution to exercise 9.7


That B and B are bases can be
transition matrices are

P = 0
1

checked in one of the usual ways. The required

1 0
1 1
0

1 0 , Q = 1 1 1 .
3 1
1 0 1

The transition matrix from B coordinates to B coordinates is P 1Q, where P is the


transition matrix from B coordinates to standard, and Q is the one for B. So you first
need to write down P and Q and calculate P 1 (steps not shown). Then

1 1 0
1 1
0
0
2 1
P 1 Q = 0
1 0 1 1 1 = 1 1 1
1 2 1
1 0 1
2 1 3

Computing [x]B using this matrix,


2
1
0
2 1
[x]B = 1 1 1 1 = 4
3 B
12 B
2 1 3

You could check your solution by finding x in standard coordinates from both bases.

235

9. Linear transformations, change of basis

236

Chapter 10
Diagonalisation
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 8.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapter 5, Sections 5.15.2.

Aims of the chapter


One of the most useful techniques in applications of matrices and linear algebra is
diagonalisation. This relies on the topic of eigenvalues and eigenvectors, and is related
to change of basis. We will learn how to find eigenvalues and eigenvectors of an n n
matrix, how to diagonalise a matrix when it is possible to do so and also how to
recognise when it is not possible.
All matrices in this chapter are square n n matrices with real entries, so all vectors
will be in Rn for some n.

10.1 Eigenvalues and eigenvectors


10.1.1

Definitions

Definition 10.1 Suppose that A is a square matrix. The number is said to be an


eigenvalue of A if for some non-zero vector x, Ax = x. Any non-zero vector x for
which this equation holds is called an eigenvector for eigenvalue or an eigenvector of
A corresponding to eigenvalue .

10.1.2

Finding eigenvalues and eigenvectors

To determine whether is an eigenvalue of A, we need to determine whether there are


any non-zero solutions x to the matrix equation Ax = x. Note that the matrix
equation Ax = x is not of the standard form, since the right-hand side is not a fixed
vector b, but depends explicitly on x. However, we can rewrite it in standard form.
Note that x = Ix, where I is, as usual, the identity matrix. So, the equation is
equivalent to Ax = Ix, or Ax Ix = 0, which is equivalent to (A I)x = 0.

Now, a square linear system Bx = 0 has solutions other than x = 0 precisely when
|B| = 0. Therefore, taking B = A I, is an eigenvalue if and only if the determinant

237

10

10. Diagonalisation

of the matrix A I is zero. This determinant, p() = |A I| is a polynomial of


degree n in the variable .
Definition 10.2 (Characteristic polynomial and equation) The polynomial
|A I| is known as the characteristic polynomial of A, and the equation |A I| = 0
is called the characteristic equation of A.
To find the eigenvalues, we solve the characteristic equation |A I| = 0. Let us
illustrate with a 2 2 example.
Example 10.1

Let
A=

Then
A I =

7 15
2 4

7 15
2 4

1 0
0 1

and the characteristic polynomial is

7
15
2
4

7
15
2
4
= (7 )(4 ) + 30
= 2 3 28 + 30
= 2 3 + 2.

|A I| =

So the eigenvalues are the solutions of 2 3 + 2 = 0. To solve this for , one could
use either the formula for the solutions to a quadratic equation, or simply observe
that the characteristic polynomial factorises. We have ( 1)( 2) = 0 with
solutions = 1 and = 2. Hence the eigenvalues of A are 1 and 2, and these are the
only eigenvalues of A.

10

To find an eigenvector for each eigenvalue , we have to find a non-trivial solution to


(A I)x = 0, meaning a solution other than the zero vector. (We stress the fact that
eigenvectors cannot be the zero vector because this is a mistake many students make.)
This is easy, since for a particular value of , all we need to do is solve a simple linear
system. We illustrate by finding the eigenvectors for the matrix of the example just
given.
Example 10.2

We find the eigenvectors of


A=

7 15
2 4

We have seen that the eigenvalues are 1 and 2. To find the eigenvectors for
eigenvalue 1 we solve the system (A I)x = 0. We do this by putting the coefficient
matrix A I into reduced echelon form.
(A I) =

238

6 15
2 5

1 25
0 0

10.1. Eigenvalues and eigenvectors

This system has solutions


v=t

5
2

for any t R.

There are infinitely many eigenvectors for 1: for each t = 0, v is an eigenvector of A


corresponding to = 1. But be careful not to think that you can choose t = 0; for
then v becomes the zero vector, and this is never an eigenvector, simply by
definition. To find the eigenvectors for 2, we solve (A 2I)x = 0 by reducing the
coefficient matrix,
(A 2I) =

5 15
2 6

1 3
0 0

Setting the non-leading variable equal to t, we obtain the solutions


v=t

3
1

t R.

Any non-zero scalar multiple of the vector (3, 1)T is an eigenvector of A for
eigenvalue 2.
Note that each system of equations is simple enough to be solved directly. For
example, if x = (x1 , x2 )T , the system (A 2I)x = 0 consists of the equations
5x1 15x2 = 0 ,

2x1 6x2 = 0.

Clearly both equations are equivalent to x1 = 3x2 . If we set x2 = t for any real
number t, then we obtain the eigenvectors for = 2 as before.
So why do we prefer row operations? There are two reasons. The first reason is that the
system of equations may not be as simple as the one just given, particularly for an
n n matrix where n > 2. The second reason is that putting the matrix A I into
echelon form provides a useful check on the eigenvalue. If |A I| = 0, the echelon form
of A I must have a row of zeros, and the system (A I)x = 0 will have a
non-trivial solution. If we have reduced the matrix (A 0 I) for some supposed
eigenvalue 0 and do not obtain a zero row, we know immediately that there is an error,
either in the row reduction or in the choice of 0 , and we can go back and correct it.
Examples in R3
We now give two examples with 3 3 matrices.
Example 10.3

Suppose that

4 0 4
A = 0 4 4.
4 4 8

Find the eigenvalues of A and find the corresponding eigenvectors for each
eigenvalue.

239

10

10. Diagonalisation

To find the eigenvalues we solve |A I| = 0. Now,


|A I| =

4
0
4
0
4
4
4
4
8

4
4
0 4
+4
4
8
4
4
= (4 ) ((4 )(8 ) 16) + 4 (4(4 ))
= (4 ) ((4 )(8 ) 16) 16(4 ).
= (4 )

We notice that each of the two terms in this expression has 4 as a factor, so
instead of expanding everything, we take 4 out as a common factor, obtaining
|A I| =
=
=
=

(4 ) ((4 )(8 ) 16 16)


(4 )(32 12 + 2 32)
(4 )(2 12)
(4 )( 12).

It follows that the eigenvalues are 4, 0, 12. (The characteristic polynomial will not
always factorise so easily. Here it was simple because of the common factor 4 .
The next example is more difficult.)
To find an eigenvector for 4, we have to solve the equation (A 4I)x = 0 for
x = (x1 , x2 , x3 )T . Using row operations, we have

0 0 4
1 1 0
0 0 4 . . . 0 0 1 .
4 4 4
0 0 0

10

Thus x3 = 0 and setting the free variable x2 = t, the solutions are

1
x = t 1 ,
t R.
0
So the eigenvectors for = 4 are the non-zero multiples of

1
v1 = 1 .
0

Activity 10.1 Determine the eigenvectors for 0 and 12. Check your answers: verify
that Av = v for each eigenvalue and one corresponding eigenvector.
Example 10.4

240

Let

3 1 2
A = 1 1 1 .
1
1
0

10.1. Eigenvalues and eigenvectors

Given that 1 is an eigenvalue of A, find all the eigenvalues of A.


We calculate the characteristic polynomial of A:
|A I| =

3
1
2
1
1 1
1
1

= (3 )

1 1
1 1
1 1
(1)
2
1

1
1
1

= (3 )(2 + 1) + ( 1) 2(2 + )
= 3 42 5 2
= (3 + 42 + 5 + 2) .

Now, the fact that 1 is an eigenvalue means that 1 is a solution of the equation
|A I| = 0, which means that (1), that is, + 1, is a factor of the
characteristic polynomial |A I|. So this characteristic polynomial can be written
in the form
( + 1)(a2 + b + c).

Clearly we must have a = 1 and c = 2 to obtain the correct 3 term and the correct
constant. Using this, and comparing the coefficients of either 2 or with the cubic
polynomial, we find b = 3. In other words, the characteristic polynomial is
(3 + 42 + 5 + 2) = ( + 1)(2 + 3 + 2) = ( + 1)( + 2)( + 1).

Activity 10.2 Perform the calculations to check that b = 3 and that the
characteristic polynomial factorises as stated.
We have, |A I| = ( + 1)2 ( + 2). The eigenvalues are the solutions to |A I| = 0,
so they are = 1 and = 2.
Note that in this case, there are only two distinct eigenvalues. We say that the
eigenvalue 1 has occurred twice, or the = 1 is an eigenvalue of multiplicity 2. We
will find the eigenvectors when we look at this example again in section 10.2.1.

10.1.3

Eigenspaces

If A is an n n matrix and is an eigenvalue of A, then the set of eigenvectors


corresponding to the eigenvalue together with the zero vector, 0, is a subspace of Rn .
Why?
We have already seen that the null space of any m n matrix is a subspace of Rn . The
null space of the n n matrix A I, consists of all solutions to the matrix equation
(A I)x = 0, which is precisely the set of all eigenvectors corresponding to together
with the vector 0.
Definition 10.3 (Eigenspace) If A is an n n matrix and is an eigenvalue of A,
then the eigenspace of the eigenvalue is the subspace N(A I) of Rn .

241

10

10. Diagonalisation

10.2 Diagonalisation of a square matrix


Recall that square matrices A and M are similar if there is an invertible matrix P such
that P 1 AP = M. We met this idea earlier when we looked at how a matrix
representing a linear transformation changes when the basis is changed. We now begin
to explore why this is such an important and useful concept.
Definition 10.4 The matrix A is diagonalisable if it is similar to a diagonal matrix; in
other words, if there is a diagonal matrix D and an invertible matrix P such that
P 1 AP = D.
Suppose that the n n matrix A is diagonalisable, and that P 1 AP = D, where D is a
diagonal matrix

1
0
D = diag(1 , 2 , . . . , n ) =
0
0

0
2
0
0

0
0
.
..
. 0
n

(Note the useful notation for describing the diagonal matrix D.) Then we have
AP = P D. If the columns of P are the vectors v1 , v2 , . . . , vn , then
AP = A(v1 . . . vn ) = (Av1 . . . Avn ),
and

1
0
P D = (v1 . . . vn )
0
0

So this means that

Av1 = 1 v1 ,

10

0
2
0
0

0
0
= (1 v1 . . . n vn ).
..
. 0
n

Av2 = 2 v2 ,

...,

Avn = n vn .

The fact that P 1 exists means that none of the vectors vi is the zero vector. So this
means that (for i = 1, 2, . . . , n) i is an eigenvalue of A and vi is a corresponding
eigenvector. Since P has an inverse, these eigenvectors are linearly independent.
Therefore, A has n linearly independent eigenvectors. Conversely, if A has n linearly
independent eigenvectors, then the matrix P whose columns are these eigenvectors will
be invertible, and we will have P 1 AP = D where D is a diagonal matrix with entries
equal to the eigenvalues of A. We have therefore established the following result.
Theorem 10.1 An n n matrix A is diagonalisable if and only if it has n linearly
independent eigenvectors.
Since n linearly independent vectors in Rn form a basis of Rn , another way to state this
theorem is:
Theorem 10.2 An n n matrix A is diagonalisable if and only if there is a basis of
Rn consisting of eigenvectors of A.

242

10.2. Diagonalisation of a square matrix

Suppose that this is the case, and let v1 , . . . , vn be n linearly independent eigenvectors,
where vi is an eigenvector for eigenvalue i . Then the vectors form a basis of Rn , and
the matrix P = (v1 . . . vn ) is such that P 1 exists, and P 1 AP = D where
D = diag(1 , . . . , n ).
This gives us a more sophisticated way to think about diagonalisation in terms of
change of basis and matrix representations of linear transformations. Suppose that
T = TA is the linear transformation corresponding to A, so that T (x) = Ax for all x.
Suppose that A has a set of n linearly independent eigenvectors B = {v1 , v2 , . . . , vn },
corresponding (respectively) to the eigenvalues 1 , . . . , n . Then B is a basis of Rn .
By Theorem 9.7 the matrix representing the linear transformation T with respect to the
basis B is AT [B, B] = P 1 AT P , where the columns of P are the basis vectors,
P = (v1 . . . vn ) .
P is the matrix whose columns are the basis of eigenvectors of A and AT is the matrix
representing T , which in this case is simply A itself, so that
P 1 AP = AT [B, B] = D.
In other words, the matrices A and D are similar. They represent the same linear
transformation, but A does so with respect to the standard basis and D represents T in
the basis of eigenvectors of A.
What does this tell us about the linear transformation T = TA ? If x Rn is any vector,
then its image under the linear transformation T is particularly easy to calculate in B
coordinates, where B is the basis of eigenvectors of A. That is, suppose the B
coordinates of x are [x]B = [b1 , b2 , . . . , bn ]B , then since [T (x)]B = AT [B, B][x]B = D[x]B ,
we have

1 0 . . . 0
b1
1 b1
0 2 . . . 0 b2
2 b2

[T (x)]B =
0 0 . . . 0 ... = ... .
0

. . . n

bn

n bn

You simply multiply each coordinate by the corresponding eigenvalue.

Geometrically, we can describe the linear transformation A as a stretch in the direction


of the eigenvector vi by a factor i (in the same direction if > 0 and in the opposite
direction if < 0). Indeed this can be seen directly. Since Avi = i vi , each vector on
the line tvi , t R is mapped into the scalar multiple i tvi by the linear transformation
A. If i = 0, the line tvi is mapped to 0.
Example 10.5 Consider the first 3 3 matrix for which we found the eigenvalues
and eigenvectors in section 10.1.2 (page 239); we will diagonalise the matrix

4 0 4
A = 0 4 4.
4 4 8
We have seen that it has three distinct eigenvalues 0, 4, 12. From the eigenvectors
we found we take one eigenvector corresponding to each of the eigenvalues

243

10

10. Diagonalisation

1 = 4, 2 = 0, 3 = 12, in that order,

1
1
v1 = 1 , v2 = 1 ,
0
1
We now form the matrix P whose columns

P =
1
0

v3 = 1 .
2

are these eigenvectors:

1 1
1 1 .
1 2

Then we know that D will be the matrix

4 0 0
D = 0 0 0 .
0 0 12

You can choose any order for listing the eigenvectors as the columns of the matrix
P , as long as you write the corresponding eigenvalues in the corresponding columns
of D, that is, as long as the column orders in P and D match. (If, for example, we
had chosen P = (v2 v1 v3 ) then D = diag(0, 4, 12).)
As soon as you have written down the matrices P and D, you should check that
your eigenvectors are correct. That is, check that
AP = (Av1 Av2 Av3 ) = (1 v1 2 v2 3 v3 ) = P D.

Activity 10.3 Carry out this calculation to check that the eigenvectors are correct,
that is, check that the columns of P are eigenvectors of A corresponding to the
eigenvalues 4, 0, 12.

10

Then, according to the theory, if P has an inverse, that is, if the eigenvectors are
linearly independent, then P 1AP = D = diag(4, 0, 12).

Activity 10.4 Check that P is invertible. Then find P 1 (the inverse may be
calculated using either elementary row operations or the cofactor method) and verify
that P 1 AP = D.
Note how important it is to have checked P first. Calculating the inverse of an incorrect
matrix P would have been a huge wasted effort.

Activity 10.5 Geometrically, how would you describe the linear transformation
TA (x) = Ax for this example?

244

10.2. Diagonalisation of a square matrix

10.2.1

Diagonalisation: when can it fail?

Not all n n matrices have n linearly independent eigenvectors, as the following


example shows.
Example 10.6

The 2 2 matrix
A=

4 1
1 2

has characteristic polynomial 2 6 + 9 = ( 3)2 , so there is only one eigenvalue,


= 3. The eigenvectors are the non-zero solutions to (A 3I)x = 0: that is,
1
1
1 1

x1
x2

0
0

This is equivalent to the single equation x1 + x2 = 0, with general solution x1 = x2 .


Setting x2 = r, we see that the solution set of the system consists of all vectors of
r
the form
as r runs through all real numbers. So the eigenvectors are precisely
r
1
the non-zero scalar multiples of the fixed vector
. Any two eigenvectors are
1
therefore scalar multiples of each other and hence form a linearly dependent set. In
other words, there are not two linearly independent eigenvectors, and the matrix A
is not diagonalisable.
The following result shows that if a matrix has n different eigenvalues then it is
diagonalisable, because the matrix will have n linearly independent eigenvectors. The
proof we give here is a proof by contradiction. If you have worked Question 8.5 then you
have already proved a particular case of this theorem.
Theorem 10.3 Eigenvectors corresponding to different eigenvalues are linearly
independent.
Proof
Suppose the result is false. We take any set of eigenvectors corresponding to distinct
eigenvalues (of an n n matrix A) consisting of one eigenvector for each eigenvalue and
choose a smallest possible subset which is linearly dependent. (This subset could have
at least two and at most n members.) By renumbering if necessary, we call the vectors
in this subset v1 , v2 , . . . , vk . Then there are non-zero numbers c1 , c2 , . . . , ck such that
c1 v1 + c2 v2 + . . . + ck vk = 0.
First, (1) by applying the matrix A to this equation, and second, (2) by multiplying it
by 1 , we get the two equations,
A(c1 v1 + c2 v2 + . . . + ck vk ) = c1 Av1 + c2 Av2 + . . . + ck Avk
= 1 c1 v1 + 2 c2 v2 + . . . + k ck vk = 0
and

1 c1 v1 + 1 c2 v2 + . . . + 1 ck vk = 0.

(1)
(2)

245

10

10. Diagonalisation

Subtracting the second equation from the first one we get


(2 1 )c2 v2 + . . . + (k 1 )ck vk = 0.
Since the i are distinct and the ci are non-zero, this says that the vectors v2 , . . . , vk
are linearly dependent, which contradicts the original assumption that we have chosen
the smallest possible set. So we conclude that the eigenvectors are linearly independent.

Using these results we have the important conclusion:


If an n n matrix has n different eigenvalues, then it has a set of n linearly
independent eigenvectors and is therefore diagonalisable.
It is not, however, necessary for the eigenvalues to be distinct. What is needed for
diagonalisation is a set of n linearly independent eigenvectors, and this can happen even
when there is a repeated eigenvalue (that is, when there are fewer than n different
eigenvalues). The following example illustrates this.
Example 10.7

Consider the matrix

3 1 1
A = 0 2 0.
1 1 3

The eigenvalues are given by the solutions of the characteristic equation


|A I| = 0. Expanding the determinant by the second row,
|A I| =

3 1
1
0
2
0
1
1 3

= (2 )
=
=
=
=

10

3
1
1
3

(2 )(2 6 + 9 1)
(2 )(2 6 + 8)
(2 )( 4)( 2)
( 2)2 ( 4).

The matrix A has only two eigenvalues: = 4 and = 2, an eigenvalue of


multiplicity 2. If we want to diagonalise it, we need to find three linearly
independent eigenvectors. There will be one (linearly independent) eigenvector
corresponding to = 4, so we will need two linearly independent eigenvectors
corresponding to the eigenvalue of multiplicity 2. Therefore we look for these first.
We row reduce the matrix (A 2I),

1 1 1
1 1 1
(A 2I) = 0 0 0 0 0 0 .
1 1 1
0 0 0

246

10.2. Diagonalisation of a square matrix

We see immediately that this matrix has rank 1, so its null space (the eigenspace for
= 2) will have dimension 2, and we can find a basis of this space consisting of two
linearly independent eigenvectors. Setting the non-leading variables equal to
arbitrary parameters s and t, we find that the solutions of (A 2I)x = 0 are

1
1
x = s 1 + t 0 = sv1 + tv2 , s, t R,
0
1
where v1 and v2 are two linearly independent eigenvectors for = 2.
Activity 10.6

How do you know that v1 and v2 are linearly independent?

Now, knowing that we will be able to diagonalise A, we find the eigenvector for = 4
by reducing (A 4I).

1 1 1
1 0 1
(A 4I) = 0 2 0 . . . 0 1 0
1 1 1
0 0 0
with solutions

Let


1
x = t0,
1

t R.

v3 = 0 .
1

The eigenvectors corresponding to distinct eigenvalues are linearly independent, so the


vectors v1 , v2 , v3 form a linearly independent set. Then we may take

1 1 1
4 0 0
P = 0 1 0 and P 1 AP = D = 0 2 0 .
1 0 1
0 0 2
Activity 10.7 Check this! Check that AP = P D. Once you have checked that the
columns of P are the eigenvectors corresponding to the eigenvalues in the
corresponding columns of D, the theory will tell you that P 1AP = D. Why?
Example 10.8 Consider again the last 3 3 example in section 10.1.2. We found
that the matrix,

3 1 2
A = 1 1 1
1
1
0
has an eigenvalue 1 = 1 of multiplicity 2, and a second eigenvalue, 2 = 2. We
can find one (linearly independent) eigenvector corresponding to 2 = 2. In order

247

10

10. Diagonalisation

to diagonalise this matrix we need two linearly independent eigenvectors for = 1.


To see if this is possible, we row reduce the matrix (A + I):

2 1 2
1 0 1
(A + I) = 1
0
1 . . . 0 1 0 .
1
1
1
0 0 0
This matrix has rank 2 and the null space (the eigenspace for = 1) therefore (by
the rank-nullity theorem) has dimension 1. We can only find one linearly
independent eigenvector for = 1. All solutions of (A + I)x = 0 are of the form

1
x = t 0
t R.
1

We conclude that this matrix cannot be diagonalised as it is not possible to find 3


linearly independent eigenvectors to form the matrix P .
There is another reason why a matrix A may not be diagonalisable over the real
numbers. Consider the following example.
Example 10.9

If A is the matrix
A=

0 1
1 0

then the characteristic equation,


|A I| =

1
= 2 + 1 = 0
1

has no real solutions.


This matrix A can be diagonalised over the complex numbers, but not over the real
numbers.

10
Learning outcomes
This chapter has discussed eigenvalues and eigenvectors and the very important
technique of diagonalisation. We shall see in the next chapter how useful a technique
diagonalisation is. At the end of this chapter and the relevant reading, you should be
able to:
state what is meant by the characteristic equation of a matrix
state carefully what is meant by eigenvectors and eigenvalues, and by
diagonalisation
find eigenvalues and corresponding eigenvectors for a square matrix
diagonalise a diagonalisable matrix
determine whether or not a matrix can be diagonalised

248

10.2. Exercises

recognise what diagonalisation says in terms of change of basis and matrix


representation of linear transformations
use diagonalisation to describe the geometric effect of a linear transformation.

Exercises
Exercise 10.1
Diagonalise the matrix
A=

7 2
1 8

that is, find an invertible matrix P and a diagonal matrix D such that P 1 AP = D.
Check your answer.
Exercise 10.2
Find the eigenvalues of the matrix

0 2

A=
16 4
16 4

1
6
10

and find an eigenvector for each eigenvalue. Hence find an invertible matrix P and a
diagonal matrix D such that P 1 AP = D.
Exercise 10.3
Determine if either of the following matrices can be diagonalised:
A=

1 1
0 1

B=

1 1
1 1

Exercise 10.4
Let A be any (real) n n matrix and suppose is an eigenvalue of A. Show that
{x : Ax = x}, the set of eigenvectors for eigenvalue together with the zero vector 0,
is a subspace of Rn .
Exercise 10.5
Let

1 1 2
A = 6 2 6
0 1 1

x = 0.
1

Show that the vector x is an eigenvector of A. What is the corresponding eigenvalue?


Find the other eigenvalues of A, and an eigenvector for each of them. Find an invertible
matrix P and a diagonal matrix D such that P 1 AP = D. Check that AP = P D.

249

10

10. Diagonalisation

Exercise 10.6
Diagonalise the matrix A:

0 0 2
A = 1 2 1 .
1 0 3

Exercise 10.7
Suppose that A is a real diagonalisable matrix and that all the eigenvalues of A are
non-negative. Prove that there is a matrix B such that B 2 = A.

Comments on selected activities


Feedback to activity 10.1
The eigenvectors for = 0 are the non-zero solutions of Ax = 0. To find these, row
reduce the coefficient matrix A.

4 0 4
1 0 1
0 4 4 0 1 1 .
4 4 8
0 0 0
The solutions are

1
x = t 1 ,
1

t R,

so that the eigenvectors are non-zero multiples of v2 = (1, 1, 1)T . The eigenspace of
= 0 is the null space of the matrix A. Note that Av2 = 0v2 = 0.

10

Similarly, you should find that for = 12 the eigenvectors are non-zero multiples of

1

v3 = 1 .
2
Feedback to activity 10.3
Perform the matrix multiplication to show that

AP = (4v1 0v2 12v3 ) = P D .


Feedback to activity 10.4
Since |P | = 6 = 0, P is invertible. Using the adjoint method (or row reduction), obtain
P 1

3 3 0
1
= 2 2 2 .
6
1
1 2

Check that P P 1 = I. You have calculated AP in the previous activity, so now just
multiply P 1 AP to obtain D.

250

10.2. Comments on exercises

Feedback to activity 10.5


TA is a stretch by a factor 4 in the direction of the vector v1 = (1, 1, 0)T , a stretch by
a factor of 12 in the direction of v3 = (1, 1, 2)T and it maps the line x = tv2 to 0.
Feedback to activity 10.6
The method of solution ensures this. See the discussion at the end of section 8.5.
Feedback to activity 10.7
The vectors v1 and v2 are linearly independent. Since v3 corresponds to a different
eigenvalue, the set of three eigenvectors is linearly independent. Therefore
P = (v3 v1 v2 ) is invertible and multiplying the equation AP = P D on the left by P 1
gives the result.

Comments on exercises
Solution to exercise 10.1
First find the eigenvalues,
|A I| =
=
=
=
=

7 2
1 8
(7 )(8 ) 2
2 15 + 54
( 6)( 9)
0.

So = 6 and = 9 are the eigenvalues. Then find an eigenvector for each eigenvalue:
1 = 6 :

A 1 I =

1 2
1 2

1 2
0 0

2 = 9 :

A 2 I =

2 2
1 1

1 1
0 0

P =

2 1
1 1

1
1

2
1

A basis of eigenvectors is
,

=
=

2
1

1
1

v1 =
v2 =

10

. You can choose

and D =

6 0
0 9

then P 1 AP = D.

First check the eigenvectors,


AP =

7 2
1 8

2 1
1 1

then
P 1 =
and
P 1AP =

1
3

1 1
1 2

1
3

12 9
6
9

1 1
1 2
12 9
6
9

= (6v1 9v2 ),

6 0
0 9

= D.

251

10. Diagonalisation

Solution to exercise 10.2


The characteristic polynomial is 3 + 142 48, which is easily factorised as
( 6)( 8). So the eigenvalues are 0 , 6 , 8. Corresponding eigenvectors,
respectively, are calculated to be non-zero scalar multiples of

1
1 ,
2


1
2,
2


1
4.
0

We may therefore take

1 1 1
P = 1 2 4 ,
2 2 0

D = diag(0, 6, 8).

Solution to exercise 10.3


The matrix A has only one eigenvalue, = 1. The corresponding eigenvectors are all
the non-zero scalar multiples of (1, 0)T , so there cannot be two linearly independent
eigenvectors, and hence the matrix is not diagonalisable.
The eigenvalues of the matrix B are 0 and 2. Since this matrix has distinct eigenvalues,
it can be diagonalised
Solution to exercise 10.4
Denote the set described by W . First, 0 W , so W is non-empty. Suppose now that
x, y are in W and that R. We need to show that x + y and x are also in W . We
know that Ax = x and Ay = y, so
A(x + y) = Ax + Ay = x + y = (x + y)
and
A(x) = (Ax) = (x) = (x),

10

so x + y and x are indeed in W .


Solution to exercise 10.5
We have


1 1 2
1
1

Ax = 6 2 6
0 = 0 = 1x
0 1 1
1
1

so x is an eigenvector with corresponding eigenvalue = 1. The characteristic


polynomial of A is
p() = 3 + 22 + 2.
Since = 1 is a root, we know that ( 1) is a factor. Factorising, we obtain
p() = ( 1)(2 + + 2) = ( 1)( 2)( + 1),

252

10.2. Comments on exercises

so the other eigenvalues are = 2, 1. Corresponding eigenvectors are, respectively,


(1, 1, 1)T and (0, 2, 1)T . We may therefore take

1 1
0
P = 0 1 2 ,
D = diag(1, 2, 1).
1 1
1

Check that AP = (1v1 2v2 1v3 ) = P D.


Solution to exercise 10.6

This matrix A does not have 3 distinct eigenvalues. The eigenvalues turn out to be 1
and 2, with 2 occurring twice (an eigenvalue of multiplicity two). So we first check that
we can find two linearly independent eigenvectors for = 2.
You should find that the eigenspace for = 2 is two-dimensional and has a basis
consisting of (1, 0, 1)T and (0, 1, 0)T . An eigenvector for 1 is (2, 1, 1)T . These three
vectors form a linearly independent set. Therefore we may take

2 1 0
P = 1
0 1,
D = diag(1, 2, 2).
1
1 0
You should check your result by calculating AP
Solution to exercise 10.7
Since A can be diagonalised, we have P 1 AP = D for some P , where
D = diag(1 , . . . , n ), these entries being the eigenvalues of A. It is given that all i 0.
We have A = P DP 1.
If B 2 = A we must have
D = P 1 AP = P 1 B 2 P = P 1BP P 1 BP = (P 1BP )2 .
Therefore, let
B = P diag( 1 ,

2 , . . . ,

n )P 1.

10

Then reversing the above steps,


B 2 = P diag(

1 ,
2

= P diag( 1 ,
= P DP 1 = A,

2 , . . . ,
2

2 , . . . ,

n )P 1 P diag(

1 ,

2 , . . . ,

n )P 1

n )P 1

and we are done

253

10. Diagonalisation

10

254

Chapter 11
Applications of diagonalisation
Suggested reading

(For full publication details, see Chapter 1.)


Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 9.
Anton, H. and C. Rorres. Elementary Linear Algebra. Chapter 10, Section 10.5.

Aims of the chapter


We will now look at some applications of diagonalisation. We apply diagonalisation to
find powers of diagonalisable matrices and solve systems of simultaneous linear
difference equations. In partiuclar, we look at the important topic of Markov chains.
You should try to understand why the diagonalisation process makes the solution
possible, by essentially changing basis to one in which the problem is readily solvable,
namely the basis of Rn consisting of eigenvectors of the matrix.

11.1 Powers of matrices


For a positive integer n, the nth power of a matrix A is simply
An = A A A A .
n times

It is often useful, as we shall see in this chapter, to determine An for a general integer n.
Diagonalisation helps here. If we can write P 1 AP = D, then A = P DP 1 and so
An = A A A A
n times
1

= (P DP

11

) (P DP 1) (P DP 1) (P DP 1)
n times

= P D(P 1P )D(P 1P )D(P 1P ) D(P 1P )DP 1


= P DIDIDI DIDP
= P DDD D P 1
n

n times
1

= PD P

The product P D n P 1 is easy to compute since D n is simply the diagonal matrix with
entries equal to the nth power of those of D.

255

11. Applications of diagonalisation

Activity 11.1

then

Convince yourself that if

1 0
0 2
D=
..
...
.
0 0

n1
0
Dn =
...

0
0
. ,
..
. ..
k

0
0
.
..
.

0
n2
..
.

..
.

nk

We give an illustrative example using a 2 2 matrix, but you should be able to carry
out the procedure for 3 3 matrices as well.
Example 11.1
matrix

Suppose that we want a matrix expression for the nth power of the
1

A=

1
2

4
0

The characteristic polynomial |A I| is (check this!) 2 2 = ( 2)( + 1).


So the eigenvalues are 1 and 2. An eigenvector for 1 is a solution of
(A + I)v = 0, found by
2

A+I =

4
1

1
2

1 2
0 0

so we may take (2, 1)T . Eigenvectors for 2 are given by


A 2I =

1
1
2

4
2

1 4
0 0

so we may take (4, 1)T . Let P be the matrix whose columns are these eigenvectors.
Then
2 4
P =
.
1 1

11

The inverse is

P 1 =

Activity 11.2

1
6

1 4
1 2

Check all of the statements just made.

We have P 1AP = D = diag(1, 2). The nth power of the matrix A is given by
An = P D n P 1
1
2 4
=
6 1 1

256

(1)n
0

0
2n

1 4
1 2

11.2. Systems of difference equations

1
6

2(1)n + 4(2n ) 8(1)n + 8(2n )


(1)n + 2n
4(1)n + 2(2n )

11.2 Systems of difference equations


Recall from Chapter 6 that a difference equation is an equation linking the terms of a
sequence to previous terms. For example, xt+1 = 5xt 1 is a first-order difference
equation for the sequence xt . (It is said to be first-order because the relationship
expressing xt+1 involves only the previous term.) Difference equations are also often
referred to as recurrence equations. Here t is always a non-negative integer, t Z, t 0.
One very simple result we will need is that the solution to the difference equation
xt+1 = axt
is simply xt = at x0 , where x0 is the first term of the sequence. (We assume that the
sequence is x0 , x1 , x2 , . . . rather than x1 , x2 , . . ..)

11.2.1

Systems of difference equations

Suppose three sequences xt , yt and zt satisfy x0 = 12, y0 = 6, z0 = 6 and are related, for
t 0, as follows:
xt+1 = 5xt + 4zt
yt+1 = 5yt + 4zt
zt+1 = 4xt + 4yt + 9zt ,

(11.1)
(11.2)
(11.3)

We cannot directly solve equation (11.1) for xt since we would need to know zt . On the
other hand we cant work out zt directly from equation (11.2) or equation (11.3)
because to do so we would need to know yt ! It seems impossible, perhaps, but there are
ways to proceed.
Note that this (coupled) system of difference


xt+1
5
yt+1 = 0
zt+1
4
That is,

equations can be expressed as



0 4
xt
5 4 yt .
4 9
zt

11

xt+1 = Axt ,
where

5 0 4
xt
xt = yt , A = 0 5 4 .
4 4 9
zt

The general system we shall consider will take the form xt+1 = Axt where A is an n n
square matrix. We shall concentrate on 3 3 and 2 2 systems, though the method is
applicable to larger values of n.
We shall describe two techniques: one involving a change of variable, and the other
powers of matrices.

257

11. Applications of diagonalisation

11.2.2

Solving by change of variable

We can use diagonalisation as the key to a general method for solving systems of
difference equations. Given a system xt+1 = Axt , in which A is diagonalisable, we
perform a change of variable or change of coordinates, as follows. Suppose that
P 1 AP = D (where D is diagonal) and let
xt = P zt
or, equivalently, the new variable vector zt is zt = P 1 xt , so that the vector xt is in
standard coordinates and zt is in coordinates in the basis of eigenvectors. Then the
equation xt+1 = Axt becomes
P zt+1 = AP zt ,
which means that
zt+1 = P 1 AP zt = Dzt ,
which since D is diagonal, is very easy to solve for zt . To find xt we then use the fact
that xt = P zt .
Example 11.2

We find the sequences xt , yt, zt such that


xt+1 = 5xt + 4zt
yt+1 = 5yt + 4zt
zt+1 = 4xt + 4yt + 9zt

and x0 = 12, y0 = 6, z0 = 6. This is the problem described above. In matrix form, as


we have seen, this system is xt+1 = Axt where

5 0 4
A = 0 5 4.
4 4 9
To use the technique, we need to diagonalise A. You should work through this
diagonalisation yourself. Well omit the workings here, but if

1 1 1
P = 1 1 1
1
0 2
then

11

P 1 AP = D = diag(1, 5, 13).
Now let

ut
zt = vt
wt

be given by xt = P zt . Then the equation


to zt+1 = Dzt . That is,


ut+1
1
vt+1 = 0
wt+1
0

258

xt+1 = Axt gives rise (as explained above)



0 0
ut
5 0 vt ,
0 13
wt

11.2. Systems of difference equations

so we have the following system for the new sequences ut , vt , wt :


ut+1 = ut
vt+1 = 5vt
wt+1 = 13wt.
This is very easy to solve: each equation involves only one sequence, so we have
uncoupled the equations. We have, for all t,
ut = u0 , vt = 5t v0 , wt = (13)t w0 .
We have not yet solved the original problem, however, since we
We have


xt
1 1 1
ut
1 1

xt = yt = P zt = 1 1 1
vt = 1 1
xt
1
0 2
wt
1
0

need to find xt , yt , zt .

1
u0
1 5t v0 .
2
(13)t w0

But we have also to find out what u0 , v0 , w0 are. These are not given in the problem,
but x0 , y0 , z0 are, and we know that



x0
u0
u0
1 1 1
y0 = P v0 = 1 1 1 v0 .
z0
w0
1
0 2
w0
To find u0 , v0 , w0 we can either solve the linear system

u0
x0
12

P v0 = y0 =
6
w0
z0
6

using row operations, or we can (though it involves more work) find out what P 1 is
and use the fact that



u0
x0
12
v0 = P 1 y0 = P 1 6 .
w0
z0
6
Either way (and the working is again omitted), we find

u0
4
v0 = 3 .
w0
5

Returning then to the general solution to the system, we obtain


1 1 1
u0
xt
yt = 1 1 1 5t v0
1
0 2
(13)t w0
xt

1 1 1
4
= 1 1 1 3(5t )
1
0 2
5(13t )

4 + 3(5t ) + 5(13)t
= 4 3(5t ) + 5(13)t .
4 + 10(13)t

11

259

11. Applications of diagonalisation

So the final answer is that the sequences are:


xt = 4 + 3(5t ) + 5(13)t
yt = 4 3(5t ) + 5(13)t
zt = 4 + 10(13)t .

Activity 11.3 Perform the omitted diagonalisation calculations required for the
example just given.
As this example demonstrates, solving a system of difference equations involves a lot of
work, but the good news is that it is just a matter of going through a definite (if
time-consuming) procedure.

11.2.3

Solving using matrix powers

Another way of looking at this problem is to notice that if xt+1 = Axt , then
xt = At x0 .

Activity 11.4

Show this.

This solution can be determined explicitly if we can find the tth power At of the matrix
A. As described in section 11.1, this can be done using diagonalisation of A.
Example 11.3 We solve the system of the above example using matrix powers.
The system is xt+1 = Axt where

5 0 4
A = 0 5 4
4 4 9

11

and where x0 = 12, y0 = 6, z0 = 6. So the solution is xt = At x0 = At (12, 6, 6)T . We


have seen how A can be diagonalised: with

1 1 1
P = 1 1 1 ,
1
0 2
we have

P 1 AP = D = diag(1, 5, 13).
So A = P DP 1 and At = P D tP 1 . Now, as you can calculate (the details are
omitted here),

1/3 1/3 1/3


P 1 = 1/2 1/2
0 ,
1/6
1/6 1/3

260

11.2. Systems of difference equations

so
xt = At x0 = P D tP 1 x0
Doing the multiplication (again, details omitted),

1 0
0
12
4 + 3(5t ) + 5(13t )
xt = P 0 5t
0 P 1 6 = 4 3(5t ) + 5(13t ) ,
0 0 (13)t
6
4 + 10(13t )

which is, of course, precisely the same answer as we obtained using the previous
method.

Activity 11.5

Check the calculations omitted in this example.

Note that although this technique is presented as being different from the one using a
change of variable, they are essentially the same. Here, as before, the matrix P 1 x0
represents the coordinates of the vector x0 (the initial conditions) in the basis of
eigenvectors of A (the columns of P ). In both cases, diagonalisation enables us to solve
the system by a change of basis from the standard basis in Rn to a basis consisting of
eigenvectors of the matrix A.

11.2.4

Markov chains

We begin with an example.


Example 11.4 Suppose two supermarkets compete for customers in a region with
20,000 shoppers. Assume that no shopper goes to both supermarkets in any week,
and that the table below gives the probabilities that a shopper will change from one
supermarket (or none) to another (or none) during the week.
To A
To B
To none

From A
0.70
0.20
0.10

From B
0.15
0.80
0.05

From none
0.30
0.20
0.50

For example, an interpretation of the second column is that during any given week
supermarket B will keep 80% of its customers while losing 15% to supermarket A
and 5% to no supermarket. Suppose that at the end of a certain week (call it week
zero) it is known that the total population of T = 20, 000 shoppers was distributed
as follows: 10,000 (0.5 T ) went to supermarket A; 8,000 (0.4 T ) went to
supermarket B; and 2,000 (0.1 T ) did not go to a supermarket.
Let xt denote the percentage of total shoppers going to supermarket A in week t, yt
the percentage going to supermarket B, and zt the percentage who do not go to any
supermarket. The number of shoppers in week t can be predicted by this model from
the numbers in the previous week, that is,


0.70 0.15 0.30
xt
xt = Axt1
where A = 0.20 0.80 0.20 , xt = yt
0.10 0.05 0.50
zt

261

11

11. Applications of diagonalisation

with x0 = 0.5, y0 = 0.4, z0 = 0.1. The questions we wish to answer are: Can we
predict from this information the number of shoppers at each supermarket in any
future week t?, and Can we predict a long-term distribution of shoppers?
This is an example of a Markov chain.
In general, a Markov chain or a Markov process is a closed system consisting of a
population which is distributed into n different states and which changes with time
from one distribution to another. The system is observed at scheduled times. It is
assumed that the probability that a given member will change from one state into
another, depending on the state it occupied at the previous observation, is known. The
system is then observed at a certain time, and the information is used to predict the
distribution of the system into its different states at a future time t.
The probabilities are listed in an n n matrix A = (aij ) where the entry aij is the
probability that a member of the population will change from state j into state i. Such
a matrix, called a transition matrix, has the following two properties:
(1) The entries of A are all non-negative.
(2) The sum of the entries in each column of A is equal to 1:

a1j + a2j + + anj = 1.

Property (2) follows from the assumption that all members of the population must be
in one of the n states at any given time.

The distribution vector (or state vector) for the time period t is the vector xt , whose ith
entry is the percentage of the population in state i at time t. The entries of xt sum to 1,
for the reason just given, that all members of the population are in one of the states at
any time. Our first goal is to find the state vector for any t, and to do this we need to
solve the difference equation
xt = Axt1 .
A solution of the difference equation is an expression for the distribution vector xt in
terms of the original information A and x0 , and so, as we have seen in the previous
section, the solution is xt = At x0 .
Now assume that A can be diagonalised. If A has eigenvalues 1 , 2 , . . . , n with
corresponding eigenvectors v1 , v2 , . . . , vn , then P 1 AP = D where P is the matrix of
eigenvectors of A and D is the corresponding diagonal matrix of eigenvalues.
The solution of the difference equation is
xt = At x0 = (P D tP 1 )x0 .

11

If we set x = P z, so that z0 = P 1 x0 = (b1 , b2 , . . . , bn )T represents the coordinates of x0


in the basis of eigenvectors, then this solution can be written in vector form as
xt = P D t (P 1 x0 )

t1
|
|
|
0

=
v1 v2 vn
...
|
|
|
0

0
t2
..
.
0

= b1 t1 v1 + b2 t2 v2 + + bn tn vn .

262


0
b1
0 b2
.
..
. ..
bn
tn

..
.

11.2. Systems of difference equations

Activity 11.6

Verify the final statement above.

We now return to our example.


Example 11.5 We find the number of shoppers using each of the supermarkets at
the end of week t, and see if we can use this to predict the long-term distribution of
shoppers.
First diagonalise the matrix A. The characteristic equation of A is
0.70
0.15
0.30
|A I| =
0.20
0.80
0.20
= 3 + 22 1.24 + 0.24 = 0.
0.10
0.05
0.50
This equation is satisfied by = 1, hence 1 is an eigenvalue. Using the fact that
( 1) is a factor of the polynomial, we find
( 1)(2 + 0.24) = ( 1)( 0.6)( 0.4) = 0,
so the eigenvalues are 1 = 1, 2 = 0.6, and 3 = 0.4. The corresponding
eigenvectors vi are found by solving the homogeneous systems (A i I)v = 0 (we
omit the calculations). Writing them as the columns of a matrix P , we obtain

1 0
0
3 3 1
P = 4 4 0 D = 0 0.6 0 P 1 AP = D.
0 0 0.4
1 1
1
Activity 11.7

Carry out the omitted calculations for the diagonalisation above.

The distribution vector xt at any time t is then given by


xt = b1 (1)t v1 + b2 (0.6)t v2 + b3 (0.4)t v3
where it only remains to find the coordinates, b1 , b2 , b3 of x0 in the basis of eigenvectors.
To find the long-term distribution of shoppers we need to consider what happens to xt
for very large values of t, that is, as t . Note, and this is very important, that
1t = 1, and that as t , (0.6)t 0 and (0.4)t 0, so that the limit of xt is a
multiple of the eigenvector whose eigenvalue is 1.
The coordinates of x0 in the basis of

1
1
1
P 1 x0 = 1 1
8
2 0
Hence,

eigenvectors are given by



1
0.5
0.125
b1
1 0.4 = 0.025 = b2 .
6
0.1
0.05
b3

3
3
1
xt = 0.125 4 + 0.025(0.6)t 4 0.05(0.4)t 0
1
1
1

263

11

11. Applications of diagonalisation

and

0.375
q = lim xt = 0.500
t
0.125

As the total number of shoppers is 20, 000, the long-term distribution is predicted to be
20,000q: 7,500 to supermarket A; 10,000 to B; and 2,500 to no supermarket.
Activity 11.8

Verify that P 1 is as stated.

You will have noticed that an essential part of the solution of predicting a long-term
distribution for this example is the fact that the transition matrix A has an eigenvalue
= 1 (of multiplicity one), and that the other eigenvalues satisfy |i | < 1. In this case,
as t increases, the distribution vector xt will approach the unique eigenvector q for
= 1 which is also a distribution vector, so that Aq = q. (The fact that the entries sum
to 1 makes q unique in this one-dimensional eigenspace.)
We would like to be able to know that this is the case for any Markov chain, but there
are some exceptions to this rule. A Markov chain is said to be regular if some integer
power of the transition matrix A has strictly positive entries, aij > 0 (so no zero
entries). In this case, there will be a long-term distribution as the following theorem
implies.
Theorem 11.1 If A is the transition matrix of a regular Markov chain, then = 1 is
an eigenvalue of multiplicity one and all other eigenvalues satisfy |i | < 1.
We will not prove this theorem here; a proof can be found in texts on Markov chains.
However, we will prove a similar, but less strong result, which makes it clear that the
only thing that can go wrong is for the eigenvalue = 1 to have multiplicity greater
than 1. First we need a definition.
Definition 11.1 A matrix C is called a stochastic matrix if it has the following two
properties
(1) The entries of C are all non-negative.
(2) The sum of the entries in each row of C is equal to 1, ci1 + ci2 + + cin = 1.

11

Note that if A is a transition matrix for a Markov process, then AT is a stochastic


matrix (all of its entries are non-negative and the sum of the entries of each row of AT
is equal to 1).
Matrices A and AT have the same eigenvalues, because by properties of transpose and
determinant, they have the same characteristic polynomials,
|A I| = |(A I)T | = |AT I|
and therefore they have the same characteristic equations and the same solutions to
these.
We will prove the following theorem for stochastic matrices, and then apply it to
transition matrices.

264

11.2. Systems of difference equations

Theorem 11.2 If C is a stochastic matrix, then:


v = (1, 1, . . . , 1)T is an eigenvector of C with eigenvalue = 1.
If is an eigenvalue of C then || 1.
Proof
Let C = (cij ). That Cv = v follows immediately from property (2) of the definition of
a stochastic matrix, since the ith entry of Cv is ci1 (1) + ci2 (1) + + cin (1) = 1.

To prove the second statement, let be an eigenvalue of C, let u = 0 be any vector


satisfying Cu = u, and let ui denote the largest component (in absolute value) of u.
To show that || 1. set
w=

1
u. Then Cw = w, wi = 1, and |wk | 1 for 1 k n.
ui

Consider the ith row of w = Cw. It follows that


wi = ci1 w1 + ci2 w2 + + cin wn
and hence
|| = |ci1 w1 + ci2 w2 + + cin wn |
(since wi = 1)
ci1 |w1 | + ci2 |w2 | + + cin |wn |
ci1 + ci2 + + cin = 1
(because wk 1)
So weve shown that = 1 is an eigenvalue and that all eigenvalues i satisfy |i | 1.
What does this theorem imply about Markov chains? We saw earlier that if A is the
transition matrix of a Markov chain, then AT is a stochastic matrix and A and AT have
the same eigenvalues. Therefore you can deduce from Theorem 11.2 that:
= 1 is an eigenvalue of A, and
if i is an eigenvalue of A then |i | 1.

11

The theorem tells us that = 1 is an eigenvalue, but it might have multiplicity greater
than one, in which case either there would be more than one (linearly independent)
eigenvector corresponding to = 1, or the matrix might not be diagonalisable.
In order to obtain a long-term distribution we need to know that there is only one
(linearly independent) eigenvector for the eigenvalue = 1. So if the eigenvalue = 1 of
a transition matrix A of a Markov chain does have multiplicity 1, then Theorem 11.2
implies all the others will have |i | < 1. There will be one corresponding eigenvector
which is also a distribution vector and, provided A can be diagonalised, we will know
that there is a long-term distribution. This is all we will need in practice.

265

11. Applications of diagonalisation

Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
calculate the general nth power of a diagonalisable matrix using diagonalisation
solve systems of difference equations in which the underlying matrix is
diagonalisable, by using both the matrix powers method and the change of variable
method
know what is meant by a Markov chain and its properties, and be able to find the
long-term distribution.

Exercises
Exercise 11.1
Solve the following system of difference equations.
xt+1 = xt + 4yt
1
yt+1 =
xt ,
2
given that x0 = y0 = 1000.
Exercise 11.2
Verify that the vector v is an eigenvector for the matrix M, where v and M are


2 0 1
1

M= 0 5 0 ,
v = 0,
1 0 2
1

and find the corresponding eigenvalue. Find the other eigenvalues of M. Hence find an
invertible matrix P and a diagonal matrix D such that P 1MP = D.
Sequences xt , yt, zt are defined by x0 = 6, y0 = 1, z0 = 4 and
xt+1 = 2xt + zt
yt+1 = 5yt
zt+1 = xt + 2zt .

11

Using the preceding calculations to change variables, find formulae for xt , yt , and zt .
Exercise 11.3
Given that

are eigenvectors of the matrix

1
3
1
1 , 0 , 1
1
1
0

266

1 2 6
A= 2
5
6 ,
2 2 3

11.2. Exercises

find an invertible matrix P such that P 1 AP is diagonal. Hence find sequences xt , yt , zt


satisfying the equations
xt+1 = xt 2yt 6zt
yt+1 = 2xt + 5yt + 6zt
zt+1 = 2xt 2yt 3zt ,
and with the property that x0 = y0 = 1 and z0 = 0.
Exercise 11.4
At any time t, the total population of 210 of Desert Island is divided into those living
by the sea (xt ) and those living in the oasis (yt ). Initially half the population is living by
the sea, and half in the oasis. Yearly population movements are given by
xt = Axt1 where A =

0.6 0.2
0.4 0.8

, xt =

xt
yt

Show this is a Markov process and interpret the yearly population movements from the
matrix A.
Find expressions for xt and yt at any future time t.
Determine the long-term population distribution; that is, find what happens to xt as
t .
Exercise 11.5
Consider the matrices

0.7 0.2 0.2


A = 0 0.2 0.4 ,
0.3 0.6 0.4

7 2 2
B = 0 2 4,
3 6 4

xt
xt = yt .
zt

(i) What is the relationship between the matrices A and B?


State precisely what is meant by:
is an eigenvalue of a matrix M with corresponding eigenvector v.
Show that A and B have the same eigenvectors. What is the relationship between
the corresponding eigenvalues?
(ii) Show that the system xt = Axt1 is a Markov chain by showing that the matrix A
satisfies the two conditions to be the transition matrix of a Markov chain.
Deduce that = 10 is an eigenvalue of B.
(iii) Find an eigenvector of B corresponding to the eigenvalue = 10.
Diagonalise the matrix B: find an invertible matrix P and a diagonal matrix D
such that P 1 BP = D. Check that BP = P D.
(iv) Write down the eigenvalues and corresponding eigenvectors of A.
Suppose a monitored total population of 100,000 food shoppers in a certain urban
area are initially divided as follows: 30,000 go to the A-B supermarket, 30,000 go
to the J-K supermarket, and 40,000 shop locally (at small local shops).

267

11

11. Applications of diagonalisation

Let xt denote the number of shoppers going to the A-B supermarket at the end of
week t, yt the number going to the J-K supermarket, and zt the number who shop
locally. Assume that each member of the population goes shopping once a week.
Suppose that the probabilities a shopper will change from shopping in any of the
three places from one week to the next are given by the matrix A above, so that
xt = Axt1 .
Find the long-term population distribution of this system. State clearly the
eventual number of shoppers going to each of the three shopping areas.

Comments on selected activities


Feedback to activity 11.1
Take any 2 2 diagonal matrix D. Calculate D 2 and D 3 , and observe what happens.
Feedback to activity 11.4
We have
xt = Axt1 = AAxt2 = A2 xt2 = A2 Axt3 = A3 xt3 = = At x0 .
Feedback to activity 11.6
First multiply the two matrices on the right to obtain,

t1
0
D t (P 1x0 ) =
...
0

0
t2
..
.
0

b1 t1
0
b1
0 b2 b2 t2
. = . .
..
. .. ..
bn
tn
bn tn

..
.

Then express the product P (D t (P 1 x0 )) as a linear combination of the columns of P


(see page 135),

|
P (D t(P 1 x0 )) = v1
|

11

|
v2
|

b1 t1
|
b2 t2
t
t
t

vn
... = b1 1 v1 + b2 2 v2 + + bn n vn .
|
bn tn

Comments on exercises
Solution to exercise 11.1
We solve this using matrix powers. We could, of course, use a change of variable
instead. Notice that the system can be written as
xt+1 =

1 4
1/2 0

xt , where xt =

xt
yt

This is xt+1 = Axt , where A is the matrix whose nth power we calculated in the
example given earlier in this chapter. The solution (using the nth power result obtained

268

11.2. Comments on exercises

earlier) is
xt = At x0
1 2(1)t + 4(2t ) 8(1)t + 8(2t )
=
(1)t + 2t
4(1)t + 2(2t )
6
=

1000(1)t + 2000(2t )
500(1)t + 500(2t )

1000
1000

That is,
xt = 1000(1)t + 2000(2t ), yt = 500(1)t + 500(2t ).
Solution to exercise 11.2
Mv = 3v so the eigenvalue corresponding to v is 3. The characteristic polynomial turns
out to be (5 )(3 )(1 ) after factorisation, so the eigenvalues of M are 1, 3, 5.
Corresponding eigenvectors are (respectively) (1, 0, 1)T , v, (0, 1, 0)T . Then
P 1 MP = D where

1 1 0
P = 0 0 1 , D = diag(1, 3, 5).
1 1 0
We use the notation used earlier. The system of difference equations is xt+1 = Mxt .
Setting xt = P zt , it becomes zt+1 = Dzt , so
ut+1 = ut , vt+1 = 3vt , wt+1 = 5wt ,
therefore,
ut = u0 , vt = 3t v0 , wt = 5t w0 .
Now,

6
1 1 0
u0
u0 + v0
,
x0 = 1 = P z0 = 0 0 1 v0 =
w0
4
1 1 0
w0
u0 + v0

so u0 = 1, v0 = 5 and w0 = 1. We therefore obtain


xt
1 + 5(3t )
yt = xt = P zt =
.
5t
t
zt
1 + 5(3 )

Note: the question explicitly says to use the diagonalisation to change variables in order
to solve the system of difference equations.

11

Solution to exercise 11.3


Eigenvectors are given, so there is no need to determine the characteristic polynomial to
find the eigenvalues. Simply multiply A times the given eigenvectors in turn. For
example,
A(1, 1, 1)T = (3, 3, 3)T = 3(1, 1, 1)T ,
so 3 is an eigenvalue and this vector is a corresponding eigenvector. The other two are
eigenvectors for eigenvalue 3. So if

1 3 1
P = 1 0
1 ,
1
1
0

269

11. Applications of diagonalisation

then
P 1 AP = diag(3, 3, 3) = D.
The system of difference equations is xt+1 = Axt . Let zt = (ut , vt , wt )T be given by
zt = P 1xt . Then the system is equivalent to zt+1 = Dzt , which is
ut+1 = 3ut ,

vt+1 = 3vt ,

wt+1 = 3wt .

This has solutions


ut = (3)t u0,

vt = 3t v0 ,

wt = 3t w0 .

We have to find u0 , v0 , w0 . Now z0 = (u0, v0 , w0 )T = P 1 x0 , and (as can be determined


by the usual methods),

1/3
1/3 1
P 1 = 1/3 1/3 0 ,
1/3
4/3 1
so

1/3
1/3 1
1
2/3
z0 = P 1 x0 = 1/3 1/3 0 1 = 2/3 .
1/3
4/3 1
0
5/3

The solution xt is therefore

1 3 1
(2/3)(3)t
(2/3)(3)t + (1/3)3t
xt = P zt = 1 0
1 (2/3)3t = (2/3)(3)t + (5/3)3t .
1
1
0
(5/3)3t
(2/3)(3)t (2/3)3t
Solution to exercise 11.4
This is a Markov process as it consists of a total population distributed into two states,
and the matrix A satisfies the criteria to be a transition matrix: (1) the entries are
positive and (2) the sum of the entries in each column is 1.
Interpreting the system, each year 40% of those living by the sea move to the oasis
(60% remain) and 20% of those living in the oasis move to the sea.
To solve the system we need to diagonalise the matrix A. First find the eigenvalues.

11

|AI| =

0.6
0.2
= 0.481.4+2 0.08 = 2 1.4+0.4 = (1)(0.4) = 0
0.4
0.8
=1

and

= 0.4 are the eigenvalues

Find corresponding eigenvectors by solving


1 = 1 :

2 = 0.4 :

270

AI =

(A I)v = 0:

0.4 0.2
0.4 0.2

1 21
0 0

0.2 0.2
0.4 0.4

1 1
0 0

A 0.4I =

= v1 =

= v2 =

1
2
1
1

11.2. Comments on exercises

Then xt = P D t P 1 x0 . The initial distribution is x0 = (0.5, 0.5)T .


xt
yt

1 1
2 1

1t
0

0
(0.4)t

1 1
2 1

1t
0

0
(0.4)t

1
3

1
2

1
(0.4)t
6

1
1

1
3
1
3

1 1
2 1

0.5
0.5
.

16

The expressions for xt and yt are


xt =

1 1
+ (0.4)t ,
3 6

yt =

2 1
(0.4)t .
3 6

As t , xt (1/3, 2/3)T . In terms of the original total population of 210


inhabitants, we multiply xt by 210, so the long-term population distribution is 70
inhabitants living by the sea and 140 inhabitants living in the oasis.
Solution to exercise 11.5
(i) The matrix B is a scalar multiple of A, B = 10 A. You should be able to state a
precise definition as asked:
If M is an n n matrix and is a real number such that
Mv = v for some vector v = 0,
then is an eigenvalue of M and v is a corresponding eigenvector.
You are being asked for the definition, not to describe the method of finding eigenvalues
and eigenvectors. Note that v = 0 is an important part of this definition.
Let be an eigenvalue of B with corresponding eigenvector v, Bv = v. Then
substituting 10A for B, we have
10Av = v

Av =

v,
10

so A and B have the same eigenvectors, v, and /10 is the corresponding eigenvalue of
A.
(ii) A is the transition matrix of a Markov chain because:
1. All the entries are non-negative (aij 0).
2. The sum of the entries in each column is 1.
Since = 1 is an eigenvalue of a Markov chain, we can deduce that 10 = 10 is an
eigenvalue of B.
(iii) To find an eigenvector for = 10, we solve (B 10I)x = 0.

3 2
2
1
2 2
1 0 1
(B 10I) = 0 8 4 3 2
2 . . . 0 1 12
0 0 0
3
6 6
0 2 1

2

So an eigenvector for = 10 is v1 = 1 .
2

271

11

11. Applications of diagonalisation

To find the other eigenvalues, we find the characteristic equation. Expanding the
determinant by the first column,
|B I| =

7
2
2
0
2
4
= (7 )(2 6 16) + 3(2 + 4) = 0.
3
6
4

Factoring the quadratic, there is a common factor of + 2 in the two terms, which can
be factored out, avoiding a cubic equation. We have
|B I| = (+2)[(7)(8)6] = (+2)(2 15+50) = (+2)(10)(5).
So the eigenvalues are = 10, 5, 2.
We then find the corresponding eigenvectors. Solving (B 5I)v = 0,

2 2
2
1 1 1
1 0 37
(B 5I) = 0 3 4 . . . 0 3 4 . . . 0 1 43
3 6 1
0 0 0
0 0 0

7
So an eigenvector for = 5 is v2 = 4 .
3

For = 2, we have

9 2 2
1 2 2
1 0 0
(B + 2I) = 0 4 4 . . . 0 1 1 0 1 1
3 6 6
0 0 0
0 0 0

0
So an eigenvector for = 2 is v3 = 1 .
1
If

2 7 0
P = 1 4 1
2 3
1

10 0 0
and D = 0 5 0 ,
0 0 2

then P 1 BP = D.

The eigenvectors and eigenvalues must be listed in corresponding columns.


To check,

7 2 2
2 7 0
20 35 0
BP = 0 2 4 1 4 1 = 10 20
2 = P D.
3 6 4
2 3
1
20 15 2

11

Why are you being asked to check? So that you know you do have the correct
eigenvalues and eigenvectors. This gives you an opportunity to look for and correct any
minor mistakes you may have made.
(iv) You are now being asked to apply the diagonalisation to the given Markov chain.
The eigenvalues and corresponding eigenvectors of A are = 1 with eigenvector v1 ,
= 0.5 with corresponding eigenvector v2 and = 0.2 with corresponding eigenvector
v3 .

272

11.2. Comments on exercises

The long-term distribution of a Markov chain is given by the eigenvector for = 1.


Therefore the distribution is proportional to the entries of the vector v1 :

2
40, 000
1
1 100, 000 = 20, 000 .
5
2
40, 000
That is, 40,000 shoppers will go to the A-B supermarket, 20,000 will go to the J-K
supermarket, and 40,000 will shop locally.

Notice that you did not need to find the solution to xt = Axt1 . This would have been a
perfectly acceptable method, but one which would take much more time. You only
needed to know that since (0.5)t 0 and (0.2)t 0 as t , the eigenvector
corresponding to = 1 will give the long-term distribution. It must be a distribution
vector, that is, the components of the column vector must sum to 1, so that when
multiplied by the total population of 100, 000 it gives the distribution of shoppers.

11

273

11. Applications of diagonalisation

11

274

Appendix A
Sample examination paper
Important note: This Sample examination paper reflects the intended examination
and assessment arrangements for this course in the academic year 2011/2012. The
intended format and structure of the examination may have changed since the
publication of this subject guide. You can find the most recent examination papers on
the VLE where all changes to the format of the examination are posted.

Time allowed: THREE HOURS


Candidates should answer all FIVE questions. All questions carry equal marks (20
marks each).
Calculators may not be used for this paper.
1(a) Consider the following system of equations, for some constants a and b,
x y + 2z = 4
3x y z = 0
x + y + az = b
Use matrix methods to determine what values a and b must take if this system is
consistent and has infinitely many solutions.
What must the value of a not be if the system has precisely one solution?
What can be said about a and b if the system has no solutions?
(b) If a = 4 and b = 1, find the solution of the above system using any matrix method
(Gaussian elimination, inverse matrix, Cramers rule).
(c) What does it mean to say that a set {x1 , x2 , . . . , xk } of vectors in Rn is linearly
dependent. Show that the set {x1 , x2 , x3 , x4 } of vectors in R4 is linearly dependent,
where




2
2
2
1
1
5
2
0


x1 =
1 , x2 = 3 , x3 = 7 , x4 = 6 .
6
5
3
4
Express x4 as a linear combination of the other three vectors.
2(a) A system of linear equations Ax = d is known to have the following solution:

1
2
1
2
1
1

x=
0 + s1 + t 0 .
1
0
1
0
0
1

275

A. Sample examination paper

Assume that A is an m n matrix. Let c1 , c2 , . . . , cn denote the columns of A.


Answer the following questions, or, if there is insufficient information to answer the
question, say so.
(1) What number is n?
(2) What number is m?
(3) What (number) is the rank of A?
(4) Write down a basis of the null space of A, N(A).
(5) Which columns ci form a basis of the range, R(A)?
(6) Write down an expression for d as a linear combination of the columns ci .
(7) Write down a non-trivial linear combination of the columns ci which is equal
to the zero vector.
(b) A sequence xt satisfies
xt+1 =

a
axt xt1 ,
4

for all t 1, where a > 0 is a fixed number. If x0 = 1 and x1 =


formula (in terms of t and a) for xt .

a, find a

(c) An investor saves money in a bank account paying interest at a fixed rate of 5%,
where the interest is paid once per year, at the end of the year. She makes an
initial deposit of $20, 000 and, then, at the end of each of the next N years, just
after the interest has been paid, she withdraws an amount of $500. Find an
expression, in terms of N, for the amount of money in the account at the end of N
years, just after the Nth withdrawal has been made.
3(a) Consider the set

2t

H=
t
: tR .

3t

Prove that the set H is closed under addition and scalar multiplication. Hence, or
otherwise, prove that it is a subspace of R3 .
Show that every vector w H is a unique linear combination of the vectors


1
0

v1 =
0
and v2 = 1 .
1
5
Answer the following questions, justifying any answers.
(1) Is {v1 , v2 } a basis of the subspace H? If yes, state why. If no, write down a
basis of H. State the dimension of H.
(2) Find a Cartesian equation for the subspace G = Lin{v1 , v2 }. Is {v1 , v2 } a
basis of G? Why or why not?
(b) State the dimension (rank-nullity) theorem for a linear transformation,
T : V W , from a finite dimensional vector space V to a vector space W ,
carefully defining each term.

276

A
Let {e1 , e2 , e3 , e4 } be the standard basis of R4 , and let v1 , v2 , v3 , x be the
following vectors in R3 (where x, y, z are constants):

1
2
1
x

0
, v2 =
3
, v3 =
5
,
x = y ,
v1 =
7
z
2
1
Let T be a linear transformation, T : R4 R3 , given by
T (e1 ) = v1 ,

T (e2 ) = v2 ,

T (e3 ) = v3 ,

T (e4 ) = x.

(i) Suppose the vector x is such that the linear transformation T has
dimR(T ) = dimN(T ).
Write down a condition that the components of x must satisfy for this to happen.
Find a basis of R(T ) in this case.
(ii) Suppose the vector x is such that the linear transformation T has
dimN(T ) = 1.
Write down a condition that the components of x must satisfy for this to happen.
Find a basis of N(T ) in this case.
4 Suppose

1 2 1
A = 4 4 8 .
13 2 11

Find a basis of the null space of A.

Deduce that = 0 is an eigenvalue of A and write down the corresponding


eigenvector. (Justify your answer using the definition of eigenvalue and eigenvector.)
Diagonalise the matrix A: find an invertible matrix P and a diagonal matrix D
such that P 1 AP = D.
Using your answer, or otherwise, determine the sequences (xn ), (yn ), (zn ) which
have the following properties
xn+1 = xn 2yn zn
yn+1 = 4xn 4yn 8zn
zn+1 = 13xn 2yn + 11zn
and which satisfy the initial conditions x0 = y0 = 1 and z0 = 0.
5(a) Let

1 4 5 3 2
A = 0 2 4 2 2,
1 1 5 0 1

11
b = 2 .
6

Solve the system of equations, Ax = b, using Gaussian elimination. (Put the


augmented matrix into reduced row echelon form.) Express your solution in vector
form (as x = p + a1 v1 + + ak vk , where k is a positive integer).

277

A. Sample examination paper

Let c1 , c2 , . . . , c5 denote the columns of A.


Can you express c3 as a linear combination of c1 and c2 ? Justify your answer.
Write down a linear combination if one exists.
Explain how to deduce from the reduced row echelon form of A that the set of
vectors B = {c1 , c2 , c4 } is linearly independent.
Why can you conclude that B is a basis of R3 ?

Write down the coordinates of the vector b in this basis; that is, write down [b]B .
(b) Show that the set S = {c1 , c3 , c4 } is also a basis of R3 .

Find the transition matrix P from coordinates in the basis B to coordinates in the
basis S.

Hence, or otherwise, find [b]S , the coordinates of the vector b in the basis S.

278

Appendix B
Commentary on the Sample
examination paper

General remarks
We start by emphasising that candidates should always include their working. This
means two things. First, you should not simply write down the answer in the
examination script, but should explain the method by which it is obtained. Second, you
should include rough working. The Examiners want you to get the right answers, of
course, but it is more important that you demonstrate that you know what you are
doing: that is what is really being examined.
We also stress that if a candidate has not completely solved a problem, they may still
be awarded marks for a partial, incomplete, or slightly wrong, solution; but, if they have
written down a wrong answer and nothing else, no marks can be awarded.

Solutions to questions
Question 1(a) Since you are asked to use matrix methods, begin by thinking of the
system of equations in matrix form, as Ax = b with



1 1 2
x
4
A = 3 1 1 ,
x = y ,
b = 0.
1 1
a
z
b
Read through the question to know all that is being asked. There are different
approaches you can take to start.

The most efficient method is to write down the augmented matrix and begin to row
reduce it,

1 1 2 4

1 1
2
4
(A|b) = 3 1 1 0 R2 3R1 0 2
7
12
1 1
a b
R3 R1
0 2 a2 b4

1 1
2
4

0 2
7
12 .
R3 R2
0 0 a+5 b+8

You are now in a position to answer the questions asked in the order in which they were
asked.

279

B. Commentary on the Sample examination paper

The system will be consistent with infinitely many solutions if and only if the last row
of the row echelon form is a row of zeros, so a = 5 and b = 8. It will have a unique
solution if and only if a + 5 = 0, so a = 5. It will be inconsistent (no solution) if and
only if a + 5 = 0 and b + 8 = 0, that is a = 5 and b = 8.

Alternatively, you can begin by evaluating the determinant of A, for example, using the
cofactor expansion by row 3,
1 1 2
3 1 1 = 1(1 + 2) 1(1 6) + a(1 + 3) = 10 + 2a.
1 1
a
The system will have a unique solution if and only if |A| = 0, so a = 5. If a = 5
there will either be infinitely many solutions or no solution, depending on the value of b.
To answer the remaining questions, you still need to row reduce the augmented matrix,
but this time you can do it with a = 5,

1 1 2 4

1 1 2
4
3 1 1 0 R2 3R1 0 2 7 12 .
1 1 5 b
R3 R1
0 2 7 b 4
Comparing the last two rows, you can see that the system will be inconsistent if
b 4 = 12, that is if b = 8 and a = 5, and that there will be infinitely many
solutions if b = 8 and a = 5.

(b) If you have successfully solved part (a) of this question, then the easiest way to
solve the system with a = 4 and b = 1 is to substitute these values into the row echelon
form of the augmented matrix and continue reducing:

1 1
2
4
1 1 2
4
1 1 2
4
0 2
7
12 = 0 2 7 12 0 2 7 12
0 0 a+5 b+8
0 0
9
9
0 0
1
1

1 1 0 2
1 1 0 2
1 0 0 21
0 2 0 5 0 1 0 25 0 1 0 52 .
0 0 1 1
0 0 1 1
0 0 1 1

The unique solution is x = (x, y, z)T = ( 12 , 52 , 1)T . (It is easy for you to check that
this is correct by substituting the values into the equations.)
You could also solve this system using the inverse matrix or Cramers rule. These are
covered in Chapter 4 of the subject guide. It is a good idea for you to practise these
methods by solving this system to obtain the same answer.
(c) Linear independence is covered in Chapter 8 of the subject guide.
A set {x1 , x2 , . . . , xk } of vectors in Rn is linearly dependent if there are real numbers
a1 , a2 , . . . , ak , not all zero, such that
a1 x1 + a2 x2 + + ak xk = 0.
Equivalently, the set {x1 , x2 , . . . , xk } of vectors is linearly dependent if one of the vectors
can be expressed as a linear combination of the others. (Either statement is acceptable).

280

To show that the set {x1 , x2 , x3 , x4 } of vectors in R4 is linearly dependent, where






1
2
2
2
2
0
1
5




x1 =
1 , x2 = 3 , x3 = 7 , x4 = 6 ,
4
5
3
6

you can write the vectors as the columns of a matrix A and row reduce it, thereby
solving the system of equations
Ax = a1 x1 + a2 x2 + a3 x3 + a4 x4 = 0.
The steps are not shown here, but you should show all steps
reduced row echelon form You should obtain the result:

1 2 2 2
1 0
2 0 1 5
0 1

A=
1 3 7 6 . . . 0 0
4 5 3 6
0 0

in the examination. The

0 2
0 1
,
1 1
0 0

from which you can deduce that there are infinitely many solutions, since there is one
non-leading variable, and therefore the vectors are linearly dependent.
To find the linear combination, you can spot the linear dependence relations between
the columns of the reduced row echelon form, and the columns of A will have the same
relationship, namely,
x4 = 2x1 x2 + x3 .

Or you can find the solution v = (2, 1, 1, 1)T of Ax = 0 and use it to write down the
relationship between the columns of A, since
Av = 2x1 + x2 x3 + x4 = 0,
and then solve for x4 . Either way, it is easy to check (and you should do this) that your
answer is correct by using the vectors,


2
2
1
2
2 0 1
5
= 2 + .
1 3 7
6
5
3
4
6
Question 2(a) This question is a good test of your understanding of the material in
Chapter 3 of the subject guide. If A is an m n matrix with columns c1 , c2 , . . . , cn
such that the system of linear equations Ax = d has solution:

1
2
1
2
1
1

x = 0 + s1 + t
0 = p + sv1 + tv2 ,
1
0
1
0
0
1

then you should be able to deduce certain properties of the matrix A just by looking at
the solution.

281

B. Commentary on the Sample examination paper

(1) The number of columns, n = 5. Why? Because the solutions, x, are 5 1 vectors,
and the multiplication Ax is only defined if A has the same number of columns as x has
rows.
(2) The number m cannot be determined. (But, m 3 from part (3).)

(3) The rank of A is 3. Essentially, this is deduced from the rank-nullity theorem
which says that rank(A)+nullity(A)=n, where n is the number of columns of A. So the
rank, r is r = n dim(N(A)). You have also seen that the general solution of Ax = b is
of the form
x = p + a1 v1 + + anr vnr
and the given solution is of the form x = p + sv1 + tv2 , so dim(N(A)) = 2 and
r = 5 2 = 3.

(4) The two vectors, v1 and v2 form a basis of the null space of A, N(A), So {v1 , v2 } is
a basis, where
v1 = (2, 1, 1, 0, 0, )T and v2 = (1, 1, 0, 1, 1)T .
(5) To answer this you need a good understanding of how the general solution is
obtained using Gaussian elimination. By looking at the solution, you can tell the
positions of the leading variables and the non-leading variables in the reduced row
echelon form of A. The non-leading variables must be in the third and fifth column
because of the positions of 0 and 1 in the solution vectors, and the leading ones must be
in the first, second and fourth columns. So a basis of the range, R(A), is the set of
vectors {c1 , c2 , c4 }.
(6) From Ap = d, you can deduce that d = c1 + 2c2 c4 . Any solution x, so any value
of s and t, will also give you a vector such that Ax = d, and so a different linear
combination, but p is the simplest one to use.
(7) In the same way, using Av1 = 0, or Av2 = 0, you obtain the linear combinations
2c1 + c2 + c3 = 0 or c1 + c2 c4 + c5 = 0.
Again, any linear combination of v1 and v2 can be used.
(b) This is a second-order difference equation,
as covered in Chapter 6 of the subject

guide. In
standard form, we have xt+1 axt + (a/4)xt1 = 0, so the auxiliary equation

is z 2 az + (a/4) = 0, which is (z a/2)2 = 0, so there is just one solution, a/2.


Therefore, for some constants A and B,
t
a
.
xt = (At + B)
2

The facts that x0 = 1 and x1 = a show that A = 3 and B = 1, so


t
a
xt = (3t 1)
.
2
(c) Let yn be the amount of money after the nth withdrawal. Then:
y1 = 20000(1.05) 500,
y2 = (1.05)y1 500 = 20000(1.05)2 500(1.05) 500,

282

y3 = y2 (1.05) 500 = 20000(1.05)3 500(1.05)2 500(1.05) 500.


Spotting the pattern,

B
N

N 1

yN = 20000(1.05) 500(1.05)
500(1.05)
N
(1.05) 1
= 20000(1.05)N 500
(1.05) 1
N
= 20000(1.05) 10000((1.05)N 1)
= 10000(1.05)N + 10000.

N 2

500(1.05) 500

The question can also be solved using difference equations.


Question 3(a) This question is covered in Chapter 7 of the subject guide. It also relies
on understanding lines and planes in R3 as covered in Chapter 2.

2t

To show H = t : t R
is closed under addition,

3t


2t
2s
let u, v H. Then u = t , v = s , for some s, t R. Then
3t
3s


2t + 2s
2(t + s)
2t
2s
u+v = t + s = t+s = t+s H
3(t + s)
3t + 3s
3s
3t
since (t + s) R. Therefore, H is closed under addition.

To show H is closed under scalar multiplication:


Let u H, R. Then

2t
(2t)
2(t)
u = t = t = t H
3t
(3t)
3(t)
since (t) R. Therefore, H is closed under scalar multiplication

The set H is non-empty, since the vector 0 H, as well as the vector v = (2, 1, 3)T .
Since H is also closed under addition and scalar multiplication, it is a subspace.

2s

For the next part, let w H, w =


s for some constant s R. You need to find
3s
constants a, b such that


2s
1
0
s = a 0 + b1,
3s
1
5

which is equivalent to the simultaneous equations: 2s = a, s = b, 3s = a + 5b.


Substituting the values of a and b obtained from the first two equations into the third

283

B. Commentary on the Sample examination paper

equation, we find that these values also satisfy the third equation. Therefore the system
has the unique solution a = 2s, b = s, and w = (2s)v1 + (s)v2 .

To answer the remaining questions, it helps for you to see what is going on.
(1) The set {v1 , v2 } is NOT a basis of the subspace H since v1
/ H (and also,
v2
/ H.)

2

A basis of H is {v} where v = 1 ; and dim(H) = 1.


3
(2) A Cartesian equation for the subspace G = Lin{v1 , v2 } is given by
1 0 x
0 1 y = x 5y + z = 0.
1 5 z

(This can be easily checked by substituting in the components of v1 and v2 , and you
should do this.) The set {v1 , v2 } is a basis of G. It spans as G is, by definition, the set
of all linear combinations of v1 and v2 . It is linearly independent as neither vector is a
scalar multiple of the other.
(b) Linear transformations are covered in Chapter 9 of the subject guide.
If T : V W is a linear transformation, and dim(V ) = n, then the dimension theorem
states that
dimR(T ) + dimN(T ) = n
or
rank(T ) + nullity(T ) = n
where nullity(T ) is the dimension of N(T ), the kernel, or null space, of T and rank(T )
is the dimension of R(T ), the range of T . Note that you must specifically say what n
represents; that is, n = dimV .
Let {e1 , e2 , e3 , e4 } be the standard basis of R4 , v1 , v2 , v3 , x the vectors.


1
2
1
x

v1 =
0
, v2 =
3
, v3 =
5
,
x = y ,
2
1
7
z
and let T be the linear transformation, T : R4 R3 , given by
T (e1 ) = v1

T (e2 ) = v2

T (e3 ) = v3

T (e4 ) = x.

Then T is given by T (x) = Ax where A is a 3 4 matrix. The simplest way to answer


the questions is to construct this matrix, whose columns are the images of the standard
basis vectors, T (ei ),

1
2 1 x
A= 0
3
5 y
2 1 7 z
In order to consider the two possibilities in parts (i) and (ii), row reduce this matrix,
beginning with R3 + 2R1 ,

1 2 1
x
1 2 1
x
.
A 0 3 5
y 0 3 5
y
0 3 5 z + 2x
0 0 0 z + 2x y

284

(i) By the dimension theorem, since T : R4 R3 , n = 4, so for the dimensions of R(T )


and N(T ) to be equal, the subspaces must both have dimension 2. Looking at the
reduced form of the matrix, we see that this will happen if
2x y + z = 0.
If the vector x satisfies this condition, then a basis of R(T ) is given by the columns of A
corresponding to the leading ones in the row echelon form, which will be the first two
columns. So a basis of R(T ) is {v1 , v2 }.
You could also approach this question by first deducing from the dimension theorem
that the dimR(T ) = 2 as above, so R(T ) is a plane in R3 . Therefore {v1 , v2 } is a basis,
since these two vectors are linearly independent (because they are not scalar multiples)
and they span a plane whose Cartesian equation is given by
1
2 x
0
3 y = 6x 3y + 3z = 0.
2 1 z
or 2x y + z = 0. The components of the vector v3 satisfy this equation, and this is the
condition that the components of x must satisfy.
(ii) If the linear transformation has dimN(T ) = 1, then by the dimension theorem, you
know that dimR(T ) = 3 (therefore R(T ) = R3 ), so the echelon form of the matrix A
needs to have 3 leading ones. Therefore the condition that the components of x must
satisfy is
2x y + z = 0.
Now continue with row reducing the matrix A to obtain a basis for N(T ). The row
echelon form of A will have a leading one in the last column (first multiply the last row
by 1/(2x y + z) to get this leading one, then continue to reduced echelon form)

1 2 1 0
1 2 1 0
1 0 13
0
3
5
A . . . 0 3 5 0 0 1 35 0 0 1
0,
3
0 0 0 1
0 0 0 1
0 0
0
1
13
3

5
3
so a basis of N(T ) is given by the vector w =
1 or any non-zero scalar multiple
0

13
5

of this, such as
3 .
0

Question 4 To find a basis of the null space of the matrix A, put it into reduced row
echelon form using the algorithm. The steps are not shown, but you should be able to
carry them out efficiently and accurately, and you should show all the steps in the
examination.

1 2 1
1 0 1
A= 4
4 8 . . . 0 1 1 .
13 2 11
0 0 0

285

B. Commentary on the Sample examination paper

You can read the solution of the homogeneous system Ax = 0 from the reduced echelon
form of the matrix, setting z = t, t R, to obtain the general solution

1
x = t 1 = tv1 . t R.
1

The vector v1 = (1, 1, 1)T is a basis of the null space.

Since Av1 = 0 = 0v1 , the vector v1 is an eigenvector of A corresponding to the


eigenvalue 1 = 0. (This statement invokes the definition of eigenvalue and eigenvector,
namely that Av = v for some v = 0.)
To diagonalise the matrix A, you first need to find the remaining eigenvalues by solving
|A I| = 0. The characteristic equation is
|A I| = 3 + 62 + 72 = ( + 6)( 12) = 0.
Again, the steps are not shown here, but you should show them all in an examination.
You need to expand the determinant slowly and carefully to avoid errors. The
eigenvalues are 1 = 0, 2 = 6, 3 = 12.

Next solve (A I)v = 0 for each of the other two eigenvalues. In each case the
reduced echelon form of the matrix, (A I) should contain a row of zeros, so that
there is a non-trivial solution giving the corresponding eigenvector. This checks that the
eigenvalues are correct. If the reduced echelon form of (A I) does not contain a row
of zeros, then you need to find your error. This may be in the row reduction, or it may
be in your characteristic equation or factorising. One quick way to check whether your
eigenvalue is correct is to substitute it into |A I| and see if you do get zero when you
evaluate the determinant.
Having solved (A I)v = 0 for each of 2 and 3 , you should have that the
corresponding eigenvectors are multiples of:

1
0
v2 = 2 and v3 = 1 ,
1
2
respectively. Again, all work should be shown.

At this stage, you should check that the eigenvectors are correct. Form a matrix P
whose columns are the eigenvectors and the diagonal matrix D with the corresponding
eigenvalues on the diagonal,

1 1 0
0 0
0
P = 1 2 1
D = 0 6 0 .
1 1 2
0 0 12
Now check that AP = P D by multiplying out the matrices AP and P D.

You know that P is invertible since eigenvectors corresponding to distinct eigenvalues


are linearly independent. Therefore, you can conclude that P 1 AP = D. Having
checked the eigenvalues and eigenvectors, you do not need to compute P 1 AP explicitly
to determine D; you can simply state the result because of the underlying theory.

286

Use this diagonalisation to determine the sequences (xn ), (yn ), (zn ) which have the
following properties:

xn+1 = xn 2yn zn
yn+1 = 4xn 4yn 8zn
zn+1 = 13xn 2yn + 11zn
and which satisfy the initial conditions x0 = y0 = 1 and z0 = 0.
Denoting by xn the vector (xn , yn , zn )T , this can be expressed as xn+1 = Axn , for which
the solution is given by
xn = An x0 = P D n P 1 x0 .
Using the adjoint method (cofactors), or any other method, find P 1 , and immediately
check that the inverse is correct by showing P P 1 = I.
Then using the initial conditions,
P 1 x0 =
So that

1
6

5 2 1
1
2
1
3 0
3

1
1
0

xn
1 1 0
0
0
xn = yn = 1 2 1 0 (6)n
0
0
zn
1 1 2

The solution is,

xn =

1
2
1
2

21

1
0
2
0 12
(12)n
12

1
(6)n
2

1
yn = (6)n + (12)n
2
1
(6)n + 12n .
zn =
2
(The answer can be checked by finding x1 both from the original equations and from
the solution. If you have time, you might want to do this.)
Question 5(a) Solving a linear system of equations by putting the augmented matrix
into reduced echelon form is an application of the basic material in Chapter 3 of the
subject guide.
You should begin this question as instructed, by writing down the augmented matrix
and putting it into reduced row echelon form. Do this carefully to avoid errors,

1 4 5 3 2 11
1 4 5 3 2 11
(A|b) = 0 2 4 2 2 2 0 1 2 1 1 1
1 1 5 0 1 6
0 5 10 3 3 17

1 4 5 3
2 11
1 4 5 3 2 11
0 1 2 1
1
1 0 1 2 1 1 1
0 0 0 2 2 12
0 0 0 1 1 6

287

B. Commentary on the Sample examination paper

1 4 5 0 1 29
1 0 3 0 1 1
0 1 2 0 0
7 0 1 2 0 0
7 .
0 0 0 1 1 6
0 0 0 1 1 6
Check that you do have the reduced row echelon form; find the columns with leading
ones and make sure they have zeros elsewhere (above and below). As the question
specifically asks you to put the matrix into reduced row echelon form, if you stop at row
echelon form and use back substitution, you will not earn full marks, and you are also
less likely to obtain the correct answer.
You can read the solution from the reduced echelon form. Assign parameters, say s
and t, to the non-leading variables x3 and x5 , and write down the other variables in
terms of these using the equations deduced from the matrix. The general solution is

x1
1 + 3s + t
1
3
1
x2 7 2s 7
2
0

x = x3 =
s
= 0 + s 1 + t 0 .
x4 6 t 6
0
1
x5
t
0
0
1
x = p + sv1 + tv2 ,

s, t R.

To answer the questions concerning the column vectors, you need to understand the
material in Chapter 8 of the subject guide.
The columns of the reduced row echelon form of a matrix satisfy the same dependency
relations as the columns of the matrix. From the reduced row echelon form of A, you
can see that
c3 = 3c1 + 2c2 .

Indeed, this also follows from Av1 = 0, and you can, and should, check that it is correct:


5
1
4
4 = 3 0 + 2 2 .
5
1
1

To answer the next part, it is enough to say that in the reduced echelon form of A, the
columns with the leading ones correspond to the vectors c1 , c2 and c4 . Therefore these
vectors are linearly independent. (The reduced row echelon form of a matrix C
consisting of these three column vectors would have a leading one in every column, so
Cx = 0 has only the trivial solution.)
To conclude that B is a basis of R3 , you can state that B is a set of three linearly
independent vectors in a three-dimensional vector space, R3 , therefore B is a basis of
R3 . It is not sufficient to merely say that the vectors are linearly independent and span,
you would need to give a reason why they span R3 . (For example, by stating that there
is a leading one in every row, so Ax = b has a solution for all b R3 .)
From the solution, Ap = b, you have b = c1 + 7c2 6c4 . You should recognise that this
expresses b as a linear combination of the basis vectors, and the coefficients are the
coordinates of b in this basis, B. That is,

1
[b]B = 7 .
6 B

288

(b) This part of the question continues with the material on the basis of a vector space
contained in Chapter 8. The material on changing basis is in Chapter 9.
To show that S = {c1 , c3 , c4 } is also a basis of R3 , you can calculate the determinant of
the matrix with these vectors as columns.
1 5 3
0 4 2 = 1(10) 1(10 12) = 8 = 0.
1 5 0
Since the determinant is non-zero, this implies that S is a basis of R3 .
(This statement answers the question, and is all that is required here, but you should
understand why it is true. The relationship of these concepts is covered in Chapter 3
and Chapter 4 of the subject guide. If Q denotes the matrix with column vectors,
c1 , c3 , c4 , then |Q| = 0 implies that Q1 exists, so that a system of equations Qx = b
has a unique solution for all b R3 . This implies both that the column vectors are
linearly independent and that they span R3 . The same argument follows by reducing
the matrix Q to echelon form and showing that there are three leading ones.)
You can find P by using the transition matrix M from B
the transition matrix Q from S coordinates to standard:

1 4 3
1
M = 0 2 2,
Q= 0
1 1 0
1
If you recall that

v = M[v]B

and

coordinates to standard and

5 3
4 2,
5 0

w = Q[w]S ,

then to change from coordinates in the basis B to coordinates in the basis S, you need
[v]S = Q1 M[v]B .
So Q1 M is the transition matrix from B coordinates to S coordinates. The easiest way
to find Q1 is using the cofactor method. Then

10 15 2
1 4 3
1 32 0
1
Q1 M = 2
3
2 0 2 2 = 0 12 0 = P.
8
4
10 4
1 1 0
0 0 1
You can find the S coordinates of b using this matrix and [b]B from part (a),

23
1 23 0
1
2
[b]S = 0 12 0 7 = 72 ,
0 0 1
6 B
6 S

which you can easily check. Or, you can find the S coordinates directly from the basis S
by solving b = ac1 + bc3 + cc4 for a, b, c using Gaussian elimination or by using the
inverse matrix, Q1 , which you found above.
You can also do this using the results of part (a). You know that b = 1c1 + 7c2 6c4
and c3 = 3c1 + 2c2 . If you solve the latter equation for c2 and substitute into the

289

B. Commentary on the Sample examination paper

equation for b, you will obtain the vector b as a linear combination of c1 , c3 , c4 , and
hence the coordinates of b in this basis.

This idea can be used in a better understanding of the matrix P . Notice the simple
form of the transition matrix P from B coordinates to S coordinates. If you have a
vector expressed as a linear combination of the basis vectors of B and as a linear
combination of the basis vectors of S, then the coefficients of the first and last vectors
will be the same in either basis since the first and last basis vectors are the same. Only
the middle vector is different. Therefore, P will be of the form

1 a 0
P = 0 b 0.
0 c 1

To change from a linear combination of c1 , c2 , c4 , to a linear combination of c1 , c3 , c4 ,


you just need to know how to express c2 as a linear combination of the S basis vectors;
that is, you need the coordinates of the vector c2 in the basis S. Using the result of part
(a), that c3 = 3c1 + 2c2 , as you did above, and solving for c2 , you will obtain
c2 = 23 c1 + 12 c3 . Therefore,

1 32 0
P = 0 12 0 .
0 0 1

290

Das könnte Ihnen auch gefallen