Beruflich Dokumente
Kultur Dokumente
M. Anthony, M. Harvey
MT1173
2013
Undergraduate study in
Economics, Management,
Finance and the Social Sciences
This subject guide is for a 100 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 4 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
Martin Anthony, Professor of Mathematics, and Michele Harvey, Course Leader, Department of
Mathematics, London School of Economics and Political Science.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the authors are unable to enter into any correspondence relating to, or arising from, the guide. If you have any comments on this subject guide, favourable or unfavourable, please use the form at the back of this guide.
Contents
Contents
Preface
1 Introduction
1.1
This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
1.1.2
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3
Topics covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1
Recommended reading . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1
The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2
1.4
1.5
Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6
1.7
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7.1
Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7.2
Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7.3
Basic notations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.7.4
Simple algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.7.5
Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.7.6
Quadratic equations . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.7.7
Polynomial equations . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.7.8
15
1.2
1.3
17
Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
17
2.1
What is a matrix?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2
18
Contents
2.3
Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.4
Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.5
Matrix inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.5.1
26
Powers of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.6.1
Properties of powers . . . . . . . . . . . . . . . . . . . . . . . . .
26
Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.7.1
Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
2.8.1
30
32
2.9.1
Vectors in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2.9.2
Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
2.9.3
Vectors in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.10 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.10.1 Lines in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.10.2 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
2.11 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
49
2.12.1 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
52
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
2.6
2.7
2.8
2.9
65
65
3.1
65
3.2
Row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
3.3
Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
3.3.1
69
3.3.2
72
3.3.3
74
3.3.4
Solution sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
77
3.4
ii
65
Contents
3.4.1
Homogeneous systems . . . . . . . . . . . . . . . . . . . . . . . .
77
3.4.2
Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
84
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
89
Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
89
4.1
89
4.1.1
Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.1.2
Row equivalence
. . . . . . . . . . . . . . . . . . . . . . . . . . .
92
4.1.3
93
4.1.4
94
4.1.5
Result on inverses . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
4.2.1
96
4.2.2
102
4.2.3
106
107
4.3.1
Cramers rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
110
111
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
114
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
114
117
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
4.2
4.3
4.4
127
Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127
127
5.1
127
5.2
129
5.3
133
5.4
Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
134
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
137
iii
Contents
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
137
139
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
141
147
147
6.1
Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
6.1.1
Sequences in general . . . . . . . . . . . . . . . . . . . . . . . . .
147
6.1.2
Arithmetic progressions . . . . . . . . . . . . . . . . . . . . . . .
148
6.1.3
Geometric progressions . . . . . . . . . . . . . . . . . . . . . . . .
148
6.1.4
Compound interest . . . . . . . . . . . . . . . . . . . . . . . . . .
148
6.1.5
Frequent compounding . . . . . . . . . . . . . . . . . . . . . . . .
148
Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149
6.2.1
Arithmetic series . . . . . . . . . . . . . . . . . . . . . . . . . . .
150
6.2.2
Geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . .
150
6.2
6.3
. . . . . . . . . . . . . . . . . . . . . .
151
6.4
Limiting behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151
6.5
Financial applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
152
6.6
153
6.7
155
6.8
156
6.9
156
157
158
160
161
161
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163
166
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
167
7 Vector spaces
iv
147
173
Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173
173
Contents
7.1
Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173
7.1.1
173
7.1.2
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
175
Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
176
7.2.1
178
179
7.3.1
Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
179
7.3.2
Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
179
Linear span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
180
7.4.1
181
7.4.2
. . . . . . . . . . . . . . . . . . . .
182
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
182
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
183
184
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
187
7.2
7.3
7.4
191
Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191
191
8.1
Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191
8.2
193
8.3
Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195
8.3.1
Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
198
8.3.2
Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199
8.3.3
200
8.4
201
8.5
202
8.6
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205
8.7
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205
8.8
207
8.9
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
210
215
Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
215
215
9.1
215
Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
9.2
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
216
9.3
217
9.3.1
Rotation in R
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
218
9.3.2
219
9.3.3
220
9.3.4
220
9.3.5
221
9.4
222
9.5
223
9.6
Coordinate change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
224
9.7
227
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
229
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
230
231
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
233
10 Diagonalisation
237
Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
237
237
237
10.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
237
237
10.1.3 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241
. . . . . . . . . . . . . . . . . . . . .
242
245
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
248
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
249
250
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
251
11 Applications of diagonalisation
vi
255
Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
255
255
255
257
257
Contents
258
260
261
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
266
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
266
268
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
268
275
279
vii
Contents
viii
Preface
This subject guide is not a course text. It sets out a logical sequence in which to study
the topics in this subject. Where coverage in the main texts is weak, it provides some
additional background material. Further reading is essential.
We are very grateful to James Ward and Keith Martin for their careful readings of a
draft of this guide and for their many helpful comments.
Preface
Chapter 1
Introduction
In this very brief introduction, we aim to give you an idea of the nature of this subject
and to advise on how best to approach it. We give general information about the
contents and use of this subject guide, and on recommended reading and how to use the
textbooks.
1.1.1
1.1.2
Learning outcomes
We now state the broad learning outcomes of this course, as a whole. More specific
learning outcomes can be found at the end of each chapter.
At the end of this course and having completed the reading and activities you should
have:
1. Introduction
used the concepts, terminology, methods and conventions covered in the course to
solve mathematical problems in this subject
the ability to solve unseen mathematical problems involving understanding of these
concepts and application of these methods
seen how algebra can be used to solve problems in economics and related subjects
the ability to demonstrate knowledge and understanding of the underlying
principles.
There are a couple of things we should stress at this point. First, note the intention that
you will be able to solve unseen problems. This means simply that you will be
expected to be able to use your knowledge and understanding of the material to solve
problems that are not completely standard. This is not something you should worry
unduly about: all mathematics topics expect this, and you will never be expected to do
anything that cannot be done using the material of this course. Second, we expect you
to be able to demonstrate knowledge and understanding and you might well wonder
how you would demonstrate this in the examination. Well, it is precisely by being able
to grapple successfully with unseen, non-routine, questions that you will indicate that
you have a proper understanding of the topic.
1.1.3
Topics covered
1.2. Reading
1.2 Reading
There are many books that would be useful for this subject. We recommend two in
particular, and another for additional, further reading. We want to emphasise, however,
that there are very many books suitable for this course. Almost any text on linear
algebra will cover the material (with the possible exception of the chapter on sequences,
series and difference equations).
1.2.1
Recommended reading
1.2.2
Further reading
Please note that as long as you read the Recommended reading you are then free to
read around the subject area in any text, paper or online resource. You will need to
support your learning by reading as widely as possible. To help you read extensively,
you have free access to the virtual learning environment (VLE) and University of
London Online Library (see below).
Another useful text for this course is the following (though, as mentioned above, there
are many suitable texts):
Textbooks will provide more in-depth explanations than you will find in this guide, and
they will also provide many more examples to study and exercises to work through. The
books listed are the ones we have referred to in this guide.
There are many editions and variants of this book, such as the Applications version. Any one is
equally useful and you will not need more than one of them. You can find the relevant sections cited in
this guide in any edition by using the index.
1. Introduction
You should receive your login details in your study pack. If you have forgotten these
login details, please click on the Forgotten your password link on the login page.
1.3.1
The VLE
The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:
Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
Past examination papers and Examiners commentaries: These provide advice on
how each examination question might best be answered.
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years Study Weekends have been recorded and made available.
Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.
Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.
1.3.2
The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library you will either need to use
your University of London Student Portal login details, or you will be required to
register and use an Athens login:
http://tinyurl.com/ollathens
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine.
If you are having trouble finding an article listed in a reading list, try removing any
punctuation from the title, such as single quotation marks, question marks and colons.
For further advice, please see the online help pages:
www.external.shl.lon.ac.uk/summon/about.php
1.5 Examination
Important: the information and advice given here are based on the examination
structure used at the time this guide was written. Please note that subject guides may
be used for several years. Because of this we strongly advise you to always check both
the current Regulations for relevant information about the examination, and the virtual
learning environment (VLE) where you should be advised of any forthcoming changes.
You should also carefully check the rubric/instructions on the paper you actually sit
and follow those instructions. Remember, it is important to check the VLE for:
up-to-date information on examination and assessment arrangements for this course
where available, past examination papers and Examiners commentaries for the
course which give advice on how each question might best be answered.
A Sample examination paper is given as the final chapter to this guide. There are no
optional topics in this subject: you should do them all. This is reflected in the structure
of the examination paper. There are five questions (each worth 20 marks) and all
questions are compulsory.
Please do not think that the questions in a real examination will necessarily be very
similar to those in the Sample examination paper. An examination is designed (by
definition) to test you. You will get examination questions unlike questions in this
guide. The whole point of examining is to see whether you can apply knowledge in
familiar and unfamiliar settings. The Examiners (nice people though they are) have an
1. Introduction
obligation to surprise you! For this reason, it is important that you try as many
examples as possible, from the guide and from the textbooks. This is not so that you
can cover any possible type of question the Examiners can think of! Its so that you get
used to confronting unfamiliar questions, grappling with them, and finally coming up
with the solution.
Do not panic if you cannot completely solve an examination question. There are many
marks to be awarded for using the correct approach or method.
1.7 Background
This section discusses some very basic aspects of mathematics on which the rest of the
subject builds. It is essential to have a firm knowledge of these topics before the more
advanced topics can be understood.
It is assumed that you have studied most (if not all) of these topics in previous
mathematics courses and that nearly all of the material is revision, but dont worry if a
topic is new to you. We will mention the main results which you will need to know. If
you are unfamiliar with a topic, or if you find any of the topics difficult, then you should
look up that topic in any basic mathematics text. It is essential for you to consolidate
your background in basic algebra before attempting this course.
1.7.1
Sets
1.7. Background
has as its members all of you (and nothing else). When x is an object in a set A, we
write x A and say x belongs to A or x is a member of A.
The set which has no members is called the empty set and is denoted by . The empty
set may seem like a strange concept, but it has its uses.
We say that the set S is a subset of the set T , and we write S T , or S T , if every
member of S is a member of T . For example, {1, 2, 5} {1, 2, 4, 5, 6, 40}. The difference
between the two symbols is that S T literally means that S is a proper subset of T ,
meaning not all of T , and S T means that S is a subset of T and possibly (but not
necessarily) all of T . So in the example just given we could have also written,
{1, 2, 5} {1, 2, 4, 5, 6, 40}.
Given two sets A and B, the union A B is the set whose members belong to A or B
(or both A and B): that is,
A B = {x | x A or x B}.
1.7.2
Numbers
There are some standard notations for important sets of numbers. 4 The set R of real
numbers, may be thought of as the points on a line. Each such number can be described
by a decimal representation.
The set of real numbers R includes the following subsets.
N, the set of natural numbers: N = {1, 2, 3, . . . }, also referred to as the positive
integers.
Z, the set of integers: {. . . , 3, 2, 1, 0, 1, 2, 3, . . .}.
p
2
9 4
with p, q Z, q = 0; for example,
, ,
= 4.
Q, the set of rational numbers:
q
5
2 1
The
set of irrational numbers, that is, real numbers which are not rational; for example,
2, .
These sets are related by: N Z Q R.
Given two real numbers a and b, we define intervals such as,
(a, b) = {x| a < x < b}
3
4
[a, b] = {x| a x b}
1. Introduction
and combinations of these. For example, [a, b) = {x| a x < b}. The numbers a and b
are called the endpoints of the interval. You should notice that when a square bracket,
[ or ], is used to denote an interval, the number beside the bracket is included in the
interval, whereas if a round bracket, ( or ), is used, the adjacent number is not in the
interval. For example, [2, 3] contains the number 2, but (2, 3] does not. We can also
indicate unbounded intervals, such as
(, b) = {x| x < b}
The symbol means infinity, but it is not a real number, merely a notational
convenience.
The absolute value of a real number a is defined by:
|a| =
a
a
if a 0
.
if a 0
So the absolute value of a equals a if a is non-negative (that is, if a 0), and equals a
otherwise. For instance, |6| = 6 and | 2.5| = 2.5. (This is sometimes called the modulus
of a). Roughly speaking, the absolute value of a number is obtained just by ignoring
any minus sign the number has. Note that
a2 = |a|,
since by x we always mean the positive square root to avoid ambiguity. So the
two
solutions of the equation x2 = 4, are x = 2 (meaning x = 2 or x = 2), but 4 = 2.
The absolute value of real numbers satisfies the following inequality,
|a + b| |a| + |b|,
a, b R.
1.7.3
Basic notations
You may consult any of a large number of basic maths texts for further information on basic notations.
10
1.7. Background
of the numbers x1 , x2 , . . . , xn by
n
xi .
i=1
The indicates that numbers are being summed, and the i = 1 and n below and
above the show that it is the numbers xi , as i runs from 1 to n, that are being
summed together. Sometimes we will be interested in adding up only some of the
numbers. For example,
n1
xi
i=2
would denote the sum x2 + x3 + + xn1 , which is the sum of all the numbers except
the first and last.
Example 1.2 Suppose that x1 = 1, x2 = 3, x3 = 1, x4 = 5. Then
4
xi = 1 + 3 + (1) + 5 = 8,
i=1
xi = 3 + (1) + 5 = 7.
i=2
For a positive whole number n, n! (read as n factorial) is the product of all the integers
from 1 up to n. For example, 4! = 1 2 3 4 = 24. By convention 0! is taken to be 1.
Finally, we often use the symbol to denote the end of a proof, where we have finished
explaining why a particular result is true. This is just to make it clear where the proof
ends and the following text begins.
1.7.4
Simple algebra
You should try to become confident and capable in handling simple algebraic
expressions and equations.
You should be proficient in:
collecting up terms: e.g.
2a + 3b a + 5b = a + 8b.
multiplication of variables: e.g.
(a)(b) + (a)(b) 3(a)(b) + (2a)(4b) = ab ab 3ab + 8ab = 3ab.
expansion of bracketed terms: e.g.
(2x 3y)(x + 4y) = 2x2 3xy + 8xy 12y 2 = 2x2 + 5xy 12y 2.
Activity 1.2 Expand (x 1)(x + 1). Then use this to expand (x 1)(x + 1)(x + 2).
11
1. Introduction
1.7.5
Powers
When n is a positive integer, the nth power6 of the number a, denoted an , is simply the
product of n copies of a, that is,
an = a a a a .
n times
The number n is called the power, exponent, or index. We have the power rules (or rules
of exponents):
ar as = ar+s , (ar )s = ars ,
whenever r and s are positive integers.
Activity 1.3
Prove the power rules above using the definition of an for n N.
The power a0 is defined to be 1.
The definition is extended to negative integers as follows. When n is a positive integer,
an means 1/an . For example, 32 is 1/32 = 1/9. The power rules hold when r and s
are any integers, positive, negative or zero.
When n is a positive integer, a1/n is the positive nth root of a; this isthe positive
number x such that xn = a. For example, a1/2 is usually denoted by a, and is the
positive square root of a, so that 41/2 = 2.
m
When m and n are integers and n is positive, am/n is a1/n . This extends the
definition of powers to the rational numbers. The definition is extended to real numbers
by filling in the gaps between the rational numbers, and it can be shown that the rules
of exponents still apply.
Activity 1.4 Simplify the expression:
49x2
4xy 2
.
35y
(2xy)3
Simplify, then solve for a:
1.7.6
a
6ab (b2 4bc) = 1.
b
Quadratic equations
12
1.7. Background
(b) x2 + 2x 8 = 0,
(c) 2x2 7x + 3 = 0.
Although factorisation may be difficult, there is a general method for determining the
solutions to a quadratic equation using the quadratic formula,8 as follows. Suppose we
have the quadratic equation ax2 + bx + c = 0, where a = 0. Then the solutions of this
equation are:
b b2 4ac
b + b2 4ac
x1 =
x2 =
.
2a
2a
The term b2 4ac is called the discriminant.
If b2 4ac > 0, the equation has two real solutions as given above.
If b2 4ac = 0, the equation has exactly one solution, x = b/(2a). (In this case we
say that this is a solution of multiplicity two.)
If b2 4ac < 0, the equation has no real solutions.
For example, consider the equation 2x2 7x + 3 = 0. Using the quadratic formula, we
have
7 49 4(2)(3)
b b2 4ac
75
x=
=
=
2a
2(2)
4
So the solutions are x = 3 and x = 12 .
The equation x2 + 6x + 9 = 0 has one solution of multiplicity two; its discriminant is
b2 4ac = 36 9(4) = 0. This equation is most easily solved by recognising that
x2 + 6x + 9 = (x + 3)2 , so the solution is x = 3.
13
1. Introduction
equal to 0. The quadratic formula for the solutions to a quadratic equation is obtained
using the technique of completing the square.9 Quadratic polynomials which cannot be
written as a product of linear terms (so ones for which the discriminant is negative) are
said to be irreducible.
Activity 1.6
Use the method of completing the square on the equation ax2 + bx + c = 0 to obtain
the quadratic formula.
1.7.7
Polynomial equations
14
1.7. Background
product the coefficient of x2 is obtained from the terms (1)(x2 ) + (x)(x), so that we
must have 1 = 0 or = 1. Then
x3 7x + 6 = (x 1)(x2 + x 6),
and the quadratic term is easily factored into (x 2)(x + 3), that is
x3 7x + 6 = (x 1)(x 2)(x + 3).
Activity 1.7
Given that the polynomial P (x) = x3 + 3x2 + 4x + 4 has an integer root, find it and
hence show that the polynomial can be expressed as a product P (x) = (x r)Q(x)
where Q(x) is an irreducible quadratic polynomial.
1.7.8
s times
Removing the brackets, we have the product of a times itself a total of r + s times; that
is,
ar as = a a a a = ar+s .
r+s times
=
2 = 2
3
2
3
3
2
35y
(2xy)
5x y 8x y
5x y 2x y
xy
7 1
5 2
9
10x2 y
a
(a) 6ab (b2 4bc) = 6ab ab + 4ac = 5ab + 4ac = a(5b + 4c), so the equation
b
becomes a(5b + 4c) = 1, and solving for a:
a=
1
,
5b + 4c
provided 5b + 4c = 0.
Note that it is an important part of the solution to declare that it is only valid if
5b + 4c = 0, otherwise there is no solution (because a 0 = 0 = 1).
15
1. Introduction
1
2
or x = 3.
b2
4ac
2.
2
4a
4a
Taking square roots, simplifying and solving for x, you will obtain the quadratic
formula.
Feedback to activity 1.7
Because all the terms are separated by + signs, the integer root must be a negative
number, so try x = 1. Substitution into the polynomial yields, 1 + 3 4 + 4 = 0, so
1 is not a root. Next try x = 2. This time it works, 8 + 3(4) + 4(2) + 4 = 0, so
x3 + 3x2 + 4x + 4 = (x + 2)(x2 + x + 2).
Comparing the coefficients of either x2 or x terms, you should obtain = 1. The
quadratic polynomial x2 + x + 2 cannot be factored over the real numbers, since its
discriminant is negative. Therefore
P (x) = x3 + 3x2 + 4x + 4 = (x + 2)(x2 + x + 2) = (x + 2)Q(x)
where Q(x) is an irreducible quadratic polynomial.
16
Chapter 2
Matrices and vectors
Suggested reading
The number aij in the ith row and jth column is called the (i, j)-entry. Note that the
first subscript on aij always refers to the row and the second subscript to the column.
17
2 1 7 8
A = 0 2 5 1
4 9 3 0
is a 3 4 matrix whose entries are integers. For this matrix, a23 = 5, since this is the
entry in the second row and third column.
Activity 2.1 In the example above, what is a32 ?
A square matrix is an n n matrix; that is, a matrix with the same number of rows as
columns. The diagonal of a square matrix is the list of entries a11 , a22 , . . . , ann .
A diagonal matrix is a square matrix with all the entries which are not on the diagonal
equal to zero. So A is diagonal if it is n n and aij = 0 if i = j.
a11
0
.
..
0
Activity 2.2 Which
3
0
0
0
a22
..
.
0
0
0
.
..
..
.
.
ann
0 0
0 0 0
0 1 0 ,
2 1 ,
0 1
0 0 2
2 0 0
0 1 0
Two matrices are equal if they are the same size and if corresponding entries are equal.
That is, if A = (aij ) and B = (bij ) are both m n matrices, then
A = B aij = bij
1 i m, 1 j n
18
1 i m, 1 j n
We can also multiply any matrix by a real number, called a scalar in this context. If
is a scalar and A is a matrix, then A is the matrix whose entries are times each of
the entries of A.
Definition 2.3 (Scalar multiplication) If A = (aij ) is an m n matrix and R,
then
A = (aij )
1 i m, 1 j n
Example 2.2
A+B =
3 1 2
0 5 2
2A = 2
1 1 4
2 3 1
3 1 2
0 5 2
2 2 6
2 2 1
6 2 4
0 10 4
b1j
b2j
row i of A
..
ai1 ai2 ain
bnj
column j of B
What size is C = AB? The matrix C must be m p since it will have one entry for
each of the m rows of A and each of the p columns of B.
Example 2.3 In the following product the element in row 2 and column 1 of the
product matrix (indicated in bold type) is found, as described above, by using the
19
1 1 1
3
4
3 0
2 0 1
= 5
AB =
1
1
1 14
1 2 4
1 3
2 2 1
9 1
This entry is 5 because
2 1 3
1 2 1
and
3 1
B = 1 0.
1 1
Both products AB and BA are defined, but they are different sizes, so they cannot be
equal. What sizes are they?
Activity 2.3 Answer the question just posed concerning the sizes of AB and BA.
Multiply the matrices to find the two product matrices, AB and BA.
Even if both products are defined and the same size, it is still generally true that
AB = BA.
Activity 2.4 Try this for any two 2 2 matrices. Write down two different matrices
A and B and find the products AB and BA. For example, you could use
A=
20
1 2
3 4
B=
1 1
0 1
Matrices are useful because they provide a compact notation and we can do algebra
with them.
For example, given a matrix equation such as
3A + 2B = 2(B A + C),
we can solve this for the matrix C using the rules of algebra. You must always bear in
mind that to perform the operations, they must be defined. In this equation it is
understood that all the matrices A, B and C are the same size, say m n.
We list the rules of algebra satisfied by the operations of addition, scalar multiplication
and matrix multiplication. The sizes of the matrices are dictated by the operations
being defined.
A + B = B + A. Matrix addition is commutative.
This is easily shown to be true. We will carry out the proof as an example. The matrices
A and B must be of the same size, say m n for the operation to be defined, so both
A + B and B + A are also m n matrices. They also have the same entries. The (i, j)
entry of A + B is aij + bij and the (i, j) entry of B + A is bij + aij , but aij + bij = bij + aij
by the properties of real numbers. So the matrices A + B and B + A are equal.
On the other hand, as we have seen, matrix multiplication is not commutative:
AB = BA in general.
We have the following associative laws:
(A + B) + C = A + (B + C)
(AB) = (A)B = A(B)
(AB)C = A(BC)
These rules allow us to remove brackets. For example the last rule says that we will get
the same result if we first multiply AB and then multiply by C on the right, as we will
if we first multiply BC and then multiply by A on the left, so the choice is ours.
All these rules follow from the definitions of the operations in the same way as we
showed the commutativity of addition. We need to know that the matrices on the left
and on the right of the equals sign have the same size and that corresponding entries
are equal. Only the associativity of multiplication presents any complications; it is
tedious, but it can be done.
Activity 2.5 Think about these rules. What sizes are each of the matrices?
Write down the (i, j) entry for each of the matrices (AB) and (A)(B) and prove
that the matrices are equal.
21
(B + C)A = BA + CA
(A + B) = A + B.
Why do we need both of the first two rules (which state that matrix multiplication
distributes through addition)? Since matrix multiplication is not commutative, we
cannot conclude the second distributive rule from the first; we have to prove it is true
separately. All these statements can be proved from the definitions using the same
technique as used earlier, but we will not take the time to do this here.
A A? We obtain an m n matrix all of whose entries are 0. This is an additive
identity: that is, it plays the same role for matrices as the number 0 does for numbers,
in the sense that A + 0 = 0 + A = A. There is a zero matrix of any size m n.
Definition 2.5 (Zero matrix) A zero matrix, denoted 0, is an m n matrix with all
entries zero,
0 0 0 0
0 0 0 0
. . .
. . ... ... .
.. ..
0 0 0 0
Then
A+0=A
AA=0
0A = 0 , A0 = 0,
where the sizes of the zero matrices above must be compatible with the size of the
matrix A.
We also have a multiplicative identity, which acts like the number 1 does for
multiplication of numbers.
Definition 2.6 (Identity matrix) The n n
the diagonal matrix with aii = 1,
1 0
0 1
I=
... ...
0 0
0
0
. .
..
. ..
1
22
Example 2.4 We can apply these rules to solve for C the equation,
3A + 2B = 2(B A + C).
We will pedantically apply each rule so that you can see how it is being used. In
practice, you dont need to put in all these steps, just implicitly use the rules of
algebra. We begin by removing the brackets using the distributive rule.
3A + 2B = 2B 2A + 2C
(distributive rule)
3A + 2B 2B = 2B 2A + 2C 2B
3A + 0 = 2A + 2C + 0
3A = 2A + 2C
(additive identity)
3A + 2A = 2A + 2C + 2A
5A = 2C
C =
5
A
2
0 0
1 1
B=
1 1
3 5
C=
8 0
4 4
0 0
4 4
23
1 2
3 4
2
3
2
1
12
we have AB = BA = I, so B = A1 .
Activity 2.8 Check this. Multiply the matrices to show that AB = I and BA = I,
where I is the 2 2 identity matrix.
You might have noticed that we have said that B is the inverse of A. This is because an
invertible matrix has only one inverse. We will prove this.
Theorem 2.1 If A is an n n invertible matrix, then the matrix A1 is unique.
Proof
Assume the matrix A has two inverses, B and C, so that AB = BA = I and
AC = CA = I. We will show that B and C must actually be the same matrix, that is,
they are equal. Consider CAB. Since matrix multiplication is associative and AB = I,
we have
CAB = C(AB) = CI = C.
On the other hand, again by associativity,
CAB = (CA)B = IB = B,
since CA = I. We conclude that C = B, so there is only one inverse matrix of A.
Not all square matrices will have an inverse. We say that A is invertible or non-singular
if it has an inverse. We say that A is non-invertible or singular if it has no inverse.
24
0 0
1 1
possible for a matrix to satisfy
For example, the matrix
0 0
1 1
a b
c d
1 0
0 1
a b
c d
ad bc = 0,
where
1
ad bc
d b
c a
Activity 2.9 Check that this is indeed the inverse of A, by showing that if you
multiply A on the left or on the right by this matrix, then you obtain the identity
matrix I.
This tells us how to find the inverse of any 2 2 invertible matrix. If
A=
a b
c d
the scalar ad bc is called the determinant of the matrix A, denoted |A|. We shall see
more about the determinant in Chapter 4. So if |A| = ad bc = 0, then to construct
A1 we take the matrix A, switch the main diagonal entries and put minus signs in
front of the other two entries, then multiply by the scalar 1/|A|.
Activity 2.10
1 2
3 4
IB = IC
B = C.
25
2.5.1
1 1
A .
This statement says that the matrix A is invertible, and its inverse is given by the
matrix C = (1/)A1 . To prove this is true, we just need to show that the matrix C
satisfies (A)C = C(A) = I. This is straightforward using matrix algebra:
(A)
1 1
A
1
= AA1 = I
and
1 1
1
A
(A) = A1 A = I.
If A and B are invertible n n matrices, then using the definition of the inverse, you
can show that
(AB)1 = B 1 A1 .
This last statement says that if A and B are invertible matrices of the same size, then
the product AB is invertible and its inverse is the product of the inverses in the reverse
order. The proof of this statement is left as an exercise at the end of this chapter.
2.6.1
Properties of powers
26
2.7. Transpose
(Ar )s = Ars .
As r and s are positive integers and matrix multiplication is associative, these
properties are easily verified in the same way as they are with real numbers.
Activity 2.11
2.7 Transpose
Definition 2.8 (Transpose) The transpose
a11
a21
A = (aij ) =
...
am1
is the n m matrix
of an m n matrix
a12 . . . a1n
a22 . . . a2n
..
..
..
.
.
.
am2 . . . amn
1 2
3 4
and B = ( 1 5 3 ) ,
then
AT =
1 3
2 4
1
BT = 5 .
3
Notice that the diagonal entries of a square matrix do not move under the operation of
taking the transpose, as aii remains aii . So if D is a diagonal matrix, then D T = D.
Properties of transpose
If we take the transpose of a matrix A by switching the rows and columns, and then do
it again, we get back to the original matrix A. This is summarised in the following
equation:
(AT )T = A.
27
(A + B)T = AT + B T .
These follow immediately from the definition. In particular, the (i, j) entry of (A)T is
aji which is also the (i, j) entry of AT .
The next property tells you what happens when you take the transpose of a product of
matrices:
(AB)T = B T AT
This can be stated as: The transpose of the product of two matrices is the product of the
transposes in the reverse order.
Showing that this is true is slightly more complicated since it involves matrix
multiplication. It is more important to understand why the product of the transposes
must be in the reverse order.
Activity 2.12 If A is an m n matrix and B is n p, look at the sizes of the
matrices AT , B T , (AB)T and show that only the product B T AT is always defined.
Show also that its size is equal to the size of (AB)T .
If A is an m n matrix and B is n p, from the above activity you know that (AB)T
and B T AT are the same size. To prove that (AB)T = B T AT you need to show that the
(i, j)-entries are equal. You can try this as follows.
Activity 2.13 The (i, j) entry of (AB)T is the (j, i) entry of AB, which is obtained
by taking row j of A and multiplying each term by the corresponding entry of
column i of B. We can write this as
(AB)T
ij
Do the same for the (i, j) entry of B T AT and show that you obtain the same
number.
The final property in this section states that the inverse of the transpose of an invertible
matrix is the transpose of the inverse; that is, if A is invertible, then
(AT )1 = (A1 )T .
This follows from the previous property and the definition of inverse. We have
AT (A1 )T = (A1 A)T = I T = I, and in the same way (A1 )T AT = I. Therefore, by
the definition of the inverse of a matrix, (A1 )T must be the inverse of AT .
2.7.1
Symmetric matrices
28
2.8. Vectors in Rn
Only square matrices can be symmetric. If A is symmetric, then aij = aji . That is,
entries diagonally opposite to each other must be equal: the matrix is symmetric about
its diagonal.
Activity 2.14
1 4
1
=
A=
2
7 = AT
5
3
2.8 Vectors in Rn
An n 1 matrix is a column vector, or simply a vector
v1
v2
v=
vi R
...
vn
We can also define a row vector to be a 1 n matrix. However, in this text, by the term
vector we shall always mean a column vector.
The numbers v1 , v2 , . . . , vn , are known as the components (or entries) of the vector, v.
In order to distinguish vectors from scalars, and to emphasise that they are vectors and
not general matrices, in this text vectors are written in lowercase boldface type. (When
writing by hand, vectors should be underlined to avoid confusion with scalars.)
Addition and scalar multiplication are defined for vectors as for n 1 matrices:
v1 + w1
v1
v2 + w2
v2
v+w=
v =
..
...
.
vn + wn
vn
For a fixed positive integer n, the set of vectors together with the operations of addition
and scalar multiplication form Rn , usually called Euclidean n-space.
We will often write a column vector in the text as the transpose of a row vector.
Although
x1
x2
T
x=
... = ( x1 x2 xn ) ,
xn
29
v = 1 v1 + + k vk
Rn
1
4
a = 2 , and b = 2 .
3
1
The 1 1 matrix vT w can be identified with the real number, or scalar, which is its
unique entry. This turns out to be particularly useful, and is known as the inner product
or scalar product or dot product of v and w.
2.8.1
v1
w1
v2
, w = w.2 ,
v=
.
..
..
vn
wn
v1
w1
v2 w2
v, w =
... , ... = v1 w1 + v2 w2 + + vn wn
vn
wn
The inner product, v, w is also known as the scalar product of v and w, or as the dot
product. In the latter case it is denoted by v w.
The inner product of v and w is precisely the scalar quantity given by
w1
w2
vT w = ( v1 v2 vn )
... = v1 w1 + v2 w2 + + vn wn ,
wn
v, w
30
= vT w.
2.8. Vectors in Rn
It is important to realise that the inner product is just a number, a scalar, not another
vector or a matrix.
The inner product on Rn satisfies certain basic properties as shown in the next theorem.
Theorem 2.2 The inner product
x, y = x1 y1 + x2 y2 + + xn yn ,
x, y Rn
x, y = y, x
(ii) x, y = x, y = x, y
(iii)
x + y, z = x, z + y, z
(iv)
Proof
We have
x, y = x1 y1 + x2 y2 + + xn yn = y1 x1 + y2 x2 + + yn xn = y, x
which proves (i). We leave the proofs of (ii) and (iii) as an exercise. For (iv), note that
x, x = x21 + x22 + + x2n
From the definitions, it is clear that it is not possible to combine vectors in different
Euclidean spaces, either by addition or by taking the inner product. If v Rn and
w Rm , with m = n, then these vectors live in different worlds, or more precisely, in
different vector spaces.
Vectors have a broader use beyond that of being special types of matrices. It is likely
that you have some previous knowledge of vectors; for example, in describing the
displacement of an object from one point to another in R2 or in R3 . Before we continue
our study of linear algebra it is important to consolidate this background, for it
provides valuable geometric insight into the definitions and uses of vectors in higher
dimensions. Parts of the next section may be review for many of you.
31
Vectors in R2
The set R can be represented as points along a horizontal line, called a real-number line.
In order to represent pairs of real numbers, (a1 , a2 ), we use a Cartesian plane, a plane
with both a horizontal axis and a vertical axis, each axis being a copy of the
real-number line, and we mark A = (a1 , a2 ) as a point in this plane. We associate this
point with the vector a = (a1 , a2 )T , as representing a displacement from the origin (the
point (0, 0)) to the point A. In this context, a is the position vector of the point A. This
displacement is illustrated by an arrow, or directed line segment, with initial point at
the origin and terminal point at A.
y
a2
r
(a1 , a2 )
(0, 0)
a1
position vector, a
Even if a displacement does not begin at the origin, two displacements of the same
length and the same direction are considered to be equal. So, for example, the two
arrows below represent the same vector, v = (1, 2)T .
(0, 0)
displacement vectors, v
If an object is displaced from a point, say O, the origin, to a point P by the
displacement p, and then displaced from P to Q, by the displacement v, then the total
displacement is given by the vector from O to Q, which is the position vector q. So we
would expect vectors to satisfy q = p + v, both geometrically (in the sense of a
displacement) and algebraically (by the definition of vector addition). This is certainly
true in general, as illustrated below.
32
2
q2
p2
p1
(0, 0)
q1
p
p
(0, 0)
p + v = v + p.
From q = p + v, we have v = q p. This is the displacement from P to Q. To help you
determine in which direction the vector v points, think of v = q p as the vector which
is added to the vector p in order to obtain the vector q.
If v represents a displacement, then 2v must represent a displacement in the same
direction, but twice as far, and v represents an equal displacement in the opposite
direction. This interpretation is compatible with the definition of scalar multiplication.
Activity 2.17 Sketch the vector v = (1, 2)T in a coordinate system. Then sketch 2v
and v. Looking at the coordinates on your sketch, what are the components of 2v
and v?
We have stated that a vector has both a length and a direction. Given a vector
a = (a1 , a2 )T , its length, denoted by a , can be calculated using Pythagoras theorem
applied to the right triangle shown below:
33
(a1 , a2 )
a2
a1
(0, 0)
a =
a =
1 , 2
5
5
12 + 22 =
5.
The direction of a vector is essentially given by the components of the vector. If we have
two vectors a and b which are (non-zero) scalar multiples, say
a = b,
R,
( = 0),
then a and b are parallel. If > 0 then a and b have the same direction. If < 0 then
we say that a and b have opposite directions.
The zero vector, 0, has length 0 and has no direction. For any other vector, v = 0, there
is one unit vector in the same direction as v, namely
u=
1
v.
v
Activity 2.19 Write down a unit vector, u, which is parallel to the vector
4
a=
. Then write down a vector, w, of length 2 which is in the opposite
3
direction to a.
2.9.2
Inner product
The inner product in R2 is closely linked with the geometric concepts of length and
angle. If a = (a1 , a2 )T , we have already seen that
a
34
= a, a = a21 + a22 .
Let a, b be two vectors in R2 , and let denote the angle between them.1 By this we
shall always mean the angle, , such that 0 . If < , the vectors a, b, and
c = b a form a triangle, where c is the side opposite the angle , as, for example, in
the figure below.
c = b a
a
The law of cosines (which you may or may not know dont worry if you dont)
applied to this triangle gives us the important relationship stated in the following
theorem.
Theorem 2.3 Let a, b R2 and let denote the angle between them. Then
a, b = a
b cos .
Proof
The law of cosines states that c2 = a2 + b2 2ab cos where c = b a , a = a ,
b = b . That is,
b a 2 = a 2 + b 2 2 a b cos
(1)
Expanding the inner product and using its properties, we have
ba
= b a, b a = b, b + a, a 2 a, b
That is,
ba
= a
+ b
2 a, b
(2)
b cos .
a, b
.
a b
Angles are always measured in radians, not degrees, here. So, for example 45 degrees is /4 radians.
See the texts, or the subject guide for MT1174 Calculus
35
Example 2.10
Let
v=
1
2
and w =
3
1
so that =
.
4
Since
a, b = a
b cos ,
and 1 cos 1 for any real number , the maximum value of the inner product is
a, b = a b . This occurs precisely when cos = 1, that is, when = 0. In this case
the vectors a and b are parallel and in the same direction. If they point in opposite
directions, then = and we have a, b = a b . The inner product will be
positive if and only if the angle between the vectors is acute, meaning that 0 < 2 . It
will be negative if the angle is obtuse, meaning that 2 < .
The non-zero vectors a and b are orthogonal (or perpendicular or, sometimes, normal)
when the angle between them is = 2 . Since cos( 2 ) = 0, this is precisely when their
inner product is zero. We restate this important fact:
The vectors a and b are orthogonal if and only if a, b = 0.
2.9.3
Vectors in R3
Everything we have said so far about the inner product and its geometric interpretation
in R2 extends to R3 .
a1
If
a = a2 ,
then
a = a21 + a22 + a23 .
a3
Activity 2.20 Show this. Sketch a position vector a = (a1 , a2 , a3 )T in R3 . Drop a
perpendicular to the xy-plane as in the figure below, and apply Pythagoras theorem
twice to obtain the result.
z
(a1 , a2 , a3 )
PP
PP
PP
(a1 , a2 , 0)
36
2.10. Lines
The vectors a, b and c = b a in R3 lie in a plane and the law of cosines can still be
applied to establish the result that
a, b = a
b cos .
Activity 2.21 Calculate the angles of the triangle with sides a, b, c and show it is
an isosceles right triangle, where
1
1
a = 2
b= 1
c =ba
2
4
2.10 Lines
2.10.1
Lines in R2
(0, 0)
t
2t
=t
1
2
= tv ,
t R.
For example, if t = 2, we get the position vector of the point (2, 4) on the line, and if
t = 1 we obtain the point (1, 2). As the parameter t runs through all real
numbers, this vector equation gives the position vectors of all the points on the line.
Starting with the vector equation,
x=
x
y
= tv = t
1
2
tR
37
we can retrieve the Cartesian equation using the fact that the two vectors are equal
if and only if their components are equal. This gives us the two equations x = t and
y = 2t. Eliminating the parameter t between these two equations yields y = 2x.
The line in the above example is a line through the origin. What about a line which
does not contain (0, 0)?
Example 2.12 Consider the line y = 2x + 1. Proceeding as above, we set x = t,
t R. Then y = 2x + 1 = 2t + 1, so the position vector of a point on this line is
given by
x=
t
2t + 1
0
1
t
2t
0
1
1
2
+t
t R.
(0, 0)
38
2.10. Lines
Activity 2.23
As before, we can retrieve the Cartesian equation of the line by equating components of
the vector and eliminating the parameter.
Do this for each of the vector equations given above for the line
Activity 2.24
y = 2x + 1.
In general, any line in R2 is given by a vector equation with one parameter of the form
x = p + tv
where x is the position vector of a point on the line, p is any particular point on the
line and v is the direction of the line.
Activity 2.25 Write down a vector equation of the line through the points
P = (1, 1) and Q = (3, 2). What is the direction of this line? Find a value for c
such that the point (7, c) is on the line.
In R2 , two lines are either parallel or intersect in a unique point.
Example 2.13
1 :
x
y
1
3
1
2
+t
x
y
2 :
5
6
+t
2
1
, tR
are not parallel, since their direction vectors are not scalar multiples of one another.
Therefore they intersect in a unique point. We can find this point either by finding
the Cartesian equation of each line and solving the equations simultaneously, or
using the vector equations. We will do the latter. We are looking for a point (x, y) on
both lines, so its position vector will satisfy
x
y
1
3
1
2
+t
5
6
+s
2
1
for some t R and for some s R. We need to use different symbols (s and t) in the
equations because they are unlikely to be the same number for each line. We are
looking for values of s and t which will give us the same point. Equating components
of the position vectors of points on the lines, we have
1 + t = 5 2s
3 + 2t = 6 + s
2s + t = 4
2s + t = 4
.
s + 2t = 3
2s + 4t = 6
Adding these last two equations, we obtain t = 2, and therefore s = 1. Therefore the
point of intersection is (3, 7):
1
3
+2
1
2
3
7
5
6
+1
2
1
2
1
=0
39
2.10.2
Lines in R3
How can you describe a line in R3 ? Think about this. How do you describe the set of
points (x, y, z) which are on a given line?
Because there are three variables involved, the natural way is to use a vector equation.
To describe a line you locate one point on the line by its position vector, and then
travel along from that point in a given direction, or in the opposite direction.
(0, 0, 0)
A line in R3
Therefore, a line in R3 is given by a vector equation with one parameter,
x = p + tv,
where x is the position vector of any point on the line, p is the position vector of one
particular point on the line and v is the direction of the line,
x
p1
v1
x = y = p2 + t v2 , t R,
(2.10.2)
z
p3
v3
The equation, x = tv represents a parallel line through the origin.
Example 2.14 The equations
1
1
3
3
x = 3 + t 2 and x = 7 + s 6 ,
0
1
2
3
s, t R
describe the same line. This is not obvious, so how do we show it?
The lines represented by these equations are parallel since their direction vectors are
parallel
3
1
6 = 3 2 ,
3
1
so they either have no points in common and are parallel, or they have all points in
common, and are really the same line. Since
3
1
1
7 = 3 + 2 2 ,
2
0
1
40
2.10. Lines
the point (3, 7, 2) is on both lines, so they must have all points in common. We say
that the lines are collinear.
On the other hand, the lines represented by the equations
1
1
3
3
x = 3 + t 2 and x = 7 + t 6 ,
0
1
1
3
2
tR
are parallel, with no points in common, since there is no value of t for which
3
1
1
7 = 3 + t 2 .
1
0
1
Activity 2.26
y = p2 + tv2 ,
z = p3 + tv3 .
Solving each of these equations for the parameter t and equating the results, we have
the two equations
x p1
y p2
z p3
=
=
, provided vi = 0, i = 1, 2, 3.
v1
v2
v3
Example 2.15
1
1
x = 2 + t 0 , t R,
3
5
we equate components,
x = 1 t,
y = 2,
z = 3 + 5t,
and then solve for t in the first and third equation. The Cartesian equations are
1x=
z3
5
and
y = 2.
41
This is a line parallel to the xz-plane in R3 . The direction vector has a 0 in the
second component, so there is no change in the y direction, the y coordinate has the
constant value y = 2.
In R2 , two lines are either parallel or intersect in a unique point. In R3 more can
happen. Two lines in R3 either intersect in a unique point, are parallel, or are skew,
which means that they lie in parallel planes and are not parallel.
Try to imagine what skew lines look like. If you are in a room with a ceiling parallel to
the floor, imagine a line drawn in the ceiling. It is possible for you to draw a parallel
line in the floor, but instead it is easier to draw a line in the floor which is not parallel
to the one in the ceiling. These lines will be skew. They lie in parallel planes (the ceiling
and the floor). If you could move the skew line in the floor onto the ceiling, then the
lines would intersect in a unique point.
Two lines are said to be coplanar if they lie in the same plane, in which case they are
either parallel or intersecting.
Example 2.16 Are the lines L1 and L2 intersecting, parallel or skew?
x
1
1
x
5
2
L1 : y = 3 + t 2 , L2 : y = 6 + t 1 ,
z
4
1
z
1
7
Activity 2.28
tR
x
1
1
5
2
y = 3 + t 2 = 6 + s 1 .
z
4
1
1
7
1 + t = 5 2s
2s + t = 4
3 + 2t = 6 + s
s + 2t = 3 .
4 t = 1 + 7s
7s + t = 3
We have already seen in Example 2.13 on page 39, that the first two equations have
the unique solution, s = 1, t = 2.Substituting these values into the third equation,
7s + t = 7(1) + 2 = 3,
we see that the system has no solution. Therefore the lines do not intersect and must
be skew.
42
2.11. Planes in R3
Example 2.17 On the other hand, if we take a new line L3 , which is parallel to L2
but which passes through the point (5, 6, 5), then the lines
x
1
1
x
5
2
L1 : y = 3 + t 2 , L3 : y = 6 + t 1 , t R
z
4
1
z
5
7
do intersect in the unique point (3, 7, 2).
Activity 2.29 Check this. Find the point of intersection of the two lines L1 and L3 .
2.11 Planes in R3
It is important to look at some sketches of lines and planes in R3 in the texts as you read
this and the previous section. If these are not handy, think about the floor in your room
(which is a plane in 3-space). In how many directions can you move along the floor?
On a line, there is essentially one direction in which a point can move given as all
possible scalar multiples of a given direction, but on a plane there are more possibilities.
A point can move in two different directions, and in any linear combination of these two
directions. So how do we describe a plane in R3 ?
The vector parametric equation
x = p + sv + tw,
s, t, R
describes the position vectors of points on a plane in R3 provided that the vectors v and
w are non-zero and are not parallel. The vector p is the position vector of any
particular point on the plane and the vectors v and w are displacement vectors which
lie in the plane. By taking all possible linear combinations x = p + sv + tw, for s, t R,
we obtain all the points on the plane.
The equation
x = sv + tw,
s, t, R
describes a plane through the origin. In this case the position vector, x, of any point on
the plane lies in the plane.
Activity 2.30 If v and w are parallel, what does the equation
x = p + sv + tw, s, t R, actually represent?
Example 2.18 You have shown that the lines L1 and L3 given in Example 2.17
intersect in the point (3, 7, 2). (See Activity 2.29 on page 43.) Two intersecting lines
determine a plane. A vector equation of the plane containing the two lines is given by
2
x
3
1
y = 7 + s 2 + t 1 , s, t R.
z
2
1
7
43
Why? We know that (3, 7, 2) is a point on the plane, and the directions of each of
the lines must lie in the plane. As s and t run through all real numbers, this
equation gives the position vector of all points on the plane. Since the point (3, 7, 2)
is on both lines, if t = 0 we have the equation of L1 , and if s = 0 we get L3 .
Any point which is on the plane can take the place of the vector (3, 7, 2)T , and any
non-parallel vectors which are linear combinations of v and w can replace these in
the equation. So, for example
x
1
1
3
y = 3 + t 2 + s 1 , s, t R
z
4
1
8
is also an equation of this plane.
Activity 2.31 Verify this. Show that (1, 3, 4) is a point on the plane given by each
equation, and show that (3, 1, 8)T is a linear combination of (1, 2, 1)T and
(2, 1, 7)T .
There is another way to describe a plane in R3 geometrically which is often easier to
use. We begin with planes through the origin. Let n be a given vector in R3 and
consider all position vectors x which are orthogonal to n. Geometrically, the set of all
such vectors describes a plane through the origin in R3 .
Again you should look at sketches in the texts, or try to imagine this by placing a
pencil perpendicular to a table top. The pencil represents a normal vector, the table top
a plane, and the point where the pencil is touching the table is the origin of your
coordinate system. Then any vector which you can draw on the table top is orthogonal
to the pencil, and conversely any point on the table top can be reached by a directed
line segment (from the point of the pencil) which is orthogonal to the pencil.
A vector, x, is orthogonal to n if and only if
n, x = 0,
so this equation gives the position vectors, x, of points on the plane. If n = (a, b, c)T
and x = (x, y, z)T , then this equation can be written as
a
x
n, x =
b , y =0
c
z
or
ax + by + cz = 0.
This is a Cartesian equation of a plane through the origin in R3 . The vector n is called
a normal vector to the plane. Any vector which is parallel to n will also be a normal
vector and will lead to the same Cartesian equation.
On the other hand, given a Cartesian equation,
ax + by + cz = 0
then this equation represents a plane through the origin in R3 with normal vector
n = (a, b, c)T .
44
2.11. Planes in R3
To describe a plane which does not go through the origin, we choose a normal vector n
and one point P on the plane with position vector p. We then consider all displacement
vectors which lie in the plane with initial point at P . If x is the position vector of any
point on the plane, then the displacement vector x p lies in the plane, and x p is
orthogonal to n. Conversely, if the position vector x of a point satisfies x p, n = 0,
then the vector x p lies in the plane, so the point (with position vector x) is on the
plane.
(Again, think about the pencil perpendicular to the table top, only this time the point
where the pencil is touching the table is a point, P , on the plane, and the origin of your
coordinate system is somewhere else, say, in the corner on the floor.)
The orthogonality condition means that the position vector of any point on the plane is
given by the equation
n, x p = 0.
Using properties of the inner product, we can rewrite this as
n, x = n, p
where n, p = d is a constant.
If n = (a, b, c)T and x = (x, y, z)T , then
ax + by + cz = d
is a Cartesian equation of a plane in R3 . The plane goes through the origin if and only if
d = 0.
Example 2.19
The equation
2x 3y 5z = 2
represents a plane which does not go through the origin, since (x, y, z) = (0, 0, 0)
does not satisfy the equation. To find a point on the plane we can choose any two of
the coordinates, say y = 0 and z = 0, and then the equation tells us that x = 1. So
the point (1, 0, 0) is on this plane. The components of a normal to the plane can be
read from this equation as the coefficients of x, y, z: n = (2, 3, 5)T .
How does the Cartesian equation of a plane relate to the vector parametric equation of
a plane? A Cartesian equation can be obtained from the vector equation algebraically,
by eliminating the parameters in the vector equation, and vice versa, as the following
example shows.
Example 2.20
2
x
1
y = s 2 + t 1 = sv + tw,
z
1
7
s, t R,
which is a plane through the origin parallel to the plane in Example 2.18 on page 43.
The direction vectors v = (1, 2, 1)T and w = (2, 1, 7) lie in the plane.
45
and eliminate the parameters s and t. We begin by solving the first equation for s,
and then substitute this into the second equation to solve for t in terms of x and y,
s = x + 2t y = 2(x + 2t) + t = 2x + 5t 5t = y 2x t =
y 2x
.
5
Then substitute back into the first equation to obtain s in terms of x and y,
s=x+2
y 2x
5
5s = 5x + 2y 4x s =
x + 2y
.
5
Finally, we substitute for s and t in the third equation, z = s + 7t, and simplify to
obtain a Cartesian equation of the plane,
3x y + z = 0.
Activity 2.32
plane.
Carry out this last step to obtain the Cartesian equation of the
where
3
n = 1 ,
1
x = y .
z
The vector n is a normal vector to the plane. We can check that n is, indeed,
orthogonal to the plane by taking the inner product with the vectors v and w, which lie
in the plane.
Activity 2.33 Do this. Calculate n, v and n, w , and verify that both inner
products are equal to zero.
Since n is orthogonal to both v and w, it is orthogonal to all linear combinations of
these vectors, and hence to any vector in the plane. So this plane can equally be
described as the set of all position vectors which are orthogonal to n.
Activity 2.34 Using the properties of inner product, show that this last statement
is true. That is, if n, v = 0 and n, w = 0, then n, sv + tw = 0, for any
s, t R.
Can we do the same for a plane which does not pass through the origin? Consider the
following example.
46
2.11. Planes in R3
Example 2.21 The plane we just considered in Example 2.20 is parallel to the
plane with vector equation
x
3
1
2
y = 7 + s 2 + t 1 = p + sv + tw, s, t R,
z
2
1
7
which passes through the point (3, 7, 2). Since the planes are parallel, they will have
the same normal vectors. So the Cartesian equation of this plane is of the form
3x y + z = d.
Since (3, 7, 2) is a point on the plane, it must satisfy the equation for the plane.
Substituting into the equation we find d = 3(3) (7) + (2) = 4 (which is equivalent
to finding d by using d = n, p ). So the Cartesian equation we obtain is
3x y + z = 4.
Conversely, starting with a Cartesian equation of a plane, we can obtain a vector
equation. We are looking for the position vector of a point on the plane whose
components satisfy 3x y + z = 4, or equivalently, z = 4 3x + y. (We can solve for
any one of the variables x, y or z, but we chose z for simplicity.) So we are looking
for all vectors x such that
x
x
0
1
0
y =
y
= 0 +x
0
+ y 1
z
4 3x + y
4
3
1
for any x, y R. Therefore
x
0
1
0
y = 0 + s 0 + t1,
z
4
3
1
s, t R
is a vector equation of the same plane as that given by the original vector equation,
x
3
1
2
y = 7 + s 2 + t 1 , s, t R,
z
2
1
7
although it is difficult to spot this at a glance.
There are many ways to show that these two vector equations do represent the same
plane, but we can use what we know about planes to find the easiest. The planes
represented by the two vector equations have the same normal vector n, since the
vectors (1, 0, 3)T and (0, 1, 1)T are also orthogonal to n. So we know that the two
vector equations represent parallel planes. They are the same plane if they have a
point in common. It is far easier to find values of s and t for which p = (3, 7, 2)T
satisfies the new vector equation,
3
0
1
0
7 = 0 + s 0 + t 1 , s, t R
2
4
3
1
than the other way around (which is by showing that (0, 0, 4) satisfies the original
equation) because of the positions of the 0s and 1s in the direction vectors.
47
Activity 2.35
t which work.
Using the examples we have just done, you should now be able to tackle the following
question.
Activity 2.36 The two lines, L1 and L2 ,
x
1
1
x
5
2
L1 : y = 3 + t 2 , L2 : y = 6 + t 1 ,
z
4
1
z
1
7
tR
in Example 2.16 on page 42 are skew, and therefore are contained in parallel planes.
Find vector equations and Cartesian equations for these two planes.
Two planes in R3 are either parallel or intersect in a line. Considering such questions, it
is usually easier to use the Cartesian equations of the planes. If the planes are parallel,
then this will be obvious from looking at their normal vectors. If they are not parallel,
then the line of intersection can be found by solving the two Cartesian equations
simultaneously.
Example 2.22
The planes
x + 2y 3z = 0 and
2x 4y + 6z = 4
are parallel, since their normal vectors are related by (2, 4, 6)T = 2(1, 2, 3)T .
The equations do not represent the same plane, since they have no points in
common; that is, there are no values of x, y, z which can satisfy both equations. The
first plane goes through the origin and the second plane does not.
On the other hand, the planes
x + 2y 3z = 0 and x 2y + 5z = 4
intersect in a line. The points of intersection are the points (x, y, z) which satisfy
both equations, so we solve the equations simultaneously. We begin by eliminating
the variable x from the second equation, by subtracting the first equation from the
second. This will naturally lead us to a vector equation of the line of intersection;
x + 2y 3z = 0
x 2y + 5z = 4
x + 2y 3z = 0
4y + 8z = 4 .
This last equations tells us that if z = t is any real number, then y = 1 + 2t.
Substituting these expressions into the first equation, we find x = 2 t. Then a
vector equation of the line of intersection is
x
2t
2
1
y = 1 + 2t = 1 + t 2 .
z
t
0
1
This can be verified by showing that the point (2, 1, 0) satisfies both Cartesian
equations and that the vector v = (1, 2, 1)T is orthogonal to the normal vectors of
each of the planes (and therefore lies in both planes).
48
Activity 2.37 Carry out the calculations in the above example and verify that the
line is in both planes.
or
= x, x .
2.12.1
Hyperplanes
The set of all points (x1 , x2 , . . . , xn ) which satisfy one Cartesian equation,
a1 x1 + a2 x2 + + an xn = d,
is called a hyperplane in Rn .
In R2 , a hyperplane is a line; in R3 it is a plane. For n > 3, we use the term hyperplane.
The vector
a1
a2
a=
...
an
is a normal vector to the hyperplane. Writing the Cartesian equation in vector form, a
hyperplane is the set of all vectors, x Rn such that
n, x p = 0,
where the normal vector n and the position vector p of a point on the hyperplane are
given.
Activity 2.38
Rn ?
49
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
explain what is meant by a matrix
use matrix addition, scalar multiplication and matrix multiplication appropriately
(know when and how these operations are defined)
manipulate matrices algebraically
state what is meant by the inverse of a square matrix, a power of a square matrix,
and the transpose of a matrix, and know the properties of these in order to
manipulate them
explain what is meant by a vector and by Euclidean n-space.
state what is meant by the inner product of two vectors and what properties it
satisfies
state what is meant by the length and direction of a vector, what is meant by a
unit vector
state the relationship between the inner product and the length and angle between
two vectors
explain what is meant by two vectors being orthogonal and how to determine this
find the equations, vector and Cartesian, of lines in R2 , lines and planes in R3 , and
work problems involving lines and planes
state what is meant by a line and by a hyperplane in Rn .
Exercises
Exercise 2.1
Given the matrices:
2 1
A = 1 1,
0 3
1
b = 1 ,
1
1 2 1
C = 3 0 1 ,
4 1 1
0 1
D = 2 5,
6 3
which of the following matrix expressions are defined? Compute those which are defined.
(a) Ab
(b) CA
(c) A + Cb
(d) A + D
(f ) DAT + C
(g) bT b
(h) bbT
(i) Cb
(e) bT D
Exercise 2.2
If a, b Rn , show that aT b = bT a.
Exercise 2.3
If A and B are invertible n n matrices, then using the definition of the inverse, prove
that
(AB)1 = B 1 A1 .
50
2.12. Exercises
Exercise 2.4
Solve for the matrix A:
T
3A +
1 0
2 5
= 4A.
Exercise 2.5
If A and AB are invertible matrices and B is a matrix, solve the equation
(AB)1 = 2A1
for B
and
x
7
6
y = 12 + t 10
z
16
14
Exercise 2.8
Which of the following sets of points in R3 are collinear that is, lie on a line?
(a) (2, 1, 4), (4, 4, 1), (6, 7, 6)
(b) (1, 2, 3), (4, 2, 1), (1, 1, 2).
If the points are not collinear, find a vector equation for the plane determined by the
three points.
Exercise 2.9
Let
x
1
1
7
3 the line through (1, 17, 6) parallel to the vector 4 .
3
Show that:
51
x
1
y , 1
z
1
=5
s, t R.
Exercise 2.11
Write down the general Cartesian equation of a vertical plane (parallel to the z-axis), a
non-vertical plane and a horizontal plane (parallel to the x, y-plane) together with their
normal vectors.
Find the Cartesian equations of the line of intersection of the plane
3x y + 2z = 4
and a general horizontal plane.
Exercise 2.12
1
1
Show that the line x = 3 + t 4 does not intersect the plane 2x + z = 8.
1
2
Find the equation of the line through the point (1, 3, 1) which is parallel to the normal
vector of the plane, and determine at what point it intersects the plane. Hence, or
otherwise, find the distance of the line to the plane.
52
7 5 10
BA = 2 1 3
3 3 4
10 5
6 2
AB =
Feedback to activity 2.4
1 3
3 7
AB =
4 6
3 4
BA =
AB =
1 2
3 4
BA =
and
Therefore A1 =
2
3
2
1
12
3
2
2
3
2
1
12
1
12
1 0
0 1
1 2
3 4
1 0
0 1
a b
c d
1
ad bc
1 0
0 1
1
ad bc
d b
c a
ad bc ab + ba
cd dc bc + ad
.
53
r times
Removing the brackets (matrix multiplication is associative) and replacing each central
AA1 = I, the resultant will eventually be AIA1 = AA1 = I. To complete the proof,
show also that (A1 )r Ar = I. Therefore (Ar )1 = (A1 )r .
Feedback to activity 2.12
Given the sizes of A and B, the matrix AB is m p, so (AB)T is p m. Also, AT is
n m and B T is p n, so the only way these matrices can be multiplied is as B T AT
(unless m = p).
Feedback to activity 2.13
The (i, j) entry of B T AT is obtained by taking row i of B T , which is column i of B and
multiplying each term by the corresponding entry of column j of AT , which is row j of
A, and then summing the products:
B T AT
ij
1 4
5
A = 4 2 7 = AT
5 7 3
4
aT b = ( 1 2 3 ) 2 = (3)
1
1
4 2 1
abT = 2 ( 4 2 1 ) = 8 4 2 .
3
12 6 3
54
= (x1 y1 + x2 y2 + + xn yn )
= x1 y1 + x2 y2 + + xn yn
= (x1 )y1 + (x2 )y2 + + (xn )yn = x, y .
= x, z + y, z
= x, z + y, z
by property (iii)
by property (ii) .
1
5
4
3
and w =
2
5
4
3
z
(a1 , a2 , a3 )
P
y
PPP
PP
(a1 , a2 , 0)
the line from the origin to the point (a1 , a2 , 0) lies in the xy-plane, and by Pythagoras
theorem, it has length a21 + a22 . Applying Pythagoras theorem again to the right
triangle shown, we have
2
a =
Feedback to activity 2.21
We have
1
a = 2
2
a21 + a22
+ (a3 )2 =
1
b= 1
4
2
c = b a = 1
2
a, b
1 + 2 + 8
1
=
=
a b
9 18
2
a, c
2+24
= = 0;
a c
9 9
b, c
21+8
1
= =
b c
18 9
2
55
Alternatively, as the vectors a and c are orthogonal, and have the same length, it
follows immediately that the triangle is right-angled and isosceles.
Feedback to activity 2.22
If t = 3, then q = (3, 7)T . You are asked to sketch the position vector q as this sum to
illustrate that the vector q does locate a point on the line, but the vector q does not lie
on the line.
Feedback to activity 2.23
Here s = 1.
Feedback to activity 2.24
We will work through this for the second equation and leave the first for you. We have,
for s R,
x
y
1
3
+s
2
4
x = 1 2s
y = 3 4s
1x
3y
=s=
,
2
4
1
1
+t
4
1
= p + tv,
t R,
where we have used p to locate a point on the line, and the direction vector, v = q p.
The point (7, 3) is on the line (t = 2), and this is the only point of this form on the line,
since once 7 is chosen for the x coordinate, the y coordinate is determined.
Feedback to activity 2.26
Once given, for example, that the x coordinate is x = 3, the parameter t of the vector
equation is determined, therefore, so are the other two coordinates. We saw in Example
2.14 that t = 2 satisfies the first two equations and it certainly does not satisfy the third
equation, 1 = 0 t.
Feedback to activity 2.27
This is similar to the earlier activity in R2 . A vector equation of the line is
1
4
x = 1 + t 1 = p + tv, t R.
2
1
The point (7, 1, 3) is not on this line, but the point (5, 0, 3) is on the line. The value
t = 1 will then satisfy all three component equations. There is, of course, only one
possible choice for the values of c and d.
Feedback to activity 2.28
The lines are not parallel because their direction vectors are not parallel.
56
x = p + (s + t)v, where r = s + t R.
Feedback to activity 2.34
Using the properties of the inner product, we have for any s, t R,
n, sv + tw = s n, v + t n, w = s 0 + t 0 = 0.
Feedback to activity 2.35
Equating components in the vector equation, we have 3 = s and 7 = t from the first two
equations, and these values do satisfy the third equation, 2 = 4 3s + t.
Feedback to activity 2.36
The parallel planes must each contain the direction vectors of each of the lines as
displacement vectors, so the vector equations of the planes are, respectively
x
1
1
2
y = 3 + s 2 + t 1
z
4
1
7
and
where s, t R.
x
5
1
2
y = 6 + s 2 + t 1 ,
z
1
1
7
The parallel planes have the same normal vector, which we need for the Cartesian
equations. Recall that in Example 2.21 on page 47 we found a Cartesian equation and a
normal vector to the first plane, the plane which contains L1 ,
3
3x y + z = 4 with n = 1 .
1
Note that the point (1, 3, 4) is on this plane because it satisfies the equation, but the
point (5, 6, 1) does not. Substituting (5, 6, 1) into the equation 3x y + z = d, we find
the Cartesian equation of the parallel plane which contains L2 is
3x y + z = 10.
Feedback to activity 2.37
As stated, to verify that the line is in both planes, show that its direction vector is
perpendicular to the normal vector of each plane, and that the point (2, 1, 0) satisfies
both equations.
Feedback to activity 2.38
To describe a line in Rn you need n 1 Cartesian equations. A vector parametric
equation of a hyperplane in Rn would require n 1 parameters.
57
Comments on exercises
1
CA = 3
4
3 2 matrix,
2 1
2 1
4 6
0 1 1 1 = 6 0
1 1
0 3
9 8
2 2
(d) A + D = 3 6
6 6
0 1
(e) bT D = ( 1 1 1 ) 2 5 = ( 4 3 )
6 3
0 1
1 1 3
2 1 0
DAT = 2 5
= 9 7 15
1 1 3
6 3
15 9 9
1 1 3
1 2 1
2 3 4
DAT + C = 9 7 15 + 3 0 1 = 12 7 14
15 9 9
4 1 1
19 10 10
1
(g) bT b = ( 1 1 1 ) 1 = (3)
1
1
1
1 1
(h) bbT = 1 ( 1 1 1 ) = 1
1 1
1
1 1 1
1 2 1
1
2
(i) Cb = 3 0 1
1
= 4
4 1 1
1
4
Solution to exercise 2.2
58
1 2
0 5
So if the two lines have at least one point in common, they must be the same line. To
show the two lines are coincident you only need to show that one point on the second
line is also on the first line. For example, the two lines are coincident if and only if
(7, 12, 16) lies on the first line and this is true if and only if there is a t R such that
3
7
1
7 = 1 + 3t
12 = 2 + t 5
12 = 2 + 5t
2
7
16
16 = 2 + 7t
59
The three component equations have the consistent solution t = 2. Hence the lines are
the same (coincident).
x
2
2
2
4
2
y = 1 + t 3 , where v = 3 = 4 1
z
4
5
5
1
4
is the direction of the line.
6
2
2
7 = 1 + t 3
6
4
5
6 = 2 + 2t
7 = 1 + 3t
6 = 4 5t
This system of equations has a solution, namely t = 2, so the points are collinear. (b)
In the same way, the three given points lie on the same line if and only if there is a
t R such that
1
1
5
1 = 1 + 5t
1 = 2 + t0
1=2
2
3
2
2 = 3 + 2t
This system of equations has no solution, so the points are not collinear.
In this case the three points determine a plane in R3 . A vector equation is given by
x
1
5
0
y = 2 + t 0 + s 1 = p + tv + sw,
z
3
2
1
s, t R
where w = (0, 1, 1)T is the vector from the point (1, 1, 2) to the point (1, 2, 3).
Remember that any point on the plane can be used to locate it, and any non-parallel
vectors in the plane can be used for the directions.
Solution to exercise 2.9
The equation of line 1 is given. The equations of the other two lines are:
5
3
2 : x = 7 + s 6 , s R
4
1
60
1
7
3 : x = 17 + q 4 , q R.
6
3
(a) The lines 1 and 3 are not parallel since their direction vectors are not parallel (not
multiples of one another). The two lines intersect there is a point (x, y, z) which
lies on both lines there are scalars t, q R for which
1
1
1
7
x = 1 + t 2 = 17 + q 4 .
2
3
6
3
1 + t = 1 + 7q
t 7q = 0
1 + 2t = 17 4q
2t + 4q = 18
2 + 3t = 6 3q
3t + 3q = 4
The top two equations can be solved simultaneously by substituting t = 7q (from the
first equation) into the second equation
2t + 4q = 18 2(7q) + 4q = 18 18q = 18 q = 1.
Then the first equation yields t = 7q = 7(1) = 7. But although this solution satisfies the
top two equations, it does not satisfy the third equation,
3t + 3q = 4
3(7) + 3(1) = 4.
Hence these equations have no solution, so no such scalars exist. Since the lines are not
parallel and do not intersect, they are skew.
(b) In the same way, the lines 1 and 2 intersect if there are scalars t, s R such that
1
1
5
3
1 + t = 5 + 3s
1 + t 2 = 7 + s 6
1 + 2t = 7 + 6s
2
3
4
1
2 + 3t = 4 s
These equations have the unique solution t = 1, s = 1. (You should check that these
satisfy all three equations.) The point of intersection can be found from either line.
Using 1 and t = 1,
1
1
2
x = 1 + 1 2 = 1 .
2
3
5
Check using 2 and s = 1. The angle of intersection of these lines is obtained from the
scalar product of their direction vectors.
1
3
2 , 6 = 3 + 12 3 = 12.
3
1
Since this is not zero, the lines are not orthogonal. The angle between them is given by
cos =
12
12
22
32
32
62
(1)2
12
6
= = .
14 46
7 23
Leave the answer in this form as you are not expected to use a calculator.
61
(c) The lines 2 and 3 intersect there are scalars s, q R such that
5
3
1
7
5 + 3s = 1 + 7q
7 + s 6 = 17 + q 4
7 + 6s = 17 4q
4
1
6
3
4 s = 6 3q
These equations have the solution s = 1, q = 1. So the lines do intersect. The point of
intersection is (8, 13, 3). The scalar product of the direction vectors is
3
7
6 , 4 = 21 24 + 3 = 0.
1
3
Since this is zero, the lines are orthogonal.
Now for the Cartesian equations of the planes. We will illustrate a slightly different
method than the one already shown in this chapter.
(d) We need a normal vector to the plane; that is, a vector, n which is orthogonal to the
direction vectors of both lines. Let n = (a, b, c)T , so that
a
1
a
3
b , 2 = 0 and
b , 6 = 0.
c
3
c
1
This leads to the simultaneous equations
a + 2b + 3c = 0
3a + 6b c = 0
Multiplying the first equation by 3 and then subtracting the two equations, we find the
solution c = 0 and a = 2b, where b is any real number. Taking b = 1, a normal vector
is n = (2, 1, 0)T , and the Cartesian equation of the plane is of the form 2x + y = d.
Substituting in any point on the plane, such as (1, 1, 2), we find d = 3. So the
Cartesian equation of the plane containing 1 and 2 is
2x y = 3.
This is a plane which is parallel to the z-axis. You should check that the point (5, 7, 4)
(on the line 2 ) and the point of intersection, (2, 1, 5), also satisfy this equation.
(e) Using the same technique, a normal vector to both planes is n = (1, 4, 3)T . The
Cartesian equation of the plane containing 1 is x + 4y 3z = 9, and the parallel
plane containing 3 is x + 4y 3z = 51.
Solution to exercise 2.10
Solve the equations simultaneously to find any points of intersection. The value of t for
the point of intersection of the plane and the first line is given by
1
1
1
2 + t 0 , 1 = 5
3
5
1
62
Multiply out the scalar product and solve for t to obtain t = 12 . Substituting this back
).
into the equation of the line, the point is ( 23 , 2, 11
2
The value of t for the point of intersection of the plane and the second line is given by
1
1
1
2 + t 2 , 1 = 5
3
1
1
Multiplying out the scalar product and simplifying yields the false statement that 2 = 5,
which is impossible, hence the line does not intersect the plane. In fact, it is parallel to
the plane. The direction vector of the line is orthogonal to the normal to the plane:
1
1
2 , 1 = 0.
1
1
What would have happened if the line had been in the plane? Try this. Write down a
line in the plane parallel to the one given and see what happens when you try to solve
for t.
Solution to exercise 2.11
This time notice that the normal to the plane is orthogonal to the direction of the line,
as
2
1
0 , 4 = 0.
1
2
Since the point (1, 3, 1) on the line does not satisfy the equation of the plane, the line is
parallel to the plane. Therefore it makes sense to ask for the distance of the line from
the plane. This can be found by dropping a perpendicular from the line to the plane
and measuring its length. A method for doing this is given in the question.
The line through (1, 3, 1) and parallel to the normal to the plane is perpendicular to the
plane. A vector equation of the line is
x
1
2
y = 3 + t 0 , t R.
z
1
1
Equating components, we have x = 1 + 2t, y = 3, and z = 1 + t. At the point of
intersection of the line with the plane, these components will satisfy the equation of the
plane, so that
2x + z = 8 2(1 + 2t) + (1 + t) = 8 3 + 5t = 8,
or t = 1. Then putting this value for t in the line, we find the point of intersection is
(3, 3, 2).
The distance between the line and the plane is the distance between this point and the
point (1, 3, 1), which is given by the length of the vector
3
1
2
v = 3 3 = 0 , so the distance is 5.
2
1
1
63
64
Chapter 3
Systems of linear equations
3
Suggested reading
..
.
65
.
.
. ..
.
.
am1 am2 ... amn
xn
am1 x1 + an2 x2 + + amn xn
whose entries are the left-hand sides of our system of linear equations.
If we define another column vector b, whose m components are the right-hand sides bi ,
the system is equivalent to the matrix equation
Ax = b.
Example 3.1 Consider the following system of three linear equations in the three
unknowns, x1 , x2 , x3 :
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 x2 + 2x3 = 5
This system can be written in
1 1
A = 2 1
1 1
The entries of the matrix
multiplication of Ax,
1
2
1
66
1
x1
3
1,
x = x2 ,
b = 4.
2
x3
5
1 1
x1
x1 + x2 + x3
1 1 x2 = 2x1 + x2 + x3
1 2
x3
x1 x2 + 2x3
x1 + x2 + x3
3
2x1 + x2 + x3 = 4
x1 x2 + 2x3
5
and these two 3 1 matrices are equal if and only if their components are equal.
This gives precisely the three linear equations.
67
For example, using the same system as above expressed in matrix form, Ax = b, then
the matrix
1 1 1 3
(A|b) = 2 1 1 4
1 1 2 5
which is the coefficient matrix A together with the constants b as the last column,
contains all the information we need to use, and instead of manipulating the equations,
we can instead manipulate the rows of this matrix. For example, subtracting twice
equation 1 from equation 2 is executed by taking twice row 1 from row 2.
These observations form the motivation behind a method to solve systems of linear
equations, known as Gaussian elimination. To solve a linear system Ax = b we first
form the augmented matrix, denoted (A|b) which is A with column b tagged on.
a11 a12
a21 a22
A=
..
..
...
.
.
am1 am2
Then the matrix
a1n
x1
b1
a2n
x
b
.2 b = .2
x
=
..
..
..
.
amn
xn
bm
a11
a21
(A|b) =
...
am1
a12
a22
..
.
am2
a1n
a2n
..
..
.
.
amn
b1
b2
..
.
bm
From the operations listed above for manipulating the equations of the linear system,
we define corresponding operations on the rows of the augmented matrix.
Definition 3.3 (Elementary row operations) These are:
RO1 multiply a row by a non-zero constant
RO2 interchange two rows
RO3 add a multiple of one row to another.
1 1 1 3
0 0 2 3
( A|b ) = 2 1 1 4 ,
( B|b ) = 0 2 3 4 .
1 1 2 5
0 0 1 5
68
3.3.1
Using the above two examples, we will carry out the algorithm in detail.
(1) Find the leftmost column that
1 1 1
2 1 1
1 1 2
3
0 0 2 3
0 2 3 4
4
5
0 0 1 5
1 1 1 3
0 2 3
2 1 1 4
0 0 2
1 1 2 5
0 0 1
4
3
5
(3) Make this entry 1; multiply the first row by a suitable number or interchange two
rows. This is called a leading one.
The left-hand matrix already
multiply row 1 by 12 .
1 1
2 1
1 1
1 3
1 4
2 5
0 1
0 0
0 0
3
2
2
1
2
3
5
(4) Add suitable multiples of the top row to rows below to make all entries below the
leading one become zero.
For the matrix on the left, we add 2 times row 1 to row 2, then we add 1 times row 1
to row 3. The first operation is the same as the one we performed earlier on the example
using the equations. The matrix on the right already has zeros under the leading one.
1 1
1
3
0 1 32 2
0 1 1 2
0 0 2 3
0 2 1
2
0 0 1 5
At any stage we can read the modified system of equations from the new augmented
matrix, remembering that column 1 gives the coefficients of x1 , column 2 the coefficients
of x2 and so on, and that the last column represents the right-hand side of the equations.
For example the matrix on the left is now the augmented matrix of the system
x1 + x2 + x3 = 3
x2 x3 = 2
2x2 + x3 = 2
The next step in the algorithm is
(5) Cover up the top row and apply steps 1 to 4 again.
69
This time we will work on one matrix at a time. After the first four steps, we have
altered the augmented matrix (A|b) to:
1 1
1
3
(A|b) 0 1 1 2
0 2 1
2
We now ignore the top row. Then the leftmost column which is not all zeros is column
2. This column already has a non-zero entry at the top. We make it into a leading one
by multiplying row 2 by 1:
1 1 1 3
0 1 1 2
0 2 1 2
This is now a leading one, and we use it to
row 3:
0
0
1 1 3
1 1 2
0 3 6
Now we cover up the top two rows and start again with steps 1 to 4. The leftmost
column which is not all zeros is column 3. We multiply row 3 by 31 to obtain the final
leading one:
1 1 1 3
0 1 1 2 .
0 0 1 2
Definition 3.4 (Row echelon form) A matrix is said to be in echelon form (or row
echelon form) if it has the following three properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.
Activity 3.1
Check that the above matrix satisfies these three properties.
The term echelon form takes its name from the form of the equations at this stage.
Reading from the matrix, these equations are
x1 + x2 + x3 = 3
x2 + x3 = 2
x3 = 2
We could now use a method called back substitution to find the solution of the system.
The last equation tells us that x3 = 2. We can then substitute this into the second
equation to obtain x2 , and then use these two values to obtain x1 . This is an acceptable
approach, but we can effectively do the same calculations by continuing with row
operations. So we continue with one final step of our algorithm.
70
(6) Begin with the last row and add suitable multiples to each row above to get zeros
above the leading ones.
Continuing from the row echelon form and using row 3, we replace row 2 with row
2row 3, and at the same time we replace row 1 with row 1row 3.
1 1 1 3
1 1 0 1
(A|b) 0 1 1 2 0 1 0 0
0 0 1 2
0 0 1 2
We now have zeros above the leading one in column 3. There is only one more step to
do, and that is to get a zero above the leading one in column 2. So the final step is row
1row 2,
1 0 0 1
0 1 0 0 .
0 0 1 2
This final matrix is now in reduced (row) echelon form. It has the additional property
that every column with a leading one has zeros elsewhere.
Definition 3.5 (Reduced row echelon form) A matrix is said to be in reduced
echelon form (or reduced row echelon form) if it has the following four properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.
(4) Every column with a leading one has zeros elsewhere.
If R is the reduced row echelon form of a matrix M, we will sometimes write
R = RREF (M).
The solution can now be read from the matrix. The top row says x1 = 1, the second row
says x2 = 0, and the third row says x3 = 2. The original system has been reduced to the
matrix equation
1 0 0
x1
1
0 1 0 x2 = 0
0 0 1
x3
2
giving the solution
1
x1
x2 = 0 .
2
x3
We can check that this solution is the correct solution of the original system by
substituting it into the equations, or equivalently, by multiplying out the matrices Ax
to show that Ax = b.
Activity 3.2 Do this: check that
1 1 1
1
3
2 1 10 = 4.
1 1 2
2
5
71
We now return to the example (B|b) which we left after the first round of steps 1 to 4,
and apply step 5. We cover up the top row and apply steps 1 to 4 again. We need to
have a leading one in the second row, which we achieve by switching row 2 and row 3:
0 1 32 2
0 1 23 2
(B|b) 0 0 2 3 0 0 1 5
0 0 1 5
0 0 2 3
We obtain a zero under this leading one by replacing row 3 with row 3 + (2) times
row 2,
0 1 32 2
0 0 1 5
0 0 0 7
0 1
0 0
0 0
3
2
1
0
2
5
1
This matrix is now in row echelon form, but we shall see that there is no point in going
on to reduced row echelon form. This last matrix is equivalent to the system
x1
2
0 1 23
0 0 1 x2 = 5
0 0 0
x3
1
What is the bottom equation of this system? Row 3 says 0x1 + 0x2 + 0x3 = 1, that is
0 = 1 which is impossible! This system has no solution.
3.3.2
If the row echelon form (REF) of the augmented matrix ( A|b ) contains a row
(0 0 0 1) then it is inconsistent.
It is instructive to lookat the original
systems represented by
matrices,
these augmented
1 1 1 3
0 0 2 3
( A|b ) = 2 1 1 4
( B|b ) = 0 2 3 4
1 1 2 5
0 0 1 5
= 3
x1 + x2 + x3 = 3
2x3
2x1 + x2 + x3 = 4
2x2 + 3x3 = 4 .
x1 x2 + 2x3 = 5
x3
= 5
We see immediately that the system Bx = b is inconsistent since it is not possible for
both the top and the bottom equation to hold.
72
Since these are systems of three equations in three variables, we can interpret these
results geometrically. Each of the equations above represents a plane in R3 . The system
Ax = b represents three planes which intersect in the point (1, 0, 2). This is the only
point which lies on all three planes. The system Bx = b represents three planes, two of
which are parallel (the horizontal planes 2x3 = 3 and x3 = 5), so there is no point which
lies on all three planes.
This method of reducing the augmented matrix to reduced row echelon form is known
as Gaussian elimination or Gauss-Jordan elimination.
We have been very careful in illustrating this method to explain what the row
operations were for each step of the algorithm, but in solving a system with this method
it is not necessary to include all this detail. The aim is to use row operations to put the
augmented matrix into reduced row echelon form, and then read off the solutions from
this form. Where it is useful to indicate the operations, you can do so by writing, for
example, R2 2R1 , where we always write down the row we are replacing first, so that
R2 2R1 indicates replace row 2 (R2 ) with row 2 plus 2 times row 1 (R2 2R1 ).
Otherwise, you can just write down the sequence of matrices linked by arrows. It is
important to realise that once you have performed a row operation on a matrix, the
new matrix obtained is not equal to the previous one, this is why you must use arrows
between the steps and not equal signs.
Example 3.2 We repeat the reduction of (A|b) to illustrate this for the system
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 x2 + 2x3 = 5
Begin by writing down the augmented matrix, then apply the row operations to
carry out the algorithm. Here we will indicate the row operations.
1 1 1 3
(A|b) = 2 1 1 4
1 1 2 5
1 1
1
3
R2 2R1 0 1 1 2
R3 R1
0 2 1
2
1 1 1 3
(1)R2 0 1 1 2
0 2 1 2
1 1 1 3
0 1 1 2
R3 + 2R2 0 0 3 6
1 1 1 3
0 1 1 2.
1
( 3 )R3 0 0 1 2
73
The matrix is now in row echelon form, continue to reduced row echelon form,
R1 R3 1 1 0 1
R2 R3 0 1 0 0
0 0 1 2
R1 R2 1 0 0 1
0 1 0 0.
0 0 1 2
The augmented matrix is now in reduced row echelon form.
Activity 3.4 Use Gaussian elimination to solve the following system of equations,
x1 + x2 + x3 = 6
2x1 + 4x2 + x3 = 5
2x1 + 3x2 + x3 = 6.
Be sure to follow the algorithm to put the augmented matrix into reduced row
echelon form using row operations.
3.3.3
Gaussian elimination can be used to solve systems of linear equations with any number
of equations and unknowns. We will now look at an example of a linear system with
four equations in five unknowns,
x1 + x2 + x3 + x4 + x5
2x1 + x2 + x3 + x4 + 2x5
x1 x2 x3 + x4 + x5
x1 + x4 + x5
=
=
=
=
3
4
5
4.
1
2
1
1
3
4
.
5
4
1 1
1 1
2 1
1 1
(A|b) =
1 1 1 1
1 0
0 1
Check that your augmented matrix is correct before you proceed, or you could be
solving the wrong system! A good method is to first write down the coefficients by rows,
so reading across the equations, and then to check the columns do correspond to the
coefficients of that variable. Now follow the algorithm to put (A|b) into reduced row
echelon form.
1 1
1
1 1 3
R2 2R1
0 1 1 1 0 2
0 2 2 0 0 2
R3 R1
R4 R1
0 1 1 0 0 1
74
1 1
1
(1)R2 0 1
1
0 2 2
0 1 1
0
R3 + 2R2 0
R4 + R2
0
0
( 12 )R3 0
0
1
1
0
0
1
1
0
0
0
0
R4 R3 0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
2
1
1
1
1
1
1
1
0
0
1
0
0
0
3
2
6
3
3
2
3
3
1
0
0
0
1
1
1
0
3
2
2
1
1
0
0
0
1
0
0
0
3
2
3
0
1
0
0
0
1
R1 R2
0
0
1
1
0
0
1
1
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
1 0
0 1
0 3
0 0
1 1
0 1
.
0 3
0 0
There are only three leading ones in the reduced row echelon form of this matrix. These
appear in columns 1, 2 and 4. Since the last row gives no information, but merely states
that 0 = 0, the matrix is equivalent to the system of equations
x1 + 0 + 0 + 0 + x5 = 1
x2 + x3 + 0 + 0 = 1
x4 + 0 = 3.
The form of these equations tells us that we can assign any values to x3 and x5 , and
then the values of x1 , x2 and x4 will be determined.
Definition 3.7 (Leading variables) The variables corresponding to the columns with
leading ones in the reduced row echelon form of an augmented matrix are called leading
variables. The other variables are called non-leading variables.
In this example the variables x1 , x2 and x4 are leading variables, x3 and x5 are
non-leading variables. We assign x3 , x5 the arbitrary values s, t, where s, t represent any
75
real numbers, and then solve for the leading variables in terms of these. We get
x4 = 3
x2 = 1 s
x1 = 1 t.
x1
1t
1
0
1
x2 1 s 1
1
0
x=
x3 = s = 0 + s 1 + t 0 .
x4 3 3
0
0
x5
t
0
0
1
Observe that there are infinitely many solutions, because any values of s R and t R
will give a solution.
The solution given above is called a general solution of the system, because it gives a
solution for any values of s and t. For any particular assignment of values to s and t,
such as s = 0, t = 1, we obtain a particular solution of the system.
Activity 3.5 Let s = 0 and t = 0 and show (by substituting it into the equation)
that x0 = (1, 1, 0, 3, 0)T is a solution of Ax = b. Then let s = 1 and t = 2 and show
that the new vector x1 you obtain is also a solution.
With practice, you will be able to read the general solution directly from the reduced
row echelon form of the augmented matrix. We have
1 0 0 0 1 1
0 1 1 0 0 1
(A|b)
0 0 0 1 0 3 .
0 0 0 0 0 0
Locate the leading ones, and note which are the leading variables. Then locate the
non-leading variables and assign each an arbitrary parameter. So, as above, we note
that the leading ones correspond to x1 , x2 and x4 and we assign arbitrary parameters to
the non-leading variables; that is, values such as x3 = s and x5 = t where s and t
represent any real number. Then write down the vector x = (x1 , x2 , x3 , x4 , x5 )T (as a
column) and fill in the values starting with x5 and working up. We have x5 = t. Then
the third row tells us that x4 = 3. We have x3 = s. Now look at the second row, which
says x2 + x3 = 1, or x2 = 1 s. Then the top row tells us that x1 = 1 t. In this
way we obtain the solution in vector form.
Activity 3.6 Write down the system of three linear equations in three unknowns
represented by the matrix equation Ax = b, where
1 2 1
x
3
A= 2 2 0 ,
x= y ,
b = 2.
3 4 1
z
5
Use Gaussian elimination to solve the system. Express your solution in vector form.
If each equation represents the Cartesian equation of a plane in R3 , describe the
intersection of these three planes.
76
3.3.4
Solution sets
We have seen systems of linear equations which have a unique solution, no solution and
infinitely many solutions. It turns out that these are the only possibilities.
For suppose we have a linear system Ax = b which has two distinct solutions, p and q.
Thinking of these vector solutions as determining points in Rn , then we will show that
every point on the line through p and q is also a solution. Therefore, as soon as there is
more than one solution, there must be infinitely many.
To prove this claim, let p and q be vectors such that Ap = b and Aq = b, p = q. The
equation of the line is
v = p + t(q p)
t R.
Then for any vector v on the line we have Av = A(p + t(q p)). Using the
distributive laws,
Av = Ap + tA(q p) = Ap + t(Aq Ap) = b + t(b b) = b
Therefore v is also a solution for any t R, so there are infinitely many of them.
Notice that in this proof, the vector w = q p satisfies the equation Ax = 0. This
leads us to our next topic.
Homogeneous systems
77
1 1
1 1 3 1
A =
1 1 1 1
0 2
0 1 2 2
0 1
1 1 3 1
1 1 0
0 1 1 0
0 1 0
0 0 1 2
0 0 1
3 1
1
2 0
0
2 2
0
5
1 0
0 1
2
0 0
1 3 1
1 1 0
1 2 2
0 3
0 2
1 2
Activity 3.7 Work through the above calculation and state what row operation is
being done at each stage. For example, the first operation is R2 R1 .
Then write down the solution from the reduced row echelon form of the matrix.
The solution is
x
3
y
2
x=
z = t 2 ,
w
1
tR
which is a line through the origin, x = tv, with v = (3, 2, 2, 1)T . There are infinitely
many solutions, one for every t R.
78
is consistent and will have infinitely many solutions. Notice that the coefficient
matrix of this linear system is the same matrix A as that used in the previous
example of a homogeneous system.
The augmented matrix is
1 1 3 1 2
(A|b) = 1 1 1 1 4 .
0 1 2 2 0
Activity 3.8 Show that the reduced
1 0
0 1
0 0
Then write down the solution.
0 3 1
0 2 2 .
1 2
1
x
1
3
y 2
2
x=
z = 1 + t 2 = p + tv
w
0
1
tR
is a line which does not go through the origin. It is parallel to the line of solutions of
the homogeneous system, Ax = 0, and goes through the point determined by p.
This should come as no surprise, since the coefficient matrix forms the first four
columns of the augmented matrix. Compare the solution sets:
Ax = 0 :
RREF (A)
Ax = b :
1 0 0 3
0 1 0 2
0 0 1 2
3
2
x = t
2
1
RREF (A|b)
1 0 0 3 1
0 1 0 2 2
0 0 1 2
1
3
1
+ t 2
x=
2
1
0
1
79
The reduced row echelon form of the augmented matrix of a system Ax = b will always
contain the information for the solution of Ax = 0, since the matrix A is the first part
of (A|b). We therefore have the following definition.
1 0 0 3
0 1 0 2 .
0 0 1 2
Explain why you can tell from this matrix that for all b R3 , the linear system
Ax = b is consistent with infinitely many solutions.
Activity 3.10
3.4.2
Null space
It is clear from what we have just seen that the general solution to a consistent linear
system Ax = b involves solutions to the system Ax = 0. This set of solutions is given a
special name: the null space or kernel of a matrix A. This null space, denoted N(A), is
the set of all solutions x to Ax = 0, where 0 is the zero vector. That is,
Definition 3.10 (Null space) For an m n matrix A, the null space of A is the
subset of Rn given by,
N(A) = {x Rn | Ax = 0},
where 0 = (0, 0, . . . , 0)T is the zero vector of Rm .
We now formalise the connection between the solution set of a consistent linear system,
and the null space of the coefficient matrix of the system.
Theorem 3.2 Suppose that A is an m n matrix, that b Rm , and that the system
Ax = b is consistent. Suppose that p is any solution of Ax = b. Then the set of all
80
A(p + z) = Ap + Az = b + 0 = b,
In light of this result, lets have another look at some of the examples we worked earlier.
In the previous section (page 79) we observed that the solutions of
x + y + 3z + w = 2
xy+z+w = 4
y + 2z + 2w = 0.
are of the form
x
1
3
y 2
=
+ t 2 = p + tv,
x=
z 1
2
w
0
1
t R,
81
=
x+y+z = 3
2z
2x + y + z = 4
2y + 3z =
x y + 2z = 5
z
=
1 0 0 1
0 1
(A|b) 0 1 0 0 ,
(B|b) 0 0
0 0 1 2
0 0
3
2
1
0
we worked in
3
4 .
5
2
5.
1
The first system, Ax = b, has a unique solution, p = (1, 0, 2)T , and the second system,
Bx = b, is inconsistent.
The reduced row echelon form of the matrix A is the identity matrix (these are the first
three columns of the augmented matrix). Therefore the homogeneous system Ax = 0
will only have the trivial solution. The unique solution of Ax = b is of the form
x = p + 0, which conforms with the Principle of Linearity.
This principle does not apply to the inconsistent system Bx = b; however, the
associated homogeneous system is consistent. Notice that the homogeneous system is
= 0
2z
2y + 3z = 0
z
= 0
which represents the intersection of two planes, since the equations 2z = 0 and z = 0
each represent the xy-plane. To find the solution, we continue to reduce the matrix B to
reduced row echelon form.
0 1 0
0 1 23
B 0 0 1 0 0 1 .
0 0 0
0 0 0
x = 0 = t 0, t R
0
0
which is a line through the origin, namely the x-axis. So the plane 2y + 3z = 0
intersects the xy-plane along the x-axis.
We summarise what we have noticed so far:
If Ax = b is consistent, the solutions are of the form x = p + z where p is any one
particular solution and z N(A), the null space of A.
(1) If Ax = b has a unique solution then Ax = 0 has only the trivial solution.
(2) If Ax = b has infinitely many solutions then Ax = 0 has infinitely many
solutions.
Ax = b may be inconsistent, but Ax = 0 is always consistent.
82
Activity 3.11
Look at the example we solved in section 3.3.3 on page 74.
x1 + x2 + x3 + x4 + x5
2x1 + x2 + x3 + x4 + 2x5
x1 x2 x3 + x4 + x5
x1 + x4 + x5
=
=
=
=
3
4
5
4.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
express a system of linear equations in matrix form as Ax = b and know what is
meant by the coefficient matrix and the augmented matrix
put a matrix into reduced row echelon form using row operations and following the
algorithm
recognise consistent and inconsistent systems of equations
solve a system of m linear equations in n unknowns using Gaussian elimination
express the solution in vector form
interpret systems with three unknowns as intersections of planes in R3
say what is meant by a homogeneous system of equations and what is meant by the
associated homogeneous system of any linear system of equations
state and explain the Principle of Linearity
say what is meant by the null space of a matrix.
Exercises
Exercise 3.1
Write down the augmented matrix for each of the following systems of equations, and
use it to solve the system by reducing the augmented matrix to reduced row echelon
form.
x+y+z =2
x + y + 2z = 2
(i)
2y + z = 0
(ii)
x + y z = 4
x + y + 2z = 2
2y + z = 0
(iii)
x + y z = 2
(iv)
2y + z = 0
x + y z = 0
3x y + z = 0
2x + 3y + 2z = 0
x + 2y + 3z = 0
83
and
x1 x2 x3 = 1.
C= 2
6
matrix
2 1 3
1 1 0.
2 4 6
1 1 1 6
1 0 0
2 4 1 5 (1) (2) (3) (4) 0 1 0
2 3 1 6
0 0 1
five steps:
2
1 ,
5
from which you can read the solution, x = (2, 1, 5)T . We will state the row operations
at each stage. To obtain (1), do R2 2R1 and R3 2R1 ; for (2) switch R2 and R3 ; For
(3) do R3 2R2 . The augmented matrix is now in row echelon form, so starting with
the bottom row, for (4) do R2 + R3 and R1 R3 . The final operation, R1 R2 will yield
the matrix in reduced row echelon form.
84
1 1
1 1
2 1
1 1
Ax0 =
1 1 1 1
1 0
0 1
1
1
1
2
0
1
3
1
0
1 1
1 1
2 1
1 1
and Ax1 =
1 1 1 1
1 0
0 1
1
1
2
2
1 .
1
3
1
2
1
2
3
2 1 3
1 2
1
3
2 0 2 0 2 2 4
4 1 5
0 2 2 4
1 2 1 3
1 0 1 1
. . . 0 1 1 2 0 1 1
2
0 0 0 0
0 0 0
0
x
1 + t
1
1
with solution y = 2 t = 2 + t 1 ,
z
t
0
1
1 2 1 1
1 2
1
1
1 2
1
1
1 2 1 1
2 2 0 2 0 2 2 0 0 1
1
0 0 1 1 0 .
3 4 1 2
0 2 2 1
0 2 2 1
0 0 0 1
There is no reason to reduce the matrix further, we conclude that the original system of
equations is inconsistent, there is no solution. For the homogeneous system, Ax = 0, the
row echelon form of A consists of the first three columns of the echelon form of the
85
1 2 1
1
A = 2 2 0 ... 0
3 4 1
0
2 1
1 0 1
1 1 0 1 1 .
0 0
0 0 0
Setting the non-leading variable x3 = t, we find that the null space of A consists of all
vectors, x,
1
x = t 1 , t R.
1
Comments on exercises
Solution to exercise 3.1
The first set is presented
1
(i) (A|b) = 0
1
0
0
1
R1 R3
0
0
1
R
2 3
in detail,
1 1
2
1 1 1 2
R3 +R1
0 2 1 0
2 1
0
1 1 4
0 2 0 2
1 1 2
2 1 0
1 0 1
1 0 0
1 0 1
0 1 2
1
R3 R2
0
0
1
R1 R2
0
0
1 1 2
1 0 1
2 1 0
0 0 1
1 0 1
0 1 2
1 1 2
R3 2R2
1 0 1
0 1 2
x
1
= y = 1
z
2
1
0
0
This system has a unique solution. The three planes intersect in one point.
(ii)
86
1 1 2 2
1 1 2 2
1 1 2 2
R3 +R1
3 R2
0 2 1 0 R
0 2 1 0
(A|b) = 0 2 1 0
1 1 1 0
0 2 1 2
0 0 0 2
This system is inconsistent; there is no solution. The equations represent three planes
whose lines of intersection are parallel, so have no points in common, but no two of the
planes are parallel.
1 1 2
2
R3 +R1
(iii) (A|b) = 0 2 1
0
1 1 1 2
1 1 2 2
1 1 2
1
R
R3 R2
2 2
0 2 1 0
0 1 12
0 0 0 0
0 0 0
1
0
0
2
0
0
1 2 2
2 1 0
2 1 0
1 0
R1 R2
0 1
0 0
3
2
1
2
2
0
0
Set z = t and then solve for x and y in terms of t. There are infinitely many solutions.
3
x
2 32 t
2
2
= y = 21 t = 0 + t 21 , t R .
z
t
0
1
The three planes intersect in a line. If you set z = 2s, then this line of solutions can be
written as
x
2
3
y = 0 + s 1 , s R.
z
0
2
(iv) This is a homogeneous system of equations, so you can just reduce the coefficient
matrix A. If you reduce the augmented matrix, you will notice that the last column
remains all zeros throughout the row reduction.
To reduce calculations, the first step in reducing this matrix should be to switch row 1
and row 3, so that you have a leading one without introducing fractions. The reduced
row echelon form is
3 1 1
1 0 0
x
0
A = 2 3 2 . . . 0 1 0
y = 0.
1
2 3
0 0 1
z
0
The unique solution is x = (0, 0, 0)T . These equations represent three planes, all of
which pass through the origin, and have one point of intersection, which is the origin.
Solution to exercise 3.2
Solve the first two equations simultaneously using Gaussian elimination. The general
solution takes the form x = p + sw, s R, where p = (1, 0, 0)T and w = (0, 1, 1)T ,
which is the equation of the line of intersection of the two planes.
The third plane intersects the first two in the same line. You can determine this by
solving the linear system of three equations using Gaussian elimination. Alternatively,
you can notice that the line of intersection of the first two planes is in the third plane
(since its direction is perpendicular to the normal, and the point v satisfies the
Cartesian equation of the plane), so this must be the intersection of all three planes.
87
1 1 0 0 1 1
0 0 1 0 2
2 .
0 0 0 1 0 1
x1
1 r + s
1
1
1
x2
0
1
0
r
x3 = 2 2s = 2 + r 0 + s 2 ,
r, s R.
x4
1
0
0
1
x5
s
0
0
1
Solution to exercise 3.4
To obtain the
3
2
6
2 1 3
1 1 0 3
1 1 0
3
R1 R2
2R1
2 1 1 0 R2
0 1 1 6
1 1 0
R3 6R1
2 4 6
6 2 4 6
0 4 4 12
1 1
0
3
R
2 0 1 1
6
0 4 4 12
R3 +4R2
and so on to obtain
1
1
0
3
1 1 0
3
1
R
12 3
0 1 1 6
0 1 1 6
0 0 0 1
0 0 0 12
1 1 0 0
1 0 1 0
0 1 1 0 0 1 1 0 .
0 0 0 1
0 0 0 1
(a) If this is the reduced row echelon form of an augmented matrix, C = (A|b), the
original system of three equations in three unknowns, Ax = b, is inconsistent.
(Solutions would have been in R3 .)
(b) If this is the reduced row echelon form of the coefficient matrix of a homogeneous
system, Cx = 0, with variables x, y, z, w, the solutions are the vectors,
x
t
1
y t
= = t 1 , t R
x=
z t
1
w
0
0
which is a line through the origin in R4 .
88
Chapter 4
Matrix inversion and determinants
Suggested reading
Elementary matrices
89
These operations change a matrix into a new matrix. We want to examine this process
more closely. Let A be an n n matrix and let Ai denote the ith row of A. Then we can
write A as a column of n rows,
=
A=
..
..
..
...
...
.
.
.
An
an1 an2 ann
A1
3A2
.
..
An
A2
A1
.
..
An
A1
A2 + 4A1
..
.
An
The first is multiply row 2 by 3, the second is interchange row 1 and row 2, and the
third is add 4 times row 1 to row 2. Each of these represent new matrices after the row
operation has been executed.
Now look at a product of two n n matrices A and B. The (1, 1) entry in the product
is the inner product of row 1 of A and column 1 of B. The (1, 2) entry is the inner
product of row 1 of A and column 2 of B, and so on. In fact, row 1 of the product
matrix AB is obtained by taking the product of the row matrix A1 with the matrix B,
that is, A1 B. This is true of each row of the product; that is, each row i of the product
AB is obtained by taking Ai B. So we can express the product AB as,
a11 a12 a1n
b11 b12 b1n
A1 B
a21 a22 a2n b21 b22 b2n A2 B
.
.
= .
..
..
..
..
..
..
..
.
.
.
. ..
.
. ..
An B
an1 an2 ann
bn1 bn2 bnn
Now consider the effect of a row operation on a product AB. The first matrix below is
the product AB after the row operation add 4 times row 1 of AB to row 2 of AB.
A1 B
A1 B
A1
A2 B + 4A1 B (A2 + 4A1 )B A2 + 4A1
=
=
B
..
..
..
.
.
.
An B
An B
An
90
Now take A = I, the identity matrix. Since IB = B, the previous statement now says
that:
The matrix obtained by a row operation on B is equal to the following product:
(the matrix obtained by a row operation on I)B.
Definition 4.1 (Elementary matrix) An elementary matrix, E, is an n n matrix
obtained by doing exactly one row operation on the n n identity matrix, I.
For example,
1 0 0
0 1 0
1 0 0
0 3 0
1 0 0
4 1 0
0 0 1
0 0 1
0 0 1
are elementary matrices. The first has had row 2 multiplied by 3, the second had row 1
and row 2 interchanged, and the last matrix had 4 times row 1 added to row 2.
Activity 4.1 Which
2
0
0
1 0
0 1
1 0
1 0
0 1
1 0
0
1 0 0
0 1 0
0
1
1 0 1
Elementary matrices provide a useful tool to relate a matrix to its reduced row echelon
form. We have shown above that the matrix obtained from a matrix B after performing
one row operation is equal to a product EB, where E is the elementary matrix obtained
from I by that same row operation.
Example 4.1 Suppose we want to put the
1
B= 1
1
into reduced row echelon form.
B=
1
1
matrix
2 4
3 6
0 1
2 4
R2 R1
3 6
0 1
1 2 4
0 1 2
1 0 1
1 0 0
1
R2 R1
I= 0 1 0
1
0 0 1
0
to obtain an elementary
0 0
1 0 = E1
0 1
1 0 0
1 2 4
1 2 4
E1 B = 1 1 0 1 3 6 = 0 1 2 ,
0 0 1
1 0 1
1 0 1
91
1 0 0
E = 4 1 0 .
0 0 1
E1 B = 1
0
0 0
1 2 4
1 2 4
1 0 1 3 6 = 0 1 2
0 1
1 0 1
1 0 1
1 0 0
1 2
1 1 0 0 1
0 0 1
1 0
4.1.2
4
1 2 4
2 = 1 3 6.
1
1 0 1
Row equivalence
92
reflexive
symmetric
transitive
AA
AB BA
A B and B C A C
Activity 4.3 Argue why this is true: that is, explain why row equivalence as defined
above satisfies these three conditions?
The algorithm for putting a matrix A into reduced row echelon form by a sequence of
row operations means that every matrix is row equivalent to a matrix in reduced row
echelon form. This fact is stated in the following theorem.
Theorem 4.2 Every matrix is row equivalent to a matrix in reduced row echelon form.
4.1.3
We are now ready to answer the first question, When is a matrix invertible? We collect
our results in the following theorem.
Theorem 4.3 If A is an n n matrix, then the following statements are equivalent
(meaning if any one of these statements is true for A, then all the statements are true).
(1) A1 exists.
(2) Ax = b has a unique solution for any b Rn .
(3) Ax = 0 only has the trivial solution, x = 0.
(4) The reduced row echelon form of A is I.
Proof
If we show that (1) (2) (3) (4) (1), then any one statement will imply all
the others, so the statements are equivalent.
(1) = (2). We assume that A1 exists, and consider the system of linear equations
Ax = b where x is the vector of unknowns and b is any vector in Rn . We use the
matrix A1 to solve for x by multiplying the equation on the left by A1 ,
A1 Ax = A1 b = Ix = A1 b = x = A1 b.
This shows that x = A1 b is a solution, and it is the only possible solution. So Ax = b
has a unique solution for any b Rn .
(2) = (3). If Ax = b has a unique solution for all b Rn , then this is true for b = 0.
The unique solution of Ax = 0 must be the trivial solution, x = 0.
(3) = (4). If the only solution of Ax = 0 is x = 0, then there are no free (non-leading)
variables and the reduced row echelon form of A must have a leading one in every
column. Since the matrix is square and a leading one in a lower row is further to the
right, A must have a leading one in every row. Since every column with a leading one
has zeros elsewhere, this can only be the n n identity matrix.
93
(4) = (1). We now make use of elementary matrices. If A is row equivalent to I, then
there is a sequence or row operations which reduce A to I, so there must exist
elementary matrices E1 , . . . , Er such that
Er Er1 E1 A = I.
Each elementary matrix has an inverse. We use these to solve the above equation for A,
1
by first multiplying the equation on the left by Er1 , then by Er1
, and so on, to obtain
1
A = E11 Er1
Er1 I
This says that A is a product of invertible matrices, hence invertible. (Recall from
Chapter 1 that if A and B are invertible matrices of the same size, then the product
AB is invertible and its inverse is the product of the inverses in the reverse order,
(AB)1 = B 1 A1 .)
This proves the theorem.
4.1.4
A1 = Er E1 .I
This gives us a method to find the inverse of a matrix A. We start with the matrix A
and we form a new, larger matrix by placing the identity matrix to the right of A,
obtaining the matrix denoted (A|I). We then use row operations to reduce this to
(I|B). If this is not possible (which will become apparent) then the matrix is not
invertible. If it can be done, then A is invertible and B = A1 .
Example 4.2 We use this method to find
1
A= 1
1
2 4
3 6.
0 1
1 2 4
(A | I) = 1 3 6
1 0 1
94
1 0 0
0 1 0.
0 0 1
(We have separated A from I by a vertical line just to emphasise how this matrix is
formed. It is also helpful in the calculations.) Then we carry out elementary row
operations.
1 2 4
1 0 0
R2 R1
0 1 2 1 1 0
R3 + R1
0 2 5
1 0 1
1 2 4
1
0 0
0 1 2 1 1 0
R3 2R2
0 0 1
3 2 1
1 2 0 11 8 4
R1 4R3
0 1 0 7
5 2
R2 2R3
0 0 1
3
2 1
1 0 0
3 2 0
R1 2R2
0 1 0 7 5 2 .
0 0 1
3 2 1
This is now in the form (I|B) so we deduce that A is invertible and that
3 2 0
A1 = 7 5 2 .
3 2 1
It is very easy to make mistakes when row reducing a matrix, so the next thing you
should do is check that AA1 = I.
Activity 4.4 Do this. Check that when you multiply AA1 , you get the identity
matrix I.
(In order to establish that this is the inverse matrix, you should also show
A1 A = I, but we will forgo that here.)
If the matrix A is not invertible, what will happen? By the theorem, if A is not
invertible, then the reduced row echelon form of A cannot be I, so there will be a row of
zeros in the row echelon form of A.
Activity 4.5 Find the inverse, if it
2 1
A=
0 1
1
2
4.1.5
3
2 1 3
1
B = 0 1 1 .
0
1 2 0
Result on inverses
At this stage, in order to show that a square matrix B is the inverse of the n n matrix
A, it seems we have to show that both statements, AB = I and BA = I are true. After
we have proved the following theorem, which follows from Theorem 4.3, we will be able
95
to deduce from the one statement AB = I that A and B must be inverses of one
another.
Theorem 4.4 If A and B are n n matrices and AB = I, then A and B are each
invertible matrices, and A = B 1 and B = A1 .
Proof
If we show that the homogeneous system of equations Bx = 0 has only the trivial
solution, x = 0, then by Theorem 4.3 this will prove that B is invertible. So we consider
the matrix equation Bx = 0 and multiply both sides of this equation on the left by the
matrix A. We have
Bx = 0 A(Bx) = A0 (AB)x = 0.
But we are given that AB = I, so that
(AB)x = 0 Ix = 0 x = 0,
which shows that the only solution of Bx = 0 is the trivial solution. We therefore
conclude that B is invertible, so the matrix B 1 exists.
We now multiply both sides of the equation AB = I on the right by the matrix B 1 .
We have,
AB = I (AB)B 1 = IB 1 A(BB 1 ) = B 1 A = B 1 .
So A is the inverse of B, and therefore A is also an invertible matrix. Then taking
inverses of both sides of the last equation, we conclude that A1 = (B 1 )1 = B.
4.2 Determinants
4.2.1
a b
c d
1
b/a
0 d cb/a
1 0
0 1
1/a 0
c/a 1
(1/a)R1
aR
1 b/a
c d
1
b/a
0 (ad bc)
96
a b
c d
1/a 0
0 1
= ad bc.
1/a 0
c a
4.2. Determinants
For example,
1 2
= (1)(4) (2)(3) = 2.
3 4
To extend this definition to n n matrices, we define the determinant of an n n
matrix recursively, in terms of (n 1) (n 1) determinants. So the determinant of a
3 3 matrix is given in terms of 2 2 matrices, and so on. To do this we will need the
following two definitions.
Definition 4.3 Suppose A is an n n matrix. The (i, j) minor of A, denoted by Mij ,
is the determinant of the (n 1) (n 1) matrix obtained by removing the ith row and
jth column of A.
Definition 4.4 The (i, j) cofactor of a matrix A is
Cij = (1)i+j Mij .
So the cofactor is equal to the minor if i + j is even, and it is equal to 1 times the
minor if i + j is odd.
Example 4.3 Let
1 2 3
A = 4 1 1.
1 3 0
1 2
= 5,
1 3
There is a simple way to associate the cofactor Cij with the entry aij of the matrix.
Locate the entry aij and cross out the row and the column containing aij . Then
evaluate the determinant of the (n 1) (n 1) matrix which remains. This is the
minor, Mij . Then give it a + or sign according to the position of aij on the
following pattern:
+ +
+ +
+ + .
.. .. .. .. . .
.
. . . .
Activity 4.6 Write down the cofactor C13 for the matrix A above using this
method.
If A is an n n matrix, the determinant of A, is given by
a11
a21
|A| = ..
.
an1
a12
a22
..
.
an2
. . . a1n
. . . a2n
..
..
.
.
. . . ann
97
This is called the cofactor expansion of |A| by row one. It is a recursive definition,
meaning that the determinant of an n n matrix is given in terms of (n 1) (n 1)
determinants.
Example 4.4 We calculate the determinant of the matrix A above:
|A| = 1C11 + 2C12 + 3C13
1 1
4 1
4 1
= 1
2
+3
3 0
1 0
1 3
= 1(3) 2(1) + 3(13) = 34.
M=
0
1
the matrix
2 1
2 3.
1 4
You might ask, Why is the cofactor expansion given by row 1, rather than any other
row? In fact it turns out that using a cofactor expansion by any row or column of A
will give the same number |A|, as the following theorem states.
98
1 2 3
4 1
1 2
4 1 1 =3
1
+ 0 = 3(13) (5) = 34.
1 3
1 3
1 3 0
4.2. Determinants
M=
0
1
2 1
2 3.
1 4
in the previous activity by expanding by a different row or column. Choose one with
fewer calculations.
We will give an informal proof of Theorem 4.5, because it is useful to understand how
the definition of determinant works. This section can be safely omitted (you can accept
the theorem without proof), but you might find it worth your while to read through it.
For a 2 2 matrix the cofactor expansion by row 1 is equivalent to the definition given
on page 97 as ad bc,
a11 a12
= a11 a22 a12 a21 .
a21 a22
Notice that each term of the sum is a product of entries, one from each row and one
from each column. Indeed a11 is the entry from row 1 and column 1, and a22 is not in
either, it comes from row 2 and column 2. Similarly for the second term, a12 a21 , which is
the only different way of taking one entry from each row and each column of the matrix.
For a 3 3 matrix, the cofactor expansion by row 1 yields,
a11
a21
a31
a12
a22
a32
a13
a
a23 = a11 22
a32
a33
a23
a
a12 21
a33
a31
a23
a
+ a13 21
a33
a31
a22
a32
= a11 (a22 a33a23 a32 )a12 (a21 a33a23 a31 )+a13 (a21 a32a22 a31 ).
1
2
3
3 choices
/\ /\ /\
23 13 12
2 choices
3 2 1 = 3!
|| || ||
32 31 21
1 choice
99
In the above expansion of |A|, because each term has the row indices arranged in
ascending order and the column indices form a different permutation of the numbers
1,2,3, we know that each term of the sum is a different product of entries, one from each
row and one from each column of A, and the set of six products contains all ways in
which this can happen.
But what about the minus signs? An inversion is said to occur in a permutation
whenever a larger integer precedes a smaller one. For example,
1 2 3 no inversions
1 3 2 one inversion
A permutation is said to be even if the total number of inversions is even. It is odd if the
total number of inversions is odd.
To find the total number of inversions of a permutation, start at the left and find the
total number of integers to the right of the first integer which are smaller than the first
integer. Then go to the next integer to the right and do the same. Continue until the
end, and then add up all these numbers.
Example 4.6 Applying this to the permutation 5 2 3 4 1, the total number of
inversions is 4 + 1 + 1 + 1 = 7, so this permutation is odd. The total number of
inversions gives the minimum number of steps that it takes to put these numbers
into ascending order, where in each step you are only allowed to switch the positions
of two adjacent integers. For the permutation 5 2 3 4 1, this can be done in seven
steps by
5 2 3 4 1 2 5 3 4 1 2 3 5 4 1 2 3 4 5 1
2 3 4 1 5 2 3 1 4 5 2 1 3 4 5 1 2 3 4 5.
If we look again at the list of products (), we find that the permutations of the column
indices corresponding to the products with a plus sign are all even, and those
corresponding to the products with a minus sign are all odd.
Definition 4.6 An elementary product from an n n matrix A is a product of n
entries, no two of which come from the same row or column. A signed elementary
product has the row indices arranged in ascending order, multiplied by 1 if the column
indices are an odd permutation of the numbers 1 to n.
We are now ready to give an intrinsic (but completely impractical) definition of
determinant.
Definition 4.7 (Determinant) The determinant of an n n matrix A is the sum of all
signed elementary products of A.
A cofactor expansion is a clever way to obtain this sum of signed elementary products.
You choose the entries from one row, say, and then cross out that row and the column
containing the entry to obtain the cofactor, and each stage of calculating the cofactor
repeats the process. The proof of Theorem 4.5 is that each cofactor expansion is the
sum of all signed elementary products.
Activity 4.9 Expand the determinant
a11
|A| = a21
a31
100
a12
a22
a32
a13
a23
a33
4.2. Determinants
using the cofactor expansion by column 2, and show that you get the same list of
signed elementary products as we obtained in ().
For very large matrices, using a cofactor expansion is impractical. For example,
1 4 3
2 7 5
|A| =
1
2
6
2 10 14
2
1
= 1C11 + (4)C12 + 3C13 + 2C14
0
4
Proof
If we evaluate the determinant by the cofactor expansion using the row of zeros, then
each cofactor is multiplied by 0 and the sum will be zero. To visualise this, expand the
determinant below using row 1,
0
a21
..
.
an1
0
a22
..
.
an2
... 0
. . . a2n
.. = 0C11 + 0C12 + + 0C1n = 0.
..
.
.
. . . ann
Proof
To prove this we will use a recursive (or inductive) argument. If A is a 2 2 matrix
with two equal rows, then
|A| =
a b
= ab ab = 0
a b
101
Now consider a 3 3 matrix with two equal rows. If we expand the determinant by the
other row, then each cofactor is a 2 2 determinant with two equal rows, therefore each
is zero and so is their sum. For example,
a b c
b c
a c
a b
|A| = d e f = d
+e
f
=0+0+0=0
b c
a c
a b
a b c
In this way, the result for (n 1) (n 1) matrices implies the result for n n
matrices, so the statement is true.
Corollary 3 If the cofactors of one row are multiplied by the entries of a different row,
then the result is 0.
Proof
Let
for i = j
This expression is not equal to |A|, so what is it? It is equal to |B| for some matrix B,
but what does the matrix B look like?
In the expression |B| = aj1 Ci1 + aj2 Ci2 + + ajn Cin , each cofactor Cik , k = 1, . . . , n is
made up of entries of the matrix A, omitting the entries from row i. For example, if
i = 1, then Ci1 is obtained from the matrix resulting in removing row i and column 1
from A. So the matrix B will have the same entries as the matrix A except in row i. In
the cofactor expansion of a determinant by row i, the entries of row i are the numbers
multiplying the cofactors. Therefore the entries of row i of the matrix B must be
aj1 , . . . , ajn . Then B has two equal rows, since row i has the same entries as row j, and
|B| = 0.
4.2.2
Definition 4.8 An n n matrix A is upper triangular if all entries below the main
diagonal are zero. It is lower triangular if all entries above the main diagonal are zero.
upper
triangular
matrix
102
a11
0
.
..
0
a12
a22
..
.
0
. . . a1n
. . . a2n
..
..
.
.
. . . ann
4.2. Determinants
a11 0
a21 a22
.
..
..
.
an1 an2
a11 0
0 a22
.
..
..
.
0
0
lower
triangular
matrix
diagonal
matrix
... 0
... 0
..
..
.
.
. . . ann
... 0
... 0
..
..
.
.
. . . ann
a12
a22
..
.
. . . a1n
. . . a2n
..
..
.
.
. . . ann
Which row or column should we use for the cofactor expansion? Clearly the calculations
are simplest if we expand by column 1 or row n. Expansion by column 1 gives us
|A| = a11
a22
..
.
0
. . . a2n
..
..
.
. .
. . . ann
a12
a22
..
.
an1
an2
. . . a1n
. . . a2n
..
..
.
.
. . . ann
a11
a21
|B| =
..
.
a12
a22
..
.
an1
an2
. . . a1n
. . . a2n
..
..
.
.
. . . ann
103
a12
a22
..
.
an2
. . . a1n
a11
. . . a2n
a21
= ..
..
..
.
.
.
. . . ann
an1
a12
a22
..
.
an2
. . . a1n
. . . a2n
.. .
..
.
.
. . . ann
a b
= ad bc
c d
|B| =
c d
= bc ad
a b
so |B| = |A|.
Now let A be a 3 3 matrix and let B be a matrix obtained from A by interchanging
two rows. Then if we expand |B| using a different row, each cofactor contains the
determinant of a 2 2 matrix which is a cofactor of A with two rows interchanged, so
each will be multiplied by 1, and |B| = |A|. To visualise this, consider for example
g h i
|B| = d e f
a b c
a b c
|A| = d e f ,
g h i
Expanding |A| and |B| by row 2, we have
|A| = d
|B| = d
a b
a c
b c
f
+e
g h
g i
h i
g h
g i
h i
+e
f
= |A|
a c
a b
b c
since all the 2 2 determinants change sign. In the same way, if this holds for
(n 1) (n 1) matrices, then it hold for n n matrices.
The effect of interchanging two rows of a matrix is to multiply the determinant by
1, |B| = |A|.
RO3 add a multiple of one row to another.
Suppose the matrix B is obtained from the matrix A by replacing row j of A by row j
plus k times row i of A, j = i. For example, consider the case in which B is obtained
from A by adding 4 times row 1 of A to row 2. Then
a11
a21
|A| = ..
.
an1
104
a12
a22
..
.
an2
. . . a1n
. . . a2n
.. ,
..
.
.
. . . ann
4.2. Determinants
a11
a12
a21 + 4a11 a22 + 4a12
|B| =
..
..
.
.
an1
an2
In general, in a situation like this, we can expand
...
a1n
. . . a2n + 4a1n
.
..
..
.
.
...
ann
|B| by row j:
|B| = (aj1 + kai1 )Cj1 + (aj2 + kai2 )Cj2 + + (ajn + kain )Cjn
= aj1 Cj1 + aj2 Cj2 + + ajn Cjn + k(ai1 Cj1 + ai2 Cj2 + + ain Cjn )
= |A| + 0
The last expression in brackets is 0 because it consists of the cofactors of one row
multiplied by the entries of another row. So this row operation does not change the
value of |A|.
There is no change in the value of the determinant if a multiple of one row is added
to another.
We collect these results in the following theorem.
Theorem 4.7 (Effect of a row (column) operation on |A|) All statements are true
if row is replaced by column.
(RO1) If a row is multiplied by a constant
|A| changes to |A|.
(RO2) If two rows are interchanged
|A| changes to
|A|.
2 1 4
3 0 2
1 1 2
4 1 3
1
0
0
0
2 1 4
5 1 6
1 1 2
2 2 1
In the second step we factored 3 from the third row. We would need to multiply
the resulting determinant on the right by 3 in order to put the 3 back into the
third row, and get back a matrix with the same determinant as A. Next we switch
row 2 and row 3, with the effect of changing the sign of the determinant.
1
0
|A| = 3
0
0
2 1 4
1 1 2
=3
5 1 6
2 2 1
1
0
0
0
2 1 4
1 1 2
0 4 4
0 4 5
105
The final steps all use RO3, so there is no change in the value of the determinant.
Finally we evaluate the determinant of the upper triangular matrix
1
0
=3
0
0
2 1 4
1 1 2
= 12.
0 4 4
0 0 1
A word of caution with row operations! What is the change in the value of |A|
(1) if R2 is replaced by R2 3R3 or
For (1) there is no change, but for (2) the determinant will change sign. Why? 3R1 R2
is actually two elementary row operations, first multiply row 2 by 1 and then add
three times row 1 to it. When performing row operation RO3, you should always add a
multiple of another row to the row you are replacing.
Activity 4.10 You can shorten the writing in the above example by expanding the
4 4 determinant using the first column as soon as you have obtained the
determinant with zeros under the leading one. You will then be left with a 3 3
determinant to evaluate. Do this. Without looking at the example above, work
through the calculations in this way to evaluate
1
1
|A| =
2
1
2 1 4
3 0 2
1 1 2
4 1 3
4.2.3
One very important result concerning determinants can be stated as: The determinant
of the product of two square matrices is the product of their determinants. This is the
content of the following theorem.
Theorem 4.8 If A and B are n n matrices then
|AB| = |A||B|.
Proof (optional)
We will outline the proof of this theorem without filling in all the details, but you can
safely omit it and carry on to the next section. We first prove the theorem in the case
when the matrix A is an elementary matrix. We use again the fact established in section
4.1.1 (page 91) that the matrix obtained by a row operation on the matrix B is equal to
the product of the elementary matrix of that row operation times the matrix B.
Let E1 be an elementary matrix that multiplies a row by a non-zero constant k. Then
E1 B is the matrix B obtained by performing that row operation on B, and by Theorem
106
4.7, |E1 B| = k |B|. For the same reason, |E1 | = |E1 I| = k|I| = k. Therefore
|E1 B| = |E1 | |B|.
The argument for the other two types of elementary matrices is the same. So we assume
that the theorem holds when A is any elementary matrix.
Now recall that every matrix is row equivalent to a matrix in reduced row echelon form,
so if R denotes the reduced row echelon form of the matrix A, then we can write
A = Er Er1 . . . E1 R.
Since A is a square matrix, R is either the identity matrix or a matrix with a row of
zeros.
Applying the result for an elementary matrix repeatedly,
|A| = |Er Er1 . . . E1 R| = |Er ||Er1 | . . . |E1 ||R|.
where |R| is either 1 or 0. Note that |R| = 0 if and only if |A| = 0.
If R = I, then by repeated application of the result for elementary matrices, this time
with the matrix B,
|AB| = |Er Er1 . . . E1 B| = |Er ||Er1| . . . |E1 ||B| = |Er Er1 . . . E1 ||B| = |A| |B|.
If R = I, then
|AB| = |Er Er1 . . . E1 R B| = |Er ||Er1 | . . . |E1 ||RB|.
Since the product matrix RB must also have a row of zeros |RB| = 0. Therefore
|AB| = 0 = 0|B| and the theorem is proved.
107
As we have seen, row operations cannot alter the fact that a determinant is zero or
non-zero. By performing a row operation we might be multiplying the determinant by a
non-zero constant, or by 1, or not changing the determinant at all. Therefore we can
conclude that |A| = 0 if and only if the determinant of its reduced row echelon form,
|R| = 0 which is if and only if A is not invertible. Or, put the other way, |A| = 0 if and
only if |R| = 1, if and only if the matrix A is invertible.
Proof 2: We will now prove this theorem directly. Since it is an if and only if
statement, we must prove both implications.
1
.
|A|
We now show the other implication, that if |A| = 0 then A is invertible. To do this we
will construct A1 , and to do this we need some definitions.
Definition 4.9 If A is an n n matrix, the matrix of cofactors of A is the matrix
whose (i, j) entry is Cij , the (i, j) cofactor of A. The adjoint (adjugate in some
textbooks) of the matrix A is the transpose of the matrix of cofactors. That is, the
adjoint of A, adj(A), is the matrix
a11
a21
A adj(A) =
...
an1
a12 . . . a1n
C11
a22 . . . a2n C12
.
..
..
..
.
.
. ..
an2 . . . ann
C1n
C21
C22
..
.
C2n
. . . Cn1
. . . Cn2
..
..
.
.
. . . Cnn
The (1, 1) entry is: a11 C11 + a12 C12 + + a1n C1n . This is the cofactor expansion of |A|
by row 1.
The (1,2) entry is: a11 C21 + a12 C22 + + a1n C2n . This consists of the cofactors of
row 2 of A multiplied by the entries of row 1, so this is equal to 0 by Corollary 3 in
section 4.2.1.
108
Continuing in this way, we see that the entries on the main diagonal of the product are
all equal to |A|, and all entries off the main diagonal are equal to 0. That is,
|A| 0 0
0 |A| 0
A adj(A) =
= |A| I,
..
..
..
...
.
.
.
0
|A|
We know |A| = 0, so we can divide both sides of the equation by |A| to obtain,
A
This implies that
1
adj(A)
|A|
A1 =
=I
1
adj(A)
|A|
1 2 3
A = 1 2 1 .
4 1 1
First calculate |A| to see if A is invertible. Using the cofactor expansion by row 1,
|A| = 1(2 1) 2(1 4) + 3(1 8) = 16 = 0.
We then calculate the minors, for example
M11 =
and fill in the chart below
M11 = 1
M21 = 1
M31 = 4
2 1
= 1,
1 1
M12 = 5
M22 = 11
M32 = 4
M13 = 9
M23 = 7
M33 = 4
Change the minors into cofactors, by multiplying by 1 those minors with i + j equal
to an odd number. Finally transpose the result to form the adjoint matrix, so that
1
1
4
1
1
A1 =
adj(A) = 5 11 4 .
|A|
16
9
7
4
1 2 3
1
1
4
16
0
0
1
1
1 2 1 5 11 4 = 0
16
0 = I.
16
16
4 1 1
9
7
4
0
0
16
109
Activity 4.11
1 2 3
A = 0 4 0
5 6 7
Remember: the adjoint matrix only contains the cofactors of A; the (i, j) entry is the
cofactor Cji of A. The entries only multiply the cofactors when calculating the
determinant of A, |A|.
4.3.1
Cramers rule
If A is a square matrix with |A| = 0, then Cramers rule gives us an alternative method
of solving a system of linear equations Ax = b.
Theorem 4.10 (Cramers rule) If A is n n, |A| = 0, and b Rn , then the solution
x = (x1 , x2 , . . . , xn )T of the linear system Ax = b is given by
xi =
|Ai |
|A|
where Ai is the matrix obtained from A by replacing the ith column with the vector b
Before we prove this theorem, lets see how it works.
Example 4.9 Use Cramers rule to find the solution of the linear system
x + 2y + 3z = 7
x + 2y + z = 3
4x + y + z = 5
In matrix form Ax = b this system
1 2
1 2
4 1
is,
3
x
7
1 y = 3
1
z
5
We first check that |A| = 0. This is the same matrix A we used Example 4.8 to find
the inverse of a matrix on page 109; |A| = 16. Then applying Cramers rule, we
find x by evaluating the determinant of the matrix obtained from A by replacing
column 1 with b,
7 2 3
3 2 1
5 1 1
16
x=
=
=1
|A|
16
and in the same way we obtain y and z.
1
7 3
1 3 1
4
5 1
48
y=
=
= 3
|A|
16
110
1 2 7
1 2 3
4 1 5
64
z=
=
=4
|A|
16
which can be easily checked by substitution into the original equations (or
multiplying Ax).
We now prove the theorem.
Proof
Since |A| = 0, A1 exists, and we can solve for x by multiplying Ax = b on the left by
A1 . Then
x = A1 b
x1
C11 C21 . . . Cn1
b1
x2
C
C22 . . . Cn2 b2
1
12
.
x=
..
..
..
... = |A| ...
.
.
. ..
xn
C1n C2n . . . Cnn
bn
The entry xi of the solution is equal to the ith row of this product.
xi =
1
(b1 C1i + b2 C2i + + bn Cni )
|A|
Stare at this expression a moment. The cofactors all come from row i of the adjoint
matrix, and they are the cofactors of column i of A, so this looks like a cofactor
expansion by column i of a matrix which is identical to A except in column i, where the
entries are the components of the vector b. That is, the term in brackets is the cofactor
expansion by column i of the matrix A with column i replaced by the vector b; this is
|Ai |.
Summary of Cramers rule. To find xi ,
(1) replace column i of A by b,
(2) evaluate the determinant of the resulting matrix,
(3) divide by |A|.
Activity 4.12 Can you think of another method to obtain the solution to Example
4.9? One way is to use the inverse matrix. Do this. We found A1 in Example 4.8 on
page 109. Now use it to find the solution x of
1 2 3
x
7
1 2 1 y = 3
4 1 1
z
5
by calculating x = A1 b.
111
($ value)
($ value)
x1
x2
water uses
What is the total water used by the industries? Water is using $0.01 for each unit
output, so a total of 0.01x1 , and electricity is using $0.21 water for each unit of its
output, so a total of 0.21x2 . The total amount of water used by the industries is
therefore 0.01x1 + 0.21x2 . In the same way, the total amount of electricity used by
the industries is 0.15x1 + 0.05x2 . The totals can be expressed as
water
electricity
0.01 0.21
0.15 0.05
x1
x2
= Cx.
112
x1 0.01x1 0.21x2 = d1
x2 0.15x1 0.05x2 = d2
(water)
(electricity)
In matrix notation,
x1
x2
0.01 0.21
0.15 0.05
x1
x2
d1
d2
or, x Cx = d, where
d=
d1
d2
If we use the fact that Ix = x, where I is the 2 2 identity matrix, then we can
rewrite this system in matrix form as
Ix Cx = d
(I C)x = d.
or
This is now in the usual matrix form for a system of linear equations. A solution, x,
to this system of equations will determine the output levels of each industry required
to satisfy all demands exactly.
Now lets look at the general case. Suppose we have an economy with n interdependent
industries. If cij denotes the amount of industry i used by industry j to produce $1.00
of industry j, then the consumption or technology matrix is C = (cij ):
cn2
cnn
where
113
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
say what is meant by an elementary matrix, and understand how they are used for
row operations
find the inverse of a matrix using row operations
find the determinant of a square matrix and use it to determine if a matrix is
invertible
find the inverse of a matrix using cofactors
solve a system of linear equations using Cramers rule
say what is meant by the Leontief input-output model.
In addition you should know that:
There are three methods to solve Ax = b if A is n n and |A| = 0:
(1) Gaussian elimination.
Exercises
Exercise 4.1
Use elementary row operations to find any inverses of the following matrices.
1 0 4 0
1 2 3
1 2 3
0 1 0 0
C=
A= 2 3 0
B = 2 3 0
0 0 0 1
0 1 6
0 1 2
0 0 1 0
114
4.4. Exercises
(a) b1 = 0
(b) b2 = 1
(c) b3 = 1 .
3
1
0
Be certain your solution for A1 is correct (check that AA1 = I first).
Exercise 4.3
Evaluate the following determinants using the cofactor expansion along an appropriate
row or column.
7 5
2 0
(b)
11 2
23 57
2 5 1
(a) 1 0 2
7 1 1
0
1
(d)
0
0
1
0
0
0
0
0
1
0
2 3
0 0
0 0
1 1
0
0
0
(e)
0
0
1
0
0
0
1
0
0
0
0
6
3
1
3
0
0
(c)
0
0
0
1
9
4
0
0
2
0
8
2
0
3
9
7
7
9
2
2
1
1
1
1
6
1
0
0
5
1
1
2
3
4
5
6
Exercise 4.4
Let
3 t 2
B = 1 5 3 .
2 1 1
2
1
0
4
1
1
(b) 2
2
1
4 1 3 0
7 4 3 8
8 2 6 0
0 5 5 7
9 0 9 2
(c)
3 3a 3a2
2 2b 2b2
1 c
c2
115
Exercise 4.6
For which values of is the matrix
A=
2
3
2
1
not invertible?
Exercise 4.7
2 0 3
1
B = 2
A= 0 3 1
1 4 2
0
0 2
1 3
1 1
1 2 0
C = 0 1 1
2 1 1
Exercise 4.9
Use Cramers rule to find the value of x, y, z for system (a) and to find the value of z
for system (b) where a, b are constants, a = b, a = 2b.
x
+
y
+
z
=
8
ax ay + bz = a + b
2x + y z = 3
bx by + az = 0
(a)
(b)
x + 2y + z = 3
ax + 2by + 3z = a b
Exercise 4.10
Consider an economy with three industries,
i1 : water
i2 : electricity
i3 : gas
d2 = $100, 000
d3 = $72, 000
(a) How much water, electricity and gas is needed to produce $1 worth of electricity?
(b) What should be the weekly production of each industry in order to satisfy all
demands exactly?
116
2 1 0
1 1 0
2 0 0
0 1 0 = 0 1 0 0 1 0 = E2 E1 ,
0 0 1
0 0 1
0 0 1
where E1 represents 2R1 and E2 represents R1 + R2 . You should multiply the matrices in
the opposite order, E1 E2 , and notice the effect, thinking about the row operations on I.
Feedback to activity 4.2
The matrix E is the identity matrix after the
matrix is the identity matrix after R2 + 4R1 ,
1
1
E = 4
0
0 0
1 0.
0 1
2 1 3
(A|I) =
0 1 1
1
2 0
So
1
R3 +2R1
0
0
1
R3 5R2
0
0
1
R2 +R3
0
0
2
1
0
1 0 0
1
2 0
R1 R3
0 1 0
0 1 1
0 0 1
2 1 3
0 0 1
1
(1)R2
0 1 0 0
1 0 2
0
0
0 0 1 1R
1
8 3
1 0 1 0 0
0
8
1 5 2
0 0 0 1
1
R 2R
1
3
1
0 8 8 4 1 2 0
5
1
1 18
0
8
4
2 0
1 1
5 3
2 0
1 1
5 3
2
1
0
2 0
1 1
0 1
A1
Now check that AA1 = I.
0 0
1 0
0 1
2 6 4
1
= 1 3 2
8
1
5 2
0 0 1
0 1 0
1 0 0
0 0 1
0 1 0
1 0 2
0
0
1
8
28
1
8
1
8
0
1
1
0
6
8
1
2
1
4
1
4
5
8
38
5
8
1
4
When you carry out the row reduction, it is not necessary to always indicate the
separation of the two matrices by a line as we have done so far. You just need to keep
track of what you are doing.
117
In the calculation for the inverse of B, we have omitted the line but added a bit of
space to make it easier for you to read.
2 1 3
1 0 0
1 2 0
0 0 1
R1 R3
0 1 1
(B|I) = 0 1 1
0 1 0
0 1 0
1 2 0
0 0 1
2 1 3
1 0 0
0 0 1
1 2
0
(1)R2
0 1 0 0 1 1
1 0 2
0 3 3
1 2 0
0 0
1
R3 +3R2
0 1 1
0 1 0
0 0 0
1 3 2
1 2 0
R3 2R1
0 1 1
0 3 3
0 0
1
0 1 0
1 0 2
which indicates that the matrix B is not invertible; it is not row equivalent to the
identity matrix.
Feedback to activity 4.6
C13 = 13.
Feedback to activity 4.7
|M| = 1(8 3) 2(0 3) + 1(0 2) = 1
Feedback to activity 4.8
You should either expand by column 1 or row 2. For example, using column 1:
|M| = 1(8 3) + 1(6 2) = 1.
Feedback to activity 4.10
1 2 1 4
5 1 6
0 5 1 6
= 3 3 6
|A| =
0 3 3 6
2
2 1
0 2
2 1
At this stage you can expand the 3 3 matrix using a cofactor expansion, or continue a
bit more with row operations:
1 1 2
1 1 2
4 4
|A| = 3 5 1 6 = 3 0 4 4 = 3
= 3(4) = 12
4 5
2 2 1
0 4 5
Feedback to activity 4.11
A1
|A| = 32 = 0
28
4 12
7 1 3
1
1
1
adj(A) = 0
8
0 = 0
2
0 .
=
|A|
32
8
20 4
4
5 1 1
118
0.01 0.21
0.15 0.05
x=
x1
x2
d=
627
4955
1 7/33 1900/3
3
19
99100
1 7/33 1900/3
0
1
5500
33 7 20900
3 19 99100
1 7/33 1900/3
0 202/11 101000
1 0 1800
0 1 5500
x=
1800
5500
Comments on exercises
Solution to exercise 4.1
For the matrix A:
1 2 3
1 0 0
(A|I) = 2 3 0
0 1 0
0 1 2
0 0 1
1 2
3
1 0 0
R2 2R1 0 1 6
2 1 0
0 1
2
0 0 1
1 2 3
1 0 0
1R2 0 1 6
2 1 0
0 1 2
0 0 1
1 2 3
1
0 0
R3 R2 0 1 6
2 1 0
0 0 4
2 1 1
1 2 3
1 0
0
14 R3 0 1 6
2 1 0
1
0 0 1
14 41
2
3
3
1 2 0
12
4
4
R2 6R3
1
3
0 1 0
1
2
2
R1 3R3
1
1
1
0 0 1
2
4
4
3
1
9
1 0 0
4 4
2
3
R1 2R2 0 1 0
1 21
2
1
1
1
0 0 1
2
4
4
119
=
You should check that AA1 = I.
For the matrix B:
3
2
41
94
1
2
41
14
A1 = 1
1
(B|I) = 2
0
R2 2R1
0
0
1
2
2 3
3 0
1 6
2
3
1 6
1
6
3
2
1 0 0
0 1 0
0 0 1
1 0 0
2 1 0
0 0 1
The next step will yield a row of zeros, therefore the matrix B is not invertible.
For C
1
0
(C|I) =
0
0
R3 R4
R1 4R3
1
0
0
0
1
0
0
0
0
1
0
0
4
0
0
1
0
1
0
0
0
1
0
0
C 1
0
0
1
0
4
0
1
0
0
0
1
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1
0
=
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
1
0
4
0
1
0
0 4
0 0
.
0 1
1 0
C is not an elementary matrix. Since two row operations were required to reduce it to
the identity matrix, C is the product of two elementary matrices which correspond to
(the inverses of) those two operations:
1 0 4 0
1 0 0 0
0 1 0 0
, E2 = 0 1 0 0 C = E2 E1
E1 =
0 0 1 0
0 0 0 1
0 0 0 1
0 0 1 0
Note that the order is important. C = E1 E2
120
14
1
2
14
x1 = 1
1
2
1
x2 = A1 b2 = 1
0
21
1
4
3
7
0
=
2
2
14
3
14
94
and
x3 = A1 b3 =
14
1
2
14
2 3
5 2 3
2 3
0 0
= 20.
= 2 2 0 0 = 2(2)
1 1
0 0
57 1 1
1 1
(iii)
1
3
0
0
2
2
1
1
1
1
6
1
0
2 1 0
2 1 0
2 1 0
0
= 1 1 6 5 3 1 6 5 = 2 1 6 5
5
1 1 1
1 1 1
1 1 1
1
= 2 2
6 5
1 5
1
1 1
1 1
= 2(2(1) 1(4)) = 12
1
0
0
0
0
0
1
0
0
1 0 0
0
= 1 0 1 0 = 1,
0
0 0 1
1
or recognise that this is an elementary matrix, which is the identity matrix with row 1
and row 2 switched, so the determinant is equal to 1.
121
0
0
0
0
6
3
0
0
0
1
9
4
0
0
2
0
8
2
0
3
9
7
7
9
1
2
3
= 1
4
5
6
0
0
0
0
1
0
0
0
6
3
0
0
1
9
4
0
2
0
8
2
3
9
7 = (1)(3)
7
9
0
0
0
1
0
0
6
3
0
1
9
4
2
0
8
2
0 0 1
0 6
= (1)(3)(2) 0 6 9 = (1)(3)(2)(1)
= 36
1 3
1 3 4
9 16
9 16
= 6(25) = 150.
=6
1 1
6 6
4 1 3 0
7 4 3 8
8 2 6 0 = 0 since R3 = 2R1 .
0 5 5 7
9 0 9 2
(b)
1
1
2
2
1
(c)
3 3a 3a2
1 a a2
1
a
a2
b a b2 a2
2 2b 2b2 = 6 1 b b2 = 6 0 b a b2 a2 = 6
c a c2 a2
1 c
c2
1 c c2
0 c a c2 a2
= 6(b a)(c a)
1 b+a
= 6(b a)(c a)(c b).
1 c+a
122
2
3
= (2 )(1 ) 6 = 2 3 4 = ( 4)( + 1).
2
1
23
8
= ,
|A|
5
|(2A)1 | =
1
1
1
= 3
= .
|2A|
2 |A|
40
3 1
=2
4 2
C21 =
C31 =
C12 =
0 3
= (12)
4 2
C22 =
0 3
=9
3 1
Then
A1
0 1
= (1)
1 2
2 3
=1
1 2
C32 =
2 3
= (2)
0 1
2 12 9
1
= 1
1
2 .
5
3
8
6
C13 =
0 3
=3
1 4
C23 =
C33 =
2 0
= (8)
1 4
2 0
=6
0 3
1 0 2
|B| = 2 1 3 = 1(1 + 3) + 2(2) = 0, so B is not invertible.
0 1 1
1 2 0
|C| = 0 1 1 = 1(1 1) 2(0 2) = 2, so C is invertible.
2 1 1
Find the cofactors,
C11 = 2
C12 = (1)(2)
C31 = 2
C32 = (1)(1)
C22 = 1
C21 = (1)(2)
Then
C 1
2 2
2
1
= 2 1 1 .
2
2 3
1
C13 = 2
C23 = (1)(3)
C33 = 1
123
1 1
A= 2 1
1 2
1
1
|A| = 1(3) 1(1) + 1(5) = 7 = 0
1
8 1 1
1
8(3) 1(6) + 1(3)
21
x=
3 1 1 =
=
= 3,
|A|
|A|
7
3 2 1
1
y=
|A|
1 8 1
1(6) 8(1) + 1(9)
7
2 3 1 =
= = 1,
|A|
7
1 3 1
1
z=
|A|
1 1 8
1(3) 1(9) + 8(5)
28
2 1 3 =
=
= 4.
|A|
7
1 2 3
8
b = 3,
3
(b) For this part, it is very useful to use the fact that Theorem 4.7 concerning the
effects of row operations on a determinant is also true for the effects of the same
operations on the columns of a matrix. Indeed, if you think about the fact that
|A| = |AT |, this has to be so. We want z in the solution of Ax = b, where
a a b
x
a+b
A = b b a , x = y , b = 0 .
a 2b 3
z
ab
To evaluate z, you need to replace the last column of A with the vector b and then
evaluate the new determinant. So you can still use the same column operation,
(replacing C2 with C2 + C1 ) to simplify the determinant,
z=
1
|A|
1
|A|
a a a + b
b b
0
a 2b a b
a
0
a+b
b
0
0
a a + 2b a b
(a + 2b) a a + b
b
0
|A|
(a 2b)b(a + b)
=
,
(a 2b)(a2 b2 )
124
so that
z=
b
.
ab
4
3
10
15
40, 000
5
9
15
100, 000
( I C|d ) = 25
10
9
0
0
72, 000
10
5
R
50, 000
1 38 41
4 1
9
2
51
100, 000
5
10
10
0
0
1
80, 000
R3
9
1 38 14
50, 000
3
R2 + 25 R1 0 34 10
120, 000
0 0
1
80, 000
1 38 14
50, 000
4
0 1 2
R
160, 000
3 2
5
0 0
1
80, 000
R1 + 41 R3
70, 000
1 38 0
0 1 0
192, 000
2
R2 + 5 R3
0 0 1
80, 000
R1 + 38 R2
1 0 0
142, 000
0 1 0
192, 000
0 0 1
80, 000
x1
142, 000
$142, 000 water
$192, 000 electricity
= x = x2 = 192, 000
80, 000
$80, 000 gas
x3
125
126
Chapter 5
Rank, range and linear equations
Suggested reading
127
matrix in row echelon form begins with a leading one, this is equivalent to the following
definition.
Definition 5.2 The rank, rank(A), of a matrix A is the number of leading ones in a
row echelon matrix obtained from A by elementary row operations.
Generally, if A is an m n matrix, then the number of non-zero rows (the number of
leading ones) in a row echelon form of A can certainly be no more than the total
number of rows, m. Furthermore, since the leading ones must be in different columns,
the number of leading ones in the echelon form can be no more than the total number,
n, of columns. Thus we have:
Theorem 5.1 For an m n matrix A, rank(A) min{m, n}, where min{m, n}
denotes the smaller of the two integers m and n.
Reducing this to
1
2
3
1 2 1 1
M = 2 3 0 5.
3 5 1 6
2 1 1
1 2
1 1
1 2 1 1
3 0 5 0 1 2 3 0 1 2 3
5 1 6
0 1 2 3
0 0 0 0
This last matrix is in row echelon form and has two non-zero rows (and two leading
ones), so the matrix M has rank 2.
has rank 3.
1 2 1 1
B = 2 3 0 5
3 5 1 4
If a square matrix A of size n n has rank n, then its reduced row echelon form has a
leading one in every row and (since the leading ones are in different columns) a leading
one in every column. Since every column with a leading one has zeros elsewhere, it
follows that the reduced echelon form of A must be I, the n n identity matrix.
Conversely, if the reduced row echelon form of A is I, then by the definition of rank, A
has rank n. We therefore have one more equivalent statement to add to our theorem:
Theorem 5.2 If A is an n n matrix, then the following statements are equivalent.
A1 exists.
Ax = b has a unique solution for any b Rn .
128
1 2 1 1
1 2
1
2 3 0 5 0 1 2
3 5 1 4
0 1 2
1
1 2 1 1
3 0 1 2 3 .
1
0 0 0 2
129
1 2 1
1 2 1
A = 2 3 0 ... 0 1 2.
3 5 1
0 0 0
1 2 1 1
1 2 1 1
(A|b) 2 3 0 5 . . . 0 1 2 3 .
3 5 1 4
0 0 0 1
The rank of the coefficient matrix A is 2, but the rank of the augmented matrix
(A|b), is 3.
If a linear system is consistent then there can be no leading one in the last column of
the reduced augmented matrix, for that would mean there was a row of the form
(0 0 . . . 0 1). Thus, a system Ax = b is consistent if and only if the rank of the
augmented matrix is precisely the same as the rank of the matrix A.
Example 5.3 In contrast, consider the system of equations
x1 + 2x2 + x3 = 1
2x1 + 3x2 = 5
3x1 + 5x2 + x3 = 6.
This system has the same coefficient matrix A as Example 5.2, and the rank of A is
2. The augmented matrix for the system is the matrix M in Example 5.1 on page
128, which also has rank 2, so this system is consistent. Since the rank is 2 and there
are 3 columns in A, there is a free variable and therefore infinitely many solutions.
Activity 5.3 Write down a general solution for this system to verify these remarks.
If an m n matrix A has rank m, then there will be a leading one in every row of an
echelon form of A, and in this case a system of equations Ax = b will never be
inconsistent; it will be consistent for all b Rm . Why? There are two ways to see this.
In the first place, if there is a leading one in every row of A, the augmented matrix
(A|b) can never have a row of the form (0 0 . . . 0 1). Second, the augmented matrix
also has m rows, its size is m (n + 1). So the rank of (A|b) can never be more than m.
Example 5.4 Consider again the matrix B from Activity 5.2 on page 128, which
we interpreted as the augmented matrix B = (A|b) in Example 5.2, and its row
echelon form:
1 2 1 1
1 2 1 1
B = 2 3 0 5 . . . 0 1 2 3 .
3 5 1 4
0 0 0 1
130
1 2 1 1 p1
(B|d) . . . 0 1 2 3 p2
0 0 0 1 p3
for some constants pi , which could be zero. This system will have infinitely many
solutions for any d R3 , because the number of columns is greater than the rank of
B. There is one column without a leading one, so there is one non-leading variable.
5
Activity 5.4 If p1 = 1, p2 = 2 and p3 = 0, and x = (x1 , x2 , x3 , x4 )T , write down
the solution to a given system Bx = d in vector form. Use this to determine the
vector d in this case.
Suppose we have a consistent system, and suppose that the rank r is strictly less than
n, the number of unknowns. Then, as we have just seen in Example 5.4, the system in
reduced row echelon form (and hence the original one) does not provide enough
information to specify the values of x1 , x2 , . . . , xn uniquely. Lets consider this in more
detail.
Example 5.5 Suppose we are given a system for which the augmented matrix
reduces to the row echelon form
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 1 5.
0 0 0 0 0 0 0
Here the rank (number of non-zero rows) is r = 3 which is strictly less than the
number of unknowns, n = 6.
Continuing to reduced row echelon
1 3
0 0
0 0
0 0
0 4 2 0 28
1 2 0 0 14
.
0 0 0 1
5
0 0 0 0
0
Activity 5.5 Verify this. What are the additional two row operations which need to
be carried out?
131
The form of these equations tells us that we can assign any values to x2 , x4 and x5 ,
and then the leading variables will be determined. Explicitly, if we give x2 , x4 , x5 the
arbitrary values s, t, u, where s, t, u represent any real numbers, the solution is given
by
x1 = 28 3s 4t 2u, x2 = s, x3 = 14 2t, x4 = t, x5 = u, x6 = 5.
There are infinitely many solutions because the so-called free variables x2 , x4 , x5
can take any values s, t, u R.
Generally, we can describe what happens when the row echelon form has r < n non-zero
rows (0 0 . . . 0 1 . . . ). If the leading one is in the kth column, it is the coefficient
of the variable xk . So if the rank is r and the leading ones occur in columns c1 , c2 , . . . , cr
then the general solution to the system can be expressed in a form where the unknowns
xc1 , xc2 , . . . , xcr (the leading variables) are given in terms of the other n r unknowns
(the non-leading variables), and those n r unknowns are free to take any values. In
Example 5.5, we have n = 6 and r = 3, and the 3 variables x1 , x3 , x6 can be expressed in
terms of the 6 3 = 3 free variables x2 , x4 , x5 .
In the case r = n, where the number of leading ones r in the echelon form is equal to
the number of unknowns n, there is only one solution to the system for there is a
leading one in every column since the leading ones move to the right as we go down the
rows. In this case there is a unique solution obtained from the reduced echelon form. In
fact, this can be thought of as a special case of the more general one discussed above:
since r = n there are n r = 0 free variables, and the solution is therefore unique.
We can now summarise our conclusions thus far concerning a general linear system of m
equations in n variables, written as Ax = b, where the coefficient matrix A is an m n
matrix of rank r.
If the echelon form of the augmented matrix has a row (0 0 . . . 0 a), with a = 0,
the original system is inconsistent; it has no solutions. In this case
rank(A) = r < m and rank(A|b) = r + 1.
If the echelon form of the augmented matrix has no rows of the above type the
system is consistent, and the general solution involves n r free variables, where r
is the rank of the coefficient matrix. When r < n there are infinitely many solutions,
but when r = n there are no free variables and so there is a unique solution.
132
x1
28 3s 4t 2u
x2
x3
14 2t
x= =
x
t
4
x5
u
x6
5
or
2u
4t
3s
28
0 s 0 0
14 0 2t 0
x=
0 + 0 + t + 0 .
0 0 0 u
0
0
0
5
s, t, u R,
28
0
14
p=
0 ,
0
5
3
1
v1 =
0 ,
0
0
4
0
v2 =
1 ,
0
0
2
0
v3 =
0 .
1
0
Applying the same method generally to a consistent system of rank r with n unknowns,
we can express the general solution of a consistent system Ax = b in the form
x = p + a1 v1 + a2 v2 + + anr vnr .
Note that, if we put all the ai s equal to 0, we get a solution x = p, which means that
Ap = b, so p is a particular solution of the system. Putting a1 = 1 and the remaining
ai s equal to zero, we get a solution x = p + v1 , which means that A(p + v1 ) = b. Thus
b = A(p + v1 ) = Ap + Av1 = b + Av1 .
133
Comparing the first and last expressions, we see that Av1 = 0. Clearly, the same
equation holds for v2 , . . . , vnr . So we have proved the following.
If A is an m n matrix of rank r, the general solution of Ax = b is the sum of:
a particular solution p of the system Ax = b and
a linear combination a1 v1 + a2 v2 + + anr vnr of solutions v1 , v2 , . . . , vnr of
the homogeneous system Ax = 0.
If A has rank n, then Ax = 0 only has the solution x = 0, and so Ax = b has a unique
solution: p + 0 = p.
This is a more precise form of the result of Theorem 3.2, which states that all solutions
of a consistent system Ax = b are of the form x = p + z where p is any solution of
Ax = b and z N(A), the null space of A (the set of all solutions of Ax = 0).
Activity 5.6 Solve the following system of equations Ax = b by reducing the
augmented matrix to reduced row echelon form:
x1 x2 + x3 + x4 + 2x5 = 4
x1 + x2 + x4 x5 = 3
x1 x2 + 2x3 + 3x4 + 4x5 = 7.
Show that your solution can be written in the form p + su1 + tu2 where Ap = b,
Au1 = 0 and Au2 = 0.
5.4 Range
The range of a matrix A is defined as follows.
Definition 5.3 (Range of a matrix) Suppose that A is an m n matrix. Then the
range of A, denoted by R(A), is the subset
R(A) = {Ax | x Rn }
of Rm . That is, the range is the set of all vectors y Rm of the form y = Ax for some
x Rn .
What is the connection between the range of a matrix A and a system of linear
equations Ax = b? If A is m n, then x Rn and b Rm . If the system Ax = b is
consistent, then this means that there is a vector x Rn such that Ax = b, so b is in
the range of A. Conversely, if b is in the range of A, then the system Ax = b must have
a solution. Therefore, we have shown that for an m n matrix A:
The range of A, R(A), consists of all vectors b Rm for which the system of
equations Ax = b is consistent.
134
5.4. Range
Lets look at R(A) from a different point of view. Suppose that the columns of A are
c1 , c2 , . . . , cn . Then we may write A = (c1 c2 . . . cn ). If x = (1 , 2 , . . . , n )T Rn ,
then the product Ax is equal to
1 c 1 + 2 c 2 + + n c n .
Activity 5.8 Convince yourself of this last statement. Write out each side using
ci = (c1i , c2i , . . . , cmi )T to show that
Ax = 1 c1 + 2 c2 + + n cn .
Try to do this yourself before looking at the solution to this activity. This is a very
important result which will be used many times in this course, so make sure you
understand how it works.
So, R(A), the set of all matrix products Ax, is also the set of all linear combinations of
the columns of A. For this reason R(A) is also called the column space of A. (More on
this in Chapter 7.)
If A = (c1 c2 . . . cn ), where ci denotes column i of A, then we can write
R(A) = {a1 c1 + a2 c2 + . . . + an cn | a1 , a2 , . . . , an R}.
Example 5.6 Suppose that
Then for x = (1 , 2 )T ,
1 2
Ax = 1 3
2 1
so
1
2
1 2
A = 1 3 .
2 1
1 + 22
1
2
= 1 + 32 = 1 1 + 2 3 ,
21 + 2
2
1
1 + 22
R(A) = 1 + 32
21 + 2
or
1 , 2 R
R(A) = {1 c1 + 2 c2 | 1 , 2 R} ,
1
2
135
Example 5.7 Consider the following systems of three equations in two unknowns.
x + 2y = 0
x + 2y = 1
x + 3y = 5
x + 3y = 5
2x + y = 3
2x + y = 2
Solving these by Gaussian elimination (or any other method) you will find that the
first system is consistent and the second system has no solution. The first system has
the unique solution (x, y)T = (2, 1)T .
Activity 5.10
A = 1
2
Checking this
Ax = 1
2
0
2
x
3, x =
, b = 5 .
y
1
3
2
0
2
3
= 5 or
1
3
1
0
1
2
5 = 2 1 3 = 2c1 c2 .
3
2
1
On the other hand, it is not possible to express the vector (1, 5, 2)T as a linear
combination of the column vectors of A. Trying to do so would lead to precisely the
same set of inconsistent equations.
Notice, also, that the homogeneous system Ax = 0 has only the trivial solution, and
that the only way to express 0 as a linear combination of the columns of A is by
0c1 + 0c2 = 0.
Activity 5.11
Activity 5.12 Look at your solution to Activity 5.6 on page 134, and express the
vector b = (4, 3, 7)T as a linear combination of the columns of the coefficient matrix
1 1 1 1 2
A = 1 1 0 1 1 .
1 1 2 3 4
Do the same for the vector 0.
136
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
find a general solution to a linear system, Ax = b, expressed in vector notation as
the sum of a particular solution plus a general solution to the associated
homogeneous system Ax = 0
explain why a general solution x to Ax = b, where A is an m n matrix of rank r,
is of the form x = p + a1 v1 + a2 v2 + + anr vnr , ai R; specifically why there
are n r arbitrary constants
explain what is meant by the rank of a matrix and by the range of a matrix, be
able to find the rank of a matrix
show that if A = (c1 c2 . . . cn ), and if x = (1 , 2 , . . . , n )T Rn , then
Ax = 1 c1 + 2 c2 + + n cn
write b as a linear combination of the columns of A if Ax = b is consistent
write 0 as a linear combination of the columns of A, and explain when it is possible
to do this in some way other than using the trivial solution, x = 0, with all the
coefficients in the linear combination equal to zero.
Exercises
Exercise 5.1
Find the rank of the matrix
1
2
A=
1
0
0 1 0
1 1 1
3 1 2
3 2 2
2
3
.
2
0
1 2 0
x
2
A= 5 1
x= y
b = 7
1 1 1
z
In case (a), use Cramers rule to find the value of z in terms of and . In case
(c), solve the system using row operations and express the solution in vector form,
x = p + tv.
137
Exercise 5.3
A system of linear equations Bx = d is known to have the following general solution:
1
3
1
0
1
0
x=
s, t R.
2 + s 0 + t 1
0
0
1
1
3
Let c1 = 1 be the first column of B. If d = 5 , find the matrix B.
2
2
Exercise 5.4
1 2 1
A = 2 3 0.
3 5 1
Find a condition the components of the vector b = (a, b, c)T must satisfy in order for
Ax = b to be consistent. Hence, or otherwise, show that R(A) is a plane in R3 , and
write down a Cartesian equation of this plane.
Show that d = (1, 5, 6)T is in R(A). Express d as a linear combination of the columns of
A. Is it possible to do this in two different ways? If the answer is yes, then do so;
otherwise justify why this is not possible.
Exercise 5.5
Consider the matrices
1 1
1
0 1 2
A=
2 1 8
3 1
7
2 3 2 5
3 6 9 6
B=
2 9 1 9
5 6 9 4
4
1
b=
a.
b
138
7
3
x = 3 + t 2 , t R.
0
1
Feedback to activity 5.3
Substitute for p1 , p2 , p3 in the row echelon form of the augmented matrix and then
continue to reduce it to reduced row echelon form. The non-leading variable is x3 .
Letting x3 = t, the general solution is
x1
3
5
x2 2
2
x=
x3 = 0 + t 1 = p + tv,
0
0
x4
5
t R.
Since Bp = d, multiplying Bp you will find that d = (1, 4, 5)T . (You can check all this
by row reducing (B|d).)
Feedback to activity 5.5
Put the augmented matrix into RREF:
1 1 1 1 2
(A|b) = 1 1 0 1 1
1 1 2 3 4
4
1 1 1 1 2
R2 +R1
3 R3 R1 0 0 1 2 1
7
0 0 1 2 2
1 1 1 1 2
4
R2 R3
R3 R2
0 0 1 2 1
1 R
1 2R3
0 0 0 0 1
2
1 1 0 1
R1 R2
0 0 1 2
0 0 0 0
1 1 1 1 0
0 0 1 2 0
0 0 0 0 1
0
1
0
1 .
1
2
4
1
3
0
1
2
Set the non-leading variables to arbitrary constants: x2 = s, x4 = t, and solve for the
leading variables in terms of these parameters, starting with the bottom row. For
s, t R,
x5 = 2,
x4 = t,
x3 = 1 2t,
x2 = s,
x1 = 1 + s + t
x1
1+s+t
1
1
1
x2
0
1
0
s
x=
x3 = 1 2t = 1 + s 0 + t 2 = p + su1 + tu2
x4
t
0
0
1
x5
2
2
0
0
139
Verify:
0
1 1 1 1 2
4
Ap = 1 1 0 1 1
1 = 3
0
1 1 2 3 4
7
2
1
1
1 1 1 1 2
0
= 0
Au1 = 1 1 0 1 1
0
0
1 1 2 3 4
0
0
0
1 1 1 1 2
0
Au2 = 1 1 0 1 1 2 = 0
1
1 1 2 3 4
0
0
c11
c21
Ax =
...
cm1
A = (cij ) and x:
c12 . . . c1n
1
c22 . . . c2n 2
.
..
.. ..
..
.
.
.
.
cm2 . . . cmn
n
Ax =
..
.
cm1 1 + cm2 2 + + cmn n
c11 1
c12 2
c1n n
c21 1 c22 2
c
+ . + + 2n. n
Ax =
.
.. ..
..
cm1 1
cm2 2
cmn n
c1n .
c12
c11
c22
c2n
c21
Ax = 1
... + 2 ... + + n ...
cmn
cm2
cm1
Ax = 1 c1 + 2 c2 + n cn
All these steps are reversible, so the expression
1 c 1 + 2 c 2 + n c n
140
1
1
2
4
1 0 + 2 1 = 3 .
1
2
4
7
Similarly, since Au1 = 0 and Au2 = 0, any linear combination of these two vectors will
give a vector v = su1 + tu2 for which Av = 0, and you can rewrite Av as a linear
combination of the columns of A. For example, taking u1 ,
1
1
0
c1 + c2 = 1 +
1
= 0.
1
1
0
Comments on exercises
Solution to exercise 5.1
Using row operations, the matrix A reduces to
1 0
0 1
A . . .
0 0
0 0
echelon form
1
0
2
1 1 1
.
1 1 3
0
0
0
1 0 0 1 1
0 1 0 0
2
A . . .
0 0 1 1 3 .
0 0 0 0
0
The leading variables are x1 , x2 , and x3 . Set the non-leading variables x4 = s and
x5 = t. Then the solution is
x1
s + t
1
1
x2 2t
0
2
x3 = s 3t = s 1 + t 3 ,
s, t R.
x4 s
1
0
x5
t
0
1
141
So the null space consists of all vectors of the form x = sv1 + tv2 , where v1 and v2 are
the vectors displayed above. It is a subset of R5 .
The range of A can be described as the set of all linear combinations of the columns of
A,
R(A) = {1 c1 + 2 c2 + 3 c3 + 4 c4 + 5 c5 | i R}
where
1
0
1
0
2
2
1
1
1
3
c1 =
1 , c2 = 3 , c3 = 1 , c4 = 2 , c5 = 2 .
0
3
2
2
0
This is a subset of R4 . We will find a better way to describe this set when we look at
the column space of a matrix in later chapters.
1
(A|b) = 5
1
1 2
0 3
0 3
2 0 2
1 2 0
2
1 3 7 0 9 3 3
1 1
0 3 1 2
0
2
1 2 0
2
1
1 0 3 1
1 .
1 2
0 0 0 1
4
1 2 0 2
1 0 23
3
(A|b) . . . 0 1 31 13 0 1 13 13 .
0 0 0 0
0 0 0 0
The solution can now be read from the matrix. Setting the non-leading variable z = t,
4 2 4
2
3
t
x
3
3
3
y = 1 + 1 t = 1 + t 1 = p + tv,
t R.
3
3
3
3
z
t
0
1
Solution to exercise 5.3
The matrix B must be 3 4 since the solutions are in R4 and the vector c1 R3 . Let
1
B = 1 c2 c3 c4 .
2
142
The solution is of the form x = p + sv1 + tv2 where v1 , v2 are in N(B), therefore you
know that Bp = d, Bv1 = 0 and Bv2 = 0. Regarding the matrix products as linear
combinations of the column vectors of B, we obtain
Bp = c1 + 2c3 = d
Bv1 = 3c1 + c2 = 0
Bv2 = c1 c3 + c4 = 0.
Knowing c1 , you just need to solve these for the other three columns.
3
1
2
1
1
3
2c3 =
5
1 =
4
c3 =
2
,
c2 = 3 1 = 3 ,
2
2
4
2
2
6
1
1
0
c4 = c3 c1 = 2 1 = 1 .
2
2
4
The matrix B is
1 3 1
0
B = 1 3 2
1 .
2 6 2 4
You can check your answer by row reducing the augmented matrix (B|d) to obtain the
solution of Bx = d, and matching it to the solution given.
Solution to exercise 5.4
You might have noticed that this is the same coefficient matrix A as we used in the
examples on page 130. You can easily tackle this question by forming the augmented
matrix and reducing it using row operations,
1 2 1 a
1 2
1
a
(A|b) = 2 3 0 b 0 1 2 b 2a .
3 5 1 c
0 1 2 c 3a
After this first step it is clear that the system will be consistent if and only if
b 2a = c 3a or a + b c = 0.
Therefore, the vector y = (x, y, z)T is in R(A) if and only if x + y z = 0. This is the
Cartesian equation of a plane in R3 .
The vector d = (1, 5, 6)T is in R(A), since its components satisfy the equation. This was
also shown Example 5.3 on page 130. Looking back at these examples you will notice
that the vector (1, 5, 4)T , for which the system is inconsistent, is not in the plane R(A).
For Activity 5.3 on page 130 you found a general solution of the system of equations
Ax = d to be
7
3
x = 3 + t 2 , t R.
0
1
Any solution x will enable you to write d as a linear combination of the columns of A.
For example, taking first t = 0 and then t = 1, d = 7c1 3c2 or d = 4c1 c2 c3 ;
that is
1
1
2
1
1
2
1
5 = 7 2 3 3 or 5 = 4 2 3 0 .
6
3
5
6
3
5
1
143
Note that any two columns of A, or any two non-parallel vectors in R(A), can be used
to obtain a vector equation of the plane R(A); that is, for example,
1
2
x = s 2 + t 3 , s, t R
3
5
You need to put the matrix into row echelon form to answer the first question, and into
reduced row echelon form for the second,
1 1
1
1 0 3
0 1 2
0 1 2
A=
2 1 8 0 0 0
3 1
7
0 0 0
(a) The rank of A is 2. There is one non-leading variable. If you write x = (x, y, z)T ,
then setting z = t, you will obtain the solution
3
x = t 2 , t R.
1
Since there are non-trivial solutions of Ax = 0, it is possible to express 0 as a linear
combination of the columns of A with non-zero coefficients. A non-trivial linear
combination of the column vectors which is equal to the zero vector is given by any
non-zero vector in the null space. For example, using t = 1, the product Ax yields,
0
1
1
1
1 2 0
0
3c1 + 2c2 + c3 = 3
2 + 2 1 + 8 = 0 = 0.
0
3
1
7
1 1
1 4
1 1
1
4
1 1 1
4
0 1 2 1
1
1
0 1 2
0 1 2
.
2 1 8 a
0 3 6
0 0 0
a8
a5
3 1
7 b
0 2 4 b 12
0 0 0 b 10
Therefore, Ax = b is consistent
to reduced echelon form
1 1
0 1
0 0
0 0
144
4
1
0
1
0
0
0
0
0 3 3
1 2 1
,
0 0 0
0 0 0
3
3
x = 1 + t 2
0
1
t R.
1 2 3 2
0 1 4
1
0 5
5
5
0 4 6 6
1 2 3 2
0 1
1
1
= 450.
0 0
5
2
0 0 5 1
Since det(B) = 0, the rank of B is 4. Therefore, the main theorem tells us that Bx = 0
has only the trivial solution. Therefore there is no way to write 0 as a linear combination
of the column vectors of B except the trivial way, with all coefficients equal to 0.
Also, using this theorem, Bx = b has a unique solution for all b R4 . Therefore
R(B) = R4 . That is, a and b can be any real numbers, the system Bx = b is always
consistent.
145
146
Chapter 6
Sequences, series and difference
equations
Suggested reading
6
Aims of the chapter
In this chapter we make a slight detour into the topic of sequences, series and difference
equations (also known as recurrence equations). Many problems in economics and
finance involve difference equations, particularly those involving quantities which
change with time, but not continuously (such as the balance of a deposit account where
interest is paid once a year, at the end of the year). This chapter is independent of the
other chapters so far, but the material is important in its own right and, moreover, we
will see later that matrices and linear algebra can be used to solve systems of difference
equations.
6.1 Sequences
6.1.1
Sequences in general
147
6.1.2
Arithmetic progressions
The arithmetic progression with first term a and common difference d has its terms
given by the formula yt = a + dt. For example, the arithmetic progression with first
term 5 and common difference 3 is 5, 8, 11, 14, . . .. Note that yt is obtained from yt1 by
adding the common difference d. In symbols, yt = yt1 + d.
6.1.3
Geometric progressions
Another very important type of sequence is the geometric progression. The geometric
progression with first term a and common ratio x is given by the formula yt = axt .
Notice that successive terms are related through the relationship yt = xyt1 . For
example, the geometric progression with first term 3 and common ratio 1/2 is given by
yt = 3(1/2)t ; that is, the sequence is 3, 3/2, 3/4, 3/8, . . ..
6.1.4
Compound interest
6.1.5
Frequent compounding
What happens if interest is added more frequently than once a year? Suppose, for
example, that instead of 8% interest paid at the end of the year, we have 4% interest
added twice-yearly, once at the middle of the year and once at the end. If $100 is
invested, the amount after one year will be
100(1 + 0.04)2 = 108.16
2
148
6.2. Series
dollars, which is slightly more than the $108 which results from the single annual
addition. If the interest is added quarterly (so that 2% is added four times a year), the
amount after one year will be
100(1 + 0.02)4 = 108.24
dollars (approximately). In general, when the year is divided into m equal periods, the
rate is r/m over each period, and the balance after one year is
r m
P 1+
,
m
where P is the initial deposit.
Taking m larger and larger formally, letting m tend to infinity we find ourselves in
the situation of continuous compounding. Now, it is a standard fact (that we wont
verify here) that, as m gets larger and larger, tending to infinity,
r m
1+
m
approaches er , where e is the base of the natural logarithm. (See the subject guide for
MT1174 Calculus.) Formally,
r m
= er .
lim 1 +
m
m
r
So the balance after one year should be P e . If invested for a further year, we would
have P er er = P (er )2 = P e2r . After t years continuous compounding, the balance of the
account would be P ert .
6.2 Series
Let us continue with the story of our investor. It is natural to investigate how the
balance varies if the investor adds a certain amount to the account each year. Suppose
that they add $P to the account at the beginning of each year, so that at the beginning
of the first year the balance is $P . At the beginning of the second year the balance in
dollars will be $P (1 + r) + P ; this represents the money from the first year with interest
added, and the new, further, deposit of $P . Convince yourself that, continuing in this
way, the balance at the beginning of year t is, in dollars,
P + P (1 + r) + + P (1 + r)t2 + P (1 + r)t1 .
How can we calculate this expression? Note that it is the sum of the first t terms (that
is, term 0 to term t 1) of the geometric progression with first term P and common
ratio 1 + r. Before coming back to this, we shall discuss such things in a more general
setting.
Given a sequence y0 , y1 , y2, y3 , . . ., a finite series is a sum of the form
y0 + y2 + + yt1 ,
the first t terms added together, for some number t. There are two important results
about series, concerning the cases where the corresponding sequence is an arithmetic
progression (in which case the series is called an arithmetic series) and where it is a
geometric progression (in which case the series is called a geometric series).
149
6.2.1
Arithmetic series
t(2a + (t 1)d)
.
2
There is a useful way of remembering this result. Notice that St may be rewritten as
St =
St = t
(y0 + yt1 )
(a + (a + (t 1)d))
=t
,
2
2
so that we have the following easily remembered result: an arithmetic series has value
equal to the number of terms, t, times the value of the average of the first and last
terms (y0 + yt1 )/2. Equivalently, the average value St /t of the t terms is the average,
(y0 + yt1 )/2 of the first and last terms.
Activity 6.2 Find the sum of the first n terms of an arithmetic series whose first
term is 1 and whose common difference is 5.
6.2.2
Geometric series
We now look at geometric series. It is easily checked (by multiplying out the expression)
that, for any x,
(1 x)(1 + x + x2 + + xt1 ) = 1 xt .
So, if x = 1 and yt = axt , then the geometric series
a(1 xt )
.
1x
Example 6.1 In our earlier discussion on savings accounts, we came across the
expression
P + P (1 + r) + . . . + P (1 + r)t2 + P (1 + r)t1 .
We now see that this is a geometric series with t terms, first term P and common
ratio 1 + r. Therefore it equals
P
1 (1 + r)t
P
=
(1 + r)t 1 .
1 (1 + r)
r
150
=
=
=
=
2y0 + 1 = 2(1) + 1 = 2 + 1
2y1 + 1 = 2(2 + 1) + 1 = 22 + 2 + 1
2y2 + 1 = 2(22 + 2) + 1 = 23 + 22 + 2 + 1
2y3 + 1 = 2(23 + 22 + 2 + 1) + 1 = 24 + 23 + 22 + 2 + 1.
or
lim xt = 0.
We notice that, while xt gets closer and closer to 0 for all values of x in the range
1 < x < 1, its behaviour depends to some extent on whether x is positive or negative.
3
151
When x is negative, the terms are alternately positive and negative, and we say that the
approach to zero is oscillatory. For example, when x = 0.2, the sequence xt is
0.2, 0.04, 0.008, 0.0016, 0.00032, 0.000064, 0.0000128, 0.00000256, . . .
When x is less than 1, the sequence is again oscillatory, but it does not approach any
limit, the terms being alternately large-positive and large-negative. In this case, we say
that at oscillates increasingly.
As an application of this, let us consider again the geometric series
St = a + ax + ax2 + + axt1 .
We have
a(1 xt )
.
1x
If 1 < x < 1 then xt 0 as t . This means that St approaches the number
a(1 0)/(1 x) = a/(1 x), as t increases. In other words,
St =
St
a
as t .
1x
We call this limit the sum to infinity of the sequence given by yt = axt . Note that a
geometric sequence has a finite sum to infinity only if the common ratio is strictly
between 1 and 1.
Example 6.3 Consider the sequence with yi = 1/2i for i 0. The sum of the first t
terms of this sequence is
St = 1 +
1
1
1
+ 2 + + t1 .
2 2
2
1
2
2
+
3
2
3
2
3
++
2
3
152
Example 6.4 John has opened a savings account with a bank, and they pay a
fixed interest rate of 5% per annum, with the interest paid once a year, at the end of
the year. He opened the savings account with a payment of $100 on 1 January 2003,
and will be making deposits of $200 yearly, on the same date. What will his savings
be after he has made N of these additional deposits? (Your answer will be an
expression involving N.)
If yN is the required amount, then we have
y1 = (1.05)100 + 200,
y2 = (1.05)y1 + 200
= 100(1.05)2 + 200(1.05) + 200,
and, in general, we can spot the pattern and observe that
yN = 100(1.05)N + 200(1.05)N 1 + 200(1.05)N 2 + + 200(1.05) + 200
= 100(1.05)N + 200 1 + (1.05) + (1.05)2 + + (1.05)N 2 + (1.05)N 1
1 (1.05)N
= 100(1.05)N + 200
1 (1.05)
N
= 100(1.05) + 4000 (1.05)N 1 ,
where we have used the formula for a geometric series.
153
yt = (1 + r)yt1 I.
This is of the form yt = ayt1 + b where a = 1 + r and b = I.
Lets work out a few terms of a difference equation.
154
b
1a
b
1
=
= 1,
1a
12
so the solution is
yt = y + (y0 y )at = 1 + (1 (1))2t = 1 + 2t+1 ,
exactly as we found above.
Example 6.7 We find the solution of the equation
yt = 5yt1 + 6,
given that y0 = 5/2. If we take a = 5 and b = 6 in the standard form yt = ayt1 + b,
we have exactly the equation given. The first thing to do is to find the constant
solution. By the formula, this is y = b/(1 a) = 6/(1 5) = 3/2. We can now
write down the general solution and insert the given value of y0 :
3
yt = y + (y0 y )at = + 4(5t ).
2
155
Behaviour of at
a
at 0 (decreasing)
at 0 (oscillating)
oscillates increasingly
t
Behaviour of yt
yt or yt
yt y
yt y
oscillates increasingly
6
In the first of these cases (a > 1), whether yt or yt will, of course, depend
on the sign of y0 y .
Activity 6.6 The case a = 1 is not covered in the table just given. How does the
solution yt behave in this case?
5
6
156
Example 6.8 Generalising the argument above, we see that pt1 determines qt ,
which in turn determines pt , according to the rules
qt = q S (pt1 ),
pt = pD (qt ),
where q S is the supply function and pD the inverse demand function. Suppose that
the demand and supply equations are, respectively, as follows:
q + p = 24, 2q + 18 = p.
Then the equilibrium quantity and price are q = 2, p = 22, and
q S (p) = 0.5p 9,
pD (q) = 24 q.
pt = 24 qt .
pt = 33 0.5pt1 .
This is in the standard form yt = ayt1 + b, with pt replacing yt and
a = 0.5, b = 33. The time-independent solution is b/(1 a) = 33/(3/2) = 22, and
the explicit solution in terms of p0 is
pt = 22 + (p0 22)(0.5)t .
Note that the time-independent solution is the equilibrium price p = 22, and that in
this case the sequence approaches p in an oscillatory way. We say that we have a
stable cobweb. However, it is possible, for other supply and demand curves, that the
price oscillates about the equilibrium price with ever-increasing magnitude. In such
cases, the price does not approach p and we say we have an unstable or exploding
cobweb.7
157
difference equation
yt = (1 + r)yt1 ,
(t = 1, 2, 3, . . .).
This is in the standard form, with a = (1 + r) and b = 0. The solution is fairly obvious
(since this is just a geometric progression).
It might seem unnecessary to use difference equations for such a simple investment
scenario, when it is very easy to determine by elementary means the amount of capital
after t years. However, suppose that we withdraw an amount I at the end of each year
for N years. Then what is the balance of the account after t years? This is less obvious,
but difference equations provide an easy means of determining the answer. As we noted
above, in this case, the difference equation is
yt = (1 + r)yt1 I,
where y0 = P.
This is another case of the first-order linear difference equation, in standard form with
a = 1 + r and b = I. The time-independent solution is therefore y = I/r. The general
solution is yt = y + (y0 y )at , and since y0 = P we obtain
yt =
I
I
+ P
r
r
(1 + r)t .
This formula enables us to answer a number of questions. First, we might want to know
how large the withdrawals I can be given an initial investment of P , if we want to be
able to withdraw I annually for N years. The condition that nothing is left after N
years is, yN = 0. This is
I
I
+ P
(1 + r)N = 0,
r
r
and rearranging, we get
I
(1 + r)N 1 = P (1 + r)N ,
r
so that
r(1 + r)N
P.
(1 + r)N 1
I(P ) =
An inverse question is: what principal P is required to provide an annual income I for
the next N years? Rearranging the equation gives the result
P (I) =
I
r
1
(1 + r)N
158
solving second-order differential equations. (See the subject guide for MT1174
Calculus.)
The general solution to the equation depends on whether the auxiliary equation
z 2 + a1 z + a2 = 0 has two distinct solutions, or just one solution, or no (real) solutions.
Thus, the form of general solution depends on the value of the discriminant, a21 4a2 .
We consider each case in turn.
When the auxiliary equation has two distinct solutions, and , the general
solution is
yt = At + B t (A, B constants).
In any specific case, A and B are determined by the initial values y0 and y1 , as in
Example 6.9.
When the auxiliary equation has just one solution, , the general solution is
yt = Ctt + Dt = (Ct + D)t .
As in the previous case, the values of the constants C and D can be determined by
using the initial values y0 and y1 .
The auxiliary equation has no solutions when the quantity a21 4a2 is negative. In
that case, 4a2 a21 is positive, and hence so is a2 . Thus there is a positive square
root r of a2 ; that is, we can define r = a2 . In order to write down the general
solution in this case we define the angle by
cos =
a1
a1
= .
2r
2 a2
Let us find yt if
yt 2yt1 + 4yt2 = 0,
159
r = 4 = 2. It follows that
cos = (2)/(2r) = 2/4 = 1/2,
so = /3. The general solution is therefore
yt = 2t (E cos (t/3) + F sin (t/3)) .
Putting t= 0, and using the given initial condition y0 = 1, we have E = 1. Similarly
y1 = 1 3 implies that
2 (E cos(/3) + F sin(/3)) = 2
so that 1 +
3F = 1
1
3
+F
2
2
=1
3,
or y =
k
.
1 + a1 + a2
160
161
Suppose
3
Ct = Yt1 ,
8
1
It = 40 + (Yt1 Yt2 )
8
and lets assume the equilibrium condition Yt = Ct + It holds. Lets suppose that
Y0 = 65 and Y1 = 64.5, and try to determine an expression for Yt .
Arguing as above, we have
Y t = C t + It
1
3
=
Yt1 + 40 + (Yt1 Yt2 )
8
8
1
1
= 40 + Yt1 Yt2 ,
2
8
so
1
1
Yt Yt1 + Yt2 = 40.
2
8
1
1
z 2 z + = 0,
2
8
2
which has discriminant (1/2) 4(1/8) = 1/4. This is negative, so there are no
solutions. We are therefore in the third case of a second-order difference
equation. To
proceed, we use the method given above. We have r = 1/8 = 1/(2 2), and
2 2
1
(1/2)
=
= ,
cos =
2r
4
2
so = /4. Thus, the general solution to the homogeneous equation in this case is
1
(E cos(t/4) + F sin(t/4)) .
2 2
We need a particular solution of
1
1
Yt Yt1 + Yt2 = 40.
2
8
Trying Yt = k, a constant, we see that k (1/2)k + (1/8)k = 40, so k = 64. It follows
that for some constants E and F ,
yt = 64 +
2 2
(E cos(t/4) + F sin(t/4)) .
162
so E = 1. Also,
Y1 = 64 + E
2 2
1
= 64 + E
2 2
E F
= 64 + + ,
4
4
cos(/4) + F
1
+F
2 2
1
2 2
sin(/4)
1
and since this is 64.5, we have E + F = 2 and hence F = 1. The final answer is
therefore
t
1
t
t
Yt = 64 +
cos
+ sin
.
4
4
2 2
Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by arithmetic and geometric progressions, and calculate the
sum of finite arithmetic and geometric series
explain compound interest and calculate balances under compound interest
apply sequences and series in management and finance
analyse the long-term behaviour of series and sequences
solve problems involving first-order difference equations
solve second-order difference equations
analyse the behaviour of solutions to difference equations
solve problems involving the application of difference equations.
Exercises
Exercise 6.1
A geometric progression has a sum to infinity of 3 and has second term y1 equal to 2/3.
Show that there are two possible values of the common ratio x and find the
corresponding values of the first term a.
Exercise 6.2
Suppose we have an initial amount, A0 , to invest and we add an additional investment
F at the end of each subsequent year. All investments earn an interest of i% per annum,
paid at the end of each year.
(a) Use the formula for the sum of a geometric series to derive a formula for the value of
the investment, An , after n years.
(b) An investor puts $10, 000 into an investment account that yields interest of 10% per
annum. The investor adds an additional $5, 000 at the end of each year. How much will
163
there be in the account at the end of five years? Show that if the investor has to wait N
years until the balance is at least $80, 000, then
N
ln(13/6)
.
ln(1.1)
Exercise 6.3
An amount of $1,000 is invested and attracts interest at a rate equivalent to 10% per
annum. Find expressions for the total after one year if the interest is compounded:
(a) annually
(b) quarterly
(c) monthly
(d) daily. (Assume the year is not a leap year.)
What would be the total after one year if the interest is 10% compounded continuously?
Exercise 6.4
Suppose yi = 1/22i . Find the limit, as t , of
St = y0 + y2 + + yt1 .
Exercise 6.5
Planners believe that, as a result of a recent government grant scheme, the number of
new high technology businesses starting up each year will be N. There are already 3,000
such businesses in existence in the country, but it is expected that each year 5% of all
those in existence at the beginning of the year will fail (shut down). Let yt denote the
number of businesses at the end of year t. Explain why
yt = 0.95yt1 + N.
Solve this difference equation for general N. Find a condition on N which will ensure
that the number of businesses will increase from year to year.
Exercise 6.6
The supply and demand functions for a good are
q S (p) = 0.05p 4, q D (p) = 20 0.15p.
Find the equilibrium price. What is the inverse demand function pD (q)? Suppose that
the sequence of prices pt is determined by pt = pD (q S (pt1 )) (as in the cobweb model).
Find an expression for pt .
164
6.14. Exercises
Exercise 6.7
A market for a commodity is modelled by taking the demand and supply functions as
follows:
D(p) = 1 p,
S(p) = p,
so that when the price p prevails the amount of commodity demanded by the market is
D(p) and the amount which producers will supply is S(p). Price adjusts over time t in
response to the excess of the demand over the supply according to the equation:
pt+1 pt = a(D(pt ) S(pt )),
where a is a positive constant. Initially the price p is p0 = 34 . Solve this equation and
show that over time the price adjusts towards the clearing value (i.e. the price at which
supply and demand are equal) if and only if
0 < a < 1.
Under what circumstances does the price tend towards the equilibrium price in an
oscillatory fashion? What happens to the price if a = 21 ?
Exercise 6.8
Find the general solution of the difference equation
yt yt1 6yt2 = 0.
Exercise 6.9
(a) Suppose that consumption this year is the average of this years income and last
years consumption; that is,
1
Ct = (Yt + Ct1 ).
2
Suppose also that the relationship between next years income and current investment is
Yt+1 = kIt , for some positive constant k. Show that, if the equilibrium condition
Yt = Ct + It holds, then
Yt
k
k+1
Yt1 + Yt2 = 0.
2
2
(b) In the model set up in part (a), suppose that k = 3 and that the initial value Y0 is
positive. Show that Yt oscillates with increasing magnitude.
(c) Find the values of k for which the model set up in part (a) leads to an oscillating Yt ,
and determine whether or not the oscillations increase in magnitude. (Remember we are
given that k > 0.)
165
1 (2/3)t
=2 1
1 (2/3)
2
3
2
3
As t , (2/3)t 0 and so St 2.
Feedback to activity 6.5
For yt = (2/3)yt1 + 5, we have a = 2/3 and b = 5. Then,
y =
b
5
=
= 15,
1a
1 (2/3)
yt = y + (y0 y )a = 15 + (2 15)
Feedback to activity 6.6
When a = 1, we have
2
3
2
= 15 13
3
yt = y + (y0 y )(1)t
and y alternately takes two values: it flips between the value y + (y0 y ) = y0
(when t is even) and the value y (y0 y ) = 2y y0 (when t is odd).
Feedback to activity 6.7
The auxiliary equation is
z 2 5z 14 = (z + 2)(z 7) = 0,
with solutions 2 and 7. The homogeneous equation
yt 5yt1 14yt2 = 0
166
Comments on exercises
Solution to exercise 6.1
We know that the sum to infinity is given by the formula a/(1 x) and that y1 = ax.
Therefore, the given information is
a
2
= 3, ax = .
1x
3
From the first equation, a = 3(1 x) and the second equation then gives
3(1 x)x = 2/3, from which we obtain the quadratic equation 9x2 9x + 2 = 0. This
has the two solutions x = 2/3 and x = 1/3. The corresponding values of the first term a
(given by a = 3(1 x)) are 1 and 2, respectively. So, as suggested by the question, there
are two geometric progressions that have the required sum to infinity and second term.
Solution to exercise 6.2
(a) After 1 year, at the beginning of the second, the amount A1 in the account is
A0 (1 + i/100) + F , because the initial amount A0 has attracted interest at rate i/100
and F has been added. Similar considerations show that
A2 =
=
i
A1 + F
100
i
i
1+
A0 1 +
100
100
1+
= A0 1 +
i
100
A3 =
1+
i
A2 + F
100
1+
i
100
= A0
i
1+
100
A0 1 +
3
+F
+F
i
100
1+
+F
i
100
+F
i
1+
100
1+
+F
+ F,
i
100
+F
1+
+F
i
100
+F
+ F.
167
i
100
+F
1+
i
100
++F
1+
i
100
n1
+F
1+
i
100
n2
++F
1+
i
100
+ F.
Now,
F
1+
i
100
n1
i
++F
100
F (1 (1 + i/100)n )
=
1 (1 + i/100)
n
i
100F
=
1+
1 ,
i
100
+F = F +F
1+
1+
i
100
where we have used the formula for the sum of a geometric progression. Therefore
An = A0
i
1+
100
100F
+
i
i
1+
100
1 .
For (b), we use the formula just obtained, with A0 = 10000, i = 10, F = 5000 and
n = 5, and we see that
A5 = 10000 1 +
10
100
100(5000)
10
1+
10
100
ln(13/6)
,
ln(1.1)
as required.
Solution to exercise 6.3
We use the fact that if the interest is paid in m equally spaced instalments, then the
m
total after one year is 1000 1 + mr , where r = 0.1 and m = 1, 4, 12, 365 in the four
cases. Therefore the answers to the first four parts of the problem are as follows:
168
n1
0.1
12
= 1000(1.025)4.
12
.
365
0.1
(d) 1000 1 +
.
365
For the last part, we use the fact that under continuous compounding at rate r, an
amount P grows to P er after one year, so the answer here is 1000e0.1 .
Solution to exercise 6.4
Note that 1/22i = 1/4i = (1/4)i , so this is a geometric series where the common ratio is
1/4. The first term is 1, and there are t terms, so
4
1 (1/4)t
St =
=
1 (1/4)
3
1
4
b
N
N
=
=
= 20N
1a
1 0.95
0.05
400 20
q.
3
3
169
Now,
pt = pD (q S (pt1 )) = pD (0.05pt1 4) =
1
400 20
16
= 120,
1 (1/3)
1
3
1
a
= ,
1 (1 2a)
2
and so
pt = p + (p0 p )(1 2a)t =
1
+
2
3 1
4 2
(1 2a)t =
1 1
+ (1 2a)t .
2 4
The equilibrium price is given by 1 p = p, and so is 1/2. From our expression for pt ,
we see that pt 1/2 as t if and only if (1 2a)t 0. For this to be true, we need
1 < 1 2a < 1, which is equivalent to 0 < a < 1. The price will oscillate towards 1/2
when, additionally, 1 2a is negative. So this happens when 1/2 < a < 1. When
a = 1/2, 1 2a = 0 and the price pt equals 1/2 for all t.
Solution to exercise 6.8
The auxiliary equation is
z 2 z 6 = (z 3)(z + 2) = 0,
so for some constants A and B
yt = A3t + B(2)t .
170
Yt+1
k
Yt
.
k
Substituting in the equation Ct = (Yt + Ct1 )/2, we get
Ct1 = Yt1
1
2
Yt Yt+1 /k =
Yt + Yt1
Yt
k
k
k+1
Yt1 + Yt2 = 0.
2
2
3/2 and
cos = (2)/(2
3/2) =
2/3.
z2
This has no solutions if
k+1
2
<4
k
2
that is,
(k + 1)2 < 8k.
In this case the general solution is of the form
Yt =
k
2
(E cos t + F sin t) .
171
(k + 1)/2
(k + 1)2 /4 2k
,
2
and both of these are positive. Then the solution is of the form
Yt = At + B t ,
and since and are positive, in this case there can be no oscillatory behaviour. The
same holds true when (k + 1)2 = 8k.
We have shown that oscillations occur when (k + 1)2 < 8k, in other words when k lies
strictly between the roots of the equation (k + 1)2 = 8k. Rewriting this as the quadratic
equation k 2 6k + 2 = 0, we find that the roots are
3 2 2 and 3 + 2 2.
So the model predicts that, when k is between these two numbers, the national income
Yt will oscillate. (In economics language, it will exhibit business cycles.)
Whether the oscillations increase or decrease in magnitude depends on k. Since the
solution involves the factor ( k/2)t , the oscillations decrease if k/2 < 1 that is, if
k < 2 and increase if k > 2.
172
Chapter 7
Vector spaces
Suggested reading
We know that vectors of Rn can be added together and that they can be scaled by real
numbers. That is, for every x, y Rn and every R, it makes sense to talk about
x + y and x. Furthermore, these operations of addition and multiplication by a scalar
(that is, multiplication by a real number) behave and interact sensibly, in that, for
example,
(x + y) = x + y,
(x) = ()x,
x + y = y + x,
and so on.
But it is not only vectors in Rn that can be added and multiplied by scalars. There are
other sets of objects for which this is possible. Consider the set V of all functions from
R to R. Then any two of these functions can be added: given f, g V we simply define
the function f + g by
(f + g)(x) = f (x) + g(x).
173
7. Vector spaces
2. u + v = v + u
3. u + (v + w) = (u + v) + w
4. there is a single member 0 of V , called the zero vector, such that for all v V ,
v+0=v
5. for every v V there is an element w V (usually written as v), called the
negative of v, such that v + w = 0
6. v V
7. (u + v) = u + v
(distributive law)
8. ( + )v = v + v
(distributive law)
9. (v) = ()v
10. 1v = v.
Other properties follow from those listed in the definition. For instance, we can see that
0x = 0 for all x, as follows:
0x = (0 + 0)x = 0x + 0x,
so, adding the negative 0x of 0x to each side,
0 = 0x + (0x) = (0x + 0x) + (0x) = 0x + (0x + (0x)) = 0x + 0 = 0x.
(A bit sneaky, but just remember the result: 0x = 0.)
174
Activity 7.1 Prove that (1)x = x, the negative of the vector x, using a similar
argument with 0 = 1 + (1).
(Note that this definition says nothing at all about multiplying together two vectors:
the only operations with which the definition is concerned are addition and scalar
multiplication.)
A vector space as we have defined it is called a real vector space, to emphasise that the
scalars , and so on are real numbers rather than (say) complex numbers. There is a
notion of complex vector space, where the scalars are complex numbers, which we shall
not cover. In this guide all scalars will be real numbers.
7.1.2
Examples
Example 7.1 The set Rn is a vector space with the usual way of adding and scalar
multiplying vectors.
Example 7.2 The set V = {0} consisting only of the zero vector is a vector space,
with addition defined by 0 + 0 = 0, and scalar multiplication defined by 0 = 0 for
all R.
Example 7.3 The set V of functions from R to R with pointwise addition and
scalar multiplication (described earlier in this section) is a vector space. Note that
the zero vector in this space is the function that maps every real number to 0
that is, the identically-zero function.
Activity 7.2 Show that all 10 properties of a vector space are satisfied. In
particular, if the function f is a vector in this space, what is the vector f ?
Example 7.4 The set of m n matrices with real entries is a vector space, with
the usual addition and scalar multiplication of matrices. The zero vector in this
vector space is the zero m n matrix which has all entries equal to 0.
Example 7.5 Let V be the set of all vectors in R3 with third entry equal to 0, that
is,
V = y : x, y R .
Then V is a vector space with the usual addition and scalar multiplication. To verify
this, we need only check that V is closed under addition and scalar multiplication.
The associative, commutative and distributive laws (properties 2, 3, 7, 8, 9, 10) will
hold for vectors in V because they hold for all vectors in R3 (and all linear
combinations of vectors in V are in V ). Furthermore, if we can show that V is
closed under scalar multiplication, then for any particular v V , 0v = 0 V and
(1)v = v V . So we simply need to check that V = (V is non-empty), that if
175
7. Vector spaces
7.2 Subspaces
The last example above is informative. Arguing as we did there, if V is a vector space
and W V is non-empty and closed under scalar multiplication and addition, then W
too is a vector space (and we do not need to verify that all the other properties hold).
The formal definition of a subspace is as follows.
Definition 7.2 (Subspace) A subspace W of a vector space V is a non-empty subset
of V that is itself a vector space (under the same operations of addition and scalar
multiplication as V ).
The discussion given justifies the following important result.
x
y
: y = 2x, x R ,
U=
x
y
: y = 2x + 1, x R .
Each vector in one of the sets is the position vector of a point on that line. We will
show that the set S is a subspace of R2 , and that the set U is not a subspace of R2 .
If v =
1
2
and p =
0
, these sets can equally well be expressed as,
1
S = {x : x = tv, t R}
U = {x : x = p + tv, t R}.
Activity 7.4 Show that the two descriptions of S describe the same set of vectors.
Example 7.6 (continued)
To show S is a subspace, we need to show that it is non-empty, and we need to show
that it is closed under addition and closed under scalar multiplication using any
176
7.2. Subspaces
vectors in S and any scalar in R. Well use the second set of definitions, so our line is
the set of vectors
1
S = {x : x = tv, t R}
v=
.
2
The set S is non-empty, since 0 = 0v S.
Let u, w be any vectors in S and let R. Then
1
2
u=s
w=t
1
2
for some s, t R.
1
2
+t
1
2
1
2
= (s + t)
(since s + t R).
u = s
= (s)
1
2
(since s R).
1. 0
/ U.
1
3
but
0
1
1
3
1
4
/U
since 4 = 2(1) + 1.
3. U is not closed under scalar multiplication:
0
1
U,
2 R but 2
0
1
0
2
/U
177
7. Vector spaces
vector of any point on the line. You will find that the position vector does not lie
along the line, but goes from the origin up to the point on the line. If you scalar
multiply this vector by any constant = 1, it will be the position vector of a point
which is not on the line, so the resulting vector will not be in U. The same is true if
you add together the position vectors of two points on the line. So U is not a
subspace.
Activity 7.6 Let v be any non-zero vector in a vector space V . Show that the set
S = {v : R}
is a subspace of V . The set S defines a line through the origin in V .
If V is a vector space, the sets V and {0} are subspaces of V . The set {0} is not empty,
it contains one vector, namely the zero vector. It is a subspace because 0 + 0 = 0 and
0 = 0 for any R.
Given any subset S of a vector space V , how do you decide if it is a subspace? First
check that 0 S. Then using some vectors in the subset, see if adding them and scalar
multiplying them will give you another vector in S. To prove that S is a subspace, you
will need to verify that it is closed under addition and closed under scalar multiplication
for any vectors in S, so you will need to use letters to represent general vectors, or
components of general vectors, in the set. That is, using letters show that the sum u + v
and the scalar product u of vectors in S also satisfy the definition of a vector in S.
To prove a set S is not a subspace you only need to find one counterexample, one or two
particular vectors (use numbers) for which the sum or the scalar product does not
satisfy the definition of S. Note that if 0 is not in the set, it cannot be a subspace.
Activity 7.7 Write down a general vector (using letters) and a particular vector
(using numbers) for each of the following subsets. Show that one of the sets is a
subspace of R3 and the other is not.
S1 =
x
: xR ,
S2 =
2x
: xR
0
0
7.2.1
We have seen that a subspace is a non-empty subset W of a vector space that is closed
under addition and scalar multiplication, meaning that if u, v W and R, then
both u + v and v are in W . Now, it is fairly easy to see that the following equivalent
property characterises when W will be a subspace:
Theorem 7.2 A non-empty subset W of a vector space is a subspace if and only if for
all u, v W and all , R, we have u + v W .
That is, W is a subspace if it is non-empty and closed under linear combination.
178
Null space
Suppose that A is an m n matrix. Then the null space N(A), the set of solutions to
the homogeneous linear system Ax = 0, is a subspace of Rn .
Theorem 7.3 For any m n matrix A, N(A) is a subspace of Rn .
Proof
To prove this we have to verify that N(A) = , and that if u, v N(A) and R,
then u + v N(A) and u N(A). Since A0 = 0, 0 N(A) and hence N(A) = .
Suppose u, v N(A). Then to show u + v N(A) and u N(A), we must show that
u + v and u are solutions of Ax = 0. We have
A(u + v) = Au + Av = 0 + 0 = 0
and
A(u) = (Au) = 0 = 0,
so we have shown what we needed.
The null space is the set of solutions to the homogeneous linear system. If we instead
consider the set of solutions S to a general system Ax = b, S is not a subspace of Rn if
b = 0 (that is, if the system is not homogeneous). This is because 0 does not belong to
S. However, as we saw in Chapter 3 (Theorem 3.2), there is a relationship between S
and N(A): if x0 is any solution of Ax = b then S = {x0 + z : z N(A)}, which we may
write as x0 + N(A). S is an affine set, a translation of the subspace N(A).
Generally, if W is a subspace of a vector space V and x V then the set x + W defined
by
x + W = {x + w : w W }
is called an affine subset of V . An affine subset is not generally a subspace (although
every subspace is an affine subset, as we can see by taking x = 0).
7.3.2
Range
179
7. Vector spaces
The set of all linear combinations of a given set of vectors of a vector space V forms a
subspace, and we give it a special name.
Definition 7.3 (Linear span) Suppose that V is a vector space and that
v1 , v2 , . . . , vk V . The linear span of X = {v1 , . . . , vk } is the set of all linear
combinations of the vectors v1 , . . . , vk , denoted by Lin{v1 , v2 , . . . , vk } or Lin(X) .
That is,
Lin{v1 , v2 , . . . , vk } = {1 v1 + + k vk : 1 , 2 , . . . , k R}.
Theorem 7.5 If X = {v1 , . . . , vk } is a set of vectors of a vector space V , then Lin(X)
is a subspace of V . It is the smallest subspace containing the vectors v1 , v2 , . . . , vk .
Proof
The set Lin(X) is non-empty, since 0 = 0v1 + . . . + 0vk Lin(X). If you have carefully
carried out the activity above, then you have shown that Lin(X) is closed under
addition and scalar multiplication. Therefore, it is a subspace of V . Furthermore, any
vector space which contains the vectors v1 , v2 , . . . , vk must also contain all linear
combinations of these vectors, so it must contain Lin(X). That is, Lin(X) is the smallest
subspace of V containing v1 , v2 , . . . , vk .
The subspace Lin(X) is also known as the subspace spanned by the set
X = {v1 , . . . , vk }, or, simply, as the span of {v1 , v2 , . . . , vk }.
Different texts may use different notations for the linear span of a set of vectors.
Notation is important, but it is nothing to get anxious about: just always make it clear
what you mean by your notation: use words as well as symbols!
180
7.4.1
What is the set Lin{v} of a single non-zero vector v Rn ? We have already seen that
this defines a line through the origin in any vector space, Rn , as we have
Lin{v} = {v : R},
and in Activity 7.6 on page 178, you proved that this is a subspace for any vector
space, V .
In Chapter 2 (section 2.11) we saw that a plane in R3 can be defined as the set of all
vectors x = (x, y, z)T whose components satisfy a single Cartesian equation,
ax + by + cz = d, or as the set of all vectors x which satisfy a vector equation with two
parameters, x = p + sv + tw, s, t R, where v and w are non-parallel vectors and p is
the position vector of a point on the plane. These definitions are equivalent as it is
possible to go from one representation of a given plane to the other.
If d = 0, the plane contains the origin, so, taking p = 0, the plane is the set of vectors
{x : x = sv + tw, s, t R}.
Since this is the linear span, Lin{v, w}, of two vectors in R3 , a plane through the origin
is a subspace of R3 .
Lets look at a specific example.
Example 7.7 Let S be the set given by
S=
y
: 3x 2y + z = 0 .
Then for x S,
0
1
x
x
0
x
y
=
0
+
y
=x
0
+ y 1.
x= y =
2
3x
2y
3
z
2y 3x
That is, x = xv1 + yv2 where x, y can be any real numbers and v1 , v2 are the
vectors given above. Since S is the linear span of two vectors, it is a subspace of R3 .
Of course, you can show directly that S is a subspace by showing it is non-empty,
and closed under addition and scalar multiplication.
S=
y
: ax + by + cz = d
181
7. Vector spaces
x
1
0
x= y =s
0
+t 1
s, t R.
z
3
2
This yields three equations in the two unknowns, s and t. Eliminating s and t from
these equations yields a single Cartesian equation between the variables x, y, z:
x=s
y=t
= z = 3x + 2y or 3x 2y + z = 0.
z = 3s + 2t
In the same way as for planes in R3 , any hyperplane in Rn which contains the origin is a
subspace of Rn . You can show this directly, exactly as in the activity above, or you can
show it is the linear span of n 1 vectors in Rn .
7.4.2
In the previous chapter we observed that the range R(A) of an m n matrix A is equal
to the set of all linear combinations of its columns. (See section 5.4.) In other words,
R(A) is the span of the columns of A and is often called the column space of A and
denoted by CS(A).
It is also possible to consider the row space RS(A) of a matrix: this is the span of the
rows of A. If A is an m n matrix the row space is a subspace of Rn and the column
space is a subspace of Rm .
Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by a vector space and a subspace
prove that a given set is a vector space, or a subspace of a given vector space
explain what is meant by the linear span of a set of vectors.
182
7.4. Exercises
Exercises
Exercise 7.1
Let
S=
x1
x2
: x2 = 3x1 .
S1 = y x + y + z = 0
S2 = y x2 + y 2 + z 2 = 1
z
z
S3 = y x = 0
S4 = y xy = 0
z
z
x
0
S5 =
y
x = 0 and y = 0 =
0
zR
z
z
2
1
u = 1 and v = 1 .
1
3
Determine which of the vectors below are in Lin{u, v}, and for each such vector, express
it as a linear combination of u and v.
3
0
7
a = 2 ,
b = 0,
c = 5 .
4
0
7
Exercise 7.4
Let V be the vector space of all functions from R R with pointwise addition and
scalar multiplication. Let n be a fixed positive integer and let W be the set of all real
polynomial functions of degree at most n; that is, W consists of all functions of the form
f (x) = a0 + a1 x + a2 x2 + + an xn , where a0 , a1 , . . . , an R.
Prove that W is a subspace of V , under the usual pointwise addition and scalar
multiplication for real functions. Find a finite set of functions which spans W .
183
7. Vector spaces
Exercise 7.5
Show that the set S1 spans R3 , but any vector v R3 can be written as a linear
combination of the vectors in S1 in infinitely many ways. Show that S2 and S3 do not
span R3 .
1
0
1
1
S1 =
2 ,
0
, 1 , 1 ,
3
1
1
0
2
1
2
1
1
, S3 =
1 , 0 .
S2 =
0
, 1 , 2
1
1
1
3
9
Exercise 7.6
1
2
1
1
1 1
2 1
= Ax
1
1
,
= R2 . That is, show any vector
2
1
b R2 can be expressed as a linear combination of w1 and w2 by solving b = Ax for x:
b1
b2
1
2
1
1
1 1
2 1
= Ax
(c) Show, in general, that if v and w are non-zero vectors in R2 , with v = (a, c)T and
w = (b, d)T , then
Lin{v, w} = R2 v = tw for any t R
a b
= 0.
c d
184
x
x
u = y , v = y V,
0
0
x + x
u + v = y + y V
0
and
0
0
x
v = y V.
0
0
1
+t
1
2
for any t R,
185
7. Vector spaces
a
b
a+b
u + v = 2a + 2b = 2(a + b) S2 ,
0
0
0
and, if R,
a
u = 2(a) S2 ,
0
a + b R,
a R.
v = 1 v1 + 2 v2 + + k vk
and
and we will have
v = 1 v1 + 2 v2 + + k vk
u = y , v = y ,
z
z
X
x + x
Y = y + y
Z
z + z
and we want to show this belongs to S. Now, this is the case, because
aX + bY + cZ = a(x + x ) + b(y + y ) + c(z + z )
= (ax + by + cz) + (ax + by + cz ) = 0 + 0 = 0,
and similarly it can be shown that, for any R, v S. So, in this case, S is a
subspace. You can see why this argument fails when d is not 0; for, then aX + bY + cZ
will equal 2d, which will not be the same as d. So we will not have u + v S. (Similarly,
we could see that v will not be in S if = 1.) Also, if d = 0, the simple statement that
0 does not satisfy the equation means that in this case S is not a subspace.
186
Comments on exercises
Solution to exercise 7.1
Since 0 S, the set S = . You need to show that for any R and u, v S, u S
and u + v S. Both are reasonably straightforward, and the details are omitted.
Solution to exercise 7.2
The set S1 = y x + y + z = 0
is a subspace of R3 because it is a plane
z
through the origin. You should also be able to prove this directly using the definition of
S1 , in the same way as you did for Activity 7.9 on page 181.
2
2
2
The set S2 =
y
x +y +z =1
is NOT a subspace. To prove this you only
z
need one counterexample. For example, 0
/ S2 . This set is the unit sphere (the surface
of the unit ball) centred on the origin.
The set S3 = y x = 0
is a subspace.
z
To prove this let u, v S3 , R. S3 is non-empty since 0 S3 .
0
0
0
0
u = u2 , v = v2 . Then u + v = u2 + v2 S3 , u = u2 S3
u3
v3
u3 + v3
u3
Hence the set is non-empty and closed under addition and closed under scalar
multiplication, so it is a subspace of R3 . This subspace is the yz-plane (a plane through
the origin).
z
scalar multiplication, but it is NOT a subspace.
To prove this, take for example, u = (1, 0, 2)T S4 and v = (0, 1, 1)T S4 . These
vectors are in S4 since they satisfy the condition which defines S4 . But
1
0
1
u + v = 0 + 1 = 1
/ S4 .
2
1
3
The set is not closed under addition, so it is not a subspace. This set is the union of the
xz-plane and the yz-plane.
x
0
z
z
is the intersection of the xz-plane and the yz-plane, which is the z-axis. This is a line
187
7. Vector spaces
through the origin, and therefore, a subspace. You should also be able to prove this
directly from the definition of S5 .
Solution to exercise 7.3
You need to determine if each of the given vectors can be expressed as a linear
combination of u and v. Each vector equation is equivalent to a system of three
equations in two unknowns. You can solve them using Gaussian elimination, or, in these
simple cases, just solve the equations directly.
For the vector a, set
2
1
3
u + v = 1 + 1 = 2 .
1
3
4
Then
1 1
2 1 3
(A|a) = 1 1 2 . . . 0 1
1
3
4
0 0
2
1 .
6
2
1
7
1 + 1 = 5 ,
1
3
7
188
1
1
0
1
x
2 +
0
+ 1 + 1 = y
3
1
1
0
z
has infinitely many solutions for any vector b R3 , and therefore spans R3 , you need
to look at the solutions of Ax = b:
1 1 0 1
x
y .
x=
b
=
A = 2 0 1 1
z
3 1 1 0
If the row echelon form of the coefficient matrix A has three leading ones, then there
will always be infinitely many solutions: a solution will exist since a leading one in each
row of A means an augmented system cannot be inconsistent and there will be infinitely
many solutions since there is one free (non-leading) variable.
Row reducing A (not
1 1
A = 2 0
3 1
0 1
1 1
0
1
1 1 0
1 1 0 2 1 1 0 1 21
1 0
0 4 1 3
0 0 1
1
1
2
The row echelon form of the matrix A has three leading ones. Therefore, the set S1
spans R3 , and any vector b R3 can be written as a linear combination of the vectors
in S1 in infinitely many ways.
S2 does not span R3 . Since
1 2 1
|B| = 0 1 2 = 1(9 6) 1(4 1) = 0
1 3 9
the reduced echelon form of B will contain a row of zeros. There will be some b R3
for which the system is inconsistent.
For example, the vector b = ( 0 0 1 )T , cannot be expressed as a linear combination
of the vectors in S2 as the system Bx = b has no solution, since
1 2 1 0
1 2 1 0
1 2 1 0
0 1 2 0 0 1 2 0 0 1 2 0 .
1 3 9 1
0 5 10 1
0 0 0 1
S3 does not span R3 . At least three vectors are required to span R3 .
Solution to exercise 7.6
1 1
(a) A =
|A| = 3 A1 exists. Then
2 1
=
1
3
1 1
2 1
2
5
1
3
1
2
1
1
189
7. Vector spaces
has solution x = A1 b,
1
3
1 1
2 1
b1
b2
1
b
3 1
2
b
3 1
+ 13 b2
13 b2
for any b R2 .
a
c
a b
= 0 ad = bc
= = s a = sb, c = sd
c d
b
d
If any of the coefficients equals zero, a similar argument works.
190
a
c
=s
b
d
Chapter 8
Linear independence, bases and
dimension
Suggested reading
8
8.1 Linear independence
Linear independence is a central idea in the theory of vector spaces. If {v1 , v2 , . . . , vk }
is a set of vectors in a vector space V , then the vector equation
1 v1 + 2 v2 + + r vk = 0
always has the trivial solution, 1 = 2 = = k = 0.
191
v1 = 2 , v2 = 1 , v3 =
5 .
11
3
5
This is because
2v1 + v2 v3 = 0.
Note that this can also be written as v3 = 2v1 + v2 .
This example illustrates the following general result. Try to prove it yourself before
looking at the proof.
Theorem 8.1 The set {v1 , v2 , . . . , vk } V is linearly dependent if and only if some
vector vi is a linear combination of the other vectors.
Proof
Since this is an if and only if statement, we must prove it both ways. If
{v1 , v2 , . . . , vk } is linearly dependent, the equation 1 v1 + 2 v2 + + r vk = 0 has a
solution with some i = 0. Then we can solve for the vector vi :
1
2
i1
i+1
k
vi = v1 v2
vi1
vi+1 vk ,
i
i
i
i
i
which expresses vi as a linear combination of the other vectors in the set.
If vi is a linear combination of the other vectors, say,
vi = 1 v1 + + i1 vi1 + i+1 vi+1 + + k vk , then
1 v1 + + i1 vi1 vi + i+1 vi+1 + k vk = 0
is a non-trivial linear combination of the vectors that is equal to the zero vector, since
the coefficient of vi is 1 = 0. Therefore, the vectors are linearly dependent.
It follows from this theorem that a set of two vectors is linearly dependent if and only if
one vector is a scalar multiple of the other.
192
v1 = 2 , v2 = 1 ,
3
5
in Example 8.1 are linearly independent, since one is not a scalar multiple of the
other.
Activity 8.1 Show that, for any vector v in a vector space V , the set of vectors
{v, 0} is linearly dependent.
x=
... .
k
Recall (Activity 5.8, page 135) that the matrix product Ax is exactly the linear
combination 1 v1 + v2 + + k vk .
Then the question of whether or not a set of vectors in Rn is linearly independent can
be answered by looking at the solutions of the homogeneous system Ax = 0.
Theorem 8.2 The vectors v1 , v2 , . . . , vk are linearly dependent if and only if the
linear system Ax = 0 has a solution other than x = 0, where A is the matrix
A = (v1 v2 vk ). Equivalently, the vectors are linearly independent precisely when
the only solution to the system is x = 0.
If the vectors are linearly dependent, then any solution x = 0 of the system Ax = 0 will
directly give a non-trivial linear combination of the vectors that equals the zero vector.
Activity 8.2 Show that the vectors
v1 =
1
2
v2 =
1
1
v3 =
2
5
193
Now, we know from our experience of solving linear systems with row operations that
the system Ax = 0 will have precisely the one solution x = 0 if and only if we obtain
from the n k matrix A an echelon matrix in which there are k leading ones. That is, if
and only if rank(A) = k. (Think about this!) Thus, we have the following result.
Theorem 8.3 Suppose that v1 , . . . , vk Rn . Then the set {v1 , . . . , vk } is linearly
independent if and only if the n k matrix (v1 v2 vk ) has rank k.
But the rank is always at most the number of rows, so we certainly need to have k n.
Also, there is a set of n linearly independent vectors in Rn . In fact, there are infinitely
many such sets, but an obvious one is
{e1 , e2 , . . . , en } ,
where ei is the vector with every entry equal to 0 except for the ith entry, which is 1.
Activity 8.3 Show that the set of vectors
{e1 , e2 , . . . , en } ,
in Rn is linearly independent.
Thus, we have the following result.
Theorem 8.4 The maximum size of a linearly independent set of vectors in Rn is n.
So any set of more than n vectors in Rn is linearly dependent. On the other hand, it
should not be imagined that any set of n or fewer is linearly independent: that isnt true.
Example 8.3 In R4 , which of the following sets of vectors are linearly independent?
1
1
2
0
2
2 1 0 5
0
L1 =
,
,
,
,
,
1 9 3 1 9
0
2
1
0
1
1
1
0
2
L2 =
1 , 9 ,
0
2
1
1
2
0 2 1
L3 =
,
,
,
1 9 3
0
2
1
1
1
2
0
2 1 0
0
L4 =
,
,
,
.
1 9 3 1
0
2
1
0
Try this yourself before reading the answers.
194
8.3. Basis
The set L1 is linearly dependent because it consists of five vectors in R4 . The set L2
is linearly independent because neither vector is a scalar multiple of the other. To
see that the set L3 is linearly dependent, write the vectors as the columns of a
matrix A and reduce A to echelon form to find that the rank of A is 2. This means
that there is a non-trivial linear combination of the vectors which is equal to 0, or
equivalently, that one of the vectors is a linear combination of the other two. The
last set, L4 contains the set L3 and is therefore also linearly dependent, since it is
still true that one of the vectors is a linear combination of the others.
Activity 8.4 For the set L3 above, find the solution of the corresponding
homogeneous system Ax = 0 where A is the matrix whose columns are the vectors
of L3 . Use the solution to write down a non-trivial linear combination of the vectors
that is equal to the zero vector. Express one of the vectors as a linear combination of
the other two.
There is an important property of linearly independent sets of vectors which holds for
any vector space V .
Theorem 8.5 If x1 , x2 , . . . , xm are linearly independent in V and
c1 x1 + c2 x2 + + cm xm = c1 x1 + c2 x2 + + cm xm
then
c1 = c1 ,
c2 = c2 ,
...,
cm = cm .
c1 x1 + c2 x2 + + cm xm = c1 x1 + c2 x2 + + cm xm
if and only if
(c1 c1 )x1 + (c2 c2 )x2 + + (cm cm )xm = 0.
8.3 Basis
The following result about Rn is very important in the theory of vector spaces. It says
that a linearly independent set of n vectors in Rn spans Rn .
Theorem 8.6 If v1 , v2 , . . . , vn are linearly independent vectors in Rn , then for any x
in Rn , x can be written as a unique linear combination of v1 , . . . , vn .
195
Proof
Because v1 , . . . , vn are linearly independent, the n n matrix
A = (v1 v2 . . . vn )
has rank(A) = n. (In other words, A reduces to the n n identity matrix.) By
Theorem 5.2 the system Az = x has a unique solution for any x Rn . But lets spell it
out. Since there is a leading one in every row of the reduced echelon form of A, we can
find a solution to Az = x, so any vector x can be expressed in the form
1
2
x = Az = (v1 v2 . . . vn )
... ,
n
as required. This linear combination is unique since the vectors are linearly independent
(or, because there is a leading one in every column of the echelon matrix, so there are
no free variables).
It follows from this theorem that if we have a set of n linearly independent vectors in
Rn , then the set of vectors also spans Rn , so any vector in Rn can be expressed in
exactly one way as a linear combination of the n vectors. We say that the n vectors form
a basis of Rn . The formal definition of a [finite] basis for a vector space is as follows.
Definition 8.3 ((Finite) Basis) Let V be a vector space. Then the subset
B = {v1 , v2 , . . . , vn } of V is said to be a basis for (or of) V if:
B is a linearly independent set of vectors, and
V = Lin(B).
An alternative characterisation of a basis can be given: B is a basis of V if every vector
in V can be expressed in exactly one way as a linear combination of the vectors in B.
The set B spans V if and only if a linear combination exists, and B is linearly
independent if and only if any linear combination is unique. We have therefore shown
Theorem 8.7 B = {v1 , v2 , . . . , vn } is a basis of V if and only if any v V is a unique
linear combination of v1 , v2 , . . . , vn .
Example 8.4 The vector space Rn has the basis {e1 , e2 , . . . , en } where ei is (as
earlier) the vector with every entry equal to 0 except for the ith entry, which is 1.
Its clear that the vectors are linearly independent (as you showed in Activity 8.3 on
page 194), and there are n of them, so we know straight away that they form a basis.
In fact, its easy to see that they span the whole of Rn , since for any
x = (x1 , x2 , . . . , xn )T Rn ,
x = x1 e1 + x2 e2 + + xn en .
196
8.3. Basis
W =
y
: x + y 3z = 0 .
If x = (x, y, z)T is any vector in W , then its components must satisfy y = x + 3z,
and we can express x as
x
x
1
0
x = y = x + 3z = x 1 + z 3 = xv + zw x, z R.
z
z
0
1
This shows that the set {v, w} spans W . The set is linearly independent. Why?
Because of the positions of the zeros and ones, if v + w = 0 then necessarily
= 0 and = 0.
Example 8.6 The set
S=
1
2
1
1
197
1
1
1
1
1
1
U = 0 ,2, 2 ,
W = 0 ,2,2 .
1
3
5
1
3
5
Show that one of these sets is a basis of R3 and that the other one spans a plane in
R3 . Find a basis for this plane. Then find a Cartesian equation for the plane.
8.3.1
Coordinates
[v]S =
...
n S
to denote the coordinate vector of v in the basis S.
1
0
0
1
and
1
2
S=
1
1
are each a basis of R2 . The coordinates of the vector v = (2, 5)T in each basis are
given by the coordinate vectors,
[v]B =
2
5
and
[v]S =
1
3
.
S
In the standard basis, the coordinates of v are precisely the components of the
vector v. In the basis S, the components of v arise from the observation that
v = 1
198
1
2
+3
1
1
2
5
8.3. Basis
Activity 8.7 For the example above, sketch the vector v on graph paper and show
it as the sum of the vectors given by each of the linear combinations: v = 2e1 5e2
and v = 1v1 + 3v2 .
8.3.2
Dimension
A fundamental result is that if a vector space V has a finite basis, then all bases of V
are of the same size. 1
Theorem 8.9 Suppose that the vector space V has a finite basis consisting of d
vectors. Then any basis of V consists of exactly d vectors.
This enables us, finally, to define exactly what we mean by the dimension of a vector
space V .
Definition 8.5 (Dimension) The number d of vectors in a finite basis of a vector
space V is the dimension of V , and is denoted dim(V ). The vector space V = {0} is
defined to have dimension 0.
A vector space which has a finite basis is said to be finite-dimensional. Not all vector
spaces are finite-dimensional. (For example, the vector space of real functions with
pointwise addition and scalar multiplication has no finite basis. Such a vector space is
said to be infinite-dimensional.)
Example 8.8 We already know Rn has a basis of size n. (For example, the
standard basis consists of n vectors.) So Rn has dimension n (which is reassuring,
since it is often referred to as n-dimensional Euclidean space).
If we know the dimension of a vector space V , then we know how many vectors we need
for a basis. If we have the correct number of vectors for a basis and we know either that
the vectors span V , or that they are linearly independent, then we can conclude that
both must be true and they form a basis, as is shown in the following theorem. That is,
we do not need to show both.2
Theorem 8.10 Let V be a finite-dimensional vector space of dimension d. Then:
d is the largest size of a linearly independent set of vectors in V . Furthermore, any
set of d linearly independent vectors is necessarily a basis of V
d is the smallest size of a spanning set of vectors for V . Furthermore, any finite set
of d vectors that spans V is necessarily a basis.
Thus, d = dim(V ) is the largest possible size of a linearly independent set of vectors in
V , and the smallest possible size of a spanning set of vectors (a set of vectors whose
linear span is V ).
1
2
199
W =
y
: x + y 3z = 0
8.3.3
200
Example 8.10
The plane W in R3 ,
W = {x : x + y 3z = 0}
has a basis consisting of the vectors v1 = (1, 2, 1)T and v2 = (3, 0, 1)T . If v3 is any
vector which is not in this plane, for example, v3 = (1, 0, 0)T , then the set
S = {v1 , v2 , v3 } is a basis of R3 .
1
2
1
3
1
1
2
0
, x2 = 2 , x3 = 4 , x4 = 0 .
x1 =
2
1
2
1
3
1
2
1
3
Example 8.11
2
AT =
1
3
Reducing this to echelon
1 1
2
1
1 2
3
0
1 1
0 1
0 1
0 1
1 2 1 1
1 2 2 2
.
2 4 1 1
0
0 3 3
2 1 1
1 1 2 1 1
2 2 2
0 3 6 0 0
0 1 2 0 0
4 1 1
0 3 3
0 3 6 0 0
1 1 2 1 1
2 1 1
2 0 0
0 1 2 0 0 .
0 0
0
0 0
2 0 0
0 0
0
0 0
2 0 0
201
The echelon matrix at the end of this tells us that a basis for Lin{x1 , x2 , x3 , x4 } is
formed from the first two rows, transposed, of the echelon matrix, that is,
1
0
2 , 2 .
1 0
1
0
If we want to find a basis that consists of a subset of the original vectors, then we need
to take those vectors that correspond to the final non-zero rows in the echelon matrix.
By this, we mean the rows of the original matrix that have ended up as non-zero rows
in the echelon matrix. For instance, in Example 8.11, the first and second rows of the
original matrix correspond to the non-zero rows of the echelon matrix, so a basis of the
span is {x1 , x2 }. On the other hand, if we interchange rows, the correspondence wont
be so obvious.
A better method to obtain such a basis is given in the next section, using the matrix A
whose columns are the vectors x1 , x2 . . . , xk . Then, as we have seen,
Lin{x1 , . . . , xk } = R(A). That is, Lin{x1 , . . . , xk } is the range or column space of the
matrix A.
We have shown that the range and null space of an m n matrix are subspaces of Rm
and Rn respectively (section 7.3). Their dimensions are so important that they are given
special names.
Definition 8.6 (Rank and nullity) The rank of a matrix A is
rank(A) = dim(R(A))
and the nullity is
nullity(A) = dim(N(A)).
We have, of course, already used the word rank, so it had better be the case that the
usage just given coincides with the earlier one. Fortunately it does. In fact, we have the
following connection.
Theorem 8.12 Suppose that A is an m n matrix with columns c1 , c2 , . . . , cn , and
that an echelon form obtained from A has leading ones in columns i1 , i2 , . . . , ir . Then a
basis for R(A) is
B = {ci1 , ci2 , . . . , cir }.
Note that the basis is formed from columns of A, not columns of the echelon matrix: the
basis consists of those columns of A corresponding to the leading ones in the echelon
matrix.
202
We will outline a proof of this theorem, so you can see how it works.3 We have already
seen that a solution x = (1 , 2 , . . . , n ) of Ax = 0 gives a linear combination of the
columns of A which is equal to the zero vector,
0 = 1 c 1 + 2 c 2 + . . . + n c n .
If E denotes the reduced echelon form of A, and if c1 , c2 , . . . , cn denote the columns of
E, then exactly the same relationship holds:
0 = 1 c1 + 2 c2 + . . . + n cn .
In fact, we use E to obtain the solution x = (1 , 2 , . . . , n ). So the linear dependence
relations are the same for the columns of both matrices, which means that the linearly
independent columns of A correspond precisely to the linearly independent columns of
E. Which columns of E are linearly independent? The columns which contain the
leading ones.
We have already seen that a matrix A and its reduced row echelon form have the same
row space, and that the non-zero rows form a basis of this row space. So the dimension
of the row space of A, RS(A), and the dimension of the column space of A,
CS(A) = R(A), are each equal to the number of leading ones in an echelon form of A,
that is both are equal to rank(A). We restate this important fact:
dim(RS(A)) = dim(R(A)) = rank(A).
Example 8.12
1 1 2 1
A = 2 0 1 1.
9 1 3 4
1
E = 0
0
is (verify this!)
0 12 12
1 32 12 .
0 0 0
The leading ones in this echelon matrix are in the first and second columns, so a
basis for R(A) can be obtained by taking the first and second columns of A. (Note:
columns of A, not of the echelon matrix!) Therefore a basis for R(A) is
1
1
2, 0 .
9
1
A basis of the row space of A consists of the two non-zero rows of the reduced
matrix, or the first two rows of the original matrix,
1
0
1
2
0 1
1 0
,
or
,
1 3
2 1 .
2
21
1
1
1
2
2
203
Note that the column space is a two-dimensional subspace of R3 (a plane) and the
row space is a two-dimensional subspace of R4 . The columns of A and E satisfy the
same linear dependence relations, which can be easily read from the reduced echelon
form of the matrix,
3
1
c3 = c1 + c2 ,
2
2
1
1
c4 = c1 + c2 .
2
2
Activity 8.9 Check that the columns of A satisfy these same linear dependence
relations.
There is a very important relationship between the rank and nullity of a matrix. We
have already seen some indication of it in our considerations of linear systems. Recall
that if an m n matrix A has rank r then the general solution to the (consistent)
system Ax = 0 involves n r free parameters. Specifically (noting that 0 is a
particular solution, and using a characterisation obtained earlier in Chapter 5), the
general solution takes the form
x = s1 u1 + s2 u2 + + snr unr ,
where u1 , u2 , . . . , unr are themselves solutions of the system Ax = 0. But the set of
solutions of Ax = 0 is precisely the null space N(A). Thus, the null space is spanned by
the n r vectors u1 , . . . , unr , and so its dimension is at most n r. In fact, it turns
out that its dimension is precisely n r. That is,
nullity(A) = n rank(A).
To see this, we need to show that the vectors u1 , . . . , unr are linearly independent.
Because of the way in which these vectors arise (look at Example 8.12), it will be the
case that for each of them, there is some position where that vector will have an entry
equal to 1 and the entry in that same position of all the other vectors will be 0. From
this we can see that no non-trivial linear combination of them can be the zero vector, so
they are linearly independent. We have therefore proved the following central result.
Theorem 8.13 (Rank-nullity theorem) For an m n matrix A,
rank(A) + nullity(A) = n.
Activity 8.10
Find a basis of the null space of the matrix A from Example 8.12,
1 1 2 1
A = 2 0 1 1.
9 1 3 4
204
8.7 Exercises
Exercise 8.1
Show that the following three vectors are linearly independent:
2
1 ,
1
3
4,
6
2
3 .
2
2
x1 = 3 ,
5
1
x2 = 1 ,
2
a
v = b.
c
Find a condition that a, b, c must satisfy for the set of vectors {x1 , x2 , v} to be linearly
dependent.
Exercise 8.3
Show that any non-empty subset of a linearly independent set of vectors is linearly
independent.
Exercise 8.4
Show that the following vectors are linearly dependent by finding a non-trivial linear
205
1
0
4
9
2 1 11 2
,
1 3 , 5 , 1 .
2
4
1
3
Exercise 8.5
Let A be any matrix. Let v1 and v2 be two non-zero vectors and suppose that
Av1 = 2v1 and Av2 = 5v2 . Prove that {v1 , v2 } is linearly independent.
1
2
2
3
S2 =
2
1
1
S3 =
0
, 1 , 2
1
3
9
Exercise 8.7
1
0
2
3
,
,
,
1
0
2
3
1
4
2
1
1
4
0
2
S4 =
1 , 1 , 1 , 1
2
8
2
3
Which of the following sets are a basis for R3 ? (State reasons for your answers.)
2
4
7
1
1
1
S1 = 2 , 1 , 1 , 2
S2 = 0 , 1
3
0
0
1
1
1
1
3
1
2
2
1
S3 = 1 , 2 , 3
S4 = 1 , 1 , 3
1
1
0
1
0
3
For any set which is a basis of R3 , find the coordinates of the vector w = (1, 2, 1) in
that basis.
Exercise 8.8
Write down a basis for the yz-plane in R3 .
Exercise 8.9
Find a basis for the null space of the matrix
1 1 1 0
2 1 0 1
206
Exercise 8.10
Let
1 2 1 1
2
1 3 0 2 2
.
A=
0
1 1 3
4
1
2 5 13 5
B= 0
1
2 1 3 0
1 1 1 1 .
3 2 0 1
Let b = c1 + c5 , the sum of the first and last column of the matrix B. Without solving
the system, use the information you have obtained to write down a general solution of
the system of equations Bx = b.
Exercise 8.12
A matrix A is said to have full column rank if and only if the columns of A are linearly
independent. If A is an m k matrix with full column rank, show that:
(1) AT A is a symmetric k k matrix
(2) AT A is invertible.
1 0
Then verify the above results for the matrix M = 0 2 .
1 1
1 1
2
2 1 5
Then, using row operations, it can be seen that the general solution x to Ax = 0 is
x = (r, 3r, r)T for r R. In particular, taking r = 1, and multiplying out the equation
Ax = 0, we have that
1
2
1
1
2
5
0
0
207
x
3/2
x = y = t 1/2 ,
z
1
t R.
1
1
2
3
1
0 + 2 1 = 0,
2 1 2 9 3
0
2
1
and hence
1
1
2
1 3 0 1 2
=
3 2 1 + 2 9 .
0
2
1
But since the vectors are linearly independent, this can be true only if c1 c1 = 0,
c2 c2 = 0, and so on. That is, for each i, we must have ci = ci .
Feedback to activity 8.6
Write each set of vectors as the
B=
0
1
columns of a matrix:
1 1
1 1 1
2 2 ,
A = 0 2 2.
3 5
1 3 5
208
1
x
1
x= y =s
0
+ t 2,
z
1
3
s, t R.
Equating components, you obtain three equations in the two unknowns s and t.
Eliminating s and t between the three equations you will obtain a single equation
relating x, y and z. Explicitly, we have
x = s + t, y = 2t, z = s + 3t,
so
y
y
t= , s =tx= x
2
2
and
y
3
x + y,
2
2
so we have x 2y + z = 0. This is the Cartesian equation of the plane.
z = s + 3t =
Note that a Cartesian equation could equally well have been obtained by writing the
two basis vectors and the vector x as the columns of a matrix M and using the fact
that |M| = 0 if and only if the columns of M are linearly dependent. That is,
1 1 x
0 2 y = 2x + 4y 2z = 0.
1 3 z
Feedback to activity 8.8
If S = {w1 , w2 , . . . , wr } is a linearly independent set of vectors in W , then we can state
that the only linear combination
a1 w1 + a2 w2 + . . . + ar wr = 0
is the trivial one, with all ai = 0. But all the vectors in W are also in V , and this
statement still holds true, so S is a linearly independent set of vectors in V .
Feedback to activity 8.10
A general solution of the system of equations Ax = 0 is
1
1
2
2
12
32
x = s1
+ s2
= s1 u1 + s2 u2 .
1
0
0
1
The set {u1 , u2 } is a basis of the null space of A, so dim(N(A)) = 2. From Example
8.12, rank(A) = 2. The matrix A has n = 4 columns.
rank(A) + nullity(A) = 2 + 2 = 4 = n.
Note that the basis vectors of the null space give precisely the same linear dependence
relations between the column vectors as those given in the example. Since Au1 = 0 and
Au2 = 0,
1
3
Au1 = c1 c2 + c3 = 0
2
2
and
1
1
Au2 = c1 c2 + c4 = 0.
2
2
209
5
2
3
2
7 = 2 1 4 + 3 3 .
2
1
6
2
Solution to exercise 8.2
There are many ways of solving this problem, and there are infinitely many possible x3 s.
We can solve it by trying to find a vector x3 such that the matrix (x1 x2 x3 ) has rank 3.
Another approach is to answer the second part of the question first, determine what
vectors v form a linearly dependent set with x1 and x2 (find the condition on a, b and c
as asked) and then write down any vector whose components do not satisfy the
condition.
A= 1
2
2 a
3 b.
5 c
Notice that you can choose to order x1 and x2 so that the row reduction will be easier
since it makes no difference in this question. The vectors {x1 , x2 , v} will be linearly
dependent if the row echelon form of A has a row of zeros.
1 2 a
1 2
a
1 2
a
R2 R1
3 R2
0 1 b a R
0 1
A = 1 3 b R
b a .
3 2R1
2 5 c
0 1 c 2a
0 0 cab
The vectors will be linearly dependent if and only if the components of v satisfy
a + b c = 0.
So, choose any vector for x3 which does not satisfy this equation, such as
x3 = ( 1 0 0 )T .
Note that this condition is the equation of a plane in R3 determined by the vectors x1
and x2 . The set {x1 , x2 , v} is linearly dependent if and only if v is the position vector of
a point in this plane.
210
1 0
4
9
2 1 11 2
,
A=
1 3
5
1
2 4
1 3
the matrix with columns equal to the given vectors. If we only needed to show that the
vectors were linearly dependent, it would suffice to show, using row operations, that
rank(A) < 4. But were asked for more: we have to find an explicit non-trivial linear
combination that equals the zero vector. So we need to find a non-trivial solution of
Ax = 0. One solution is x = (5, 3, 1, 1)T . (You should use row operations to find
this. The details are omitted here.) This means that
0
9
4
0
1
1 11 2 0
2
5
1 3 3 + 5 1 = 0.
0
3
1
4
2
Solution to exercise 8.5
To prove that {v1 , v2 } is linearly independent, assume that 1 and 2 are scalars
such that
()
1 v1 + 2 v2 = 0.
Then
A(1 v1 + 2 v2 )
1 Av1 + 2 Av2
1 (2v1 ) + 2 (5v2 )
21 v1 + 52 v2
=
=
=
=
0
0
0
0
211
(since all other vectors are scalar multiples of it). The set
with Cartesian equation y = x. This is a one-dimensional
To find a basis of S3 or S4 , write the vectors as the columns of a matrix, A. You can
either row reduce AT , and the non-zero rows will be a basis, or you can row reduce A
and the columns of A corresponding to the columns with leading ones in the echelon
form will be a basis. Using the first method for S3 (row reduce AT ),
1 0 1
1 0 1
1 0 1
2 1 3 0 1 5 0 1 5 .
1 2 9
0 2 10
0 0 0
Then a basis is given by the top two vectors. The set S3 spans a plane (through the
origin) in R3 , Lin(S3 ) has dimension 2.
Using the second method for S4 , write the vectors as the columns of a matrix A and
reduce A to echelon form. Columns 1, 2 and 4 have leading ones, so the first, second
and fourth vector form a basis of Lin(S4 ), which is a three-dimensional subspace of R4 .
Solution to exercise 8.7
S1 is not a basis. No set of four vectors in R3 can be linearly independent.
S2 is not a basis. Two vectors cannot span R3 : there must be at least three.
S3 is not a basis. Either notice that the third vector is the sum of the first two, or
reduce AT to echelon form and show it has a row of zeros, or show |A| = 0. The set is
not linearly independent.
S4 is a basis. You can reduce AT to echelon form or you can compute |A| = 5 = 0,
which shows that Ax = b has a unique solution for all b R3 .
1
To find the coordinates of w = 2 in the basis S4 you need to find the unique
1
solution of
1
1
2
1
1 + 1 + 3 = 2 .
1
0
3
1
212
2
[w]S4 = 3 .
1 S4
Solution to exercise 8.8
The yz-plane is the set of all vectors of the form (0, y, z)T . So the set of vectors {e2 , e3 }
is a basis of the yz-plane.
Solution to exercise 8.9
We omit the details. A basis for the null space is (1, 2, 1, 0)T , (1, 1, 0, 1)T . There
are many other possible answers.
Solution to exercise 8.10
A basis for the column space is
2
2
1
3 2
1
.
,
,
0 1 4
5
2
1
Details of the calculations are omitted.
Solution to exercise 8.11
Reduce the matrix B
1 2 1 3
B = 0 1 1 1
1 3 2 0
0
1 2 1 3
0
1 2 1 3
0
1 0 1 1 1 1 0 1 1 1 1
1
0 1 1 3 1
0 0 0 4 2
1 2 1 3 0
0 1 1 1 1
0 0 0 1 21
1 2 1 0 32
1 0 1 0 52
0 1 1 0 12 0 1 1 0 12 .
0 0 0 1 12
0 0 0 1 12
The leading ones are in the first, second and fourth columns.
1
0
0
1 0
1 , 1 , 0 R5 .
A basis of the row space is:
0 0 1
1
1
2
2
2
2
3
1
1
3
0
So the range of B is all of R3 .
213
From the reduced echelon form of B, we can find the solution of Bx = 0. Setting the
non-leading variables x3 = s and x5 = t, we have
s 52 t
x1
1
5
x2 s + 12 t
1
1
x3 =
= 1 + 0 = x1 + x2 .
s
0
1
x 1t
4
2
x5
0
2
t
The set {x1 , x2 } R5 is a basis of the null space.
1
1
5
0
1
1
+ 1 + 0 = p + x1 + x2 .
x=
0
0
0
1
1
0
2
Solution to exercise 8.12
1 0
1 0 1
2 1
MT M =
0 2 =
0 2 1
1 5
1 1
214
Chapter 9
Linear transformations, change of
basis
Suggested Reading
215
Activity 9.1 Prove that this single condition is equivalent to the two of the
definition.
Sometimes you will see T (u) written simply as T u.
9.2 Examples
Example 9.1 Let V = Rn and W = Rm and suppose that A is an m n matrix.
Let TA be the function given by TA (x) = Ax for x Rn . That is, TA is simply
multiplication by A. Then TA is a linear transformation. This is easily checked, as
follows: first,
TA (u + v) = A(u + v) = Au + Av = TA (u) + TA (v).
Next,
TA (u) = A(u) = Au = TA (u).
So the two linearity conditions are satisfied. We call TA the linear transformation
corresponding to A.
Example 9.2 (More complicated) Let us take V = Rn and take W to be the vector
space of all functions f : R R (with pointwise addition and scalar multiplication).
Define a function T : Rn W as follows:
u1
u2
T (u) = T
... = pu1 ,u2 ,...,un = pu ,
un
and
T (u) = T (u).
The fact that for all x, pu+v (x) = (pu + pv )(x) means that the functions pu+v and
pu + pv are identical. The fact that T (u) = T (u) is similarly proved, and you
should try it!
216
AT u =
=
=
=
u1
u2
But
u1 e1 + u2 e2 + + un en
0
0
1
0
1
0
0
0
+ u2 . + + un 0.
.
= u1
..
..
..
0
0
0
1
0
0
u1
u2
=
... ,
un
Thus, to each matrix A there corresponds a linear transformation TA , and to each linear
transformation T there corresponds a matrix AT . Note that the matrix AT we found
217
was determined by using the standard basis in both vector spaces: later in this chapter
we will generalise this to use other bases.
x
x+y+z
.
T y =
xy
z
x + 2y 3z
1
6
In particular, if u = 2 , then T (u) = 1 .
3
4
To find the matrix of this linear transformation we need the images of the standard
basis vectors. We have that
1
1
1
T (e1 ) = 1 ,
T (e2 ) = 1 ,
T (e3 ) = 0 .
1
2
3
The matrix representing T is AT = (T (e1 ) T (e2 ) T (e3 )), which is
1 1
1
AT = 1 1 0 .
1 2 3
Notice that the entries of the matrix AT are just the coefficients of x, y, z in the
definition of T .
9.3.1
Rotation in R2
a
c
T (e2 ) =
b
d
so that
AT =
a b
c d
218
y
1
T (e1 )
(b, d)
T (e2 )
(a, c)
(0, 0)
a
b
and T (e2 ) =
are orthogonal and each has length one
c
d
since they are the rotated standard basis vectors. Drop a perpendicular from the point
(a, c) to the x-axis, forming a right triangle with angle at the origin. Since the
x-coordinate of the rotated vector is a and the y-coordinate is c, the side opposite the
angle has length c and the side adjacent to the angle has length a. The hypotenuse
of this triangle (which is the rotated unit vector e1 ) has length equal to one. We
therefore have a = cos and c = sin . Similarly, drop the perpendicular from the point
(b, d) to the x-axis and observe that the angle opposite the x-axis is equal to . Again,
basic trigonometry tells us that the x-coordinate is b = sin (it has length sin and is
in the negative x-direction), and the height is d = cos . Therefore,
A=
a b
c d
cos
sin
sin
cos
cos 4
sin 4
sin 4
cos 4
1
2
1
2
Activity 9.3 Confirm this by sketching the vectors e1 and e2 and the image vectors
T (e1 ) =
1
2
1
2
and T (e2 ) =
12
1
2
9.3.2
219
9.3.3
9.3.4
If V and W are finite dimensional vector spaces of the same dimension, then the inverse
of a linear transformation T : V W is the linear transformation T 1 : W V such
that
T 1 (T (v)) = v.
If T 1 exists, then its matrix satisfies T 1 (T (v)) = AT 1 AT v = Iv. That is, T 1 exists
if and only if (AT )1 exists, and (AT )1 = AT 1 .
Example 9.4 In R2 , the inverse of rotation anticlockwise by an angle is rotation
clockwise by the same angle. Thinking of clockwise rotation by as anticlockwise
rotation by an angle , the matrix of rotation clockwise by is given by,
AT 1 =
cos() sin()
sin() cos()
cos sin
sin cos
220
cos
sin
sin
cos
cos sin
sin cos
1 0
0 1
Activity 9.4 Check this by multiplying the matrices. (You should note that
sin2 + cos2 = 1: see the subject guide for MT1174 Calculus.)
x
x+y+z
?
T y =
xy
z
x + 2y 3z
We found,
1 1
1
AT = 1 1 0 .
1 2 3
3
1
3
A1
=
T
9
3
That is,
T 1
u
v
w
5
1
4 1 .
1 2
+ 59 v + 91 w
49 v + 91 w .
=
1
u 19 v + 29 w
3
1
u
3
1
u
3
9.3.5
Theorem 9.2 Let V be a finite dimensional vector space and let T be a linear
transformation from V to a vector space W . Then T is completely determined by what
it does to a basis of V .
Proof
Let dim(V ) = n, and let B = {v1 , v2 , . . . , vn } be a basis of V . Then any v V , can be
uniquely expressed as a linear combination of these basis vectors,
v = a1 v1 + a2 v2 + + an vn .
Then
221
Example 9.6 We find the null space and range of the linear transformation
S : R2 R4 ,
x+y
x
x
S
=
x y.
y
y
The matrix of the linear transformation is
1 1
1 0
AS =
1 1 .
0 1
Observe that this matrix has rank 2 (by having two linearly independent columns, or
you could alternatively see this by putting it into row echelon form), so that
N(S) = {0}, the subspace of R2 consisting of only the zero vector. This can also be
seen directly from the fact that
0
x+y
x 0
x
0
=
.
x y = 0 x = 0, y = 0
y
0
y
0
The range, R(S) is the two-dimensional subspace of R4 with basis given by the
column vectors of AS .
222
since T (vi ) = 0 for i = 1, . . . , k. Hence the vectors {T (vk+1 ), , T (vn )} span the range,
R(T ). If they are a basis of R(T ), then rank(T ) = n k. So it only remains to show that
they are linearly independent.
If there is a linear combination of the vectors equal to the zero vector,
bk+1 T (vk+1) + + bn T (vn ) = T (bk+1 vk+1 + + bn vn ) = 0,
then the vector bk+1 vk+1 + + bn vn is in the null space of T , and can be written as a
linear combination of the basis vectors of N(T ),
bk+1 vk+1 + + bn vn = b1 v1 + + bk vk .
Rearranging, we have
b1 v1 + + bk vk bk+1 vk+1 bn vn = 0.
But {v1 , v2 , . . . , vk , vk+1, . . . , vn } is a basis of V , hence all coefficients bi = 0. This
shows that {T (vk+1), , T (vn )} are linearly independent and the theorem is proved.
For an m n matrix A, if T = TA , then T is a linear transformation from V = Rn to
W = Rm , and rank(T ) = rank(A), nullity(T ) = nullity(A), so this theorem states the
earlier result that
rank(A) + nullity(A) = n.
223
N(T ) = t 2
: tR ,
R(T ) = xy-plane?
AT v = ( c1 c2 c3 ) 2 = 1 c1 + 2 c2 + 3 c3 = 0 .
3
1 0 13
AT = 0 1 23 .
0 0 0
224
1
2
[x]B =
...
n
One very straightforward observation is that the coordinate vector of any x Rn with
respect to the standard basis is just x itself. This is because if x = (x1 , x2 , . . . , xn )T ,
x = x1 e1 + x2 e2 + + xn en .
What is less immediately obvious is how to find the coordinates of a vector x with
respect to a basis other than the standard one.
Example 9.8 Suppose that we let B be the following basis of R3 :
2
3
1
B = 2 , 1 , 2 .
3
3
1
If x is the vector (5, 7, 2)T , then the coordinate vector of x with respect to B is
1
[x]B = 1 ,
2 B
because
1
2
3
x = 1 2 + (1) 1 + 2 2 .
3
3
1
x = PB [x]B .
The matrix PB is invertible (because its columns are linearly independent, and hence its
rank is n). So we can also write
[x]B = PB1 x.
225
The matrix of the linear transformation which performs this rotation is given by
AT =
cos 4
sin 4
sin 4
cos 4
1
2
1
2
12
= PB ,
1
2
where the column vectors of the matrix are the new basis vectors, v1 , v2 , so the
matrix is also the transition matrix from B coordinates to standard coordinates;
that is, we have v = PB [v]B . Then the coordinates of a vector with respect to the
new basis are given by [v]B = PB1 v. The inverse of rotation anticlockwise, is
rotation clockwise, so we have,
PB1
cos( 4 ) sin( 4 )
sin( 4 ) cos( 4 )
sin 4
cos 4
cos 4
sin 4
1
2
12
1
2
1
2
x=
1
2
1
2
2v1 .
What are
its coordinates in the new basis B? We can find these directly since we
have x = 2v1 + 0v2 , and in B coordinates
[v1 ]B =
1
0
and
B
so that,
[x]B =
Note that
x = PB [x]B =
0
1
[v2 ]B =
1
2
0
1
2
1
2
=
B
12
1
2
2
0
.
B
2
0
,
B
=
B
1
1
Given a basis B of Rn with transition matrix PB , and another basis B with transition
226
227
The matrix AT [B, B ] is called the matrix representing T with respect to bases B and
B . A direct proof of this theorem can be found in the texts,1 but instead lets take the
indirect approach. Lets look at the stages of transition which occur from changing basis
from B to standard, performing the linear transformation in standard coordinates, and
then changing to the basis B .
It is instructive (and more useful) to find the matrix AT [B, B ] in terms of the matrix
AT (representing T in standard coordinates) and the transition matrices for the bases B
and B . Let PB and PB be, respectively, the transition matrix from B coordinates to
standard coordinates in Rn and the transition matrix from B coordinates to standard
coordinates in Rm . (So PB is an n n matrix having the basis vectors of B as columns
and PB is an m m matrix having the basis vectors of B as columns.) Then we know
that for any v Rn , v = PB [v]B . Similarly, for any u Rm , u = PB [u]B , so
[u]B = PB1 u. We therefore have (taking u = T (x))
[T (x)]B = PB1 T (x).
Now, for any x Rn (in standard coordinates), T (x) = AT x where
AT = (T (e1 ) T (e2 ) . . . T (en ))
is the matrix corresponding to T . So we have (taking v = x)
[T (x)]B = PB1 T (x) = PB1 AT x = PB1 AT PB [x]B = (PB1 AT PB )[x]B .
Since this is true for all x, we have therefore obtained the following result.
228
Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by a linear transformation and be able to prove a given
mapping is linear
explain what is meant by the range and null space, and rank and nullity of a linear
transformation
know the dimension theorem (the rank-nullity theorem) for linear transformations
and be able to apply it
comprehend the two-way relationship between matrices and linear transformations
find the matrix representation of a transformation with respect to two given bases
know how to change between different bases of a vector space
know what it means to say that two square matrices are similar.
229
Exercises
Exercise 9.1
Find bases for the null space and range of the linear transformation T : R3 R3 given
by
x1
x1 + x2 + 2x3
.
T x2 =
x1 + x3
x3
2x1 + x2 + 3x3
Verify the dimension theorem. Is T invertible?
Exercise 9.2
Let T and S be linear transformations of R2 R2 with respective matrices:
AT =
1
2
1
2
12
1
2
1 0
0 1
AS =
Sketch the vectors e1 and e2 in the xy-plane, and sketch the unit square. Describe the
effect of T in words, and illustrate it using the unit square, by adding the images T (e1 )
and T (e2 ) to your sketch (and filling in the image of the unit square). Do the same for
the linear transformation S.
Now consider the composed linear transformations ST and T S. Illustrate the effect of
ST and T S using the unit square (by first performing one linear transformation and
then the other). Then calculate their matrices to check that ST = T S.
Exercise 9.3
x2
x1
T
= 5x1 + 13x2 .
x2
7x1 + 16x2
Find the matrix AT [B, B ] of T with respect to the bases B = {(3, 1)T , (5, 2)T } and
B = {(1, 0, 1)T , (1, 2, 2)T , (0, 1, 2)T }.
Exercise 9.4
Consider the vectors
1
v1 = 0 ,
1
1
v2 = 1 ,
2
v3 = 1
5
and
u = 2
3
230
S(v2 ) = e2
S(v3 ) = e3
Find a basis of N(T ). State the dimension theorem for linear transformations and use
it to verify that such a linear transformation T can exist.
If T maps e1 , e2 R3 to the standard basis vectors in R2 , find a matrix AT such that
T (x) = AT x. Write down an expression for T (x) as a vector in R2 in terms of x, y, z.
Exercise 9.6
If S and T are the linear transformations given in the previous two questions, decide
which composed linear transformation, ST or T S is defined and find its corresponding
matrix.
Exercise 9.7
Show that each of the sets B and B is a basis of R3 :
1
0
1
0
1
1
B = 010
and
B = 1 1 1
1
3
1
1
0
1
Write down the transition matrix P from B coordinates to standard. Write down the
transition matrix Q from B coordinates to standard.
Find the transition matrix from B coordinates to B coordinates.
2
If [x]B = 1 , find [x]B .
3 B
231
as required.
Feedback to activity 9.3
Rotation by radians is given by matrix A, whereas reflection in the y-axis is given by
matrix B:
1 0
1 0
A=
B=
.
0 1
0 1
Feedback to activity 9.6
This is just definition-chasing. By definition, TA is the mapping given by TA (x) = Ax
and
R(TA ) = {TA (x) : x V } = {Ax : x V } = R(A),
N(TA ) = {x V : TA (x) = 0} = {x V : Ax = 0} = N(A).
Feedback to activity 9.7
This is very similar to the proofs in the previous chapter that, for a matrix A, R(A) and
N(A) are subspaces.
First, we show R(T ) is a subspace of W . Note that it is non-empty since T (0) = 0 and
hence it contains 0. (The fact that T (0) = 0 can be seen in a number of ways. For
instance, take any x V . Then T (0) = T (0x) = 0T (x) = 0.) We need to show that if
u, v R(T ) then u + v R(T ) and, for any R, v R(T ). Suppose u, v R(T ).
Then for some y1 , y2 V , u = T (y1 ), v = T (y2 ). Now,
u + v = T (y1 ) + T (y2 ) = T (y1 + y2 ),
and so u + v R(T ). Next,
v = (T (y1 )) = T (y1 ),
so v R(A).
Now consider N(T ). It is non-empty because the fact that T (0) = 0 shows 0 N(T ).
Suppose u, v N(A) and R. Then to show u + v N(T ) and u N(T ), we must
show that T (u + v) = 0 and T (u) = 0. We have
T (u + v) = T (u) + T (v) = 0 + 0 = 0
and
T (u) = (T (u)) = 0 = 0,
so we have shown what we needed.
232
Comments on exercises
Solution to exercise 9.1
The matrix AT and its reduced row
1 1
AT = 1 0
2 1
2
1 0 1
1 0 1 1
3
0 0 0
A basis for the null space is {(1, 1, 1)T }, and a basis for the range is
1
1
1,0 .
2
1
There are other possible answers. To verify the dimension theorem,
rank(T ) + nullity(T ) = 2 + 1 = 3 = dim(R3 )
This linear transformation is not invertible, as A1
T does not exist.
Solution to exercise 9.2
To sketch the effect of T on the unit square, mark off a unit square on a set of axes.
Mark the unit vector in the x direction, e1 , in one colour, and the unit vector in the y
direction, e2 in another colour (or differentiate between them by single and double
arrowheads). Now draw the vector images of these, T (e1 ) and T (e2 ), in the same
colours, and complete the image of the unit square with these vectors as its two
corresponding sides.
The linear transformation T is rotation anticlockwise by
radians.
Sketch the unit square and its image under S in the same way. The linear
transformation S is reflection in the y-axis.
The linear transformation ST is first do T and then do S, since ST (v) = S(T (v)).
Think about what this means geometrically. First rotate the unit square by 4 and then
reflect this in the y-axis. The linear transformation T S is very different, first reflect in
the y-axis, and then rotate by 4 .
For the illustrations of ST and T S using the unit square, continue as above. Draw a
unit square marking e1 and e2 distinctively on it. For ST , first find the images under T ,
T (e1 ) and T (e2 ) and mark these, then find the images of these vectors under S. Now
complete the unit square. When you have completed both sketches, you will find the
images under ST and T S in very different positions.
Calculating their matrices will verify that ST = T S:
AST = AS AT =
AT S = AT AS =
1 0
0 1
1
2
1
2
12
1
2
1
2
1
2
12
1
2
1 0
0 1
12
1
2
12
12
1
2
1
2
12
1
2
233
The columns of AST should be ST (e1 ) and ST (e2 ). Check that this matches your
sketch. Do the same for T S.
Solution to exercise 9.3
The answer, using the notation in the final section of the chapter, is
AT [B, B ] = PB1 AT PB . Now,
0
1
3 5
AT = 5 13 ,
PB =
1 2
7 16
and
PB1
1 2
1 1 0
3
= 0
2 1 = 31
2
1 2 2
3
2
3
2
3
13
13
13
2
3
where we have omitted the details of the calculation of this inverse. It then follows that
1
3
AT [B, B ] = PB1 AT PB = 0
1 .
2 1
Solution to exercise 9.4
Write the vectors vi as the columns of a matrix,
1 1 0
PB = 0 1 1 .
1 2 5
Since |PB | = 2 = 0, the columns are linearly independent and hence a basis of R3 . PB is
the transition matrix from B coordinates to standard coordinates, v = PB [v]B . Finding
PB1 by the cofactor method, or otherwise, the B coordinates of u are
3
5 1
1
5
1
[u]B = PB1u = 1
5 1 2 = 4 .
2
1 3 1
3
2 B
5
S(u) = S(5v1 + 4v2 2v3 ) = 5S(v1 ) + 4S(v2 ) 2S(v3 ) = 5e1 + 4e2 2e3 = 4 .
2
Since R(S) is spanned by {e1 , e2 , e3 }, R(S) = R3 and N(S) = {0}. The linear
transformation S is the inverse of the linear transformation T with T (e1 ) = v1 ,
T (e2 ) = v2 , T (e3 ) = v1 , which has matrix PB , so the matrix AS = PB1 .
234
1
x
N(T ) = y : x = y = z = s 1 : s R ,
z
1
so that a basis of N(T ) is the vector (1, 1, 1)T . The dimension theorem states that the
dimension of the range plus the dimension of the null space is equal to the dimension of
the domain, which is R3 . We have nullity(T ) = 1 and rank(T ) = 2 since R(T ) = R2 .
That is, rank(T ) + nullity(T ) = 2 + 1 = 3 which confirms that T can exist. Given that
T (e1 ) and T (e2 ) are the standard basis vectors of R2 , we have
A
1 0 1
0 1 1
where the last column is obtained from the basis of the null space, since
c1 + c2 + c3 = 0. Then
x
1 0 1
xz
T (x) =
y =
.
0 1 1
yz
z
Solution to exercise 9.6
Only the linear transformation T S is defined, with AT S = AT AS ,
3
5 1
1 0 1 1
2 4 1
1
5 1 =
AT S =
0 1 1 2
1 4 1
1 3 1
P = 0
1
1 0
1 1
0
1 0 , Q = 1 1 1 .
3 1
1 0 1
1 1 0
1 1
0
0
2 1
P 1 Q = 0
1 0 1 1 1 = 1 1 1
1 2 1
1 0 1
2 1 3
2
1
0
2 1
[x]B = 1 1 1 1 = 4
3 B
12 B
2 1 3
You could check your solution by finding x in standard coordinates from both bases.
235
236
Chapter 10
Diagonalisation
Suggested reading
Definitions
10.1.2
Now, a square linear system Bx = 0 has solutions other than x = 0 precisely when
|B| = 0. Therefore, taking B = A I, is an eigenvalue if and only if the determinant
237
10
10. Diagonalisation
Let
A=
Then
A I =
7 15
2 4
7 15
2 4
1 0
0 1
7
15
2
4
7
15
2
4
= (7 )(4 ) + 30
= 2 3 28 + 30
= 2 3 + 2.
|A I| =
So the eigenvalues are the solutions of 2 3 + 2 = 0. To solve this for , one could
use either the formula for the solutions to a quadratic equation, or simply observe
that the characteristic polynomial factorises. We have ( 1)( 2) = 0 with
solutions = 1 and = 2. Hence the eigenvalues of A are 1 and 2, and these are the
only eigenvalues of A.
10
7 15
2 4
We have seen that the eigenvalues are 1 and 2. To find the eigenvectors for
eigenvalue 1 we solve the system (A I)x = 0. We do this by putting the coefficient
matrix A I into reduced echelon form.
(A I) =
238
6 15
2 5
1 25
0 0
5
2
for any t R.
5 15
2 6
1 3
0 0
3
1
t R.
Any non-zero scalar multiple of the vector (3, 1)T is an eigenvector of A for
eigenvalue 2.
Note that each system of equations is simple enough to be solved directly. For
example, if x = (x1 , x2 )T , the system (A 2I)x = 0 consists of the equations
5x1 15x2 = 0 ,
2x1 6x2 = 0.
Clearly both equations are equivalent to x1 = 3x2 . If we set x2 = t for any real
number t, then we obtain the eigenvectors for = 2 as before.
So why do we prefer row operations? There are two reasons. The first reason is that the
system of equations may not be as simple as the one just given, particularly for an
n n matrix where n > 2. The second reason is that putting the matrix A I into
echelon form provides a useful check on the eigenvalue. If |A I| = 0, the echelon form
of A I must have a row of zeros, and the system (A I)x = 0 will have a
non-trivial solution. If we have reduced the matrix (A 0 I) for some supposed
eigenvalue 0 and do not obtain a zero row, we know immediately that there is an error,
either in the row reduction or in the choice of 0 , and we can go back and correct it.
Examples in R3
We now give two examples with 3 3 matrices.
Example 10.3
Suppose that
4 0 4
A = 0 4 4.
4 4 8
Find the eigenvalues of A and find the corresponding eigenvectors for each
eigenvalue.
239
10
10. Diagonalisation
4
0
4
0
4
4
4
4
8
4
4
0 4
+4
4
8
4
4
= (4 ) ((4 )(8 ) 16) + 4 (4(4 ))
= (4 ) ((4 )(8 ) 16) 16(4 ).
= (4 )
We notice that each of the two terms in this expression has 4 as a factor, so
instead of expanding everything, we take 4 out as a common factor, obtaining
|A I| =
=
=
=
It follows that the eigenvalues are 4, 0, 12. (The characteristic polynomial will not
always factorise so easily. Here it was simple because of the common factor 4 .
The next example is more difficult.)
To find an eigenvector for 4, we have to solve the equation (A 4I)x = 0 for
x = (x1 , x2 , x3 )T . Using row operations, we have
0 0 4
1 1 0
0 0 4 . . . 0 0 1 .
4 4 4
0 0 0
10
1
x = t 1 ,
t R.
0
So the eigenvectors for = 4 are the non-zero multiples of
1
v1 = 1 .
0
Activity 10.1 Determine the eigenvectors for 0 and 12. Check your answers: verify
that Av = v for each eigenvalue and one corresponding eigenvector.
Example 10.4
240
Let
3 1 2
A = 1 1 1 .
1
1
0
3
1
2
1
1 1
1
1
= (3 )
1 1
1 1
1 1
(1)
2
1
1
1
1
= (3 )(2 + 1) + ( 1) 2(2 + )
= 3 42 5 2
= (3 + 42 + 5 + 2) .
Now, the fact that 1 is an eigenvalue means that 1 is a solution of the equation
|A I| = 0, which means that (1), that is, + 1, is a factor of the
characteristic polynomial |A I|. So this characteristic polynomial can be written
in the form
( + 1)(a2 + b + c).
Clearly we must have a = 1 and c = 2 to obtain the correct 3 term and the correct
constant. Using this, and comparing the coefficients of either 2 or with the cubic
polynomial, we find b = 3. In other words, the characteristic polynomial is
(3 + 42 + 5 + 2) = ( + 1)(2 + 3 + 2) = ( + 1)( + 2)( + 1).
Activity 10.2 Perform the calculations to check that b = 3 and that the
characteristic polynomial factorises as stated.
We have, |A I| = ( + 1)2 ( + 2). The eigenvalues are the solutions to |A I| = 0,
so they are = 1 and = 2.
Note that in this case, there are only two distinct eigenvalues. We say that the
eigenvalue 1 has occurred twice, or the = 1 is an eigenvalue of multiplicity 2. We
will find the eigenvectors when we look at this example again in section 10.2.1.
10.1.3
Eigenspaces
241
10
10. Diagonalisation
1
0
D = diag(1 , 2 , . . . , n ) =
0
0
0
2
0
0
0
0
.
..
. 0
n
(Note the useful notation for describing the diagonal matrix D.) Then we have
AP = P D. If the columns of P are the vectors v1 , v2 , . . . , vn , then
AP = A(v1 . . . vn ) = (Av1 . . . Avn ),
and
1
0
P D = (v1 . . . vn )
0
0
Av1 = 1 v1 ,
10
0
2
0
0
0
0
= (1 v1 . . . n vn ).
..
. 0
n
Av2 = 2 v2 ,
...,
Avn = n vn .
The fact that P 1 exists means that none of the vectors vi is the zero vector. So this
means that (for i = 1, 2, . . . , n) i is an eigenvalue of A and vi is a corresponding
eigenvector. Since P has an inverse, these eigenvectors are linearly independent.
Therefore, A has n linearly independent eigenvectors. Conversely, if A has n linearly
independent eigenvectors, then the matrix P whose columns are these eigenvectors will
be invertible, and we will have P 1 AP = D where D is a diagonal matrix with entries
equal to the eigenvalues of A. We have therefore established the following result.
Theorem 10.1 An n n matrix A is diagonalisable if and only if it has n linearly
independent eigenvectors.
Since n linearly independent vectors in Rn form a basis of Rn , another way to state this
theorem is:
Theorem 10.2 An n n matrix A is diagonalisable if and only if there is a basis of
Rn consisting of eigenvectors of A.
242
Suppose that this is the case, and let v1 , . . . , vn be n linearly independent eigenvectors,
where vi is an eigenvector for eigenvalue i . Then the vectors form a basis of Rn , and
the matrix P = (v1 . . . vn ) is such that P 1 exists, and P 1 AP = D where
D = diag(1 , . . . , n ).
This gives us a more sophisticated way to think about diagonalisation in terms of
change of basis and matrix representations of linear transformations. Suppose that
T = TA is the linear transformation corresponding to A, so that T (x) = Ax for all x.
Suppose that A has a set of n linearly independent eigenvectors B = {v1 , v2 , . . . , vn },
corresponding (respectively) to the eigenvalues 1 , . . . , n . Then B is a basis of Rn .
By Theorem 9.7 the matrix representing the linear transformation T with respect to the
basis B is AT [B, B] = P 1 AT P , where the columns of P are the basis vectors,
P = (v1 . . . vn ) .
P is the matrix whose columns are the basis of eigenvectors of A and AT is the matrix
representing T , which in this case is simply A itself, so that
P 1 AP = AT [B, B] = D.
In other words, the matrices A and D are similar. They represent the same linear
transformation, but A does so with respect to the standard basis and D represents T in
the basis of eigenvectors of A.
What does this tell us about the linear transformation T = TA ? If x Rn is any vector,
then its image under the linear transformation T is particularly easy to calculate in B
coordinates, where B is the basis of eigenvectors of A. That is, suppose the B
coordinates of x are [x]B = [b1 , b2 , . . . , bn ]B , then since [T (x)]B = AT [B, B][x]B = D[x]B ,
we have
1 0 . . . 0
b1
1 b1
0 2 . . . 0 b2
2 b2
[T (x)]B =
0 0 . . . 0 ... = ... .
0
. . . n
bn
n bn
4 0 4
A = 0 4 4.
4 4 8
We have seen that it has three distinct eigenvalues 0, 4, 12. From the eigenvectors
we found we take one eigenvector corresponding to each of the eigenvalues
243
10
10. Diagonalisation
1
1
v1 = 1 , v2 = 1 ,
0
1
We now form the matrix P whose columns
P =
1
0
v3 = 1 .
2
1 1
1 1 .
1 2
4 0 0
D = 0 0 0 .
0 0 12
You can choose any order for listing the eigenvectors as the columns of the matrix
P , as long as you write the corresponding eigenvalues in the corresponding columns
of D, that is, as long as the column orders in P and D match. (If, for example, we
had chosen P = (v2 v1 v3 ) then D = diag(0, 4, 12).)
As soon as you have written down the matrices P and D, you should check that
your eigenvectors are correct. That is, check that
AP = (Av1 Av2 Av3 ) = (1 v1 2 v2 3 v3 ) = P D.
Activity 10.3 Carry out this calculation to check that the eigenvectors are correct,
that is, check that the columns of P are eigenvectors of A corresponding to the
eigenvalues 4, 0, 12.
10
Then, according to the theory, if P has an inverse, that is, if the eigenvectors are
linearly independent, then P 1AP = D = diag(4, 0, 12).
Activity 10.4 Check that P is invertible. Then find P 1 (the inverse may be
calculated using either elementary row operations or the cofactor method) and verify
that P 1 AP = D.
Note how important it is to have checked P first. Calculating the inverse of an incorrect
matrix P would have been a huge wasted effort.
Activity 10.5 Geometrically, how would you describe the linear transformation
TA (x) = Ax for this example?
244
10.2.1
The 2 2 matrix
A=
4 1
1 2
x1
x2
0
0
1 c1 v1 + 1 c2 v2 + . . . + 1 ck vk = 0.
(1)
(2)
245
10
10. Diagonalisation
3 1 1
A = 0 2 0.
1 1 3
3 1
1
0
2
0
1
1 3
= (2 )
=
=
=
=
10
3
1
1
3
(2 )(2 6 + 9 1)
(2 )(2 6 + 8)
(2 )( 4)( 2)
( 2)2 ( 4).
1 1 1
1 1 1
(A 2I) = 0 0 0 0 0 0 .
1 1 1
0 0 0
246
We see immediately that this matrix has rank 1, so its null space (the eigenspace for
= 2) will have dimension 2, and we can find a basis of this space consisting of two
linearly independent eigenvectors. Setting the non-leading variables equal to
arbitrary parameters s and t, we find that the solutions of (A 2I)x = 0 are
1
1
x = s 1 + t 0 = sv1 + tv2 , s, t R,
0
1
where v1 and v2 are two linearly independent eigenvectors for = 2.
Activity 10.6
Now, knowing that we will be able to diagonalise A, we find the eigenvector for = 4
by reducing (A 4I).
1 1 1
1 0 1
(A 4I) = 0 2 0 . . . 0 1 0
1 1 1
0 0 0
with solutions
Let
1
x = t0,
1
t R.
v3 = 0 .
1
1 1 1
4 0 0
P = 0 1 0 and P 1 AP = D = 0 2 0 .
1 0 1
0 0 2
Activity 10.7 Check this! Check that AP = P D. Once you have checked that the
columns of P are the eigenvectors corresponding to the eigenvalues in the
corresponding columns of D, the theory will tell you that P 1AP = D. Why?
Example 10.8 Consider again the last 3 3 example in section 10.1.2. We found
that the matrix,
3 1 2
A = 1 1 1
1
1
0
has an eigenvalue 1 = 1 of multiplicity 2, and a second eigenvalue, 2 = 2. We
can find one (linearly independent) eigenvector corresponding to 2 = 2. In order
247
10
10. Diagonalisation
2 1 2
1 0 1
(A + I) = 1
0
1 . . . 0 1 0 .
1
1
1
0 0 0
This matrix has rank 2 and the null space (the eigenspace for = 1) therefore (by
the rank-nullity theorem) has dimension 1. We can only find one linearly
independent eigenvector for = 1. All solutions of (A + I)x = 0 are of the form
1
x = t 0
t R.
1
If A is the matrix
A=
0 1
1 0
1
= 2 + 1 = 0
1
10
Learning outcomes
This chapter has discussed eigenvalues and eigenvectors and the very important
technique of diagonalisation. We shall see in the next chapter how useful a technique
diagonalisation is. At the end of this chapter and the relevant reading, you should be
able to:
state what is meant by the characteristic equation of a matrix
state carefully what is meant by eigenvectors and eigenvalues, and by
diagonalisation
find eigenvalues and corresponding eigenvectors for a square matrix
diagonalise a diagonalisable matrix
determine whether or not a matrix can be diagonalised
248
10.2. Exercises
Exercises
Exercise 10.1
Diagonalise the matrix
A=
7 2
1 8
that is, find an invertible matrix P and a diagonal matrix D such that P 1 AP = D.
Check your answer.
Exercise 10.2
Find the eigenvalues of the matrix
0 2
A=
16 4
16 4
1
6
10
and find an eigenvector for each eigenvalue. Hence find an invertible matrix P and a
diagonal matrix D such that P 1 AP = D.
Exercise 10.3
Determine if either of the following matrices can be diagonalised:
A=
1 1
0 1
B=
1 1
1 1
Exercise 10.4
Let A be any (real) n n matrix and suppose is an eigenvalue of A. Show that
{x : Ax = x}, the set of eigenvectors for eigenvalue together with the zero vector 0,
is a subspace of Rn .
Exercise 10.5
Let
1 1 2
A = 6 2 6
0 1 1
x = 0.
1
249
10
10. Diagonalisation
Exercise 10.6
Diagonalise the matrix A:
0 0 2
A = 1 2 1 .
1 0 3
Exercise 10.7
Suppose that A is a real diagonalisable matrix and that all the eigenvalues of A are
non-negative. Prove that there is a matrix B such that B 2 = A.
4 0 4
1 0 1
0 4 4 0 1 1 .
4 4 8
0 0 0
The solutions are
1
x = t 1 ,
1
t R,
so that the eigenvectors are non-zero multiples of v2 = (1, 1, 1)T . The eigenspace of
= 0 is the null space of the matrix A. Note that Av2 = 0v2 = 0.
10
Similarly, you should find that for = 12 the eigenvectors are non-zero multiples of
1
v3 = 1 .
2
Feedback to activity 10.3
Perform the matrix multiplication to show that
3 3 0
1
= 2 2 2 .
6
1
1 2
Check that P P 1 = I. You have calculated AP in the previous activity, so now just
multiply P 1 AP to obtain D.
250
Comments on exercises
Solution to exercise 10.1
First find the eigenvalues,
|A I| =
=
=
=
=
7 2
1 8
(7 )(8 ) 2
2 15 + 54
( 6)( 9)
0.
So = 6 and = 9 are the eigenvalues. Then find an eigenvector for each eigenvalue:
1 = 6 :
A 1 I =
1 2
1 2
1 2
0 0
2 = 9 :
A 2 I =
2 2
1 1
1 1
0 0
P =
2 1
1 1
1
1
2
1
A basis of eigenvectors is
,
=
=
2
1
1
1
v1 =
v2 =
10
and D =
6 0
0 9
then P 1 AP = D.
7 2
1 8
2 1
1 1
then
P 1 =
and
P 1AP =
1
3
1 1
1 2
1
3
12 9
6
9
1 1
1 2
12 9
6
9
= (6v1 9v2 ),
6 0
0 9
= D.
251
10. Diagonalisation
1
1 ,
2
1
2,
2
1
4.
0
1 1 1
P = 1 2 4 ,
2 2 0
D = diag(0, 6, 8).
10
1 1 2
1
1
Ax = 6 2 6
0 = 0 = 1x
0 1 1
1
1
252
1 1
0
P = 0 1 2 ,
D = diag(1, 2, 1).
1 1
1
This matrix A does not have 3 distinct eigenvalues. The eigenvalues turn out to be 1
and 2, with 2 occurring twice (an eigenvalue of multiplicity two). So we first check that
we can find two linearly independent eigenvectors for = 2.
You should find that the eigenspace for = 2 is two-dimensional and has a basis
consisting of (1, 0, 1)T and (0, 1, 0)T . An eigenvector for 1 is (2, 1, 1)T . These three
vectors form a linearly independent set. Therefore we may take
2 1 0
P = 1
0 1,
D = diag(1, 2, 2).
1
1 0
You should check your result by calculating AP
Solution to exercise 10.7
Since A can be diagonalised, we have P 1 AP = D for some P , where
D = diag(1 , . . . , n ), these entries being the eigenvalues of A. It is given that all i 0.
We have A = P DP 1.
If B 2 = A we must have
D = P 1 AP = P 1 B 2 P = P 1BP P 1 BP = (P 1BP )2 .
Therefore, let
B = P diag( 1 ,
2 , . . . ,
n )P 1.
10
1 ,
2
= P diag( 1 ,
= P DP 1 = A,
2 , . . . ,
2
2 , . . . ,
n )P 1 P diag(
1 ,
2 , . . . ,
n )P 1
n )P 1
253
10. Diagonalisation
10
254
Chapter 11
Applications of diagonalisation
Suggested reading
It is often useful, as we shall see in this chapter, to determine An for a general integer n.
Diagonalisation helps here. If we can write P 1 AP = D, then A = P DP 1 and so
An = A A A A
n times
1
= (P DP
11
) (P DP 1) (P DP 1) (P DP 1)
n times
n times
1
= PD P
The product P D n P 1 is easy to compute since D n is simply the diagonal matrix with
entries equal to the nth power of those of D.
255
Activity 11.1
then
1 0
0 2
D=
..
...
.
0 0
n1
0
Dn =
...
0
0
. ,
..
. ..
k
0
0
.
..
.
0
n2
..
.
..
.
nk
We give an illustrative example using a 2 2 matrix, but you should be able to carry
out the procedure for 3 3 matrices as well.
Example 11.1
matrix
Suppose that we want a matrix expression for the nth power of the
1
A=
1
2
4
0
A+I =
4
1
1
2
1 2
0 0
1
1
2
4
2
1 4
0 0
so we may take (4, 1)T . Let P be the matrix whose columns are these eigenvectors.
Then
2 4
P =
.
1 1
11
The inverse is
P 1 =
Activity 11.2
1
6
1 4
1 2
We have P 1AP = D = diag(1, 2). The nth power of the matrix A is given by
An = P D n P 1
1
2 4
=
6 1 1
256
(1)n
0
0
2n
1 4
1 2
1
6
11.2.1
Suppose three sequences xt , yt and zt satisfy x0 = 12, y0 = 6, z0 = 6 and are related, for
t 0, as follows:
xt+1 = 5xt + 4zt
yt+1 = 5yt + 4zt
zt+1 = 4xt + 4yt + 9zt ,
(11.1)
(11.2)
(11.3)
We cannot directly solve equation (11.1) for xt since we would need to know zt . On the
other hand we cant work out zt directly from equation (11.2) or equation (11.3)
because to do so we would need to know yt ! It seems impossible, perhaps, but there are
ways to proceed.
Note that this (coupled) system of difference
xt+1
5
yt+1 = 0
zt+1
4
That is,
11
xt+1 = Axt ,
where
5 0 4
xt
xt = yt , A = 0 5 4 .
4 4 9
zt
The general system we shall consider will take the form xt+1 = Axt where A is an n n
square matrix. We shall concentrate on 3 3 and 2 2 systems, though the method is
applicable to larger values of n.
We shall describe two techniques: one involving a change of variable, and the other
powers of matrices.
257
11.2.2
We can use diagonalisation as the key to a general method for solving systems of
difference equations. Given a system xt+1 = Axt , in which A is diagonalisable, we
perform a change of variable or change of coordinates, as follows. Suppose that
P 1 AP = D (where D is diagonal) and let
xt = P zt
or, equivalently, the new variable vector zt is zt = P 1 xt , so that the vector xt is in
standard coordinates and zt is in coordinates in the basis of eigenvectors. Then the
equation xt+1 = Axt becomes
P zt+1 = AP zt ,
which means that
zt+1 = P 1 AP zt = Dzt ,
which since D is diagonal, is very easy to solve for zt . To find xt we then use the fact
that xt = P zt .
Example 11.2
5 0 4
A = 0 5 4.
4 4 9
To use the technique, we need to diagonalise A. You should work through this
diagonalisation yourself. Well omit the workings here, but if
1 1 1
P = 1 1 1
1
0 2
then
11
P 1 AP = D = diag(1, 5, 13).
Now let
ut
zt = vt
wt
ut+1
1
vt+1 = 0
wt+1
0
258
xt
1 1 1
ut
1 1
xt = yt = P zt = 1 1 1
vt = 1 1
xt
1
0 2
wt
1
0
need to find xt , yt , zt .
1
u0
1 5t v0 .
2
(13)t w0
But we have also to find out what u0 , v0 , w0 are. These are not given in the problem,
but x0 , y0 , z0 are, and we know that
x0
u0
u0
1 1 1
y0 = P v0 = 1 1 1 v0 .
z0
w0
1
0 2
w0
To find u0 , v0 , w0 we can either solve the linear system
u0
x0
12
P v0 = y0 =
6
w0
z0
6
using row operations, or we can (though it involves more work) find out what P 1 is
and use the fact that
u0
x0
12
v0 = P 1 y0 = P 1 6 .
w0
z0
6
Either way (and the working is again omitted), we find
u0
4
v0 = 3 .
w0
5
1 1 1
u0
xt
yt = 1 1 1 5t v0
1
0 2
(13)t w0
xt
1 1 1
4
= 1 1 1 3(5t )
1
0 2
5(13t )
4 + 3(5t ) + 5(13)t
= 4 3(5t ) + 5(13)t .
4 + 10(13)t
11
259
Activity 11.3 Perform the omitted diagonalisation calculations required for the
example just given.
As this example demonstrates, solving a system of difference equations involves a lot of
work, but the good news is that it is just a matter of going through a definite (if
time-consuming) procedure.
11.2.3
Another way of looking at this problem is to notice that if xt+1 = Axt , then
xt = At x0 .
Activity 11.4
Show this.
This solution can be determined explicitly if we can find the tth power At of the matrix
A. As described in section 11.1, this can be done using diagonalisation of A.
Example 11.3 We solve the system of the above example using matrix powers.
The system is xt+1 = Axt where
5 0 4
A = 0 5 4
4 4 9
11
1 1 1
P = 1 1 1 ,
1
0 2
we have
P 1 AP = D = diag(1, 5, 13).
So A = P DP 1 and At = P D tP 1 . Now, as you can calculate (the details are
omitted here),
260
so
xt = At x0 = P D tP 1 x0
Doing the multiplication (again, details omitted),
1 0
0
12
4 + 3(5t ) + 5(13t )
xt = P 0 5t
0 P 1 6 = 4 3(5t ) + 5(13t ) ,
0 0 (13)t
6
4 + 10(13t )
which is, of course, precisely the same answer as we obtained using the previous
method.
Activity 11.5
Note that although this technique is presented as being different from the one using a
change of variable, they are essentially the same. Here, as before, the matrix P 1 x0
represents the coordinates of the vector x0 (the initial conditions) in the basis of
eigenvectors of A (the columns of P ). In both cases, diagonalisation enables us to solve
the system by a change of basis from the standard basis in Rn to a basis consisting of
eigenvectors of the matrix A.
11.2.4
Markov chains
From A
0.70
0.20
0.10
From B
0.15
0.80
0.05
From none
0.30
0.20
0.50
For example, an interpretation of the second column is that during any given week
supermarket B will keep 80% of its customers while losing 15% to supermarket A
and 5% to no supermarket. Suppose that at the end of a certain week (call it week
zero) it is known that the total population of T = 20, 000 shoppers was distributed
as follows: 10,000 (0.5 T ) went to supermarket A; 8,000 (0.4 T ) went to
supermarket B; and 2,000 (0.1 T ) did not go to a supermarket.
Let xt denote the percentage of total shoppers going to supermarket A in week t, yt
the percentage going to supermarket B, and zt the percentage who do not go to any
supermarket. The number of shoppers in week t can be predicted by this model from
the numbers in the previous week, that is,
0.70 0.15 0.30
xt
xt = Axt1
where A = 0.20 0.80 0.20 , xt = yt
0.10 0.05 0.50
zt
261
11
with x0 = 0.5, y0 = 0.4, z0 = 0.1. The questions we wish to answer are: Can we
predict from this information the number of shoppers at each supermarket in any
future week t?, and Can we predict a long-term distribution of shoppers?
This is an example of a Markov chain.
In general, a Markov chain or a Markov process is a closed system consisting of a
population which is distributed into n different states and which changes with time
from one distribution to another. The system is observed at scheduled times. It is
assumed that the probability that a given member will change from one state into
another, depending on the state it occupied at the previous observation, is known. The
system is then observed at a certain time, and the information is used to predict the
distribution of the system into its different states at a future time t.
The probabilities are listed in an n n matrix A = (aij ) where the entry aij is the
probability that a member of the population will change from state j into state i. Such
a matrix, called a transition matrix, has the following two properties:
(1) The entries of A are all non-negative.
(2) The sum of the entries in each column of A is equal to 1:
Property (2) follows from the assumption that all members of the population must be
in one of the n states at any given time.
The distribution vector (or state vector) for the time period t is the vector xt , whose ith
entry is the percentage of the population in state i at time t. The entries of xt sum to 1,
for the reason just given, that all members of the population are in one of the states at
any time. Our first goal is to find the state vector for any t, and to do this we need to
solve the difference equation
xt = Axt1 .
A solution of the difference equation is an expression for the distribution vector xt in
terms of the original information A and x0 , and so, as we have seen in the previous
section, the solution is xt = At x0 .
Now assume that A can be diagonalised. If A has eigenvalues 1 , 2 , . . . , n with
corresponding eigenvectors v1 , v2 , . . . , vn , then P 1 AP = D where P is the matrix of
eigenvectors of A and D is the corresponding diagonal matrix of eigenvalues.
The solution of the difference equation is
xt = At x0 = (P D tP 1 )x0 .
11
t1
|
|
|
0
=
v1 v2 vn
...
|
|
|
0
0
t2
..
.
0
= b1 t1 v1 + b2 t2 v2 + + bn tn vn .
262
0
b1
0 b2
.
..
. ..
bn
tn
..
.
Activity 11.6
1 0
0
3 3 1
P = 4 4 0 D = 0 0.6 0 P 1 AP = D.
0 0 0.4
1 1
1
Activity 11.7
1
1
1
P 1 x0 = 1 1
8
2 0
Hence,
1
0.5
0.125
b1
1 0.4 = 0.025 = b2 .
6
0.1
0.05
b3
3
3
1
xt = 0.125 4 + 0.025(0.6)t 4 0.05(0.4)t 0
1
1
1
263
11
and
0.375
q = lim xt = 0.500
t
0.125
As the total number of shoppers is 20, 000, the long-term distribution is predicted to be
20,000q: 7,500 to supermarket A; 10,000 to B; and 2,500 to no supermarket.
Activity 11.8
You will have noticed that an essential part of the solution of predicting a long-term
distribution for this example is the fact that the transition matrix A has an eigenvalue
= 1 (of multiplicity one), and that the other eigenvalues satisfy |i | < 1. In this case,
as t increases, the distribution vector xt will approach the unique eigenvector q for
= 1 which is also a distribution vector, so that Aq = q. (The fact that the entries sum
to 1 makes q unique in this one-dimensional eigenspace.)
We would like to be able to know that this is the case for any Markov chain, but there
are some exceptions to this rule. A Markov chain is said to be regular if some integer
power of the transition matrix A has strictly positive entries, aij > 0 (so no zero
entries). In this case, there will be a long-term distribution as the following theorem
implies.
Theorem 11.1 If A is the transition matrix of a regular Markov chain, then = 1 is
an eigenvalue of multiplicity one and all other eigenvalues satisfy |i | < 1.
We will not prove this theorem here; a proof can be found in texts on Markov chains.
However, we will prove a similar, but less strong result, which makes it clear that the
only thing that can go wrong is for the eigenvalue = 1 to have multiplicity greater
than 1. First we need a definition.
Definition 11.1 A matrix C is called a stochastic matrix if it has the following two
properties
(1) The entries of C are all non-negative.
(2) The sum of the entries in each row of C is equal to 1, ci1 + ci2 + + cin = 1.
11
264
1
u. Then Cw = w, wi = 1, and |wk | 1 for 1 k n.
ui
11
The theorem tells us that = 1 is an eigenvalue, but it might have multiplicity greater
than one, in which case either there would be more than one (linearly independent)
eigenvector corresponding to = 1, or the matrix might not be diagonalisable.
In order to obtain a long-term distribution we need to know that there is only one
(linearly independent) eigenvector for the eigenvalue = 1. So if the eigenvalue = 1 of
a transition matrix A of a Markov chain does have multiplicity 1, then Theorem 11.2
implies all the others will have |i | < 1. There will be one corresponding eigenvector
which is also a distribution vector and, provided A can be diagonalised, we will know
that there is a long-term distribution. This is all we will need in practice.
265
Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
calculate the general nth power of a diagonalisable matrix using diagonalisation
solve systems of difference equations in which the underlying matrix is
diagonalisable, by using both the matrix powers method and the change of variable
method
know what is meant by a Markov chain and its properties, and be able to find the
long-term distribution.
Exercises
Exercise 11.1
Solve the following system of difference equations.
xt+1 = xt + 4yt
1
yt+1 =
xt ,
2
given that x0 = y0 = 1000.
Exercise 11.2
Verify that the vector v is an eigenvector for the matrix M, where v and M are
2 0 1
1
M= 0 5 0 ,
v = 0,
1 0 2
1
and find the corresponding eigenvalue. Find the other eigenvalues of M. Hence find an
invertible matrix P and a diagonal matrix D such that P 1MP = D.
Sequences xt , yt, zt are defined by x0 = 6, y0 = 1, z0 = 4 and
xt+1 = 2xt + zt
yt+1 = 5yt
zt+1 = xt + 2zt .
11
Using the preceding calculations to change variables, find formulae for xt , yt , and zt .
Exercise 11.3
Given that
1
3
1
1 , 0 , 1
1
1
0
266
1 2 6
A= 2
5
6 ,
2 2 3
11.2. Exercises
0.6 0.2
0.4 0.8
, xt =
xt
yt
Show this is a Markov process and interpret the yearly population movements from the
matrix A.
Find expressions for xt and yt at any future time t.
Determine the long-term population distribution; that is, find what happens to xt as
t .
Exercise 11.5
Consider the matrices
7 2 2
B = 0 2 4,
3 6 4
xt
xt = yt .
zt
267
11
Let xt denote the number of shoppers going to the A-B supermarket at the end of
week t, yt the number going to the J-K supermarket, and zt the number who shop
locally. Assume that each member of the population goes shopping once a week.
Suppose that the probabilities a shopper will change from shopping in any of the
three places from one week to the next are given by the matrix A above, so that
xt = Axt1 .
Find the long-term population distribution of this system. State clearly the
eventual number of shoppers going to each of the three shopping areas.
t1
0
D t (P 1x0 ) =
...
0
0
t2
..
.
0
b1 t1
0
b1
0 b2 b2 t2
. = . .
..
. .. ..
bn
tn
bn tn
..
.
|
P (D t(P 1 x0 )) = v1
|
11
|
v2
|
b1 t1
|
b2 t2
t
t
t
vn
... = b1 1 v1 + b2 2 v2 + + bn n vn .
|
bn tn
Comments on exercises
Solution to exercise 11.1
We solve this using matrix powers. We could, of course, use a change of variable
instead. Notice that the system can be written as
xt+1 =
1 4
1/2 0
xt , where xt =
xt
yt
This is xt+1 = Axt , where A is the matrix whose nth power we calculated in the
example given earlier in this chapter. The solution (using the nth power result obtained
268
earlier) is
xt = At x0
1 2(1)t + 4(2t ) 8(1)t + 8(2t )
=
(1)t + 2t
4(1)t + 2(2t )
6
=
1000(1)t + 2000(2t )
500(1)t + 500(2t )
1000
1000
That is,
xt = 1000(1)t + 2000(2t ), yt = 500(1)t + 500(2t ).
Solution to exercise 11.2
Mv = 3v so the eigenvalue corresponding to v is 3. The characteristic polynomial turns
out to be (5 )(3 )(1 ) after factorisation, so the eigenvalues of M are 1, 3, 5.
Corresponding eigenvectors are (respectively) (1, 0, 1)T , v, (0, 1, 0)T . Then
P 1 MP = D where
1 1 0
P = 0 0 1 , D = diag(1, 3, 5).
1 1 0
We use the notation used earlier. The system of difference equations is xt+1 = Mxt .
Setting xt = P zt , it becomes zt+1 = Dzt , so
ut+1 = ut , vt+1 = 3vt , wt+1 = 5wt ,
therefore,
ut = u0 , vt = 3t v0 , wt = 5t w0 .
Now,
6
1 1 0
u0
u0 + v0
,
x0 = 1 = P z0 = 0 0 1 v0 =
w0
4
1 1 0
w0
u0 + v0
xt
1 + 5(3t )
yt = xt = P zt =
.
5t
t
zt
1 + 5(3 )
Note: the question explicitly says to use the diagonalisation to change variables in order
to solve the system of difference equations.
11
1 3 1
P = 1 0
1 ,
1
1
0
269
then
P 1 AP = diag(3, 3, 3) = D.
The system of difference equations is xt+1 = Axt . Let zt = (ut , vt , wt )T be given by
zt = P 1xt . Then the system is equivalent to zt+1 = Dzt , which is
ut+1 = 3ut ,
vt+1 = 3vt ,
wt+1 = 3wt .
vt = 3t v0 ,
wt = 3t w0 .
1/3
1/3 1
P 1 = 1/3 1/3 0 ,
1/3
4/3 1
so
1/3
1/3 1
1
2/3
z0 = P 1 x0 = 1/3 1/3 0 1 = 2/3 .
1/3
4/3 1
0
5/3
1 3 1
(2/3)(3)t
(2/3)(3)t + (1/3)3t
xt = P zt = 1 0
1 (2/3)3t = (2/3)(3)t + (5/3)3t .
1
1
0
(5/3)3t
(2/3)(3)t (2/3)3t
Solution to exercise 11.4
This is a Markov process as it consists of a total population distributed into two states,
and the matrix A satisfies the criteria to be a transition matrix: (1) the entries are
positive and (2) the sum of the entries in each column is 1.
Interpreting the system, each year 40% of those living by the sea move to the oasis
(60% remain) and 20% of those living in the oasis move to the sea.
To solve the system we need to diagonalise the matrix A. First find the eigenvalues.
11
|AI| =
0.6
0.2
= 0.481.4+2 0.08 = 2 1.4+0.4 = (1)(0.4) = 0
0.4
0.8
=1
and
2 = 0.4 :
270
AI =
(A I)v = 0:
0.4 0.2
0.4 0.2
1 21
0 0
0.2 0.2
0.4 0.4
1 1
0 0
A 0.4I =
= v1 =
= v2 =
1
2
1
1
1 1
2 1
1t
0
0
(0.4)t
1 1
2 1
1t
0
0
(0.4)t
1
3
1
2
1
(0.4)t
6
1
1
1
3
1
3
1 1
2 1
0.5
0.5
.
16
1 1
+ (0.4)t ,
3 6
yt =
2 1
(0.4)t .
3 6
Av =
v,
10
so A and B have the same eigenvectors, v, and /10 is the corresponding eigenvalue of
A.
(ii) A is the transition matrix of a Markov chain because:
1. All the entries are non-negative (aij 0).
2. The sum of the entries in each column is 1.
Since = 1 is an eigenvalue of a Markov chain, we can deduce that 10 = 10 is an
eigenvalue of B.
(iii) To find an eigenvector for = 10, we solve (B 10I)x = 0.
3 2
2
1
2 2
1 0 1
(B 10I) = 0 8 4 3 2
2 . . . 0 1 12
0 0 0
3
6 6
0 2 1
2
So an eigenvector for = 10 is v1 = 1 .
2
271
11
To find the other eigenvalues, we find the characteristic equation. Expanding the
determinant by the first column,
|B I| =
7
2
2
0
2
4
= (7 )(2 6 16) + 3(2 + 4) = 0.
3
6
4
Factoring the quadratic, there is a common factor of + 2 in the two terms, which can
be factored out, avoiding a cubic equation. We have
|B I| = (+2)[(7)(8)6] = (+2)(2 15+50) = (+2)(10)(5).
So the eigenvalues are = 10, 5, 2.
We then find the corresponding eigenvectors. Solving (B 5I)v = 0,
2 2
2
1 1 1
1 0 37
(B 5I) = 0 3 4 . . . 0 3 4 . . . 0 1 43
3 6 1
0 0 0
0 0 0
7
So an eigenvector for = 5 is v2 = 4 .
3
For = 2, we have
9 2 2
1 2 2
1 0 0
(B + 2I) = 0 4 4 . . . 0 1 1 0 1 1
3 6 6
0 0 0
0 0 0
0
So an eigenvector for = 2 is v3 = 1 .
1
If
2 7 0
P = 1 4 1
2 3
1
10 0 0
and D = 0 5 0 ,
0 0 2
then P 1 BP = D.
7 2 2
2 7 0
20 35 0
BP = 0 2 4 1 4 1 = 10 20
2 = P D.
3 6 4
2 3
1
20 15 2
11
Why are you being asked to check? So that you know you do have the correct
eigenvalues and eigenvectors. This gives you an opportunity to look for and correct any
minor mistakes you may have made.
(iv) You are now being asked to apply the diagonalisation to the given Markov chain.
The eigenvalues and corresponding eigenvectors of A are = 1 with eigenvector v1 ,
= 0.5 with corresponding eigenvector v2 and = 0.2 with corresponding eigenvector
v3 .
272
2
40, 000
1
1 100, 000 = 20, 000 .
5
2
40, 000
That is, 40,000 shoppers will go to the A-B supermarket, 20,000 will go to the J-K
supermarket, and 40,000 will shop locally.
Notice that you did not need to find the solution to xt = Axt1 . This would have been a
perfectly acceptable method, but one which would take much more time. You only
needed to know that since (0.5)t 0 and (0.2)t 0 as t , the eigenvector
corresponding to = 1 will give the long-term distribution. It must be a distribution
vector, that is, the components of the column vector must sum to 1, so that when
multiplied by the total population of 100, 000 it gives the distribution of shoppers.
11
273
11
274
Appendix A
Sample examination paper
Important note: This Sample examination paper reflects the intended examination
and assessment arrangements for this course in the academic year 2011/2012. The
intended format and structure of the examination may have changed since the
publication of this subject guide. You can find the most recent examination papers on
the VLE where all changes to the format of the examination are posted.
x1 =
1 , x2 = 3 , x3 = 7 , x4 = 6 .
6
5
3
4
Express x4 as a linear combination of the other three vectors.
2(a) A system of linear equations Ax = d is known to have the following solution:
1
2
1
2
1
1
x=
0 + s1 + t 0 .
1
0
1
0
0
1
275
a
axt xt1 ,
4
a, find a
(c) An investor saves money in a bank account paying interest at a fixed rate of 5%,
where the interest is paid once per year, at the end of the year. She makes an
initial deposit of $20, 000 and, then, at the end of each of the next N years, just
after the interest has been paid, she withdraws an amount of $500. Find an
expression, in terms of N, for the amount of money in the account at the end of N
years, just after the Nth withdrawal has been made.
3(a) Consider the set
2t
H=
t
: tR .
3t
Prove that the set H is closed under addition and scalar multiplication. Hence, or
otherwise, prove that it is a subspace of R3 .
Show that every vector w H is a unique linear combination of the vectors
1
0
v1 =
0
and v2 = 1 .
1
5
Answer the following questions, justifying any answers.
(1) Is {v1 , v2 } a basis of the subspace H? If yes, state why. If no, write down a
basis of H. State the dimension of H.
(2) Find a Cartesian equation for the subspace G = Lin{v1 , v2 }. Is {v1 , v2 } a
basis of G? Why or why not?
(b) State the dimension (rank-nullity) theorem for a linear transformation,
T : V W , from a finite dimensional vector space V to a vector space W ,
carefully defining each term.
276
A
Let {e1 , e2 , e3 , e4 } be the standard basis of R4 , and let v1 , v2 , v3 , x be the
following vectors in R3 (where x, y, z are constants):
1
2
1
x
0
, v2 =
3
, v3 =
5
,
x = y ,
v1 =
7
z
2
1
Let T be a linear transformation, T : R4 R3 , given by
T (e1 ) = v1 ,
T (e2 ) = v2 ,
T (e3 ) = v3 ,
T (e4 ) = x.
(i) Suppose the vector x is such that the linear transformation T has
dimR(T ) = dimN(T ).
Write down a condition that the components of x must satisfy for this to happen.
Find a basis of R(T ) in this case.
(ii) Suppose the vector x is such that the linear transformation T has
dimN(T ) = 1.
Write down a condition that the components of x must satisfy for this to happen.
Find a basis of N(T ) in this case.
4 Suppose
1 2 1
A = 4 4 8 .
13 2 11
1 4 5 3 2
A = 0 2 4 2 2,
1 1 5 0 1
11
b = 2 .
6
277
Write down the coordinates of the vector b in this basis; that is, write down [b]B .
(b) Show that the set S = {c1 , c3 , c4 } is also a basis of R3 .
Find the transition matrix P from coordinates in the basis B to coordinates in the
basis S.
Hence, or otherwise, find [b]S , the coordinates of the vector b in the basis S.
278
Appendix B
Commentary on the Sample
examination paper
General remarks
We start by emphasising that candidates should always include their working. This
means two things. First, you should not simply write down the answer in the
examination script, but should explain the method by which it is obtained. Second, you
should include rough working. The Examiners want you to get the right answers, of
course, but it is more important that you demonstrate that you know what you are
doing: that is what is really being examined.
We also stress that if a candidate has not completely solved a problem, they may still
be awarded marks for a partial, incomplete, or slightly wrong, solution; but, if they have
written down a wrong answer and nothing else, no marks can be awarded.
Solutions to questions
Question 1(a) Since you are asked to use matrix methods, begin by thinking of the
system of equations in matrix form, as Ax = b with
1 1 2
x
4
A = 3 1 1 ,
x = y ,
b = 0.
1 1
a
z
b
Read through the question to know all that is being asked. There are different
approaches you can take to start.
The most efficient method is to write down the augmented matrix and begin to row
reduce it,
1 1 2 4
1 1
2
4
(A|b) = 3 1 1 0 R2 3R1 0 2
7
12
1 1
a b
R3 R1
0 2 a2 b4
1 1
2
4
0 2
7
12 .
R3 R2
0 0 a+5 b+8
You are now in a position to answer the questions asked in the order in which they were
asked.
279
The system will be consistent with infinitely many solutions if and only if the last row
of the row echelon form is a row of zeros, so a = 5 and b = 8. It will have a unique
solution if and only if a + 5 = 0, so a = 5. It will be inconsistent (no solution) if and
only if a + 5 = 0 and b + 8 = 0, that is a = 5 and b = 8.
Alternatively, you can begin by evaluating the determinant of A, for example, using the
cofactor expansion by row 3,
1 1 2
3 1 1 = 1(1 + 2) 1(1 6) + a(1 + 3) = 10 + 2a.
1 1
a
The system will have a unique solution if and only if |A| = 0, so a = 5. If a = 5
there will either be infinitely many solutions or no solution, depending on the value of b.
To answer the remaining questions, you still need to row reduce the augmented matrix,
but this time you can do it with a = 5,
1 1 2 4
1 1 2
4
3 1 1 0 R2 3R1 0 2 7 12 .
1 1 5 b
R3 R1
0 2 7 b 4
Comparing the last two rows, you can see that the system will be inconsistent if
b 4 = 12, that is if b = 8 and a = 5, and that there will be infinitely many
solutions if b = 8 and a = 5.
(b) If you have successfully solved part (a) of this question, then the easiest way to
solve the system with a = 4 and b = 1 is to substitute these values into the row echelon
form of the augmented matrix and continue reducing:
1 1
2
4
1 1 2
4
1 1 2
4
0 2
7
12 = 0 2 7 12 0 2 7 12
0 0 a+5 b+8
0 0
9
9
0 0
1
1
1 1 0 2
1 1 0 2
1 0 0 21
0 2 0 5 0 1 0 25 0 1 0 52 .
0 0 1 1
0 0 1 1
0 0 1 1
The unique solution is x = (x, y, z)T = ( 12 , 52 , 1)T . (It is easy for you to check that
this is correct by substituting the values into the equations.)
You could also solve this system using the inverse matrix or Cramers rule. These are
covered in Chapter 4 of the subject guide. It is a good idea for you to practise these
methods by solving this system to obtain the same answer.
(c) Linear independence is covered in Chapter 8 of the subject guide.
A set {x1 , x2 , . . . , xk } of vectors in Rn is linearly dependent if there are real numbers
a1 , a2 , . . . , ak , not all zero, such that
a1 x1 + a2 x2 + + ak xk = 0.
Equivalently, the set {x1 , x2 , . . . , xk } of vectors is linearly dependent if one of the vectors
can be expressed as a linear combination of the others. (Either statement is acceptable).
280
x1 =
1 , x2 = 3 , x3 = 7 , x4 = 6 ,
4
5
3
6
you can write the vectors as the columns of a matrix A and row reduce it, thereby
solving the system of equations
Ax = a1 x1 + a2 x2 + a3 x3 + a4 x4 = 0.
The steps are not shown here, but you should show all steps
reduced row echelon form You should obtain the result:
1 2 2 2
1 0
2 0 1 5
0 1
A=
1 3 7 6 . . . 0 0
4 5 3 6
0 0
0 2
0 1
,
1 1
0 0
from which you can deduce that there are infinitely many solutions, since there is one
non-leading variable, and therefore the vectors are linearly dependent.
To find the linear combination, you can spot the linear dependence relations between
the columns of the reduced row echelon form, and the columns of A will have the same
relationship, namely,
x4 = 2x1 x2 + x3 .
Or you can find the solution v = (2, 1, 1, 1)T of Ax = 0 and use it to write down the
relationship between the columns of A, since
Av = 2x1 + x2 x3 + x4 = 0,
and then solve for x4 . Either way, it is easy to check (and you should do this) that your
answer is correct by using the vectors,
2
2
1
2
2 0 1
5
= 2 + .
1 3 7
6
5
3
4
6
Question 2(a) This question is a good test of your understanding of the material in
Chapter 3 of the subject guide. If A is an m n matrix with columns c1 , c2 , . . . , cn
such that the system of linear equations Ax = d has solution:
1
2
1
2
1
1
x = 0 + s1 + t
0 = p + sv1 + tv2 ,
1
0
1
0
0
1
then you should be able to deduce certain properties of the matrix A just by looking at
the solution.
281
(1) The number of columns, n = 5. Why? Because the solutions, x, are 5 1 vectors,
and the multiplication Ax is only defined if A has the same number of columns as x has
rows.
(2) The number m cannot be determined. (But, m 3 from part (3).)
(3) The rank of A is 3. Essentially, this is deduced from the rank-nullity theorem
which says that rank(A)+nullity(A)=n, where n is the number of columns of A. So the
rank, r is r = n dim(N(A)). You have also seen that the general solution of Ax = b is
of the form
x = p + a1 v1 + + anr vnr
and the given solution is of the form x = p + sv1 + tv2 , so dim(N(A)) = 2 and
r = 5 2 = 3.
(4) The two vectors, v1 and v2 form a basis of the null space of A, N(A), So {v1 , v2 } is
a basis, where
v1 = (2, 1, 1, 0, 0, )T and v2 = (1, 1, 0, 1, 1)T .
(5) To answer this you need a good understanding of how the general solution is
obtained using Gaussian elimination. By looking at the solution, you can tell the
positions of the leading variables and the non-leading variables in the reduced row
echelon form of A. The non-leading variables must be in the third and fifth column
because of the positions of 0 and 1 in the solution vectors, and the leading ones must be
in the first, second and fourth columns. So a basis of the range, R(A), is the set of
vectors {c1 , c2 , c4 }.
(6) From Ap = d, you can deduce that d = c1 + 2c2 c4 . Any solution x, so any value
of s and t, will also give you a vector such that Ax = d, and so a different linear
combination, but p is the simplest one to use.
(7) In the same way, using Av1 = 0, or Av2 = 0, you obtain the linear combinations
2c1 + c2 + c3 = 0 or c1 + c2 c4 + c5 = 0.
Again, any linear combination of v1 and v2 can be used.
(b) This is a second-order difference equation,
as covered in Chapter 6 of the subject
guide. In
standard form, we have xt+1 axt + (a/4)xt1 = 0, so the auxiliary equation
282
B
N
N 1
yN = 20000(1.05) 500(1.05)
500(1.05)
N
(1.05) 1
= 20000(1.05)N 500
(1.05) 1
N
= 20000(1.05) 10000((1.05)N 1)
= 10000(1.05)N + 10000.
N 2
500(1.05) 500
2t
To show H = t : t R
is closed under addition,
3t
2t
2s
let u, v H. Then u = t , v = s , for some s, t R. Then
3t
3s
2t + 2s
2(t + s)
2t
2s
u+v = t + s = t+s = t+s H
3(t + s)
3t + 3s
3s
3t
since (t + s) R. Therefore, H is closed under addition.
2t
(2t)
2(t)
u = t = t = t H
3t
(3t)
3(t)
since (t) R. Therefore, H is closed under scalar multiplication
The set H is non-empty, since the vector 0 H, as well as the vector v = (2, 1, 3)T .
Since H is also closed under addition and scalar multiplication, it is a subspace.
2s
2s
1
0
s = a 0 + b1,
3s
1
5
283
equation, we find that these values also satisfy the third equation. Therefore the system
has the unique solution a = 2s, b = s, and w = (2s)v1 + (s)v2 .
To answer the remaining questions, it helps for you to see what is going on.
(1) The set {v1 , v2 } is NOT a basis of the subspace H since v1
/ H (and also,
v2
/ H.)
2
(This can be easily checked by substituting in the components of v1 and v2 , and you
should do this.) The set {v1 , v2 } is a basis of G. It spans as G is, by definition, the set
of all linear combinations of v1 and v2 . It is linearly independent as neither vector is a
scalar multiple of the other.
(b) Linear transformations are covered in Chapter 9 of the subject guide.
If T : V W is a linear transformation, and dim(V ) = n, then the dimension theorem
states that
dimR(T ) + dimN(T ) = n
or
rank(T ) + nullity(T ) = n
where nullity(T ) is the dimension of N(T ), the kernel, or null space, of T and rank(T )
is the dimension of R(T ), the range of T . Note that you must specifically say what n
represents; that is, n = dimV .
Let {e1 , e2 , e3 , e4 } be the standard basis of R4 , v1 , v2 , v3 , x the vectors.
1
2
1
x
v1 =
0
, v2 =
3
, v3 =
5
,
x = y ,
2
1
7
z
and let T be the linear transformation, T : R4 R3 , given by
T (e1 ) = v1
T (e2 ) = v2
T (e3 ) = v3
T (e4 ) = x.
1
2 1 x
A= 0
3
5 y
2 1 7 z
In order to consider the two possibilities in parts (i) and (ii), row reduce this matrix,
beginning with R3 + 2R1 ,
1 2 1
x
1 2 1
x
.
A 0 3 5
y 0 3 5
y
0 3 5 z + 2x
0 0 0 z + 2x y
284
1 2 1 0
1 2 1 0
1 0 13
0
3
5
A . . . 0 3 5 0 0 1 35 0 0 1
0,
3
0 0 0 1
0 0 0 1
0 0
0
1
13
3
5
3
so a basis of N(T ) is given by the vector w =
1 or any non-zero scalar multiple
0
13
5
of this, such as
3 .
0
Question 4 To find a basis of the null space of the matrix A, put it into reduced row
echelon form using the algorithm. The steps are not shown, but you should be able to
carry them out efficiently and accurately, and you should show all the steps in the
examination.
1 2 1
1 0 1
A= 4
4 8 . . . 0 1 1 .
13 2 11
0 0 0
285
You can read the solution of the homogeneous system Ax = 0 from the reduced echelon
form of the matrix, setting z = t, t R, to obtain the general solution
1
x = t 1 = tv1 . t R.
1
Next solve (A I)v = 0 for each of the other two eigenvalues. In each case the
reduced echelon form of the matrix, (A I) should contain a row of zeros, so that
there is a non-trivial solution giving the corresponding eigenvector. This checks that the
eigenvalues are correct. If the reduced echelon form of (A I) does not contain a row
of zeros, then you need to find your error. This may be in the row reduction, or it may
be in your characteristic equation or factorising. One quick way to check whether your
eigenvalue is correct is to substitute it into |A I| and see if you do get zero when you
evaluate the determinant.
Having solved (A I)v = 0 for each of 2 and 3 , you should have that the
corresponding eigenvectors are multiples of:
1
0
v2 = 2 and v3 = 1 ,
1
2
respectively. Again, all work should be shown.
At this stage, you should check that the eigenvectors are correct. Form a matrix P
whose columns are the eigenvectors and the diagonal matrix D with the corresponding
eigenvalues on the diagonal,
1 1 0
0 0
0
P = 1 2 1
D = 0 6 0 .
1 1 2
0 0 12
Now check that AP = P D by multiplying out the matrices AP and P D.
286
Use this diagonalisation to determine the sequences (xn ), (yn ), (zn ) which have the
following properties:
xn+1 = xn 2yn zn
yn+1 = 4xn 4yn 8zn
zn+1 = 13xn 2yn + 11zn
and which satisfy the initial conditions x0 = y0 = 1 and z0 = 0.
Denoting by xn the vector (xn , yn , zn )T , this can be expressed as xn+1 = Axn , for which
the solution is given by
xn = An x0 = P D n P 1 x0 .
Using the adjoint method (cofactors), or any other method, find P 1 , and immediately
check that the inverse is correct by showing P P 1 = I.
Then using the initial conditions,
P 1 x0 =
So that
1
6
5 2 1
1
2
1
3 0
3
1
1
0
xn
1 1 0
0
0
xn = yn = 1 2 1 0 (6)n
0
0
zn
1 1 2
xn =
1
2
1
2
21
1
0
2
0 12
(12)n
12
1
(6)n
2
1
yn = (6)n + (12)n
2
1
(6)n + 12n .
zn =
2
(The answer can be checked by finding x1 both from the original equations and from
the solution. If you have time, you might want to do this.)
Question 5(a) Solving a linear system of equations by putting the augmented matrix
into reduced echelon form is an application of the basic material in Chapter 3 of the
subject guide.
You should begin this question as instructed, by writing down the augmented matrix
and putting it into reduced row echelon form. Do this carefully to avoid errors,
1 4 5 3 2 11
1 4 5 3 2 11
(A|b) = 0 2 4 2 2 2 0 1 2 1 1 1
1 1 5 0 1 6
0 5 10 3 3 17
1 4 5 3
2 11
1 4 5 3 2 11
0 1 2 1
1
1 0 1 2 1 1 1
0 0 0 2 2 12
0 0 0 1 1 6
287
1 4 5 0 1 29
1 0 3 0 1 1
0 1 2 0 0
7 0 1 2 0 0
7 .
0 0 0 1 1 6
0 0 0 1 1 6
Check that you do have the reduced row echelon form; find the columns with leading
ones and make sure they have zeros elsewhere (above and below). As the question
specifically asks you to put the matrix into reduced row echelon form, if you stop at row
echelon form and use back substitution, you will not earn full marks, and you are also
less likely to obtain the correct answer.
You can read the solution from the reduced echelon form. Assign parameters, say s
and t, to the non-leading variables x3 and x5 , and write down the other variables in
terms of these using the equations deduced from the matrix. The general solution is
x1
1 + 3s + t
1
3
1
x2 7 2s 7
2
0
x = x3 =
s
= 0 + s 1 + t 0 .
x4 6 t 6
0
1
x5
t
0
0
1
x = p + sv1 + tv2 ,
s, t R.
To answer the questions concerning the column vectors, you need to understand the
material in Chapter 8 of the subject guide.
The columns of the reduced row echelon form of a matrix satisfy the same dependency
relations as the columns of the matrix. From the reduced row echelon form of A, you
can see that
c3 = 3c1 + 2c2 .
Indeed, this also follows from Av1 = 0, and you can, and should, check that it is correct:
5
1
4
4 = 3 0 + 2 2 .
5
1
1
To answer the next part, it is enough to say that in the reduced echelon form of A, the
columns with the leading ones correspond to the vectors c1 , c2 and c4 . Therefore these
vectors are linearly independent. (The reduced row echelon form of a matrix C
consisting of these three column vectors would have a leading one in every column, so
Cx = 0 has only the trivial solution.)
To conclude that B is a basis of R3 , you can state that B is a set of three linearly
independent vectors in a three-dimensional vector space, R3 , therefore B is a basis of
R3 . It is not sufficient to merely say that the vectors are linearly independent and span,
you would need to give a reason why they span R3 . (For example, by stating that there
is a leading one in every row, so Ax = b has a solution for all b R3 .)
From the solution, Ap = b, you have b = c1 + 7c2 6c4 . You should recognise that this
expresses b as a linear combination of the basis vectors, and the coefficients are the
coordinates of b in this basis, B. That is,
1
[b]B = 7 .
6 B
288
(b) This part of the question continues with the material on the basis of a vector space
contained in Chapter 8. The material on changing basis is in Chapter 9.
To show that S = {c1 , c3 , c4 } is also a basis of R3 , you can calculate the determinant of
the matrix with these vectors as columns.
1 5 3
0 4 2 = 1(10) 1(10 12) = 8 = 0.
1 5 0
Since the determinant is non-zero, this implies that S is a basis of R3 .
(This statement answers the question, and is all that is required here, but you should
understand why it is true. The relationship of these concepts is covered in Chapter 3
and Chapter 4 of the subject guide. If Q denotes the matrix with column vectors,
c1 , c3 , c4 , then |Q| = 0 implies that Q1 exists, so that a system of equations Qx = b
has a unique solution for all b R3 . This implies both that the column vectors are
linearly independent and that they span R3 . The same argument follows by reducing
the matrix Q to echelon form and showing that there are three leading ones.)
You can find P by using the transition matrix M from B
the transition matrix Q from S coordinates to standard:
1 4 3
1
M = 0 2 2,
Q= 0
1 1 0
1
If you recall that
v = M[v]B
and
5 3
4 2,
5 0
w = Q[w]S ,
then to change from coordinates in the basis B to coordinates in the basis S, you need
[v]S = Q1 M[v]B .
So Q1 M is the transition matrix from B coordinates to S coordinates. The easiest way
to find Q1 is using the cofactor method. Then
10 15 2
1 4 3
1 32 0
1
Q1 M = 2
3
2 0 2 2 = 0 12 0 = P.
8
4
10 4
1 1 0
0 0 1
You can find the S coordinates of b using this matrix and [b]B from part (a),
23
1 23 0
1
2
[b]S = 0 12 0 7 = 72 ,
0 0 1
6 B
6 S
which you can easily check. Or, you can find the S coordinates directly from the basis S
by solving b = ac1 + bc3 + cc4 for a, b, c using Gaussian elimination or by using the
inverse matrix, Q1 , which you found above.
You can also do this using the results of part (a). You know that b = 1c1 + 7c2 6c4
and c3 = 3c1 + 2c2 . If you solve the latter equation for c2 and substitute into the
289
equation for b, you will obtain the vector b as a linear combination of c1 , c3 , c4 , and
hence the coordinates of b in this basis.
This idea can be used in a better understanding of the matrix P . Notice the simple
form of the transition matrix P from B coordinates to S coordinates. If you have a
vector expressed as a linear combination of the basis vectors of B and as a linear
combination of the basis vectors of S, then the coefficients of the first and last vectors
will be the same in either basis since the first and last basis vectors are the same. Only
the middle vector is different. Therefore, P will be of the form
1 a 0
P = 0 b 0.
0 c 1
1 32 0
P = 0 12 0 .
0 0 1
290