Linear Algebra

University of Oxford
Department of Engineering Science

A1 Mathematics
Linear Algebra
4 lectures, Michaelmas Term 2010
Prof. Steve Roberts
sjrob@robots.ox.ac.uk
Recommended Texts
These are some books that I found useful while preparing these notes.
Allenby, R.B.J.T. (1995). Linear algebra. Elsevier.
Anton, H. (1984). Elementary linear algebra (4th edn). Wiley.
Anton, H. & Rorres, C. (2000). Elementary linear algebra applications version (8th edn). Wiley.
Kreyszig, E. (2006). Advanced engineering mathematics (9th edn). Wiley.
Larson, R. & Falvo, D.C. (2010). Elementary linear algebra (6th edn). Brooks/Cole.
Strang, G. (2006). Linear algebra and its applications (4th edn). Thomson Brooks/Cole
Topic 1
Systems of linear equations
1.1 Introduction
Let us begin by discussing the simple equation
ax = b
(1.1)
This is a linear equation in the unknown x. It is linear because x appears on the left-hand side in a very simple
form: raised to the power one, and multiplied by a constant. There are no terms such as
x , log(x) or x + 2 on
the left-hand side. The only other term in the equation the b on the right-hand side is a constant.
We can think of the left-hand side of eqn (1.1) as the result of a function that maps x to ax:
f ( x) = ax
The adjective linear arises from the fact that the graph of this function is a straight line (and in particular, one
that goes through the origin). More formally, we say that f is a linear function if it has the properties
(i)
f(x + y) = f(x) + f(y)
(ii)
f(x) = f(x)
[where = an arbitrary scalar]
The function f(x) = ax satisfies both requirements, whereas functions like
x , log(x) and x + 2 clearly do not:
they are non-linear.

More generally, a linear equation in n unknowns x1, x2, , xn is defined as one that can be expressed in the form
a1x1 + a2 x2 + K + an xn = b
where the coefficients a1, a2, an and the right-hand side b are all constants. We will assume that the constants
and the unknowns are numbers in the real domain R, but it should be pointed out that some applications (e.g.
quantum mechanics) give rise to linear equations involving numbers in the complex domain C.
For much of this course, we will be studying sets of simultaneous linear equations that we aim to solve for the
unknowns. We will describe a set of m linear equations in n unknowns as an m n linear system. As you know
from P1, such a system can be expressed in the compact form
Ax = b
(1.2)
where A is an m n matrix, and b and x are column vectors with m and n components respectively. Note the
similarity with eqn (1.1).
We can think of the left-hand side of eqn (1.2) as the result of a function that maps x (a vector in Rn) to Ax (a
4
vector in Rm):
f ( x ) = Ax
This function is a linear transformation from Rn to Rm because

(i)
f(x + y) = A(x + y) = Ax + Ay = f(x) + f(y)
(ii)
f(x) = A(x) = (Ax) = f(x)
Essentially, the origin is mapped to the origin (i.e. there is no translation), and straight lines in Rn are mapped to
straight lines in Rm.
Rn
Rm
R
R
Q P
Q
P
O
1.2 Two ways of thinking about Ax = b

Consider the 2 2 system
1 1 x1 3
4 1 x = 2
We need to find values for x1 and x2 such that the matrix-vector product on the left-hand side is equal to the
vector on the right-hand side.
The familiar row-oriented view interprets the system as
x1 + x2 = 3
4 x1 x2 = 2
Our task is to find a point in R2 with coordinates (x1, x2) such that both equations are satisfied simultaneously.
Geometrically, we can visualize plotting two lines:
x2
4
4x1 x2 = 2
3
(1, 2)
2
1
x1
-4
-3
-2
-1
-1
-2
x1 + x2 = 3
-3
-4
The lines intersect at the point x1 = 1, x2 = 2. This is the (unique) solution of the original system.
The alternative column-oriented view interprets the system as
1
1 3
x1 + x2 =
4
1 2
This time we have two vectors in R2, namely i + 4 j and i j , and our task is to combine them (by choosing
suitable scalar multipliers x1 and x2) in such a way that the resultant vector is 3i + 2j. Geometrically, we can
visualize constructing a parallelogram:
i + 4j
3
2
3i + 2j
1
-4
-3
-2
-1
1
-1
-2
ij
2(i j)
-3
-4
The right linear combination of the column vectors is 1(i + 4j) + 2(i - j). Hence x1 = 1, x2 = 2 is the (unique)
solution of the original system.
The row- and column-oriented pictures look very different, but they lead, of course, to the same solution. They
8
are two sides of the same coin. The column-oriented view may seem less natural, but we will see that it can be
very helpful in understanding and solving general m n linear systems, whether homogeneous (Ax = 0) or nonhomogeneous (Ax = b). In particular, the column-oriented view provides us with an alternative way of posing
(and then answering!) the two fundamental questions that arise in relation to any linear system:
Is there any solution at all?
If there is a solution, is it unique, or are there infinitely many?
Solution
existence
Solution
uniqueness
Row-oriented view
Column-oriented view
Do the m hyperplanes in
Can b be expressed as a
n-dimensional space have
linear combination of the
any point in common?
columns of A?
Do the hyperplanes
Can the linear combination
intersect at just one point,
be formed in just one way,
or infinitely many points?
or infinitely many ways?
1.3 Echelon form

Regardless of which geometric interpretation we have in mind, we need a reliable way of answering the basic
questions of solution existence and solution uniqueness for a general linear system Ax = b. In P1 you learnt how
to apply Gaussian elimination to the augmented coefficient matrix
[A
b]
Having reduced this matrix (and thus the system of equations) to a simpler form, you were generally able to
solve for the unknowns by a systematic process of back-substitution, leading to a unique solution x. Sometimes,
however, elimination revealed that there was no solution, implying that the original equations were inconsistent.
In other cases, elimination revealed that there were infinitely many solutions one or more of the unknowns
could be assigned an arbitrary value.
All of the systems you studied in P1 involved n equations in n unknowns. We now want to relax that restriction,
and see what can happen when the coefficient matrix A is rectangular (m n). There are three scenarios:
10
m=n
m<n
m>n
Square
Underdetermined
Overdetermined
Fewer equations than
More equations than
unknowns
unknowns
Example: 3 3
Example: 2 3
Example: 4 3
x1
x =
2
x3
x1

x2 =
x

3
Usual expectation:
Usual expectation:
Usual expectation:
unique solution
infinitely many solns
no solution
Same number of
equations as
unknowns
1

x2 =

x3

The good news is that Gaussian elimination with a few tweaks continues to play a central role. The key is to
reduce the (augmented) coefficient matrix to row-echelon form, or just echelon form for short. The word is
borrowed directly from French (chelon = a rung on a ladder).
11
To illustrate the meaning, the following matrices are in echelon form:

2 2 1 4
0 1 0 3
0
0
3
2
0
3 1
0
0
0
0
0
1
3 2
2 3
0 4
4
0 3 2 5
0 0 2 3
0
0
0
0
In any row that does not consist entirely of zeros, the leftmost nonzero entry is given a special name: the leading
entry. We can now proceed to a formal definition.
A matrix is in row-echelon form ( echelon form) if
in every pair of successive nonzero rows, the leading entry in the
lower row lies strictly to the right of the leading entry in the upper row;
any zero rows are grouped together at the bottom of the matrix.
Some authors add a third condition, that each of the leading entries must be 1. This complicates the elimination
process slightly (each row has to be divided through by its leading entry) but it simplifies back-substitution. We
will not insist on this; the main thing is to get the pattern right.
12
Here are some examples of matrices that are not in echelon form:
2 2 1 4
0 1 0 3
3
0
3
2
0
3 1
0
0
0
2
0
1
3 2
2 3
0 4
4
0 3 2 5
0 0 0 0
0
0
2
3
Any matrix can be reduced to echelon form by performing a sequence of elementary row operations. These
operations allow the corresponding system of equations to be simplified, without altering the solution.
The three available elementary row operations are:
ERO1 add a multiple of one row to another row.
ERO2 swap two rows.
ERO3 multiply a row by a nonzero constant.
Note that if a square matrix is reduced to echelon form, we get a bonus because the determinant is simply the
product of the entries on the diagonal. Be careful, however. Although ERO1 (remarkably) does not affect the det,
ERO2 and ERO3 both do. Swapping two rows negatives the det, while scaling a row scales the det by the same
factor.
13
Example
Reduce the following matrix to echelon form.
1 2 3
2 4 6
1 4 2
4 8 12
1
8
2 8
2
7
2
14
Solution
1 2
0 0
0 2
0 0
3
0
1
6 R 2 = R 2 2R1
1 4
0 R 3 = R 3 + R1
0 6 12 R 4 = R 4 4R1
1 2
0 2
0 0
0 0
1 2
0 2
0 0
0 0
3 2 1
1 4 0
0 3 6
0 0 0 R 4 = R 4 + 2R 3
2
3
1
1 4
0 R 2 R3
0 3
6
0 6 12
2
15
Optional extra #1: if we wanted all the leading entries to be 1, we could have scaled the rows en route, but we
can also scale them now:
1
0
2 3
1 21
0 0
0
2 1
2 0 R 2 = R 2 21
1 2 R 3 = R 3 31
0 0
Optional extra #2: having done this, we could carry on and clear out the columns vertically above each leading 1,
working from right to left. This eventually leads to the reduced row-echelon form:
1
0
2 3 0 3 R1 = R1 2R 3
1 21 0 4 R 2 = R 2 + 2R 3
0 0 1 2
0 0 0 0
1
0
0 4 0 11 R1 = R1 2R 2
1 21 0 4
0 0 1 2
0 0 0 0
16
Reduced row-echelon form is the simplest form that a matrix (or really, the underlying system of equations) can
take. It makes back-substitution a breeze, and furthermore it is unique, whereas an ordinary row-echelon form
(even with leading 1s) is not. Then again, we need to work harder to obtain the reduced row-echelon form in the
first place.
For Matlab enthusiasts, there is a function called rref for computing the reduced row echelon form of a matrix.
17
1.4 Rank and nullity

Let us first refresh the notion of linear independence.
A set of vectors is linearly independent if no member of the set can be
expressed as a linear combination of the other members. Otherwise, the
set of vectors is linearly dependent.
When discussing matrices, the terms row rank ( = number of linearly independent rows) and column rank (=
number of linearly independent columns) are commonly used. Somewhat surprisingly, it turns out that the row
rank and the column rank of any matrix are the same, so there is no ambiguity in talking about the rank of a
matrix. In summary:
The rank of a matrix A can be defined as either
the number of linearly independent rows of A, or
the number of linearly independent columns of A.
The two definitions are equivalent.
18
Sometimes the rank of a matrix is obvious from inspection, e.g.

1 0
0 2
3 4
1 0 0
0 2 0
3 4 5
1 0 2 0
0 3 0 4
2 3 4 4
have ranks of 2, 3 and 2. The failsafe technique, however, is to reduce the matrix to echelon form, then count the
number of nonzero rows (or equally well, count the number of leading entries on the echelon both counts will
be same because of the strict definition of echelon form).
We now need to introduce the idea of the kernel (or null space) of a matrix. In Section 1.1 we discussed how an
m n matrix A defines a linear transformation from Rn to Rm. Essentially, the kernel of A is the set of all vectors
in Rn that are mapped to the zero vector in Rm. It is denoted ker(A). The dimension of this set, i.e. the number of
linearly independent vectors it contains, is called the nullity of A. More formally,
The kernel of a matrix A is the set of all solutions of the homogeneous
system Ax = 0.
The nullity of a matrix A is the number of linearly independent solutions
of the homogeneous system Ax = 0.
19
It is clear that ker(A) always contains the zero vector, 0. If this is the only vector in ker(A), then by convention,
nullity(A) = 0. If ker(A) contains vectors other than 0, then nullity(A) > 0. One way to determine nullity(A) is to
solve the homogeneous system Ax = 0, then count the number of linearly independent solutions. We will do this
in the next section. There is, however, a neat shortcut based on the rank-nullity theorem:
Let A be a matrix with n columns. Then
rank(A) + nullity(A) = n
This means that we can determine the nullity of a matrix as soon as it has been reduced to echelon form.
20
Example
Find the rank and nullity of the matrix
1 2 3 5
A = 2 3 4 7
3 4 5 9
Solution
1 2 3 5
0 1 2 3 R 2 = R 2 2R1
0 2 4 6 R 3 = R 3 3R1
1 2 3 5
0 1 2 3
0 0 0 0 R 3 = R 3 2R 2
There are two nonzero rows, so rank(A) = 2. Since the matrix has n = 4 columns, nullity(A) = n rank(A) = 4 2
= 2.
21
1.5 Solving Ax = 0
A linear system in which the right-hand side vector is filled with zeros is said to be homogeneous. Suppose we
are given a matrix A and asked to find the general solution of Ax = 0. This is exactly the same as being asked to
find ker(A). To start with, we can quickly determine nullity(A) as outlined above. If it turns out to be zero, ker(A)
contains only the zero vector, and we can declare immediately that Ax = 0 has only the trivial solution x = 0. If
nullity(A) is not zero, then ker(A) contains vectors other than 0, and Ax = 0 has infinitely many solutions. To find
them, we use reduction to echelon form and a special back-substitution process.
Example
Find the general solution of Ax = 0 where
2 3 2 1 4
4 6 1 1 1
A=
8 12 1 5 11
4 6 5 7 19
22
Solution
2
0
3 2 1 4
0 3 3
9 R 2 = R 2 2R1
0 9 9 27 R 3 = R 3 4R1
0 9 9 27 R 4 = R 4 + 2R1
2
0
3 2 1 4
0 3 3 9
0 0 0 0 R 3 = R 3 3R 2
0 0 0 0 R 4 = R 4 + 3R 2
We see that rank(A) = 2 and nullity(A) = 5 2 = 3, so the system Ax = 0 has infinitely many solutions. In fact, we
could see at the outset that the system was underdetermined (4 5), so we knew that nullity(A) had to be at
least 5 4 = 1. In general, an underdetermined homogeneous system always has infinitely many solutions.
23
Writing the echelon form of Ax = 0 out in full, we have:

2
0
x1
3 2 1 4 0
x2
0 3 3 9 0
x3 =
0 0 0 0 0
x4

0 0 0 0 0
x5
The variables are now split into two groups. Columns C1 and C3 contain the leading entries on the echelon
(circled). The corresponding variables x1 and x3 are variously referred to as the leading variables, key variables,
bound variables or pivot variables. The remaining columns (C2, C4, C5) are associated with the variables x2, x4
and x5. These are called the free variables (or sometimes, the non-leading variables). To find the general
solution, we first assign arbitrary parameters to the free variables, say
x2 = , x4 = , x5 =
Next we solve for the leading variables in terms of these parameters, working up through the nonzero rows, and
from right to left along the echelon. (Note how each of the zero rows automatically has a zero on the right-hand
side it is impossible for a homogeneous system to be inconsistent.) We first solve row R2 for x3,
3 x3 + 3 x4 9 x5 = 0
3 x3 + 3 9 = 0
x3 = 3
24
then solve row R1 for x1:

2 x1 + 3 x2 + 2x3 x4 + 4 x5 = 0
2 x1 + 3 + 2 ( 3 ) + 4 = 0
x1 = 32 21 +
The general solution can now be assembled, then (optionally) expressed as a linear combination of vectors, one
for each arbitrary parameter:
32
x1 32 21 +
21
1

x

0
1
0
2

=
x3
3
1
3
0
0
1
x4
0
1
x5
0
(1.3)
This is the desired general solution of Ax = 0; it is also ker(A). We can easily read off a set of linearly
independent basis vectors for the kernel:
32

1
0 ,

0
0
21

0
1 ,

1
0
1
0

3

0
1
25
In this instance, ker(A) is seen to be a 3-dimensional subspace of R5. Hence nullity(A) which is just the
dimension of ker(A) is equal to 3. This agrees with the initial prediction of nullity(A) that was made using the
rank-nullity theorem. It is good practice to check that each of the kernel basis vectors does actually satisfy Ax =
0, and this can readily be verified. It follows that any linear combination of the basis vectors, as in eqn (1.3), must
also satisfy Ax = 0.
1.6 Solving Ax = b
With a homogeneous linear system Ax = 0, the answer to the question of solution existence is always affirmative,
since (as discussed above) there will always be at least the trivial solution x = 0. With a non-homogenous system
Ax = b, the answer to this question is no longer automatic: there is now a possibility that the system will have no
solution.
As usual, reduction to echelon form (this time performing elementary row operations on the augmented
coefficient matrix [A | b]) provides clarity. Suppose we have a 4 5 system and end up with something like
0 0 0 0
0 0 0 0 0
26
We can immediately infer that the original system Ax = b is inconsistent, i.e. it has no solution. We can interpret
the circled entry in two different ways, using the terminology introduced in Section 1.2. Taking the row-oriented
view, the emergence of a contradictory equation indicates that the 4 hyperplanes in R5 have no point in common.
Taking the column-oriented view, the fact that [A | b] and A have different column ranks (3 and 2 respectively)
indicates that the vector b in R4 cannot be expressed as a linear combination of the 5 columns of A.
On the other hand, suppose we reduce a general m n system Ax = b to echelon form, and find that the trailing
rows (if any) all look like
[0
0 L 0 0]
We can then infer that the original system Ax = b is consistent, i.e. it has at least one solution. Taking the roworiented view, the absence of a contradictory equation indicates that the m hyperplanes in Rn have at least one
point in common. Taking the column-oriented view, the fact that [A | b] and A have the same column rank
indicates that the vector b in Rm can be expressed in at least one way as a linear combination of the n
columns of A.
27
If the above analysis shows that there is at least one solution, we turn to the question of uniqueness: is there just
one solution, or a whole family of solutions?
The answer for Ax = b, just as for Ax = 0, is determined by the nullity of A. If nullity(A) = 0, the solution is unique,
and it can be found by standard back-substitution (leading variables only, no free variables). If nullity(A) > 0, the
solution involves a corresponding number of arbitrary parameters, and its general form has to be worked out by
special back-substitution (separate treatment of leading and free variables).
Our last example will show how to solve a non-homogeneous system Ax = b that has infinitely many solutions.
To illustrate some important points, we will retain the same A matrix that was used to demonstrate the solution of
Ax = 0 in Section 1.5.
28
Example
Find the general solution of Ax = b where
2 3 2 1 4
4 6 1 1 1
,
A=
8 12 1 5 11
4 6 5 7 19
7
2
b=
8

22
29
Solution
7
2 3 2 1 4
4 6 1 1 1 2
8 12 1 5 11 8
4 6 5 7 19 22
2
0
3 2 1 4
7
0 3 3
9 12 R 2 = R 2 2R1
0 9 9 27 36 R 3 = R 3 4R1
0 9 9 27 36 R 4 = R 4 + 2R1
2
0
3 2 1 4
7
0 3 3 9 12
0 0 0 0
0 R 3 = R 3 3R 2
0 0 0 0
0 R 4 = R 4 + 3R 2
Since rank([A | b]) = rank(A) = 2, the system is consistent: there is at least one solution. Furthermore, since
nullity(A) = 5 2 = 3, we know that the general solution will involve 3 arbitrary parameters, so in fact the system
has infinitely many solutions.
30
Proceeding just as in Section 1.5, we first assign arbitrary parameters to the free variables, say
x2 = , x4 = , x5 =
Next we solve for the leading variables, remembering to incorporate the right-hand side entries (this time they
are not zero!). Solving row R2 for x3 gives
3 x3 + 3 x4 9 x5 = 12
3 x3 + 3 9 = 12
x3 = 4 + 3
and solving row R1 for x1 gives
2 x1 + 3 x2 + 2 x3 x4 + 4 x5 = 7
2 x1 + 3 + 2 ( 4 + 3 ) + 4 = 7
x1 = 21 32 21 +
31
The general solution can now be assembled, then unpacked as before:

21 32 21 +
32
x1
21
21
1

x
0
0
0
1
2

= 4 + 0 + 1 + 3
x3 =
4 + 3
0
0
4

1

0
0
1
0
x5
1
0 4244444
123
4
24
3 14444
3
x
xp
ker( A)
(1.4)
We recognize the second part of the solution as ker(A): it is identical to the solution of the homogeneous system
Ax = 0 that we obtained in eqn (1.3). But we now have an additional term: a particular solution xp that (we hope!)
satisfies Ax = b. It is easy to check that it does:
21
2 3 2 1 4 7
4 6 1 1 1 0 2
4 =
8 12 1 5 11 8
0
4 6 5 7 19 22
0
Of course this is not the only possible choice for xp. The general solution contains three arbitrary parameters,
and we can pick any values we like for those.
32
For example, setting = 1, = 0 and = 1 in eqn (1.4), we get another solution

T
x = [ 1 1 1 0 1]
that also satisfies Ax = b,

1
2 3 2 1 4 7
4 6 1 1 1 1 2
1 =
8 12 1 5 11 8
0
4 6 5 7 19 22
1
and could therefore serve just as well as the particular solution xp. An infinite number of other (equally valid)
choices could be made. Note, however, that any expression for the general solution of Ax = b must always
include the whole of the kernel:
x = x p + ker( A)
33
In closing it is worth recalling the first form of the general solution in eqn (1.4), and reflecting on the significance
of the rank and nullity of the coefficient matrix A.
x1 21 32 21 +
x3 =
4 + 3
x5
In this example nullity(A) is 3, which corresponds to the number of free variables (x2, x4, x5) that can be assigned
arbitrary values. Also, rank(A) is 2, which corresponds to the number of leading variables (x1, x3) that are bound
to the free variables in certain prescribed ratios.
In general, the significance of the nullity of A is that it tells us the number of degrees of freedom in the general
solution of a linear system Ax = b (or indeed Ax = 0). The rank of A is a complementary measure it tells us the
number of constraints imposed on the general solution. If the nullity is zero, the solution has no degrees of
freedom, therefore that solution is unique. If the nullity is greater than zero, the solution has a corresponding
number of degrees of freedom, and consequently the system has infinitely many solutions.
34
Flowchart for Ax = b (matrix A is m n)
35
Example (A1 2004 Q7)

A set of linear equations is given by Ax = b where
1
1
A=
2
1 2
3
x1
4
2 3
, x = x2 and b =

3 5
7
x3

3 4
5
(a) Define the rank, nullity and kernel of a matrix.

(b) Reduce the augmented matrix [A | b] to echelon form and hence find the rank, nullity and kernel of the
matrix A.
(c) Confirm that solutions to Ax = b exist, and find the general solution for x.
36
Topic 2
Norms and conditioning
2.1 Vector norms
When we talk about the size or magnitude of a real number, we mean its absolute value. Although the number
6.2 is algebraically smaller than the number 2.8, most of us would agree that 6.2 is bigger than 2.8, in the
sense that it lies further away from zero. The idea can be expressed more formally: the length of the vector
running from the origin to 6.2 is greater than that of the vector running from the origin to 2.8.
What about higher dimensions? We know how to calculate the length of vectors in R2 and R3 using Pythagorass
theorem, and there is a very natural extension to Rn:
T
length of [ x1 x2 ] = x12 + x22

length of [ x1 x2
x3 ] = x12 + x22 + x3 2
M
T
length of [ x1 x2 L xn ] = x12 + x22 + K + xn 2
37
The general definition works also perfectly well when n = 1, since
length of [ x1 ] = x12 = x1
Going back to the example in the opening paragraph, we get the desired outcome: the length of the 1-component
vector [ 6.2] is greater than that of [2.8] .
We are very used to calculating the length of a vector in this way, but in fact a whole family of length measures
can be defined as follows:
(
x ] = (x
T
length of [ x1 x2 ] = x1p + x2 p
length of [ x1 x2
p
1
1p
+ x2 p + x3 p
1p
M
T
length of [ x1 x2 L xn ] = x1p + x2 p + K + xn p
1p
This generalized measure of length is called the p-norm (or Lp norm). To denote the p-norm of a vector x, we use
the special notation x p . We never write x p , though the similarity is of course intentional; taking the norm of a
vector gives us a scalar measure of its magnitude, and is thus analogous to taking the absolute value of a real
(or complex) number.
38
The p-norm can be defined compactly as follows.

Let x be a vector and p ( 1) be a scalar. The p-norm of x is
1p
p
= xi
i
It is not essential that p be an integer, but in practice it invariably is. In fact, there are only a few norms in
common use, namely the ones that are easy to calculate: those for p = 1, p = 2 and p = .
The Pythagorean measure of length that we are so accustomed to is just the p-norm with p = 2. It is commonly
referred to as the Euclidean norm, or the Euclidean metric (note that in this context, metric means measure of
distance nothing to do with SI units). The 2-norm can also be expressed in a compact vectorial form, since it is
the square root of the dot product of x with itself:
= x x = xT x
It is often simpler (e.g. when minimizing the length of a vector, see Q7 of the tutorial sheet) to work with square
of the 2-norm:
2
2
= x x = xT x
39
If we set p = 1 in the general definition, we get
x 1 = xi
i
i.e. the sum of the absolute values of the components. The 1-norm is sometimes referred to as the taxicab or
T
Manhattan norm, since a hypothetical taxi driving from [0 0] to a destination with coordinates [ x1 x2 ] has to
cover a distance of at least x1 + x2 if it is constrained to drive on a rectangular grid of streets in R2.
Finally, if we let p approach infinity in the general definition, we get
= max ( xi
i
i.e. the supremum of the absolute values of the components. You are asked to justify this in Q3 of the tutorial
sheet.
It is also possible to come up with vector norms other than the p-norm, but any proposed norm must possess
certain common-sense properties in order to qualify as a meaningful measure of length. These are similar to the
properties of the absolute value function.
40
Specifically,
(i)
x = 0 if and only if x = 0
(ii)
x = x
(iii)
x+y x + y
(the triangle inequality)
It can be shown that these basic properties imply another common-sense property that we would expect any
norm to exhibit:
x 0 for all x
For example, the 1-norm
xi
i
is a valid vector norm, but
xi
is not, since the sum of the terms in a vector
could be negative.
A good way to understand vector norms is to study how the unit circle in R2 looks under the three common pnorms:
41
x 1 =1
=1
=1
Example
T
Find the 1-, 2-, 5-, and -norms of the vector x = [3 5 4 1] .
Solution
x 1 = 3 + 5 + 4 + 1 = 13
12
2
2
= 32 + ( 5 ) + 42 + ( 1)
5
5
5
5
= 3 + 5 + 4 + 1
= max ( 3 , 5 , 4 , 1 ) = 5
15
x
x
= 51 = 7.14
= 5 4393 = 5.35
42
2.2 Matrix norms

Can we extend the notion of a norm from vectors to matrices? Intuition suggests that a matrix such as
100 200
75 400
must somehow be bigger than a matrix such as
1 3
1 2
One idea comes quickly to mind. We could place all the entries of the matrix into a single vector, then apply the
vector p-norm introduced above. If we took p = 2, this would be equivalent to squaring each entry of the matrix,
summing and taking the square root. This is indeed a commonly used matrix norm known as the Frobenius
norm:
A Fro =
Aij2
i
Another approach, and one that turns out to be much more useful, is to think about the action of a matrix on an
arbitrary vector. As we noted in the first lecture, an m n matrix A can be viewed as a linear operator that maps
a vector x in Rn to a vector Ax in Rm. If we take the norm of the output vector Ax and divide it by the norm of the
43
input vector x, we should get some clue as to the magnifying power of the matrix A (and thus its inherent size
or magnitude). We are therefore motivated to calculate the ratio
Ax
x
p
p
assuming of course that x is not the zero vector. The trouble is that, unless A happens to be the identity matrix
(or some multiple thereof), this ratio will depend on the direction of the input vector x. To use a very simple
example, consider the diagonal matrix
5 0
A=
0 2
T
The vector x = [1 0] is mapped to Ax = [5 0] (magnification = 5), but the vector [0 1] is mapped to
[0
2] (magnification = 2). Vectors in other directions undergo intermediate levels of magnification. This can be
clarified by looking at the image of the unit circle under the action of A. It is mapped to an ellipse with semi-axes
of length 5 and 2:
44
3
2
1
0
-6
-5
-4
-3
-2
-1
-1
-2
-3
Given that a general matrix A is likely to exert widely varying magnifying powers on different input vectors x, we
had better be conservative and define the matrix p-norm of A as
= max
x 0
Ax
x
p
p
With this definition, the matrix p-norm provides an upper bound on the magnifying power of A (as measured by
applying the vector p-norm to the input vector x and output vector Ax). It means that we are assured of satisfying
the following important inequality for any vector x, regardless of its direction.
45
Let A be a matrix with n columns, and let x be a vector in Rn. If the

inequality
Ax A x
is satisfied for all vectors x in Rn, then the matrix norm A is said to be
compatible with, or induced from, the vector norm used as a metric for x
and Ax.
There is no general formula for calculating the p-norm of a matrix A in terms of p and the entries Aij. For selected
values of p, however, it is possible to work out formulae for A
that guarantee the satisfaction of the above
compatibility inequality for an arbitrary vector x.

For p = 1, the compatible matrix norm is
A 1 = max Aij
j
i
which is the maximum absolute column sum of A (remember that i counts down the rows, and j counts across
the columns). We could just take this on trust, but it would be nice to verify that the inequality
46
Ax 1 A 1 x 1
is actually satisfied for any vector x. The proof requires some messy fiddling with summations and absolute
values. The indices i and j are understood to run from 1 to m and 1 to n respectively.
Ax 1 = Aij x j
i
Aij x j
i
= Aij x j
i
= Aij x j
j
= xj
Aij
x j max Aij
j
i
j
= x1 A1
= A1 x
47
The justification for the step marked * is the scalar version of the triangle inequality mentioned above:
+ + .
For p = , the compatible matrix norm is
= max Aij
i
j
which is the maximum absolute row sum of A. To verify the validity of this definition, we need to show that the
inequality
Ax
is certain to be satisfied for any vector x. This time the proof is a little shorter, but still requires careful study to
follow the reasoning. You might find it helpful to make up a few small but concrete test cases (e.g. with A = 2
3 and x = 3 1) and trace what happens at each step. Make sure you include a few negative terms in both A and
x!
48
Ax
= max
A
x
ij j
j
max Aij x j
i
j
= max Aij x j
i
j
max Aij
i
j
= A x
xj
max
j
( )
Again, note the use of the triangle inequality to reach line *.

The matrix 2-norm is widely used, but can be difficult to calculate. If A happens to be square and symmetric, the
2-norm is the supremum of the absolute values of the eigenvalues:
= max ( i
i
(You might like to think about the unit circle to ellipse mapping discussed above.)
49
This is compatible with the 2-norm for vectors, i.e. it ensures
Ax
for any vector x. More generally, the 2-norm of A turns out to be its largest singular value (this is beyond the
scope of the present course).
Example
Find the 1-norm, -norm and Frobenius norm of the matrix
3 2 6
A = 5 8 4
4 1 3
Solution
The 1-norm is the maximum absolute column sum:
A 1 = max (12, 11, 13 ) = 13
The -norm is the maximum absolute row sum:
= max (11, 17, 8 ) = 17
The Frobenius norm is the square root of the sum of squares:

12
Fro
2
2
= 3 2 + ( 2 ) + K + ( 3 )
= 180 = 13.42
50
Example
Find the 2-norm of the matrix
1 2
A=
2 3
Solution
The matrix is square and symmetric, so its 2-norm can be calculated from the eigenvalues.
2
1
det
=0
2
(1 )( 3 ) 4 = 0
2 + 2 7 = 0
Solving the quadratic gives = 1 8 = 1.83, 3.83 . Hence

A
= max ( 1.83 , 3.83 ) = 3.83
51
Example
Given the matrix and vector
1.8 1.4
A=
,
0.1
0.6
0.3
x=
0.2
investigate the satisfaction of the compatibility inequality
Ax
when p and q take values 1 and in all combinations.
Solution
1.8 1.4 0.3 0.82

Ax =
0.2 = 0.09
0.1
0.6
We therefore have
x 1 = 0.5, x
A 1 = 2.0, A
Ax 1 = 0.91, Ax
= 0.3
(vector norms of x)
= 3.2
(matrix norms of A)
= 0.82
(vector norms of Ax)
52
Combination
Ax
p = 1, q = 1
0.91 2.0 0.5
YES (expected)
p = , q =
0.82 3.2 0.3
YES (expected)
p = 1, q =
0.91 3.2 0.5
YES (lucky)
p = , q = 1
0.82 2.0 0.3
NO
Inequality satisfied?
Moral: dont mix and match incompatible matrix and vector norms!
53
2.3 Conditioning of linear systems

Consider the 2 2 system
1 1 x1 2
1 1 x = 0
which has the solution

x1 1
x = 1
2
If we perturb one of the terms in the matrix by 1%,

1 1.01 x1 2
1 1 x = 0
the solution changes to

x1 0.995
x = 0.995
This seems reasonable both components of x have changed by 0.5%, which is of the same order of magnitude
as the perturbation.
54
Similarly, if we perturb the first term of the right hand side vector by 1%,
1 1 x1 2.02
1 1 x = 0
the solution becomes

x1 1.01
x = 1.01
which is again in line with our intuitive expectation.

This type of nicely behaved linear system is said to be well-conditioned.
Now lets look at another 2 2 system
1 x1 2
1
0.99 1.01 x = 2
which also happens to have the solution

x1 1
x = 1
2
55
When we perturb one of the terms in the matrix by 1%,

1.01 x1 2
1
0.99 1.01 x = 2
we now get a drastic change in the solution:

x1 0
x = 1.98
When we perturb the first term of the right hand side vector by 1%,
1 x1 2.02
1
0.99 1.01 x = 2
the solution is wildly different again:

x1 2.01
x = 0.01
56
This behaviour is surprising, and clearly undesirable: the solution x is highly sensitive to small perturbations in
the problem data A and b (such as those which might arise from noisy data, imperfect measurements, or
numerical roundoff errors during a computation). This type of badly behaved linear system is said to be illconditioned. We can gain some insight into what is happening if we plot the two base case situations (wellconditioned on the left, ill-conditioned on the right):
2
The reason for the ill-conditioned behaviour is now clear. The lines are so close to parallel that even small
changes in slope and/or intercept cause the intersection point to move far away from the base case solution
point (1, 1).
57
If the lines were parallel, the determinant of the coefficient matrix would be zero, which might prompt the idea of
using det(A) as an indicator of possible conditioning trouble. The ill-conditioned system above has
1
1
A=
0.99 1.01
such that det(A) = 0.02, which seems close to zero. Unfortunately, the determinant is useless for predicting illconditioning because a simple rescaling of the problem leads to a change in the determinant. If we multiply the
equations of the ill-conditioned system by 10, we get
10 10 x1 20
9.9 10.1 x = 20
such that det(A) = 2. The matrix appears to be much less singular, but the solution remains just as sensitive to
1% (or other small) perturbations of the entries in A and/or b.
Fortunately, there is a better approach for quantifying the sensitivity of a linear system, thus allowing illconditioned behaviour to be predicted in advance (i.e. without resorting to trial and error perturbations of the sort
we imposed above). The perturbations to A and b are characterized by the relative error ratios
A
A
and
b
b
58
We would like to be able to predict the effect on the solution, via the corresponding relative error ratio
x
x
Any convenient norm can be used, but it is essential that the matrix norm and vector norm be compatible in the
sense defined previously.
Suppose we have a linear system defined by a matrix A, a right-hand side vector b, and an exact solution x,
such that
Ax = b
The matrix A is assumed to be square and invertible.
If the matrix is perturbed from A to A + A, the solution will be perturbed from x to x + x, such that
( A + A )( x + x ) = b
Subtracting Ax from the left and b from the right,
Ax + A ( x + x ) = 0
59
Premultiplying by A 1 and rearranging,

x + A 1A ( x + x ) = 0
x = A 1A ( x + x )
Taking norms and applying the compatibility inequality twice,

x = A 1A ( x + x )
A 1 A ( x + x )
A 1 A x + x
Hence
x
A 1 A
x + x
or equivalently
x
x + x
A A 1
A
A
(2.1)
60
Similarly, if the right-hand side vector is perturbed from b to b + b, the solution will be perturbed from x to x +
x, such that
A ( x + x ) = b + b
Subtracting Ax from the left and b from the right,

Ax = b
Premultiplying by A 1 ,
x = A 1b
Taking norms and applying the compatibility inequality,
x = A 1b
A 1 b
Multiplying by b on the left and Ax on the right,
x b A 1 b Ax
Applying the compatibility inequality once more,

x b A 1 b A x
61
Rearranging,
x
x
A A 1
(2.2)
Remarkably, the bracketed term A A 1 appears in both eqn (2.1) and eqn (2.2). It is called the condition
number, and is usually denoted .
Let A be a square, invertible matrix. The condition number of A is

( A ) = A A 1
For a linear system Ax = b, the condition number provides an upper

bound on the relative error in x caused by relative errors in A and b:
x
x + x
x
x
( A)
( A)
A
A
b
b
62
Example
Find the condition numbers for the two systems considered at the start of this section. Also, for the ill-conditioned
system, verify that the actual relative errors caused by the perturbations in A and b are within the bounds
predicted by the condition number. Use the infinity norm.
Solution
For the well-conditioned system,
1 1
A=
1 1
and
0.5 0.5
A 1 =
0.5 0.5
hence
( A) = A
A 1
= 2 1= 2
(which is low)
For the ill-conditioned system,

1
1
A=
0.99 1.01
and
50.5 50
A 1 =
49.5 50
63
hence
( A ) = A
A 1
= 2 100.5 = 201
(which is high)
The matrix was changed from

1
1
A=
0.99 1.01
to
1.01
1
A + A =
0.99 1.01
such that
0 0.01
A =
0
0
This caused the solution to change from

1
x=
1
to
0
x + x =
1.98
such that
1
x =
0.98
64
We therefore expect
x
( A )
x + x
A
A
1
0.01
201
1.98
2
0.505 1.005
which checks out.

The right-hand side was changed from
2
b=
2
to
2.02
b + b =
such that
0.02
b =
This caused the solution to change from

1
x=
1
to
2.01
x + x =
0.01
65
such that
1.01
x =
0.99
We therefore expect
x
x
( A )
b
b
1.01
0.02
201
1
2
1.01 2.01
which again checks out.
66

Linear Algebra

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Linear Algebra

Hochgeladen von

Copyright:

Verfügbare Formate

University of Oxford

Department of Engineering Science

f(x + y) = f(x) + f(y)

[where = an arbitrary scalar]

The function f(x) = ax satisfies both requirements, whereas functions like

x , log(x) and x + 2 clearly do not:

they are non-linear.

This function is a linear transformation from Rn to Rm because

f(x + y) = A(x + y) = Ax + Ay = f(x) + f(y)

f(x) = A(x) = (Ax) = f(x)

1.2 Two ways of thinking about Ax = b

n-dimensional space have

linear combination of the

any point in common?

Can the linear combination

intersect at just one point,

be formed in just one way,

or infinitely many points?

or infinitely many ways?

1.3 Echelon form

Fewer equations than

More equations than

infinitely many solns

To illustrate the meaning, the following matrices are in echelon form:

1.4 Rank and nullity

Sometimes the rank of a matrix is obvious from inspection, e.g.

Writing the echelon form of Ax = 0 out in full, we have:

then solve row R1 for x1:

The general solution can now be assembled, then unpacked as before:

For example, setting = 1, = 0 and = 1 in eqn (1.4), we get another solution

that also satisfies Ax = b,

Flowchart for Ax = b (matrix A is m n)

Example (A1 2004 Q7)

(a) Define the rank, nullity and kernel of a matrix.

length of [ x1 x2 ] = x12 + x22

length of [ x1 x2 L xn ] = x12 + x22 + K + xn 2

The general definition works also perfectly well when n = 1, since

The p-norm can be defined compactly as follows.

If we set p = 1 in the general definition, we get

Finally, if we let p approach infinity in the general definition, we get

(the triangle inequality)

is a valid vector norm, but

is not, since the sum of the terms in a vector

Find the 1-, 2-, 5-, and -norms of the vector x = [3 5 4 1] .

2.2 Matrix norms

must somehow be bigger than a matrix such as

The vector x = [1 0] is mapped to Ax = [5 0] (magnification = 5), but the vector [0 1] is mapped to

Let A be a matrix with n columns, and let x be a vector in Rn. If the

that guarantee the satisfaction of the above

compatibility inequality for an arbitrary vector x.

For p = , the compatible matrix norm is

Again, note the use of the triangle inequality to reach line *.

This is compatible with the 2-norm for vectors, i.e. it ensures

A 1 = max (12, 11, 13 ) = 13

The -norm is the maximum absolute row sum:

= max (11, 17, 8 ) = 17

The Frobenius norm is the square root of the sum of squares:

Solving the quadratic gives = 1 8 = 1.83, 3.83 . Hence

= max ( 1.83 , 3.83 ) = 3.83

investigate the satisfaction of the compatibility inequality

when p and q take values 1 and in all combinations.

1.8 1.4 0.3 0.82

(vector norms of Ax)