Sie sind auf Seite 1von 328

620-156 (MAST10007) Linear Algebra

Topic 1: Linear equations 2


Topic 2: Matrices and determinants 40
Topic 3: Euclidean vector spaces 105
Topic 4: General vector spaces 193
Topic 5: Linear transformations 225
Topic 6: Inner product spaces 263
Topic 7: Eigenvalues and eigenvectors 296

1
Topic 1: Linear equations [AR 1.1 and 1.2]

One of the major topics studied in linear algebra is systems of linear


equations and their solutions. The following topics will be addressed:

1.1. Systems of equations. Coefficient arrays. Row operations.


1.2. Reduction of systems to row-echelon form and reduced row-echelon
form.
1.3. Consistent and inconsistent systems. Infinite solution sets.

2
We will begin with a couple of examples to illustrate how linear
equations can arise.

Example (This is Example 2 of [AR 11.3])


An investor has $10,000 to invest. She considers two bonds, one at
10% and one at 7%. She decides to invest at most $6,000 in the first
bond and at least $2,000 in the second. She also decides to invest at
least as much in the first as the second. How should she invest?

Suppose she invests x1 in the first and x2 in the second.

Then the yield is z = 0.1x1 + 0.07x2 .

The constraints are

x1 6 6000 x2 > 2000


x1 + x2 6 10000 x1 > x2
x1 > 0 x2 > 0
3
We could sketch these constraints:

10000

x 1 = x2

line of constant profit

2000

6000 10000

We can find the maximum profit by moving the ‘line of constant profit’
parallel to itself until it meets the shaded region.

4
Example (This is Example 1 of [AR 11.2])

7Ω A
I1 I2
The numbers I1 , I2 , I3
30V I3 3Ω 11Ω represent the current as
Loop 1 Loop 2 indicated.
50V
Because the current going into point A is the same as the current going
out, we have
I1 = I2 + I3
Measuring voltage around Loop 1 and Loop 2 we get, respectively,

7I1 + 3I3 = 30
11I2 − 3I3 = 50

Solving these gives values for the current.

5
1.1 Systems of equations, coefficient arrays, row operations

Linear equations

In our example, the unknowns were I1 , I2 , I3 , and a typical equation


obtained was
7I1 + 3I3 = 30
We call this a linear equation in the variables I1 and I2 because the
coefficients are constants, and the variables are raised to the first power
only.

Now that we know what makes the equation linear, it is clear to see
that the number of variables is unimportant, so we have in the general
case the following definition.

6
Definition (Linear equation and linear system)
A linear equation in n variables, x1 , x2 , . . . , xn , is an equation of the form

a 1 x1 + a 2 x2 + · · · + a n xn = b

where a1 , a2 , . . . , an are real constants (not all zero) and b is also a real
constant.

Furthermore, a finite set of linear equations in the variables


x1 , x2 , . . . , xn is called a system of linear equations or a linear system.

7
Examples

x + 2y = 7
3
x − 21y = 0
8

x1 + 5x2 + 6x3 = 100


x2 − x3 = −1
−x1 + x3 = 11

8
Definition (Solution of a system of linear equations)
A solution to a system of linear equations in the variables x1 , . . . , xn is a
set of values of these variables which satisfy every equation in the
system.
How would you solve the following linear system?

2x − y = 3
x +y =0

Graphically
◮ Need accurate sketch!
◮ Not practical for three or more variables.
Elimination
◮ Will always give a solution, but is too adhoc, particularly in higher
dimensions (meaning three or more variables).

9
However, the good news is that with the introduction of some clever
notation, we can formalise the procedure involved in solving
simultaneous equations.

Before we do this, let’s remind ourselves how to solve a simple linear


system using elimination.

Example
Find all solutions for the following system of linear equations

2x − y = 3
x +y =0

Adding the two equations gives 3x = 3, from which we get

10
Example Find all solutions for the linear system

3x + 2y = −3
2x + y = −1

Subtracting twice the second from the first gives

11
Coefficient arrays

There is a systematic way to write a linear system. For example,


consider the previously encountered linear system

I1 − I2 − I3 = 0
7I1 + (0×)I2 + 3I3 = 30
(0×)I1 + 11I2 − 3I3 = 50

The coefficients of the unknowns I1 , I2 , I3 form a 3 × 3 array of numbers:


 
1 −1 −1
7 0 3
0 11 −3

12
Definition (Matrix)
A matrix is a rectangular array of numbers.
The numbers in the array are called the entries of the matrix.

(Matrices are discussed in further detail in a few lectures time).

Definition (Augmented matrix of a linear system)


The augmented matrix for a linear system is the matrix formed from the
coefficients in the equations and the constant terms.

Example
The augmented matrix for the previous set of equations is:

13
Example
Write the following system of linear equations as an augmented matrix

2x − y = 3
x +y =0

Note
The number of rows is equal to the number of equations.
Each column, except the last, corresponds to a variable.
The last column contains the constant term from each equation.

14
Row Operations
Our aim is to use matrices to assist us in finding a solution to a system
of equations.
First we need to decide what sort of operations we can perform on the
augmented matrix. An essential condition is that whichever operations
we perform, we must be able to recover the solution to the original
system from the new matrix we obtain.
Let’s start by considering operations on a matrix that mimic those
operations used in the elimination method.

Definition (Elementary row operations)


The elementary row operations are:
1. Interchanging two rows.
2. Multiplying a row by a non-zero constant.
3. Adding a multiple of one row to another.
15
Example
Back to our simple system:

2x − y = 3
x +y =0

Let’s apply some elementary row operations to the corresponding


augmented matrix:
 
2 1 −3

1 1 0

Note
The matrices are not equal, but are equivalent in that the solution set is
the same for each system represented by each augmented matrix.

16
1.2 Reduction of systems to reduced row-echelon form

Gaussian elimination
Using a sequence of elementary row operations, we can always get to a
matrix that allows us to determine the solution set of a linear system.
However, the degree of complication increases as the number of
variables and equations increases, so it is a good idea to formalise the
process.

The leftmost non-zero element in each row is called the leading entry.

17
Definition (Row echelon form)
A matrix is in row-echelon form if:
1. For any row with a leading entry, all elements below that entry and
in the same column as it, are zero.
2. For any two rows, the leading entry of the lower row is further to
the right than the leading entry in the higher row.
3. Any row that consists solely of zeros is lower than any row with
non-zero entries.

Examples
 
  1 0 0 3
1 −2 3 4 5 ,  0 1 1 2  r.e. form
0 0 0 3
 
0 0 0 2 4
 0 0 3 1 6 

 0 0
 not r.e. form
0 0 0 
2 −3 6 −4 9
18
Gaussian elimination is a systematic (or algorithmic) approach to the
reduction of a matrix to row-echelon form.

Gaussian elimination
1. Make the top left element (row 1, column 1) a leading entry; that
is, reorder rows so that the entry in the top left position is non-zero.
2. Add multiples of the first row to the other rows to make all other
entries (from row 2 down) in the first column zero.
3. Reorder rows 2 . . . n so that the next leading entry is in row 2.
4. Add multiples of the second row to rows 3 . . . n, making all other
entries (from row 3 down) in the column containing the second
leading entry zero.
5. Repeat until you run out of rows.

The matrix is now in row-echelon form.

19
Example
Use Gaussian elimination to reduce the augmented matrix which
represents the linear system

3x + 2y − z = −15
x + y − 4z = −30
3x + y + 3z = 11
3x + 3y − 5z = −41

to row-echelon form.

20
By reducing a matrix to row-echelon form, Gaussian elimination allows
us to easily solve a system of linear equations. To do this we need to
read off the final equations from the row-echelon matrix and then
manipulate them to find the solution. The final manipulation is
sometimes called back substitution. We illustrate with an example.

Example
From the row-echelon matrix of the previous example we can calculate
the solutions to the original system.

Note
This procedure relies on the fact that the new row-echelon matrix gives
a linear system with exactly the same set of solutions as the original
linear system.
21
Is there any way to solve a system without having to perform the final
manipulation?

Definition (Reduced row-echelon form)


A matrix is in reduced row-echelon form if the following three conditions
are satisfied:
1. It is in row-echelon form.
2. Each leading entry is equal to 1 (called a leading 1).
3. In each column containing a leading 1, all other entries are zero.

22
Examples

 
1 −2 3 −4 5 is in r.r.e form

 
1 0 0 3
 0 1 1 2  is not in r.r.e form
0 0 0 3

 
1 0 0 2 4
 0 1 3 1 6 

 0
 is not in r.r.e form
0 0 0 0 
0 0 1 −4 9

23
1.2.2 Gauss-Jordan elimination
Gauss-Jordan elimination is a systematic way to reduce a matrix to
reduced row-echelon form using row operations.

Gauss-Jordan elimination
1. Use Gaussian elimination to reduce matrix to row-echelon form.
2. Use row operations to create zeros above the leading entries.
3. Multiply rows by appropriate numbers to create the leading 1’s.

The order of operations is not unique, however the reduced row-echelon


form of a matrix is!

24
Example
Use Gauss-Jordan elimination to find a solution to the linear system

3x + 2y − z = −15
x + y − 4z = −30
3x + y + 3z = 11
3x + 3y − 5z = −41

25
1.3 Consistent and inconsistent systems

Recall that the definition of a solution of a system of linear equations as


values of the variables which satisfy every equation in the system.

Example
Find the solution of the system

2x + 4y − z = −3
x − 3y + 2z = 11
4x − 2y + 5z = 21

This system has a unique solution.

26
Another example
Find the solution of the system

x− y+ z =3
x − 7y + 3z = −11
2x + y + z = 16

What’s going on here?

27
Yet another example
Find the solution of the system

x+ y+ z =4
2x + y + 2z = 9
3x + 2y + 3z = 13

Is this the same behaviour as the previous example or not?

28
Types of solution sets
It is clear from the above examples that there are different types of
solutions for systems of equations.

Definition (Consistency)
◮ A system of linear equations is said to be consistent if the system
has at least one solution.
◮ A system of linear equations is said to be inconsistent if the system
has no solution.

For any system of linear equations, one of three types of solution set is
possible.
◮ no solution (inconsistent)
◮ one solution (consistent and unique)
◮ infinitely many solutions (consistent but not unique)

29
Inconsistent systems

We can determine the type of solution set a system has by reducing its
augmented matrix to row-echelon form.

The system is inconsistent if there is at least one row in the row-echelon


matrix having all values equal to zero on the left and a non-zero entry
on the right. For example,
 
0 0 ··· 0 5
Why is this inconsistent?

If we try to recover the equation represented by this row, it says:

0 × x1 + 0 × x2 + · · · + 0 × xn = 5
and of course this is not satisfied for any values of x1 , . . . , xn .

30
Example  
1 2 0 4
 2 1 1 2 
4 2 2 3

If a system of equations is inconsistent then the


 row-echelon form of its
augmented matrix will have a row of the form 0 0 · · · 0 a with
a 6= 0.
31
Example (inconsistent)
Geometrically, an inconsistent system is one for which there is no
common point of intersection for the system.

32
Consistent systems
Recall that a consistent system has either a unique solution or infinitely
many solutions.

Unique solution:
For a system of equations with n variables, a unique solution exists
precisely when the row reduced augmented matrix has n non-zero rows.
In this case we can read off the solution straight from the reduced
matrix.
Example
 
1 1 3 5
 1 2 1 4 ∼
2 1 1 4

and the solution is

x1 = x2 = x3 =

33
Infinitely many solutions:
Example
What is the solution to the equations with rre form;
 
1 2 0 0 5 1
 0 0 1 0 6 2 
 
 0 0 0 1 7 3 
0 0 0 0 0 0
The corresponding (non-zero) equations are
x1 + 2x2 + 5x5 = 1, x3 + 6x5 = 2, x4 + 7x5 = 3.

We can choose x2 and x5 as we wish. Say x2 = s, x5 = t. Then the


other variables must be given by
x1 = 1 − 2s − 5t, x3 = 2 − 6t, x4 = 3 − 7t.

In this way, we can describe every possible solution.

34
In general
If the row reduced augmented matrix has < n non zero rows then it has
infinitely many solutions.

If r is the number of non zero rows, then n − r parameters are needed


to specify the solution set.

More precisely, in the rre form of the matrix, there will be n − r columns
which contain no leading entry.

We can choose the variable corresponding to such a column arbitrarily.

The values for the remaining variables will then follow.

35
Example
Suppose that the rre matrix for a system of equations is
 
1 2 0 1 1
 0 0 1 2 2 
 
 0 0 0 0 0 
0 0 0 0 0

We can represent the values of x2 and x4 by parameters s and t,


respectively.
Then the values of the other variables are

x1 = x3 =

36
Example
 
1 1 3 5
 1 2 8 11  ∼
2 1 1 4

So the solution of the equations can be given by:

x1 = x2 = x3 =

37
Example
Solve the linear system:

v − 2w + z = 1
2u − v − z = 0
4u + v − 6w = 3

38
Example
Find the values of k for which the system

u + 3v + 4w = 6
4u + 9v − w = 4
6u + 9v + kw = 8

has
(i) no solution
(ii) a unique solution
(iii) an infinite number of solutions

39
Topic 2: Matrices and Determinants [AR 1.3 – 1.6]

In studying linear systems we introduced the idea of a matrix. Next we


will discover that matrices are not only useful tools for solving systems
of equations but also that they have their own algebraic structure and
have many interesting properties.

2.1 Properties of matrices


2.2 Matrix algebra
2.3 Matrix inverses
2.4 Rank of a matrix
2.5 Solutions of non-homogeneous linear equations
2.6 Determinants

40
2.1 Properties of matrices

Notation
Sometimes it is convenient to refer to the entries of a matrix rather
than the entire matrix.
If A is a matrix, we denote its entries as Aij , where i specifies the row of
the entry and j specifies the column. Using this notation, we write
 
A11 A12 . . . . . . A1n
 A21 A22 . . . . . . A2n 
 
 .. .. .. .. 
A= .
 . . .  or A = [Aij ]
 .. .. . .. . 
.. 
 . .
Am1 Am2 . . . . . . Amn

41
We say that a matrix has size m × n when it has m rows and n columns.

Example  
1 2 3
The matrix A = has size 2 × 3.
π e 27.1

We have A12 = 2 and A21 = π.

Note
A12 and A21 are not equal (in this example).

42
Some special matrices
Some matrices with special features are given names suggestive of those
features.
Definition (Special matrices)
◮ A matrix having the same number of rows as columns is called a
square matrix
◮ A matrix with only one row is called a row matrix
(They have size 1 × n)
◮ A matrix with only one column is called a column matrix (They
have size n × 1)
◮ A matrix with all elements equal to zero is referred to as a zero
matrix
◮ A square matrix with Aij = 0 for i 6= j is called a diagonal matrix
(
1 if i = j
◮ A square matrix A satisfying Aij =
0 if i 6= j
is called an identity matrix.
43
Examples
 
  1 2 2
1 2
The matrices and  3 4 5  are both square.
3 4
6 7 8
 
4 3 5 −2 is a row matrix.
 
1
 2  is a column matrix.
3
 
  0 0
0 0 0
The matrices and  0 0  are both zero matrices.
0 0 0
0 0
 
  1 0 0
1 0
and  0 1 0  are both identity matrices.
0 1
0 0 1
44
2.2 Matrix algebra

We all know the algebra of real numbers: addition, subtraction,


multiplication and division. But what do these operations mean when
we try to apply them to things other than real numbers? Matrices have
their own algebra for which some of these things make sense and some
don’t.

Definition (Equality of matrices)


We say that two matrices are equal if
◮ they have the same size; and
◮ every corresponding element is equal.
That is,
A and B have the same size and
A=B ⇐⇒
Aij = Bij for all i and j

45
Example    
1 y 0 1 3 0
Given A = and B =
−7 2 3 x 2 3
determine the values of x and y for which A = B.

Scalar Multiplication

Definition (Scalar multiple)


Let A = [Aij ] be a matrix and c ∈ R. The matrix cA having the same
size as A and entries given by

(cA)ij = c × Aij

is called a scalar multiple of A.


In this context the real number c is called a scalar.

46
Example  
0 1
Let A =  1 −1 . What are −2A and αA for α ∈ R?
2 −7

Properties of Scalar Multiplication

If α and β are any scalars and C and D are any matrices of the same
size, then
1. (α + β)C = αC + βC
2. α(D + C ) = αD + αC
3. α(βC ) = (αβ)C

47
Addition of Matrices

Definition (Addition of matrices)


Let A = [Aij ] and B = [Bij ] be matrices of the same size.
We define a matrix C , called the sum of A and B, by
◮ C has the same size as A and B
◮ Cij = Aij + Bij
We write C = A + B.

Be Careful: Matrix addition is only defined for matrices of the same size.

Notation: We write A − B in place of A + (−1)B.

48
Example
Let
     
2 0 −3 −1 −1 1 −1 1
A= , B= and C= .
1 −1 3 0 1 2 2 0

Calculate (where possible) A + B and A + B + C

49
Properties of Matrix Addition

For matrices A, B and C , all of the same size, the following statements
hold:
1. A + B = B + A (commutativity)
2. A + (B + C ) = (A + B) + C (associativity)
3. A − A = 0
4. A + 0 = A

Here 0 denotes the zero matrix of the same size as A, B and C .

All these properties follow from the corresponding properties of the


scalars.

50
Matrix multiplication
Sometimes we can multiply two matrices together.

Definition (Matrix multiplication)


Let A be an m × n matrix and B be a n × q matrix.
The matrix product of A and B is a matrix of size m × q, denoted AB.

The entry in position ij of the matrix product is obtained by taking row


i of A, and column j of B, then multiplying together the entries in order
and forming the sum.
Using summation notation this rule can be expressed as
n
X
(AB)ij = Aik Bkj
k=1

Note: The matrix product AB is only defined if the number of columns


of A is equal to the number of rows of B.
51
Example
 
1 −1  
4 0
Let A =  3 0  and B =
−7 1
0 1
Calculate AB and BA (if they exist).

52
Example
   
1 0 4 3
Let A = and B =
2 3 2 1
Calculate AB and BA.

Notice: In this example AB 6= BA, even though both are defined.

Matrix multiplication is not commutative (in general).

53
Definition (Commuting matrices)
The matrices A and B are said to commute if AB = BA.

For both AB and BA to be defined and equal we must have that A and
B are square and have the same size.

Note
If I is an n × n identity matrix and A is any n × n square matrix, then

AI = IA = A

54
Properties of matrix multiplication
The following properties hold whenever the matrix products and sums
are defined:
1. A(B + C ) = AB + AC (left distributivity)
2. (A + B)C = AC + BC (right distributivity)
3. A(BC ) = (AB)C (associativity)
4. A(αB) = α(AB)
5. AI = IA = A
6. A0 = 0 and 0A = 0

Here α is a scalar and, as always, I and 0 denote the identity matrix


and zero matrix of the appropriate size.

55
Matrix multiplication and linear systems

Using the rule for matrix multiplication, a linear system can be written
as a matrix equation.

Example

The linear system

2x + 0.1y − 0.02z = 0.9


−0.8x + 7y = 0

is equivalent to the matrix equation


 
  x  
2 0.1 −0.02   0.9
y =
−0.8 7 0 0
z

56
Another non-property of matrix multiplication

If we take the product of two real numbers a and b and ab = 0 we can


conclude that at least one of a and b is equal to 0.

This is not always true for matrices.

Example
   
1 1 1 −1
Let A = and B = .
−1 −1 −1 1

What is going on here? The problem is that the matrices A and B in


the above example do not have inverses.

57
2.3 Matrix inverses

Definition (Matrix inverse)


A matrix A is called invertible if there exists a matrix B such that

AB = BA = I

Such a matrix B is called the inverse of A and is denoted A−1 .

If A is not invertible, we say that A is singular.

58
Note

It follows from the definition that:


◮ For A to be invertible it must be square
◮ (If it exists) A−1 has the same size as A
◮ (If it exists) A−1 is unique
◮ If A is invertible, then A−1 is invertible and (A−1 )−1 = A
◮ I −1 = I , 0 has no inverse.

59
Inverse of a 2 × 2 matrix
 
a b
In general, for a 2 × 2 matrix A=
c d
1. A is invertible iff (ad − bc) 6= 0
 
1 d −b
2. If (ad − bc) 6= 0, then A−1 = ad−bc −c a

Example  
2 −1
Find the inverse of A =
1 1

60
Example
If A is a square matrix satisfying A3 = 0, show that
(I − A)−1 = I + A + A2

61
Finding the inverse of a square matrix
Calculating the inverse of a matrix
Given an n × n matrix A, we can find A−1 as follows:

1. Construct the (“grand augmented”) matrix [A | I ],


where I is the n × n identity matrix.

2. Apply row operations to [A | I ] to get the block corresponding to


A into reduced row-echelon form. This gives

[A | I ] ∼ [R | B]

where R is in reduced row-echelon form.

3. If R = I , then A is invertible and A−1 = B


6 I , then A is singular (i.e., there is no A−1 )
If R =
We will see shortly why this method works.
62
Example  
1 2 1
Find the inverse of A =  −1 −1 1 
0 1 3

63
Another example  
1 2 1
Find the inverse of  −1 −1 1 
−1 0 3

64
Properties of the matrix inverse
If A and B are invertible matrices of the same size, and α a non-zero
scalar, then
1
1. (αA)−1 = α A−1
2. (AB)−1 = B −1 A−1
n
3. (An )−1 = A−1 (for all n ∈ N)

Exercise
Prove these!

65
Row operations by matrix multiplication

The effect of a row operation can be achieved by multiplication on the


left by a suitable matrix.

Definition (Elementary matrix)


An n × n matrix is an elementary matrix if it can be obtained from In
by performing a single elementary row operation.

Example  
1 0 0
Consider I3 =  0 1 0 
0 0 1

Let F be obtained from I3 by the row operation: R2 ↔ R3


Let G be obtained from I3 by the row operation: R1 →
7 2R1
Let H be obtained from I3 by the row operation: R3 →7 R3 + 3R1
66
     
1 0 0 2 0 0 1 0 0
F = 0 0 1  G = 0 1 0  H= 0 1 0 
0 1 0 0 0 1 3 0 1

 
a b c
Now consider the matrix A =  d e f 
g h i
Calculation gives

FA =

GA =

HA =

67
This works in general!

Let E be obtained by applying a row operation p to I .

If A is a matrix such that the product EA is defined,


then EA is the result of performing p on A.

We can represent a sequence of elementary row operations by a


corresponding sequence of elementary matrices.
(Be careful about the order!)

68
Example      
1 2 1 2 1 2
∼ ∼
3 4 0 −2 0 1

     
1 0 1 0 1 2 1 2
−1 =
0 2 −3 1 3 4 0 1

69
If A ∼ I , then there is a sequence of elementary matrices E1 , E2 , . . . , En
such that En En−1 · · · E2 E1 A = I

This can be used to prove the following

Theorem (AR 1.5.3)


1. Let A be an n × n matrix. Then

A is invertible ⇐⇒ A ∼ In

2. If A and B are n × n matrices such that AB = In ,


then B is invertible and B −1 = A.
3. Every invertible matrix can be written as a product of elementary
matrices.

This theorem justifies why our procedure for finding inverses using row
operations actually works.... Can you see why?
70
Matrix transpose

Definition (Transpose of a matrix)


Let A be an m × n matrix. The transpose of A, denoted by AT , is
defined to be the n × m matrix whose entries are given by

(AT )ij = Aji

That is, AT is obtained by interchanging the rows and columns of A.

Example  
1 2 3
Let A = . Then AT is
4 5 6

71
Properties of the transpose

T
1. AT =A
2. (A + B)T = AT + B T (whenever A + B is defined)
3. (αA)T = αAT (where α is a scalar)
4. (AB)T= B T AT (whenever AB is defined)
 −1 T
5. AT = A−1 (whenever A−1 is defined)

Parts 1, 2 and 3 follow easily from the definition.

To prove part 4, we also use the definition of matrix multiplication given


previously.

72
Example
−1 T
Using part 4 above, prove part 5: AT = A−1 .

73
Linear systems revisited

Consider the linear system

x + 2y + z = − 3
−x − y + z = 11
y + 3z = 21

We could represent the system by an augmented matrix:


 
1 2 1 −3
 −1 −1 1 11 
0 1 3 21
or, we could write it as a matrix equation:
    
1 2 1 x −3
 −1 −1 1   y  =  11 
0 1 3 z 21

74
For convenience, label
(i) the matrix of coefficients as A,
(ii) the column matrix of variables as x,
(iii) the right hand side column matrix as b.
Using this notation, the linear system can be written as

Ax = b

Solving linear systems using matrix inverses

Theorem
Suppose A is an invertible matrix.
Then the linear system Ax = b has exactly one solution, and it is given
by x = A−1 b

Proof: Ax = b =⇒ A−1 Ax = A−1 b =⇒ x = A−1 b


75
Example

Use a matrix inverse to solve the linear system

x + 2y + z = −3
−x − y + z = 11
y + 3z = 21

76
Another example

Solve the following system of linear equations:

x + 2y + z = a
−x − y + z = b
y + 3z = c

77
2.4 Rank of a matrix

Definition (Rank)
The rank of a matrix A is the number of non-zero rows in the reduced
row-echelon form of A.

Note
◮ This is the same as the number of non-zero rows
in a row-echelon form of A.
◮ If A has size m × n, then rank(A) 6 m and rank(A) 6 n

Example
Find the rank of each of the following matrices:
   
1 2 1 1 −1 2 1
 −1 −1 1   0 1 1 −2 
0 1 3 1 −3 0 5

78
Theorem
The linear system Ax = b, where A is an m × n matrix, has:
1. No solution if rank(A) < rank([A | b])

2. A unique solution if rank(A) = rank([A | b]) and rank(A) = n

3. Infinitely many solutions if rank(A) = rank([A | b]) and rank(A) < n

Proof: This is just a restatement of the results in section 1.3


Note
It is always the case that rank(A) 6 rank([A | b]) and rank(A) 6 n

79
Theorem
If A is an n × n matrix, the following conditions are equivalent:
1. A is invertible
2. Ax = b has a unique solution for any b
3. The rank of A is n
4. The reduced row-echelon form of A is In
Proof:
1 ⇒ 2 We’ve seen before (slide 75).
2 ⇒ 3 Follows from the previous theorem (or what we already knew about
linear systems).
3 ⇒ 4 Immediate from the definition of rank, and that fact that A is
square.
4 ⇒ 1 Let R be the RREF of A. Then R = EA, where E = Ek Ek−1 . . . E1
is a product of elementary matrices. So we have I = EA. We have
already noted that this implies that A is invertible (and
A−1 = E ).
80
2.5 Solutions of non-homogeneous linear equations

Suppose that A is an m × n matrix (of coefficients), that


x = [x1 , . . . , xn ]T is an n × 1 matrix (of unknowns) and that b is an
m × 1 matrix (of constants).

The general solution to the system of equations

Ax = b

is given by
x = xh + x0
where x0 is any one solution to Ax = b and xh varies through all
solutions to the homogeneous equations Ax = 0.

81
Example
The equations
 
  x1  
1 0 −1 1   1
0 1 −1 −1 x2  = 2 (Ax = b)
 x3 
0 0 0 0 0
x4

have a solution x1 = 1, x2 = 2, x3 = 0, x4 = 0.

The general solution to the homogeneous equations Ax = 0 is given by

So the general solution to the original equations is

82
2.6 Determinants [AR 2.1–2.3]


a b
When we calculate the inverse of the 2 × 2 matrix A = we find
c d
that we need to invert the number ad − bc.

If ad − bc 6= 0 then we can find the inverse of A; if not, then A is not


invertible.

So this number plays an important role when we study A. We call it the


determinant of A.

The determinant is a function that associates a real number to any


square matrix.

83
Defining the determinant
 
a b
The determinant of a 2 × 2 matrix A = is given by
c d

det(A) = ad − bc
 
a b
Furthermore, the matrix is invertible iff det(A) 6= 0
c d

Definition (Determinant)
Let A be an n × n matrix. The determinant of A, denoted det(A) or |A|,
can be defined as the signed sum of all the ways to multiply together n
entries of the matrix, with all chosen from different rows and columns.

84
To determine the sign of the products, imagine all but the elements in
the product in question are set to zero in the matrix. Now swap
columns until a diagonal matrix results. If the number of swaps required
is even, then the product has a + sign, while if it is odd, it is to be
given a − sign.

Determinants of 2 × 2 matrices
 
a b
For A = , the possible products are ad and bc.
c d

When we set all entries other than a and d to zero, then the matrix is
diagonal; so ad has a + sign. When we set all entries other than b and
c to zero then we need to interchange the two columns to obtain a
diagonal matrix; so we have a minus sign.

85
Determinant of a 3 × 3 matrix
Suppose  
a11 a12 a13
A = a21 a22 a23 
a31 a32 a33
We can form a table of all the products of 3 entries taken from different
rows and columns, together with the signs:

product sign
a11 a22 a33 +
a11 a23 a32 −
a12 a21 a33 −
a12 a23 a31 +
a13 a21 a32 +
a13 a22 a31 −
Hence
det(A) = a11 a22 a33 − a11 a23 a32 − a12 a21 a33
+a12 a23 a31 + a13 a21 a32 − a13 a22 a31
86
The formula for 3 × 3 matrices is complicated and it becomes quickly
worse as the size of the matrix increases. But there are better ways to
calculate determinants.
Submatrices and cofactors
Here is one way for practical calculation of (small) determinants.

Definition (Submatrix)
Let A be an m × n matrix. The (i , j)-submatrix of A, denoted by
A(i , j), is the (m − 1) × (n − 1) matrix obtained by deleting the i th row
and jth column from A.

Example
 
1 2 1  
1 2
If A = −1 −1 1 , then A(2, 3) =
 
0 1
0 1 3

87
Definition (cofactor)
Let A be a square matrix. The (i , j)-cofactor of A, denoted by Cij , is
defined by
Cij = (−1)i +j det (A(i , j))

Example (continued)
 
1 2
A(2, 3) = , so det (A(2, 3)) = 1 and C23 =
0 1

88
Cofactor Expansion

We can write the determinant of a 3 × 3 matrix, A, in terms of


cofactors.
det(A) = a11 C11 + a12 C12 + a13 C13
This is called the cofactor expansion along the first row of A.

Example
 
1 2 1
Calculate det  −1 −1 1 
0 1 3

89
Theorem
The determinant of an n × n matrix A can be computed by multiplying
the entries in any row (or column) by their cofactors and adding the
resulting products.

That is, for each 1 6 i 6 n and 1 6 j 6 n,

det(A) = ai 1 Ci 1 + ai 2 Ci 2 + · · · + ain Cin


(this is called cofactor expansion along the ith row)
and

det(A) = a1j C1j + a2j C2j + · · · + anj Cnj


(this is called cofactor expansion along the jth column)

Proof: We can prove this in the n = 3 case using the formula on slide
86. The proof for the general case is essentially the same, but gets more
technical...
90
Example

Calculate the determinant of


 
1 2 1
A =  −1 −1 1 
0 1 3

(i) using the cofactor expansion along the 2nd column,


(ii) using the cofactor expansion along the 3rd row.

Which one was easier?

91
How do you remember the sign of the cofactor?

The (1, 1)-cofactor always has sign +. Starting from there, imagine
walking to the square you want using either horizontal or vertical steps.
The appropriate sign will change at each step.

We can visualize this arrangement with the following matrix:


 
+ − + − ...
 − + − + ... 
 
 + − + − ... 
 
 − + − + ... 
 
.. .. .. .. . .
. . . . .

So, for example, C13 is assigned + but C32 is assigned −

92
Example


1 −2 0 1
3 1 2 0
Calculate
1 0 1 0
2 −2 1 2

93
Some properties of determinants:

Suppose A and B are square matrices. Then


1. det AT = det(A)
2. If A has a row or column of zeros, then det(A) = 0
3. det(AB) = det(A) det(B)
 1
4. If A if invertible, then det(A) 6= 0 and det A−1 = det(A)

5. If A is singular, then det(A) = 0


 
C ∗
6. If A = with C and D both square, then
0 D
det(A) = det(C ) det(D)

Idea of proof: 2: Cofactor expansion. 4: Follows from 3. 5: Follows


from 3, 4 and the theorem on slide 80. 1, 3 6: Need to use the
definition...
94
Row operations and determinants
Calculating determinants via cofactors becomes a very large calculation
as the size of the matrix increases.
We need a better way to calculate larger determinants.
For some types of matrix it is easy to write down their determinant.

Definition (Triangular Matrix)


A matrix is said to be upper triangular (respectively lower triangular) if
all the elements below (respectively above) the main diagonal are zero.

Examples
     
2 −1 9 2 0 0 2 0 0
 0 3 2   1 3 0   0 3 0 
0 0 2 2 −3 2 0 0 2

95
Theorem
If A is an n × n triangular matrix, then
det(A) is the product of the entries on the main diagonal of A.

Idea of proof: (Repeated) cofactor expansion along the first column.

Example  
2 −10 92 −117
 0 3 28 −31 
Let A = 
 0

0 −1 27 
0 0 0 2

What is det(A)?

96
We can use row operations to manipulate a matrix into triangular form
in order to make the determinant calculation easier.

Recall that an n × n matrix is an elementary matrix if it can be


obtained from In by performing a single elementary row operation.
Multiplying on the left by an elementary matrix is equivalent to
applying the corresponding elementary row operation.

Using elementary matrices we can check the effects of row operations


on the determinant of a matrix.

97
Example
 
a b c
Let A =  d e f 
g h i

Row swap:
   
1 0 0 a b c
E = 0 0 1  EA =  g h i 
0 1 0 d e f

det(EA) = det(E ) det(A) = − det(A)

98
Multiply a row by a scalar:
   
2 0 0 2a 2b 2c
F = 0 1 0  FA =  d e f 
0 0 1 g h i

det(FA) = det(F ) det(A) = 2 det(A)

Add a constant multiple of one row to another row:

   
1 0 0 a b c
G = 0 1 0  GA =  d e f 
3 0 1 g + 3a h + 3b i + 3c

det(GA) = det(G ) det(A) = det(A)


99
The effect of each of the three types of elementary row operations are
given by the following.

Theorem
Let A be a square matrix.
1. If B is obtained from A by swapping two rows (or two columns) of
A, then det(B) = − det(A)
2. If B is obtained from A by multiplying a row (or column) of A by
the scalar α, then det(B) = α det(A)
3. If B is obtained from A by replacing a row (or column) of A by
itself plus a multiple of another row (column), then
det(B) = det(A)

Proof: The corresponding elementary matrices have determinants −1, α


and 1 (respectively). Then use det(B) = det(E ) det(A).

100
Example
 
1 2 1
Calculate det  −1 −1 1 
0 1 3

101
Example


1 −2 0 1
3 1 2 0
Calculate
1 0 1 0
2 −2 1 2

102
We collect here some properties of the determinant function, most of
which we’ve already noted.

Theorem
Let A be an n × n matrix. Then,

1. det AT = det(A)
2. det(AB) = det(A) det(B)
3. det(αA) = αn det(A)
4. If A is a triangular matrix, then its determinant is the product of
the elements on the main diagonal
5. If A has a row (or column) of zeros, then det(A) = 0
6. If A has a row (or column) which is a scalar multiple of another
row (or column) then det(A) = 0
7. A is singular iff det(A) = 0 (and A is invertible iff det(A) 6= 0)

103
Example (showing how to prove property 3 above)

Let A be a 3 × 3 matrix. Show that det(αA) = α3 det(A).

104
Topic 3: Euclidean Vector Spaces

There are quite a few things to cover:

3.1 Vectors in Rn
3.2 Dot product
3.3 Cross product of vectors in R3
3.4 Geometric applications
3.5 Linear combinations
3.6 Subspaces of Rn
3.7 Bases and dimension
3.8 Rank-nullity theorem
3.9 Coordinates relative to a basis

105
3.1 Vectors in Rn [AR 3.1]

Geometrically, a pair of real numbers (a, b) can be thought of as


representing a directed line segment from the origin in the plane.

These can be added together, and multiplied by a real number.

Algebraically, these operations are given by:

vector addition: (a, b) + (c, d) = (a + c, b + d)


scalar multiplication: α(a, b) = (αa, αb)

Notation:

R2 = {(a, b) | a, b ∈ R}
= the set of all ordered pairs of real numbers

106
The algebraic approach to vector addition and scalar multiplication
extends to 3 dimensions or more.

Rn = {(x1 , x2 , . . . , xn ) | xi ∈ R for i = 1, 2, . . . , n}
= the set of all n-tuples of real numbers

We will refer to elements of Rn as vectors.

We can add two vectors together and multiply a vector by a scalar.

vector addition : (x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn )


scalar multiplication : α(x1 , . . . , xn ) = (αx1 , . . . , αxn )

107
Notation

We often denote by i, j, and k the vectors in R3 given by:

i = (1, 0, 0) j = (0, 1, 0) k = (0, 0, 1)

Any vector in R3 can be written in terms of i, j and k:

u = (u1 , u2 , u3 ) = ui i + u2 j + u3 k

108
Definition (magnitude)
The length (or magnitude or norm) of a vector
u = (u1 , u2 , . . . , un ) ∈ Rn is given by
q
kuk = u12 + u22 + · · · + un2

It follows from Pythagoras’ theorem that this corresponds to our


geometric idea of length for vectors in R2 and R3 .
Example
Let u = 2i − j + 2k. kuk =
A vector having length equal to 1 is called a unit vector.
Example
Find a unit vector parallel to u.

109
Definition
The distance between two vectors u, v ∈ Rn is given by

d(u, v) = kv − uk

Geometrically, if we consider u and v to be the position vectors of


points P and Q, then d(u, v) is the distance between P and Q.

Example
Find the distance between the points P(1, 3, −1) and Q(2, 1, −1).

110
3.2 Dot product [AR 3.3]
Let
u = (u1 , u2 , . . . , un ) ∈ Rn
and
v = (v1 , v2 , . . . , vn ) ∈ Rn
be two vectors in Rn .

Definition (Dot product)


We define the dot product (or scalar product or Euclidean inner
product) by
u · v = u1 v1 + u2 v2 + · · · + un vn

Examples
(3, −1) · (1, 2) =
(i + j + k) · (−j + k) =

111
The angle between two vectors can be defined in terms of the dot
product.

Definition (Angle)
The angle θ between two vectors u, v ∈ Rn is given by the expression

u · v = kukkvk cos θ

The angle defined in this way is exactly the usual angle between two
vectors in R2 or R3 .
That our definition of angle makes sense relies on the following

Theorem (Cauchy-Schwarz Inequality for Rn )


Let u, v be vectors. Then
|u · v| 6 kukkv k
with equality holding precisely when u is a multiple of v.
Proof: Consider the quadratic polynomial given by
(u + tv) · (u + tv)...
112
Example
Find the angle between the vectors u = 2i + j − k and v = i + 2j + k.

113
Properties of the dot product
1. u · v is a scalar
2. u · v = v · u
3. u · (v + w) = u · v + u · w
4. u · u = kuk2
5. (αu) · v = α(u.v)

Note
Suppose u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) are two vectors in Rn .
If we write each as a row matrix U = [u1 · · · un ] V = [v1 · · · vn ], then

u · v = UV T

114
3.3 Cross product of vectors in R3 [AR 3.4]

Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be two vectors in R3 .

Definition (Cross product)


The cross product (or vector product) of u and v is the vector given by

u × v = (u2 v3 − u3 v2 )i + (u3 v1 − u1 v3 ) j + (u1 v2 − u2 v1 )k

A convenient way to remember this is as a “determinant”



i j k
u2 u3 u1 u3 u1 u2
u × v = u1 u2 u3 =

i − v1 v3 j + v1 v2 k

v1 v2 v3 v 2 v 3

using cofactor expansion along the first row.

115
Geometry of the cross product
v

θ
u × v = kuk kvk sin(θ) n̂
u

◮ n̂ is a unit vector (i.e., kn̂k = 1)


◮ n̂ is perpendicular to both u and v (i.e., n̂ · u = 0 and n̂ · v = 0)
◮ n̂ points in the direction given by the right-hand rule
◮ θ ∈ [0, π] is the angle between u and v
◮ If θ = 0 or θ = π, then u × v = 0

Example
Find a vector perpendicular to both (2, 3, 1) and (1, 1, 1).

116
Properties of the cross product

1. v × u = −(u × v)
2. u × (v + w) = (u × v) + (u × w)
3. (αu) × v = α(u × v)
4. u × 0 = 0
5. u × u = 0
6. u · (u × v) = 0

Note
The cross product is defined only for R3 . Unlike dot product and many
of the other properties we are considering, it does not extend to Rn in
general.

117
3.4 Geometric applications [AR 3.5]

Basic applications

Suppose u, v ∈ R3 are two vectors.


1. u and v are perpendicular precisely when u · v = 0
2. The area of the parallelogram
defined by u and v is equal to v

ku × vk u

Note
If u and v are elements of R2 with u = (u1 , u2 ) and v = (v1 , v2 ), then

u1 u2
area of parallelogram = absolute value of
v1 v2

118
Example
Find the area of the triangle with vertices (2, −5, 4), (3, −4, 5) and
(3, −6, 2).

119
3. Assuming u 6= 0,
the projection of v onto u is v
given by (v − proju v)

(u · v)
proju v = u
kuk2 proju v u

Notice that:
◮ If we set û = u/kuk, then proju v = (û · v) û
◮ u · (v − proju v) =

120
Example
Let w = (2, −1, −2) and v = (2, 1, 3).
Find vectors v1 and v2 such that
◮ v = v1 + v2 ,
◮ v1 is parallel to w, and
◮ v2 is perpendicular to w.

121
Scalar triple product

Notice that
u1 u2 u3

u · (u × v) = u1 u2 u3 = 0
v1 v2 v3

Similarly

u · (v × u) = v · (u × v) = v · (v × u) = 0

In general, u · (v × w) is called the scalar triple product and is given by



u1 u2 u3

u · (v × w) = v1 v2 v3
w1 w2 w3

122
Suppose u, v, w ∈ R3 are three vectors.

The parallelepiped defined by u, v and w

w
v

has volume equal to the scalar triple product of u, v and w


u1 u2 u3

volume of parallelepiped = |u·(v×w)| = absolute value of v1 v2 v3
w1 w2 w3

123
Example
⇀ ⇀ ⇀
Find the volume of the parallelepiped with adjacent edges PQ, PR, PS,
where the points are: P(2, −1, 1), Q(4, 6, 7), R(5, 9, 7) and S(8, 8, 8).

124
Lines

Vector equation of a line


The vector equation of a line through a point P0 in the direction
determined by a vector v is

r = r0 + tv t∈R

where r0 = OP0

tv
P0

Each point on the line is


r = r0 + tv
given by a unique value
r0
of t.

0
125
Letting r = (x, y , z), r0 = (x0 , y0 , z0 ) and v = (a, b, c) the equation
becomes
(x, y , z) = (x0 , y0 , z0 ) + t(a, b, c)

Parametric equations for a line


Equating components gives the parametric equations for the line:

x = x0 + ta
y = y0 + tb t∈R
z = z0 + tc

Example
What is the parametric form of the line passing through the points
P(−1, 2, 3) and Q(4, −2, 5)?

126
Cartesian equations for a line
If a 6= 0, b 6= 0 and c 6= 0, we can solve the parametric equations for t
and equate. This gives the cartesian form of the straight line:
x − x0 y − y0 z − z0
= =
a b c

Example
What is the cartesian form of the line passing through the points
P(−1, 2, 3) and Q(4, −2, 5)?

127
Example
Find the vector equation of the line whose cartesian form is
x +1 y −3 z −4
= =
5 −1 2

128
Definition
Two lines are said to:
◮ intersect if there is a point lying in both
◮ be parallel if their direction vectors are parallel
The angle between two lines is the angle between their direction vectors

Example
Find the vector equation of the line through the point P(0, 0, 1) that is
parallel to the line given by
x −1 y +2 z −6
= =
1 2 2

129
Planes

A plane is determined by a point P0 on it and any two non-parallel


vectors u and v lying in the plane.

Vector equation of a plane



Letting r0 = OP0 , the vector equation of the plane is

r = r0 + su + tv s, t ∈ R
su
P0
tv

r = r0 + su + tv
0

Particular values of s and t determine a point on the plane, and


conversely every point on the plane has position vector given by some s
and t.
130
The vector
(u × v)
n̂ =
ku × vk
is a unit vector that is perpendicular to the plane.

Such a vector is called a unit normal vector to the plane.

The angle between two planes is given by the angle between their (unit)
normal vectors.

If P is a point with position vector r, then:

P lies on the plane ⇐⇒ r − r0 is parallel to plane


⇐⇒ r − r0 is perpendicular to n̂

131
It follows that the equation of
r0
the plane can also be written as n̂ r−

(r − r0 ) · n̂ = 0
r0
r
0

Writing r = (x, y , z) and n = (a, b, c), we obtain the Cartesian equation


of the plane
ax + by + cz = d

132
Examples
1. The plane perpendicular to the direction (1, 2, 3) and through the
point (4, 5, 6) is given by x + 2y + 3z = d where
d = 1 × 4 + 2 × 5 + 3 × 6. That is

x + 2y + 3z = 32

2. What is the equation of the plane perpendicular to (1, 0, −2) and


containing the point (1, −1, −3)?

133
3. The plane through (1, 1, 1) containing vectors parallel to (1, 0, 1)
and (0, 1, 2) is the set of all vectors of the form
(1, 1, 1) + s(1, 0, 1) + t(0, 1, 2) s, t ∈ R

4. Find the Cartesian equation of the plane with vector form


(x, y , z) = (1, 1, 1) + s(1, 0, 1) + t(0, 1, 2).

5. Find the vector and cartesian equations of the plane containing the
three points P(2, 1, −1), Q(3, 0, 1), and R(−1, 1, −1).

134
Intersection of a line and a plane

Example
Where does the line
x −1 y −2 z −3
= =
1 2 3
meet the plane 3x + 2y + z = 20?

A typical point of the line is

Putting this into the equation for the plane, we get

This gives the point of intersection as (2, 4, 6)

135
Intersection of two planes
Example What is the cartesian equation of the line of intersection of
the two planes x + 3y + 2z = 6 and 3x + 2y + z = 11?

The direction of the line is perpendicular to both normals and so is


given by

A point on the line is given by solving the two equations. For example:

Thus the equation of the line is

Or you could solve the two equations (for the planes) simultaneously.
136
Distance from a point to a line

Example Find the distance from the point P(2, 1, 1) to the line with
cartesian equation
x −2 y −1 z
= =
1 1 2

137
3.5 Linear Combinations [AR 5.2]

We have seen that a plane through the origin in R3 can be built up by


starting with two vectors and taking all scalar multiples and sums.

In this way, we can build up lines, planes and their higher dimensional
versions.

Definition (Linear combination)


A linear combination of vectors v1 , v2 , . . . , vk ∈ Rn is a vector of the
form
w = α1 v1 + α2 v2 + · · · + αk vk
where α1 , α2 , . . . , αk are scalars (i.e., real numbers).

138
Examples
1. w = (2, 3) is a linear combination of e1 = (1, 0) and e2 = (0, 1)

2. w = (2, 3) is a linear combination of v1 = (1, 2) and v2 = (3, 1)

3. w = (1, 2, 3) is not a linear combination of v1 = (1, 1, 1) and


v2 = (0, 0, 1)

139
Linear Dependence [AR 5.3]
By taking all linear combinations of a given set of vectors, we can build
up lines, planes, etc.

We can make this more efficient by removing any redundant vectors


from the set.

Definition (Linear dependence)


A collection of vectors S is linearly dependent if there are vectors
v1 , . . . , vk ∈ S and scalars α1 , . . . , αk such that

α1 v1 + · · · + αk vk = 0

and at least one of the αi is non-zero.

Remember
The zero vector 0 = (0, . . . , 0) is not the same as the number zero.

140
So, the set of vectors is linearly dependent if and only if some
non-trivial linear combination gives the zero vector.

Theorem (AR Thm 5.3.1)


Vectors v1 , . . . , vk (where k > 2) are linearly dependent iff one vector is
a linear combination of the others.

Proof:

Two vectors are linearly dependent iff one is a multiple of the other.
Three vectors in R3 are linearly dependent iff they lie in a plane.
141
Definition (Linear independence)
A set of vectors is called linearly independent if it is not linearly
dependent.

Examples
1. The vectors (2, −1) and (−6, 3) are linearly dependent.

2. (2, −1) and (4, 1) are linearly independent.

142
Examples
1. The vectors (1, 0, 0), (1, 1, 0) and (1, 1, 1) are linearly independent.

2. (1, 1, 2), (1, −1, 2), (3, 1, 6) are linearly dependent.

3. (1, 2, 3), (1, 0, 0), (0, 1, 0), (0, 1, 1) are linearly dependent.

143
To decide if vectors v1 , . . . , vk ∈ Rn are linearly independent

1. Form the matrix A having the vectors as columns


(we’ll denote it by A = [v1 · · · vk ])

2. Reduce to row-echelon form R

3. Count the number of non-zero rows in R (this is rank(A)):

v1 , . . . , vk are linearly independent ⇐⇒ rank(A) = k

Why does this method work?


It follows from what we know about the number of solutions of a linear
system (slide 79)

144
Examples
1. (1, 0, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0) ∈ R4 are linearly independent

2. (1, 1), (2, 3), (1, 0) ∈ R2 are linearly dependent

145
An important observation

If A and B are row-equivalent matrices, then the columns of A satisfy


the same linear relations as the columns of B. This is the same as saying
that the two systems of linear equations have the same set of solutions.

This implies that relations between the (original) vectors v1 , . . . , vk can


be read from the reduced row-echelon form of the matrix [v1 · · · vk ].

Example
If  
1 3 0 2 0
[v1 · · · v5 ] ∼ 0 0 1 1 0
0 0 0 0 1
then v2 = 3v1 and v4 = 2v1 + v3 .

146
Let’s illustrate this with an
Example
The vectors
v1 = (1, 2, 1, 3), v2 = (2, 4, 2, 6), v3 = (0, −1, 3, −1), v4 = (1, 3, −2, 4)
satisfy v2 = 2v1 and v4 = v1 − v3

147
Useful Facts:

From the method above for deciding whether vectors are linearly
dependent we can derive the following.

Theorem (AR Thm 5.3.3)


If k > n, then any k vectors in Rn are linearly dependent.

Idea of proof: Because then the rre form of the matrix [v1 · · · vk ] must
contain columns which do not have a leading entry. So these columns
will be dependent on other columns which do have a leading entry.

148
Theorem
Vectors v1 , . . . , vn ∈ Rn are
linearly independent iff the matrix A = [v1 · · · vn ] has det(A) 6= 0.
Idea of proof:

linearly indep ⇐⇒ rank(A) = n ⇐⇒ det(A) 6= 0

149
Example
Decide whether the following vectors in R3 are linearly dependent or
independent:

(1, 2, −1), (0, 3, 4), (2, 1, −6), (0, 0, 2)

If they are dependent, write one as a linear combination of the others.

150
3.6 Subspaces of Rn [AR 5.2]

Certain subsets S of Rn have the nice property that any linear


combination of vectors from S still lies in S.

Definition (Subspace)
A subspace (of Rn ) is a subset S (of Rn ) that satisfies:

0. S is non-empty
1. u, w ∈ S =⇒ u + w ∈ S (closed under vector addition)
2. u ∈ S, α ∈ R =⇒ αu ∈ S (closed under scalar multiplication)

151
Examples
1. The xy -plane S = {(x, y , z) ∈ R3 | z = 0} is a subspace of R3

2. The plane in R3 through the origin and parallel to the vectors


u = (1, 1, 1) and v = (1, 2, 3) is a subspace of R3

3. The set H = {(x, y ) ∈ R2 | x > 0, y > 0} is not a subspace of R2

4. Any line or plane in R3 that contains the origin is a subspace.

152
Another example
Show that the points on the line y = 2x form a subspace of R2

Note
Every subspace of Rn contains the zero vector.

So the line y = 2x + 1 is not a subspace of R2 .

In fact, a line or plane in R3 is a subspace iff it contains the origin.

153
An important example of a subspace is given by the following.

Example
The solution to the homogeneous linear system

x +y +z =0
x −y −z =0

is a subspace of R3 .

Remember that a linear system is called homogeneous if all the


constant terms are equal to zero.

154
In general, we have the following:

Proposition (AR Thm 5.2.2)


The set of solutions of a system of m homogeneous linear equations in
n variables is a subspace of Rn .

Proof:
We check the 3 conditions in the definition of subspace

155
Generating a subspace
Let v1 , . . . , vk be vectors in Rn .

Definition (Span)
The subspace spanned (or subspace generated) by these vectors is the
set of all linear combinations of the given vectors:

Span{v1 , . . . , vk } = {α1 v1 + · · · + αk vk | α1 , . . . , αk ∈ R}

It is sometimes also denoted by hv1 , . . . , vk i.

Examples
1. In R2 , Span{(3, 2)} is the line through the origin in the direction
given by (3, 2), i.e., the line y = 23 x

2. In R3 , Span{(1, 1, 1), (3, 2, 1)} is the plane x − 2y + z = 0

3. In R3 , Span{(1, 1, 1), (3, 3, 3)} is the line x = y = z


156
Lemma (AR Thm 5.2.3)
1. Span{v1 , . . . , vk } is a subspace of Rn .
2. Any subspace of Rn that contains the vectors v1 , . . . , vk contains
Span{v1 , . . . , vk }.

Proof:

Remark
The subspace spanned by a set of vectors is the ‘smallest’ subspace that
contains those vectors.
157
More examples
In R3 :
1. Span{(1, 1, 0)} is the line through the origin containing the point
(1, 1, 0).

2. Span{(1, 0, 0), (1, 1, 0)} is the xy -plane.

3. Span{(1, 0, 0), (−3, 7, 0), (1, 1, 0)} is the xy -plane.

4. Span{(1, 0, 0), (2, 3, −4), (1, 1, 0)} is the whole of R3 .

158
Spanning sets

Definition (Spanning set)


Let V be a subspace of Rn .
Vectors v1 , . . . , vk ∈ Rn span V if Span{v1 , . . . , vk } = V .
Such a set of vectors is called a spanning set for V .

Equivalently, V contains all the vectors v1 , . . . , vk and all vectors in V


are linear combinations of the vectors v1 , . . . , vk .
Examples
1. (1, 0) and (1, 1) span R2
2. (1, 1, 2) and (1, 0, 1) do not span R3
3. (1, 1, 2), (1, 0, 1) and (2, 1, 3) do not span R3

159
Example
Show that the following vectors span R4 :

{(1, 0, −1, 0), (1, 1, 1, 1), (3, 0, 0, 0), (4, 1, −3, −1)}

We need to show that for any vector (a, b, c, d) ∈ R4 , the equation

x(1, 0, −1, 0) + y (1, 1, 1, 1) + z(3, 0, 0, 0) + w (4, 1, −3, −1) = (a, b, c, d)

has a solution.

160
Writing this linear system in matrix form gives:
    
1 1 3 4 x a
0 1 0 1   y  b 
   

−1 = 
1 0 −3  z   c 
0 1 0 −1 w d
| {z }
A
Which has augmented matrix:
 
1 1 3 4 a
0 1 0 1 b 
 
−1 1 0 −3 c 
0 1 0 −1 d

We know that this linear system is consistent for all possible values of
a, b, c, d if and only if rank(A) = 4
(Which is the case in this example)

161
To decide if v1 , . . . , vk ∈ Rn span Rn
1. Form the matrix A = [v1 · · · vk ]
having columns given by the vectors v1 , . . . , vk
2. Calculate rank(A) as before:
a. Reduce to row-echelon form
b. Count the number of non-zero rows

Then, v1 , . . . , vk span Rn ⇐⇒ rank(A) = n

Why does this method work?

162
Since A = [v1 · · · vk ] has k columns, we know that rank(A) 6 k.

It follows that:

Proposition
If k < n, then v1 , . . . , vk ∈ Rn can’t span Rn .

163
3.7 Bases and dimension [AR 5.4]

In general, spanning sets can be (too) big.

For example, (1, 0), (0, 1), (2, 3), (−7, 4) span R2 , but the last two are
not needed since they can be expressed as linear combinations of the
first two. The vectors are not linearly independent.

Bases

Definition (Basis)
A basis for a subspace V ⊆ Rn is a set of vectors from V which
1. spans V
2. is linearly independent

164
Examples
1. {(1, 0), (0, 1)} is a basis for R2
2. {(2, 0), (−1, 1)} is a basis for R2
3. {(2, −1, −1), (1, 2, −3)} is a basis for the plane x + y + z = 0 in R3
y
4. {(2, 3, 7)} is a basis for the line in R3 given by x
2 = 3 = z
7

Note
A subspace of Rn can have many bases. For example, any two vectors
in R2 which are not collinear will form a basis of R2 .

165
Notation/example:
The vectors

e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, . . . , 0, 1)

form a basis of Rn , called the standard basis.

166
The following is an important and very useful theorem about bases:

Theorem (AR Thms 5.4.2, 5.4.3)


Let V be a subspace of Rn and let {v1 , . . . , vk } be a basis for V .

1. If a subset of V has more than k vectors, then it is linearly


dependent.
2. If a subset of V has less than k vectors, then it does not span V .
3. Any two bases of V have the same number of elements.

Proof:
The proof is based on what we already know about (homogeneous)
linear systems:

167
So, in particular, every basis of Rn has exactly n elements.

Dimension

The above theorem tells us that although there can be many different
bases for the same space, they will all have the same number of vectors.

Definition (Dimension)
The dimension of a subspace V is the number of vectors in a basis for
V . This is denoted dim(V ).

168
Examples
1. The dimension of R2 is
2. Rn has dimension
3. The line {(α, α, α) | α ∈ R} ⊆ R3 has dimension
4. The plane {(x, y , z) | x + y + z = 0} ⊆ R3 has dimension

In the special case where V = {0} it is convenient to say dim(V ) = 0.

169
Calculating bases I

How can we find a basis for a given subspace?

The method will depend on how the subspace is described.


Often we know (or can easily calculate) a spanning set.

Example
Find a basis for the subspace

S = {(a + 2b, b, −a + b, a + 3b) | a, b ∈ R} ⊆ R4

What is the dimension of S?


170
Another example
Find a basis for the subspace of R4 spanned by the vectors
(1, −1, 2, 1), (−2, 2, −4, −2), (1, 0, 3, 0).

171
To find a basis for the span of a set of vectors

To find a basis for the subspace V spanned by vectors v1 , . . . , vk ∈ Rn :


1. Form the matrix A = [v1 · · · vk ]
2. Reduce to row-echelon form B
3. The columns of A corresponding to the steps (leading entries) in B
give a basis for V

Why does this method work?


Recall that if A ∼ B, then the columns of A and B satisfy the same
linear relationships. The question then reduces to one about matrices in
reduced row-echelon form.

172
Remarks
◮ Don’t forget that it is the columns of A (not its row-echelon form)
that we use for the basis.
◮ This method gives a basis that is a subset of the original set of
vectors. Later we will give a second method which gives a basis of
different, but usually simpler, vectors

Example
Let S = {(1, −1, 2, 1), (0, 1, 1, −2), (1, −3, 0, 5)}
Find a subset of S that is a basis for hSi

173
Column space of a matrix

Definition (Column space)


Let A be a m × n matrix. The subspace of Rm spanned by the columns
of A is called the column space of A.

Suppose A ∼ B. In general, the column space of B is not equal to the


column space of A, although they do have the same dimension.
Example    
1 0 1 1 0 1
A = −1 1 −3 ∼ 0 1 −2 = B
2 1 0 0 0 0

The column space of A is


The column space of B is
Are they equal?
What is the dimension of each subspace?
174
Suppose that S = {v1 , . . . , vk } ⊆ Rn is a set of vectors.

Remember that we saw a method for obtaining a basis for Span(S) that
started with the matrix [v1 · · · vk ] having the vectors of S as columns.

This method gave us a basis of Span(S) that is a subset of S.

So, if V is a subspace of Rn :

Every spanning set for V contains a basis for V .

It also follows from the above method that:

Every linearly independent subset of V extends to a basis for V .

175
Let m = dim(V ). From the above observation, since any basis must
have m elements, we conclude the following:

Theorem
Let V be a subspace of Rn , and suppose that dim(V ) = m.
1. If a spanning set of V has exactly m elements, then it is a basis.
2. If a linearly independent subset of V has exactly m elements, then
it is a basis.

Example
Given that
√ 3
{(1, −π, 2), (23, 1, 100), ( , 7, 1)}
2
is a spanning set for R3 , it is a basis for R3 .

176
Calculating bases II

Another method for finding a basis: The ‘row method’

We first illustrate by repeating a previous example, but this time using a


matrix that has the vectors as rows (not columns).

Example
Let S = {(1, −1, 2, 1), (0, 1, 1, −2), (1, −3, 0, 5)}
Find a subset of S that is a basis for hSi.

   
1 −1 2 1 1 −1 2 1
0 1 1 −2 ∼ · · · ∼ 0 1 1 −2
1 −3 0 5 0 0 0 0
It follows that a basis for hSi is

Can you see why?


177
To find a basis for the span of a set of vectors: the ‘row method’

To find a basis for the subspace V spanned by vectors v1 , . . . , vk ∈ Rn .

1. Form the matrix A whose rows are the given vectors.


2. Reduce to row-echelon form B
3. The non-zero rows of B are a basis for V .

To explain why this works, let’s introduce some more terminology.

178
Row space of a matrix

Definition
Let A be a m × n matrix. The subspace of Rn spanned by the rows of A
is called the row space of A.

Suppose A ∼ B. The above method relies on the fact that (unlike the
situation with the column space):

The row space of B is always equal to the row space of A.

Why? Can you prove it?


179
Example    
1 0 1 1 0 1
A = −1 1 −3 ∼ 0 1 −2 = B
2 1 0 0 0 0

The row space of A is Span


The row space of B is Span

Note
For any matrix A:

rank(A) = dim(row space of A) = dim(column space of A)

This is because the number of non-zero rows in the row-echelon form is


equal to the number of leading entries.

180
Example
Let  
1 −1 2 −2
2 0 1 0
A= 
5 −3 7 −6
1 1 −1 3

Find a basis and the dimension for:


1. the column space of A;
2. the row space of A.
   
1 −1 2 −2 1 −1 2 −2
2 0 1 0 0 2 −3 4 
A= 5
 ∼ ··· ∼  
−3 7 −6 0 0 0 1
1 1 −1 3 0 0 0 0
So...

181
Bases for solution spaces

We have seen how to find a basis for a subspace given a spanning set
for the subspace.

Another way in which subspaces arise is as the set of solutions to a


homogenous linear system.

We saw previously that such a solution set is always a subspace of Rn


where n is the number of variables.

How can we find a basis for it?

We first illustrate with an example.

182
Example
Find a basis for the subspace of R4 defined by the equations
x1 + x3 + x4 = 0
3x1 + 2x2 + 5x3 + x4 = 0
x2 + x3 − x4 = 0

183
Finding a basis for the solution space:

To find a basis for the solution space of a system of homogeneous linear


equations.
1. Write the system in matrix form Ax = 0
2. Reduce A to row-echelon form B
3. Solve the system Bx = 0 as usual, and write the solution in the
form
x = t1 v1 + · · · + tk vk
where t1 , . . . , tk ∈ R are parameters and v1 , . . . , vk ∈ Rn

Then {v1 , · · · , vk } is a basis for the solution space.

184
Why does this work?

The set S = {v1 , · · · , vk } is a spanning set for the solution space, since
every solution can be written as a linear combination of the vectors in S.

The fact that the vectors are linearly independent results from the way
in which they were defined.

Suppose we chose the parameters to be the variables whose


corresponding column in B had no leading entry. Say ti = xmi .

Then the mj coordinate of vi is 1 if i = j and 0 if i 6= j.

It follows that the vi are linearly independent.

185
Another example
Find a basis for the subspace of R4 given by

V = {(x1 , x2 , x3 , x4 ) | x1 + 2x2 + x3 + x4 = 0, 3x1 + 6x2 + 4x3 + x4 = 0}

The matrix A of coefficients is


   
1 2 1 1 1 2 0 3
A= ∼ ··· ∼
3 6 4 1 0 0 1 −2

The solutions are given by

(x1 , x2 , x3 , x4 ) = t1 ( , , , ) + t2 ( , , , ) t1 , t2 ∈ R

So a basis for V is:


186
3.8 Rank-nullity theorem

Given a m × n matrix A there are three associated subspaces:


1. The row space of A is the subspace of Rn spanned by the rows.
Its dimension is equal to rank(A)

2. The column space of A is the subspace of Rm spanned by the


columns. Its dimension is equal to rank(A)

3. The solution space of A is the subspace of Rn given by


   
x1 0
n  ..   .. 
{(x1 , . . . , xn ) ∈ R | A  .  =  . }
xn 0

The solution space is also called the nullspace of A.


Its dimension is called the nullity of A, nullity(A).

187
We have seen techniques to find bases for the row space, column space
and solution space of a matrix.

As a result of these techniques we observe the following:

Theorem (cf. AR Thm 8.2.3)


Suppose A is an m × n matrix. Then

rank(A) + nullity(A) = n

Given what we know about finding the rank and the solution space of a
matrix, this is simply the statement that every column in the
row-echelon form either contains a leading entry or doesn’t contain a
leading entry.

188
Note
If you are asked to find the solution space, the column space and the
row space of A, you only need to find the reduced row-echelon form of
A once.

Remember

If A and B are row equivalent matrices, then:


◮ the row space of A is equal to the row space of B

◮ the solution space of A is equal to the solution space of B

◮ the column space if A is not necessarily equal to the column space


of B

◮ the columns of A satisfy the same linear relations as the columns of


B

◮ the dimension of the column space of A is equal to the dimension of


the column space of B
189
3.9 Coordinates relative to a basis

Definition (Coordinates)
Suppose B = {v1 , . . . , vn } is a basis for Rn . For v ∈ Rn write

v = α1 v1 + · · · + αn vn for some scalars α1 , . . . , αn

The scalars α1 , . . . , αn are called the coordinates of v relative to B.


The vector (α1 , . . . , αn ) ∈ Rn is called the coordinate vector of v
relative to B.
The column matrix  
α1
[v ]B =  ... 
 

αn
is called the coordinate matrix of v with respect to B.

190
Examples
1. If we consider R2 with the standard basis B = {i,
 j},
1
the vector v = (1, 5) has coordinates [v]B =
5

2. Now consider R2 with basis


B ′ = {a, b}, where a = (2, 1) and
b = (−1, 1).
3b
Then v = (1, 5) has coordinates
2 v
[v]B′ = 5j
3

2a

191
Having fixed a basis, there is a one-to-one correspondence between
vectors and their coordinate matrices.

If u, v are vectors in Rn , and α is a scalar, then

[u + v]B = [u]B + [v]B


[αv]B = α[v]B

A consequence of this is that vectors are linearly independent if and


only if their coordinate matrices are (using any basis).
(Exercise: Prove this!)

192
Topic 4: General Vector Spaces [AR chapt 5]

4.1 The vector space axioms


4.2 Examples of vector spaces
4.3 Complex vector spaces
4.4 Subspaces of general vector spaces
4.5 Spanning sets, linear independence and bases
4.6 Coordinate matrices

193
Vectors in Rn have some basic properties shared by many many other
mathematical systems. For example,

u+v =v+u for all u, v

is true for other systems.

Key Idea:

Write down these basic properties and look for other systems which
share these properties. Any system that does share these properties will
be called a vector space.

194
4.1 The vector space axioms [AR 5.1]

Let’s start trying to write down the basic properties that we want
‘vectors’ to satisfy.

A vector space is a set V with two operations defined:


1. addition
2. scalar multiplication
We want these two operations to satisfy the kind of algebraic properties
that we are used to from vectors in Rn .
For example, we want our vector operations to satisfy

u+v =v+u and α(u + v) = αu + αv

This leads to a list of ten properties (or axioms) that we will then take
as our definition.
195
The scalars are members of a number system F called a field in which
we have addition, subtraction, multiplication and division.

Usually the field will be F = R or C.

In this subject we will mainly concentrate on the case F = R.

196
Definition (Vector Space)
A vector space is a non-empty set V with two operations: addition and
scalar multiplication.
These operations are required to satisfy the following rules.
For any u, v, w ∈ V :
Addition behaves well:
A1 u + v ∈ V (closure of vector addition)
A2 (u + v) + w = u + (v + w) (associativity)
A3 u+v =v+u (commutativity)
There must be a zero and inverses:
A4 There exists a vector 0 ∈ V such that
v + 0 = v for all v ∈ V (existence of zero vector)
A5 For all v ∈ V , there exists a vector −v
such that v + (−v) = 0 (additive inverses)

197
Definition (Vector Space ctd)
For all u, v, ∈ V and α, β ∈ F:

Scalar multiplication behaves well:


M1 αv ∈ V (closure of scalar multiplication)
M2 α(βv) = (αβ)v (associativity of scalar multip)
M3 1v = v (multiplication by unit scalar)
Addition and scalar multiplication combine well :
D1 α(u + v) = αu + αv (distributivity 1)
D2 (α + β)v = αv + βv (distributivity 2)

198
Remark
It follows from the axioms that for all v ∈ V :
1. 0v = 0
2. (−1)v = −v

Can you prove these?

We are going to list some systems that obey these rules.

We are not going to show that the axioms hold for these systems.
If, however, you would like to get a feel for how this is done, read
AR5.1, Example 2.

199
4.2 Examples of vector spaces

1. Rn is a vector space with scalars R

After all, this was what we based our definition on! Vector spaces with
R as the scalars are called real vector spaces

200
2. Vector space of matrices
Denote by Mmn ( or Mmn (R) Mm,n (R) or Mm×n (R)) the set of all m × n
matrices with real entries.
Mmn is a real vector space with the following familiar operations:

     
a11 a12 b b12 a11 + b11 a12 + b12
+ 11 =
a21 a22 b21 b22 a2 1 + b21 a22 + b22
   
a a12 αa11 αa12
α 11 =
a21 a22 αa21 αa22

What is the zero vector in this vector space?

Note: Matrix multiplication is not a part of this vector space structure.


201
3. Vector space of polynomials

For a fixed n, denote by Pn (or Pn (R)) the set of all polynomials


with degree at most n:

Pn = {a0 + a1 x + a2 x 2 + · · · + an x n | a0 , a1 , . . . , an ∈ R}

If we define vector addition and scalar multiplication by:

(a0 + · · · + an x n ) + (b0 + · · · + bn x n ) = (a0 + b0 ) + · · · + (an + bn )x n

α(a0 + a1 x + · · · + an x n ) = (αa0 ) + (αa1 )x + · · · + (αan )x n

Then Pn is a vector space.

202
4. Vector space of functions

Let S be a set.

Denote by F(S, R) the set of all functions from S to R.

Given f , g ∈ F(S, R) and α ∈ R,


let f + g and αf be the functions defined by the following:
(f + g )(x) = f (x) + g (x)

(αf )(x) = α × f (x)

Equipped with these operations F(S, R) is a vector space.

Remark
The ‘+’ on the left of the first equation is not the same as the ‘+’ on
the right!

Why not?
203
Example
Let f , g ∈ F(R, R) be defined by

f : R → R, f (x) = sin x and g : R → R, g (x) = x 2

What do f + g and 3f mean?

204
What is the zero vector in F(R, R)?

0 : R → R is the function defined by

0(x) = 0

That is, 0 is the function that maps all numbers to zero.

205
4.3 Complex vector spaces

A vector space that has C as the scalars is called a complex vector


space.

Example

C2 = {(a1 , a2 ) | a1 , a2 ∈ C}
with the operations :

(a1 , a2 ) + (b1 , b2 ) = (a1 + a2 , b1 + b2 )


(where a1 , a2 , b1 , b2 , α ∈ C)
α(a1 , a2 ) = (αa1 , αa2 )

is a complex vector space.

Remark
All of the above examples of real vector spaces: Rn , Pn (R), F(S, R)
have complex analogues: Cn , Pn (C), F(S, C)
206
Important observation
All the concepts we looked at for Rn
(such as subspaces, linear independence, spanning sets, bases)
carry over directly to general vector spaces.

Why consider general vector spaces?

207
4.4 Subspaces of general vector spaces

Definition (Subspace)
A subspace of a vector space V is a subset S ⊆ V that is itself a vector
space (using the operations from V ).

This looks slightly different to the definition we had for subspaces of Rn .

The following theorem shows that, in fact, we get the same thing.

Theorem (Subspace Theorem, AR Thm 5.2.1)


Let V be a vector space.
A subset W ⊆ V is a subspace of V if and only if
0. W is non-empty
1. W is closed under vector addition
2. W is closed under scalar multiplication
208
Note
It follows that a subspace W of V must necessarily contain the zero
vector 0 ∈ V

Example
Let V = M2,2 the vector space of real 2 × 2 matrices and H ⊆ V be
matrices with trace equal to 0, where ‘trace’ is the sum of the diagonal
entries.
In other words   
a b
H= |a+d =0
c d

Show that H is a subspace of V .

209
Another example
Let
V = P2 = {a0 + a1 x + a2 x 2 | a0 , a1 , a2 ∈ R}
and
W = {a0 + a1 x + a2 x 2 | a1 a2 > 0} ⊆ V
Is W a subspace of V ?

210
More examples
1. {0} is always a subspace of V

2. V is always a subspace of V
 
a 0 0
3. The set of diagonal matrices {0 b 0 | a, b, c ∈ R}
0 0 c
is a subspace of M3,3

4. The subset of continuous functions {f : R → R | f is continuous}


is a subspace of F(R, R)

5. S = {2 × 2 matrices
 with determinant equal to 0}
a b
={ | ad − bc = 0} is not a subspace of M2,2
c d

6. {f : [0, 1] → R | f (0) = 2} is not a subspace of F([0, 1], R)

211
4.5 Spanning sets, linear independence and bases
These concepts, which we have seen for subspaces of Rn ,
apply equally well in a general vector space.
Let V be a vector space with scalars F and S ⊆ V a subset.
Definition
A linear combination of vectors v1 , v2 , . . . , vk ∈ S is a sum

α1 v1 + · · · + αk vk

where each αi is a scalar.

Definition
The set S is linearly dependent if there are vectors v1 , . . . , vk ∈ S and
scalars α1 , . . . , αk at least one of which is non-zero, such that

α1 v1 + · · · + αk vk = 0

A set which is not linearly dependent is called linearly independent.


212
Definition
The span of the set S is the set of all linear combinations of vectors
from S

Span(S) = {α1 v1 + · · · + αk vk | v1 , . . . , vk ∈ S and α1 , . . . , αk ∈ F}

S is called a spanning set for V if Span(S) = V

This is the same as saying that S ⊆ V and every vector in V can be


written as a linear combination of vectors from S.

Definition
A basis for V is a set which is both linearly independent and a spanning
set for V .

213
Example
Are the following elements of M2,2 linearly independent?
     
1 3 −2 1 1 3
, ,
0 1 0 −1 0 4

214
Another example

In P2 the element p(x) = 2 + 2x + 5x 2 is a linear combination of


p1 (x) = 1 + x + x 2 and p2 (x) = x 2 , but q(x) = 1 + 2x + 3x 2 is not.

So {p1 , p2 } is not a spanning set for P2 , though it is linearly


independent.

215
As with Rn we have the following important results:

Theorem
Let V be a vector space.
1. Every spanning set for V contains a basis for V
2. Every linearly independent set in V can be extended to a basis of V
3. Any two bases of V have the same cardinality
(i.e., ‘same number of elements’)

The basic idea behind the proof is exactly as we saw with Rn . But there
are some very interesting technical differences! These mostly concern
the possibility that a basis might have infinitely many elements.

216
Definition
The dimension of V , denoted dim(V ), is the number of elements in a
basis of V . We call V finite dimensional if it admits a finite basis, and
infinite dimensional otherwise.
Examples
1. {1, x, x 2 , . . . , x n } is a basis for Pn . So dim(Pn (R)) = n + 1
       
1 0 0 1 0 0 0 0
2. { , , , } is a basis for M2×2 ,
0 0 0 0 1 0 0 1

so dim(M2×2 ) = 4.
     
1 0 0 1 0 0
3. { , , } is a basis for the vector space of 2 × 2
0 −1 0 0 1 0
matrices with trace equal to zero.
This is therefore a 3 dimensional subspace of M2×2 .

217
An infinite dimensional example

Let P be the set of all polynomials:

P = {a0 + a1 x + · · · + an−1 x n−1 + an x n | n ∈ N, a0 , a1 , . . . , an ∈ R}

Then P is a vector space and the set B = {1, x, x 2 , . . .} is a basis for P.

So P is an infinite dimensional vector space.

Can you see why B is a basis?

218
In the case of a finite dimensional vector space, we have the following:

Theorem
Suppose V has dimension n, and S is a subset of V .
1. If |S| < n, then S does not span V
2. If |S| > n, then S is linearly dependent

The proof is exactly the same as in the case of Rn .

219
Examples
1. The polynomials

{2 + x + x 2 , 1 + x, − 1 − 7x 2 , x − x 2 }

are linearly dependent, since dim(P2 ) = 3.

2. The matrices
n 2 1 
−1 1
  
6 7 o
, ,
3 4 0 1 4 5

do not span M2×2 since dim(M2×2 ) = 4

220
Standard Bases

It is useful to fix names for certain bases.

◮ The standard basis for Rn is

{(1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, 0, . . . , 0, 1)}

The dimension of Rn is n.

◮ The standard basis for Mm,n is


     
 1 0 ... 0 0 1 ... 0 0 0 ... 0 

0 0 . . . 0 
0 0 . . . 0 0 0 . . . 0
 , . .
, ... ,
. . . .. 
  .. .. ..   .. .. .  ... ..
.
.. 
. 

 

0 0 ... 0 0 0 ... 0 0 0 ... 1

The dimension of Mm,n is m × n.


221
◮ The standard basis for Pn is

1, x, x 2 , . . . , x n

The dimension of Pn is n + 1.

222
4.6 Coordinate matrices

The notion of coordinates relative to a basis carries over from the case
of vectors in Rn .

Definition
Suppose that B = {v1 , . . . , vn } is a basis for a vector space V . For any
v ∈ V we have

v = α1 v1 + · · · + αn vn for some scalars α1 , . . . , αn

The scalars α1 , . . . , αn are called the coordinates of v relative to B.


The coordinate vector of v relative to B and coordinate matrix of v
with respect to B are also defined as in the case that V = Rn .

223
Examples
1. In P2 with basis B = {1, x, x 2 } the polynomial
 
2
p = 2 + 7x − 9x 2 has coordinates [p]B =  7 
−9
       
1 0 0 1 0 0 0 0
2. M2×2 with basis B = { , , , }
0 0 0 0 1 0 0 1
 
  1
1 2  2
The matrix A = has coordinates [A]B =  
3 4 3
4

224
Topic 5: Linear Transformations [AR 4.2, 8.1]

5.1 Linear transformations from R2 to R2


5.2 Linear transformations from Rn to Rm
5.3 Matrix representations in general
5.4 Image, kernel, rank and nullity
5.5 Change of basis

225
We now turn to thinking about maps from one vector space to another.

A key feature of a vector space V is that given a basis B = {v1 , . . . vn },


each element v ∈ V can be uniquely written as a linear combination of
the vectors in the basis:

v = α1 v1 + α2 v2 + · · · αn vn

In keeping with this, we will undertake a study of mappings


T : V → W between vector spaces which have the property that

T (v) = α1 T (v1 ) + α2 T (v2 ) + · · · + αn T (vn )

226
Definition (Linear transformation)
Let V and W be vector spaces (over the same field of scalars).
A linear transformation from V to W is a map T : V → W such that
for each u, v ∈ V and for each scalar α:

1. T (u + v) = T (u) + T (v) (T preserves addition)

2. T (αu) = αT (u) (T preserves scalar multiplication)

Loosely speaking, linear transformations are those maps between vector


spaces that ‘preserve the vector space structure.’

227
5.1 Linear transformations from R2 to R2

We will start by looking at some geometric transformations of R2 .


A vector in R2 is an ordered pair (x, y ), with x, y ∈ R.
To describe the effect of a transformation we will use coordinate
matrices. With respect to the standard
 basis B = {e1 , e2 }, the vector
x
(x, y ) has coordinate matrix .
y
Example

(x, y )
Reflection across y -axis.
(−x, y )
 x   −x 
T =
y y

228
A common feature of all linear transformations is that they can be
represented by a matrix. In the example above
 x   −1 0 x  −x 
T = =
y 0 1 y y

The matrix  
−1 0
AT =
0 1
is called the standard matrix representation of the transformation T

229
Examples of (geometric) linear transformations from R2 to R2
 
1. Reflection across the x-axis has matrix

" 12 5
#
− 13 13
2. Reflection in the line y = 5x has matrix 5 12
13 13
 
0 −1
3. Rotation around the origin by an angle of π/2 has matrix
1 0
 
4. Rotation around the origin by an angle of θ has matrix

Q
We need to work out the coor-
dinates of the point Q obtained
by rotating P. P
θ

230
Examples continued
 
5. Compression/expansion along the x-axis has matrix

 
1 c
6. Shear along the x-axis has matrix
0 1
These are best thought of as mappings on a rectangle.
For example, a shear along the x-axis corresponds to the mapping

231
Successive Transformations

Example
Find the image of (x, y ) after a shear along the x-axis with c = 1
followed by a compression along the y -axis with c = 21 .

Solution:
Let R : R2 → R2 be the compression and denote its standard matrix
representation by AR . Similarly let S : R2 → R2 be the shear and
denote its standard matrix representation by AS . Then the coordinate
matrix of R(S(x, y )) is given by
 
x
AR AS
y

It remains to recall AR and AS , and to compute the matrix products.

232
Note
1. The matrix for the linear transformation S followed by the linear
transformation R is the matrix product AR AS .
(In other words ARS = AR AS )
2. Notice that (reading right to left) the two matrices are in the
opposite order to the order in which the transformations are
applied.
3. The composition of two linear transformations T (v) = R(S(v)) is
also written T (v) = R ◦ S(v)

233
5.2 Linear transformations from Rn to Rm

An example of a linear transformation from R3 to R2 is

T (x1 , x2 , x3 ) = (x2 − 2x3 , 3x1 + x3 )

To prove that this is a linear transformation, we must show that for any

u, v ∈ R3 and α ∈ R

we have that

T (u + v) = T (u) + T (v) and T (αu) = αT (u)

234
Proof Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ). First we note that

u + v = (u1 + v1 , u2 + v2 , u3 + v3 )

Applying T to this gives

T (u + v) = ((u2 + v2 ) − 2(u3 + v3 ), 3(u1 + v1 ) + (u3 + v3 ))

Re-arranging the right-hand side gives

((u2 + v2 ) − 2(u3 + v3 ), 3(u1 + v1 ) + (u3 + v3 ))


= (u2 − 2u3 , 3u1 + u3 ) + ( v2 − 2v3 , 3v1 + v3 )
= T (u) + T (v)

For the second part,

T (αu) = T ((αu1 , αu2 , αu3 )) = (αu2 − 2αu3 , 3αu1 + αu3 )


= α(u2 − 2u3 , 3u1 + u3 ) = αT (u)

235
In fact this example is typical of all linear transformations from Rn to
Rm — the mapping rule gives a vector whose components are linear
combinations of the components of the input vector.

The simplest way to see this is to prove the following theorem.

Theorem
All linear transformations T : Rn → Rm have a standard matrix
representation AT specified by

AT = [ [T (e1 )] [T (e2 )] · · · [T (en )] ]

where each [T (ei )] denotes the coordinate matrix of T (ei )

Note
◮ The matrix AT has size m × n.
◮ Alternative notations for AT include: [T ] or [T ]S or [T ]S,S
236
Proof
Let v ∈ Rn and write
v = α1 e1 + α2 e2 + · · · + αn en
The coordinate matrix of v with respect to the standard basis is then
 
α1
α2 
 
[v] =  . 
 .. 
αn

We seek an m × n matrix AT such that


[T (v)] = AT [v] (independently of v ) (†)

In words, this equation says that AT times the column matrix [v] is
equal to the coordinate matrix of T (v).
Since T is linear, we have that
T (v) = α1 T (e1 ) + α2 T (e2 ) + · · · + αn T (en )
237
We read-off from this that
(property of
[T (v)] = α1 [T (e1 )] + α2 [T (e2 )] + · · · + αn [T (en )] coordinate matrices)
 
α1
 ..  (just matrix
= [ [T (e1 )] [T (e2 )] · · · [T (en )] ]  .  multiplication)
αn

Comparing with equation (†) above, we see that we can take


AT = [ [T (e1 )] [T (e2 )] · · · [T (en )] ]

Summarizing: Given a linear transformation T from Rn to Rm there is a


corresponding m × n matrix AT satisfying

[T (v)] = AT [v]

That is, the image of any vector v ∈ Rn can be calculated using the
matrix AT .
238
Note
1. All linear transformations map the zero vector to the zero vector.
2. Linear transformations map lines through the origin to other lines
through the origin (or to just the origin).
3. The above theory generalizes to linear transformations between any
n-dimensional vector space V and m-dimensional vector space W .

239
Examples
1. Define T : R3 → R4 by T (x1 , x2 , x3 ) = (x1 , x3 , x2 , x1 + x4 ).
Calculate AT .

2. Give a reason why the mapping T : R2 → R2 specified by


T (x1 , x2 ) = (x1 − x2 , x1 + 1) is not a linear transformation.

3. Find AT for the linear transformation T : P2 → P1 given by


T (a0 + a1 x + a2 x 2 ) = (a0 + a2 ) + a0 x

240
5.3 Matrix representations in general [AR 8.4]

We’ve talked about the standard matrix representation. This is when


the vectors are expressed as coordinates with respect to the standard
basis for the spaces involved.
We can also give a matrix representation when the vectors are expressed
using some other basis.

Let U and V be finite dimensional vector spaces. Suppose that

T : U → V is a linear transformation,
B = {b1 , . . . , bn } is a basis for U and
C = {c1 , . . . , cm } is a basis for V .

241
We want a matrix that can be used to calculate the effect of T .
Specifically, if we denote the matrix by AC,B , we want that

[T (u)]C = AC,B [u]B for all u ∈ U. (∗)

Note
For this matrix equation to make sense the size of AC,B must be m × n.

Theorem
There exists a unique matrix satisfying the above condition (∗). It is
given by h i
AC,B = [T (b1 )]C [T (b2 )]C · · · [T (bn )]C

The proof is the same as for the case of the formula for the standard
matrix AT .

242
A note on notation
The matrix AC,B is also denoted by [T ]C,B . In the special case in which
U = V and B = C, we often write [T ]B in place of [T ]B,B .
Example  
5 1 0
A linear transformation T : R3 → R2 has matrix with
1 5 −2
respect to the standard bases of R3 and R2 . What is its matrix with
respect to the basis B = {(1, 1, 0), (1, −1, 0), (1, −1, −2)} of R3 and
the basis C = {(1, 1), (1, −1)} of R2 ?

Solution
We apply T to the elements of B to get:

Then we write the result in terms of C:

We obtain the matrix AC,B =

243
Example
Consider the linear transformation T : V → V where V is the vector
space of real valued 2 × 2 matrices and T is defined by

T (Q) = Q T = the transpose of Q.

Find the matrix representation of T with respect to bases:

       
1 1 1 0 1 0 0 1
B= , , ,
0 0 0 1 1 0 1 0
and        
1 0 0 1 0 0 0 0
C= , , ,
0 0 0 0 1 0 0 1

244
Solution
The task is to work out the coordinates with respect to C of the image
of each element in B.

245
5.4 Image, kernel, rank and nullity [AR 8.2]
Let T : U → V be a linear transformation.

Definition (Kernel and Image)


The kernel of T is defined to be

ker(T ) = {u ∈ U | T (u) = 0}

The image of T is defined to be

Im(T ) = {v ∈ V | v = T (u) for some u ∈ U}

The kernel is also called the nullspace. It is a subspace of U. Its


dimension is denoted nullity(T ).

The image is also called the range. It is a subspace of V . Its dimension


is denoted rank(T ).
246
Example
Consider the linear transformation T : R3 → R2 specified by

T (x, y , z) = (x − y + z, 2x − 2y + 2z)

Find bases for ker(T ) and Im(T ).

247
Definition
A linear transformation T : U → V is called injective if ker(T ) = {0}.
It is called surjective if Im(T ) = V .

Example
Is the linear transformation of the preceding example injective ?

Is it surjective ?

248
Note
When we calculate ker(T ) we find that we are solving equations of the
form AT X = 0. So ker(T ) is the same as the solution space for AT .

To calculate Im(T ) we can use the fact that, because B spans U then
T (B) must span T (U); the image of T . But the elements of T (B) are
given by the columns of AT . Thus the image of T can be identified
with the column space of AT . It follows that rank(T ) = rank(AT ).

We can now use the result of slide 188 to see that for a linear
transformation T : U → V with dim(U) = n

nullity(T ) + rank(T ) = n

249
5.5 Change of basis

Transition Matrices

We have seen that a matrix representation of a linear transformation


T : U → V depends on both the choices of bases for U and V .

In fact different matrix representations are related by a matrix which


depends on the bases but not on T . To understand this, we undertake a
study of converting coordinates with respect to one basis to coordinates
using another basis.

Let B = {b1 , . . . , bn } and C = {c1 , . . . , cn } be bases for the same


vector space V and let v ∈ V .

How are [v]B and [v]C related?

By multiplication by a matrix!
250
Theorem
There exists a unique matrix P such that for any vector v ∈ V ,

[v]C = P[v]B

The matrix P is given by

P = [ [b1 ]C · · · [bn ]C ]

and is called the transition matrix from B to C.

In words, the columns of P are the coordinate matrices, with respect to


C, of the elements of B.

We will sometimes denote this transition matrix by PC,B

It is also sometimes denoted by PB→C

251
Proof:
We want to find a matrix P such that for all vectors v in V ,

[v]C = P[v]B

Recall that if T : V → V is any linear transformation, then

[T (v)]C = [T ]C,B [v]B (∗)

where  
[T ]C,B = [T (b1 )]C [T (b2 )]C · · · [T (bn )]C
is the matrix representation of T .

252
Applying this to the special case where T (v) = v for all v
(i.e., T is the identity linear transformation)
gives  
[T ]C,B = [b1 ]C [b2 ]C . . . [bn ]C
and (∗) becomes
[v]C = [T ]C,B [v]B
So we can take P = [T ]C,B

Exercise
Finish the proof by showing that P is unique. That is, if Q is a matrix
satisfying [v]C = Q[v]B for all v, then Q = P.

253
A simple case
The transition matrix is easy to calculate when one of B or C is the
standard basis.

Example
In R2 , write down the transition matrix from B to S, where

B = {(1, 1), (1, −1)} and S = {(1, 0), (0, 1)} .


 
1
Use this to compute [v]S , given that [v]B = .
1

Solution
h i
PS,B = [b1 ]S [b2 ]S =

[v]S = PS,B [v]B =

254
Going in the other direction

A useful fact is that transition matrices are always invertible.


(Why is this true?)

Starting with the equation

[v]C = PC,B [v]B

and rearranging gives


−1
[v]B = PC,B [v]C

But we know that


[v]B = PB,C [v]C
and so, by the uniqueness part of the above theorem, it must be the
case that
PB,C = (PC,B )−1

255
Example
For B and S as in the previous example, compute PB,S , the transition
matrix from S to B.  
2
Use it to compute [v]B , given [v]S = .
0

Solution
We saw that in this case
 
1 1
PS,B =
1 −1

It follows that

PB,S =

[v]B =

256
Calculating a general transition matrix

Keeping notation as before, we have

[v]S = PS,B [v]B and [v]C = PC,S [v]S

Combining these,

[v]C = PC,S [v]S = PC,S PS,B [v]B

Using the uniqueness of the transition matrix, we get


−1
PC,B = PC,S PS,B = PS,C PS,B

Since it is usually easy to calculate a transition matrix with the first


basis the standard basis, this makes it straightforward to calculate any
transition matrix.

257
Example
With U = V = R2 and B = {(1, 2), (1, 1)} and C = {(−3, 4), (1, −1)},
find PC,B .

PS,B = PS,C =

So
PC,S = and PC,B =

258
Relationship Between Different Matrix Representations

Example

Calculate the standard matrix representation of T : R2 → R2 where

T (x, y ) = (3x − y , −x + 3y )

Solution

259
Example continued
Now find the matrix of T with respect to the basis
B = {(1, 1), (1, −1)}

Solution

Notice that [T ]B is diagonal. This makes it very convenient to use the


basis B in order to understand the effect of T .

260
How are [T ]S and [T ]B related?

Theorem
The matrix representations of T : V → V with respect to two bases C
and B are related by the following equation:

[T ]B = PB,C [T ]C PC,B
Proof:
We need to show that for all v ∈ V
[T (v)]B = PB,C [T ]C PC,B [v]B
Starting with the right-hand side we obtain:
PB,C [T ]C PC,B [v]B = PB,C [T ]C [v]C (property of PC,B )
= PB,C [T (v)]C (property of [T ]C )
= [T (v)]B (property of PB,C )

261
Example

For the above linear transformation T : R2 → R2 given by


T (x, y ) = (3x − y , −x + 3y ) we saw that

[T ]C =

[T ]B =

where C = {(1, 0), (0, 1)} is the standard basis and B = {(1, 1), (1, −1)}

Since C is the standard basis, it is easy to write down


PC,B =

From which we calculate PB,C =

Calculation verifies that in this case we do indeed have:

[T ]B = PB,C [T ]C PC,B
262
Topic 6: Inner Product Spaces [AR Ch 6]

6.1 Definition of inner products


6.2 Geometry from inner products
6.3 Cauchy-Schwarz inequality
6.4 Orthogonality and projections
6.5 Gram-Schmidt orthogonalization procedure
6.6 Application: curve fitting

263
6.1 Definition of inner products
The Euclidean length of a vector in v ∈ Rn is defined as
q √
kvk = v12 + v22 + · · · + vn2 = v · v

Similarly, the distance between vectors u and v is given by ku − vk.

The angle θ between two vectors u and v was defined by


v·u
cos θ = with 06θ6π
kvkkuk

The projection of v onto u was given in terms of the dot product by

(u · v)
proju v = u
kuk2

264
We want to address two issues:
1. How to generalise the notion of a dot product for vector spaces
other than Rn .
2. To relate ideas associated with the dot product and its
generalisations to bases and linear equations.

The first can be done by looking carefully at some of the key properties
and generalising them.

265
Definition (Inner Product)
Let V be a vector space over the real numbers. An inner product on V
is a function that associates with every pair of vectors u, v ∈ V a real
number, denoted hu, vi, satisfying the following properties.

For all u, v, w ∈ V and α ∈ R:


1. hu, vi = hv, ui
2. αhu, vi = hαu, vi
3. hu, v + wi = hu, vi + hu, wi
4. a. hu, ui > 0
b. hu, ui = 0 ⇒ u = 0
A vector space V together with an inner product is called an inner
product space.

Note
A vector space can admit many different inner products. We’ll see some
examples shortly.
266
In words these axioms say that we require the inner product to:
1. be symmetric;
2. be linear with respect to scalar multiplication;
3. be linear with respect to addition;
4. a. have positive squared lengths;
b. be such that only the zero vector has length 0.

With V = Rn , we of course have that hu, vi = u · v defines an inner


product (the Euclidean inner product).

But this is not the only inner product in the vector space Rn .

267
Example
Show that, in R2 , if u = (u1 , u2 ) and v = (v1 , v2 ) then

hu, vi = u1 v1 + 2u2 v2

defines an inner product.

268
More examples
1. Show that in R3 , hu, vi = u1 v1 − u2 v2 + u3 v3 does not define an
inner product by showing that axiom 4a does not hold.

2. Show that in R2 ,
    
 T 2 −1     2 −1 v1
hu, vi = u v = u1 u2
−1 1 −1 1 v2

defines an inner product.

269
Another example

The last example raises


 the
 question of what we need to know about a
a b  T  
2 × 2 matrix A = to ensure that hu, vi = u A v defines an
c d
inner product.

The check that this satisfies axioms 2 and 3 follows exactly the same
lines as the previous example.

In order to satisfy axiom 1 we will need to make b = c.

The hard work occurs in satisfying axiom(s) 4. A quick check with the
standard basis vectors shows that we must have a > 0 and d > 0.

A harder calculation, which involves completing a square, shows that


axiom(s) 4 will be satisfied exactly when det(A) > 0 (and a > 0).

So we can tell exactly when a 2 × 2 matrix A defines an inner product


in this way.
270
Another example

In the case of the vector space Pn of polynomials of degree less than or


equal to n, a possible choice of inner product is
Z 1
hp, qi = p(x)q(x) dx.
0

To verify this, we would need to check, for any polynomials


p(x), q(x), r (x) and any scalar α that
R1 R1
1. 0 p(x)q(x) dx = 0 q(x)p(x) dx
R1 R1
2. α 0 p(x)q(x) dx = 0 (αp(x))q(x) dx
R1 R1 R1
3. 0 p(x) (q(x) + r (x)) dx = 0 p(x)q(x) dx + 0 p(x)r (x) dx
R1
4. 0 p(x)2 dx > 0
R1
5. 0 p(x)2 dx = 0 only when p(x) is the zero polynomial
271
6.2 Geometry from inner products
In a general vector space, how can we measure angles and find lengths?
If we fix an inner product on the vector space, we can then define
length and angle using the same equations that we saw above for Rn .
We simply replace the dot product by the chosen inner product.

Definition (Length, Distance, Angle)


For a real vector space with an inner product h· , ·i we define:
p
◮ the length (or norm) of a vector v by kvk = hv, vi

◮ the distance between two vectors v and u by d(v, u) = kv − uk

◮ and the angle θ between v and u using

hv, ui
cos θ = with 06θ6π
kvkkuk
272
Note
In order for this definition of angle to make sense we need

hv, ui
−1 6 61
kvkkuk

We will see shortly that this is always the case


(the Cauchy-Schwarz inequality)

Definition (Orthogonal vectors)


Two vectors v and u are said to be orthogonal if hv, ui = 0

273
Example

h(u1 , u2 ), (v1 , v2 )i = u1 v1 + 2u2 v2 defines an inner product on R2

If u = (3, 1) and v = (−2, 3), then


p √ √
d(u, v) = ku−vk = k(5, −2)k = h(5, −2), (5, −2)i = 25 + 8 = 33

and
hu, vi = h(3, 1), (−2, 3)i = 3 × (−2) + 2 × 1 × 3 = 0
so u and v are orthogonal (using this inner product)

274
Example (an inner product for functions)
The set of all continuous functions

C [a, b] = {f : [a, b] → R | f is continuous}

is a vector space.
It’s a subspace of F([a, b], R), the vector space of all functions.

For f , g ∈ C [a, b] define


Z b
hf , g i = f (x)g (x)dx
a

This defines an inner product.

275
Example
R 2π
Consider C [0, 2π] with the inner product hf , g i = 0 f (x)g (x)dx

The norms of the functions s(x) = sin(x) and c(x) = cos(x) are:
Z 2π Z 2π
1
ksk2 = hs, si = sin2 (x)dx = (1 − cos(2x))dx
0 0 2
 2π
x 1
= − sin(2x) =π
2 4 0
√ √
So ksk = π and (similarly) kck = π
Z 2π Z 2π  2π
1 1
hs, ci = sin(x) cos(x)dx = sin(2x)dx = − cos(2x) =0
0 0 2 4 0

So sin(x) and cos(x) are orthogonal

This is used in the study of periodic functions (using ‘Fourier series’) in


(for example) signal analysis, speech recognition, music recording etc.
276
6.3 Cauchy-Schwarz inequality
Theorem
Let V be a real inner product space.
Then for all u, v ∈ V
|hu, vi| 6 kukkvk
Proof:
The same as the proof we saw for the case V = Rn on slide 112.

Application
The definition of angle is okay!
We defined the angle between two vectors using
hu, vi
cos θ =
kukkvk
From the Cauchy-Scwartz inequality we know that
hu, vi
−1 6 61
kukkvk
277
Example
For two continuous functions f , g : [a, b] → R, it follows directly from
the Cauchy-Schwarz inequality that
Z b  2 Z b  Z b 
f (x)g (x)dx 6 f 2 (x)dx g 2 (x)dx
a a a

And we don’t need to do any (more) calculus to prove it!

Example
√ √
Set f (x) = x and g (x) = 1/ x; also a = 1, b = t > 1.

We obtain Z 2 Z  Z 
t t t
1
1 dx 6 x dx dx
1 1 1 x
which becomes
 
2 t2 − 1 2(t − 1)
(t − 1) 6 log t or log t >
2 t +1

278
6.4 Orthogonality and projections [AR 6.2]

Orthogonal sets
Recall that u and v are orthogonal if hu, vi = 0

Definition (Orthogonal set of vectors)


A set of vectors {v1 , . . . , vk } is called orthogonal if hvi , vj i = 0
whenever i 6= j.

Examples
1. {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is orthogonal in R3 with the dot
product.
2. So is {(1, 1, 1), (1, −1, 0), (1, 1, −2)}.
3. {sin(x), cos(x)} is orthogonal in C [0, 2π] equipped with the inner
product defined on slide 276
279
Proposition
Every orthogonal set of nonzero vectors is linearly independent
Proof:

280
Orthonormal sets

Definition
A set of vectors {v1 , . . . , vk } is called orthonormal if it is orthogonal
and each vector has length one. That is
(
0 i 6= j
{v1 , . . . , vk } is orthonormal ⇐⇒ hvi , vj i =
1 i =j

Note
Any orthogonal set of non-zero vectors can be made orthonormal by
dividing each vector by its length.
Examples
1. In R3 with the dot product:
{(1, 0, 0), (0, 1, 0), (0, 0, 1)} is orthonormal
{(1, 1, 1), (1, −1, 0), (1, 1, −2)} is not (though it is orthogonal)
{ √13 (1, 1, 1), √1 (1, −1, 0), √1 (1, 1, −2)}
2 6
is orthonormal
281
2. In C [0, 2π] with the inner product
Z 2π
hf , g i = f (x)g (x)dx
0
The set {sin(x), cos(x)} is orthogonal but not orthonormal.
The set { √1π sin(x), √1π cos(x)} is orthonormal
The (infinite) set

1 1 1 1 1
{ √ , √ sin(x), √ cos(x), √ sin(2x), √ cos(2x), . . . }
2π π π π π

is orthonormal.

282
Orthonormal bases

Bases that are orthonormal are particularly convenient to work with.


For example, we have the following

Lemma (AR Thm 6.3.1)


If {v1 , . . . , vn } is an orthonormal basis for V and x ∈ V , then

x = hx, v1 iv1 + · · · + hx, vn ivn

Proof: Exercise!

283
Orthogonal projection [AR 6.3]
Let V be a real vector space, with inner product h·, ·i
Let u ∈ V be a unit vector (i.e., kuk = 1)

Definition
The orthogonal projection of v onto u is

p = hv, uiu

Note
◮ p = kvk cos θ u
◮ v − p is orthogonal to u

Example
The orthogonal projection of (2, 3) onto √1 (1, 1) is:
2

284
More generally, we can project onto a subspace W of V as follows.

Let {u1 , . . . , uk } be an orthonormal basis for W .

Definition
The orthogonal projection of v ∈ V onto W is:

projW (v) = hv, u1 iu1 + · · · + hv, uk iuk

Properties of p = projW (v)


◮ p∈W
◮ if w ∈ W , then projW (w) = w (by the above lemma)
◮ v − p is orthogonal to W (i.e., orthogonal to every vector in W )
◮ p is the vector in W that is closest to v
That is, for all w ∈ W , kv − wk > kv − pk
◮ p does not depend on the choice of orthonormal basis for W
285
Example

Let W = {(x, y , z) | x + y + z = 0} in V = R3 with the dot product.


The set
1 1
{u1 = √ (1, −1, 0), u2 = √ (1, 1, −2)}
2 6
is an orthonormal basis for W .
For v = (1, 2, 3) we have

p = hv, u1 iu1 + hv, u2 iu2


   
−1 −1
= (1, −1, 0) + (1, 1, −2)
2 2
= (−1, 0, 1)

Note that v − p = (2, 2, 2) is orthogonal to W .

286
6.5 Gram-Schmidt orthogonalization procedure
The following can be used to make any basis orthonormal:
Gram-Schmidt procedure [AR Thm 6.3.6]
Suppose {v1 , . . . , vk } is a basis for V .
1
1. Let u1 = kv1 k v1

2a. w2 = v2 − hv2 , u1 iu1


1
2b. u2 = kw2 k w2

3a. w3 = v3 − hv3 , u1 iu1 − hv3 , u2 iu2


3b. u3 = kw13 k w3
.. ..
. .
ka. wk = vk − hvk , u1 iu1 − · · · − hvk , uk−1 iuk−1
1
kb. uk = kwk k wk

Then {u1 , . . . , uk } is an orthonormal basis for V .


287
Example
Find an orthonormal basis for the subspace W of R4 (with dot product)
spanned by
{(1, 1, 1, 1), (2, 4, 2, 4), (1, 5, −1, 3)}
Answer
1 1 1
{ (1, 1, 1, 1), (−1, 1, −1, 1), (1, 1, −1, −1)}
2 2 2

Example
Find the point in W closest to v = (2, 2, 1, 3)

Answer
1
2 (3, 5, 3, 5)

288
6.6 Application: curve fitting [AR 6.4]

Given a set of data points (x1 , y1 ), (x2 , y2 ),. . . , (xn , yn ) we want to find
the straight line y = a + bx that best approximates the data.

A common approach is to minimise the the least squares error, E


y

E 2 = sum of the squares a+bx


of vertical errors
Xn
= δi2
i =1
n
X x
= (yi − (a + bxi ))2
i =1

289
So given (x1 , y1 ),. . . , (xn , yn ) we want to find a, b ∈ R which minimise
n
X
(yi − (a + bxi ))2
i =1

Which can be written as


ky − Auk2
where  
  1 x1
y1  1 x2   
 ..    a
y=. A =  .. ..  u=
. .  b
yn
1 xn

To minimise ky − Auk we want Au to be as close as possible to y


We can use projection to find the closest point.

290
We seek the vector in

W = {Av | v ∈ R2 } (= the column space of A)

that is closest to y
The closest vector is precisely projW y

So to find u we could project y to W to get Au = projW y


and from this calculate u

291
However, we can calculate u directly (without finding an orthonormal
basis for W ) by noting that
hw, y − projW yi = 0 for all w ∈ W

=⇒ hAv, y − Aui = 0 for all v ∈ R2

=⇒ (Av)T (y − Au) = 0 for all v ∈ R2

=⇒ vT AT (y − Au) = 0 for all v ∈ R2

=⇒ AT (y − Au) = 0

=⇒ AT y − AT Au = 0

=⇒ AT Au = AT y
From this we can calculate u, given that we know A and y.
292
Summary
Given data points (x1 , y1 ), (x2 , y2 ),.. . , (xn , yn ), to find the straight line
a
y = a + bx of best fit, we find u = such that
b

AT Au = AT y (∗)
 
  1 x1
y1  1 x2 
 ..   
where y =  .  and A =  . . 
 .. .. 
yn
1 xn

If AT A is invertible (and it usually is), the solution to (∗) is given by

u = (AT A)−1 AT y

293
Example
Find the straight line which best fits the data points
(−1, 1), (1, 1), (2, 3)

Answer
9
Line of best fit is: y = 7 + 47 x

294
Extension
The same method works for finding quadratic fitting curves.
To find the quadratic y = a + bx + cx 2 which best fits data (x1 , y1 ),
(x2 , y2 ),. . . , (xn , yn ) we take
 
1 x1 x12  
1 x2 x 2  y1
 2  .. 
A = . . ..  y=.
 .. .. .
yn
1 xn xn2

and solve
AT Au = AT y
for  
a
u = b 
c

295
Topic 7: Eigenvalues and Eigenvectors [AR Chapt 7]

7.1 Definition of eigenvalues and eigenvectors


7.2 Finding eigenvalues
7.3 The Cayley-Hamilton theorem
7.4 Finding eigenvectors
7.5 Diagonalization

296
7.1 Definition of eigenvalues and eigenvectors
The topic of eigenvalues and eigenvectors is fundamental to many
applications of linear algebra. These include quantum mechanics in
physics, image compression and reconstruction in computing and
engineering, and the analysis of high dimensional data in statistics.

The key idea is to identify those subspaces which map to themselves


under T .

For any vector 0 6= v ∈ V , Span{v} is a subspace of V of dimension 1.

Suppose that
T (v) = λv
for some scalar λ. It follows that T maps the subspace Span{v} to
itself.
Note that λ is a scale factor which stretches the subspace and possibly
changes its sense (if λ is negative).
297
Definition
Let T : V → V be a linear transformation.
A scalar λ is an eigenvalue of T if there is a non-zero vector v ∈ V such
that
T (v) = λv (∗)
The vector v is called an eigenvector of T (with eigenvalue λ).
If λ is an eigenvalue of T , then the set of all v satisfying (∗) is a
subspace of V (exercise!) and is called the eigenspace of λ.

We will restrict attention to the case that V is finite dimensional. Then


T can be represented by a matrix [T ], which we know depends on the
choice of bases.

298
In fact the idea of eigenvalues and eigenvectors can be applied directly
to square matrices.

Definition
Let A be an n × n matrix and let λ be a scalar. Then a non-zero n × 1
column matrix v with the property that

Av = λv

is called an eigenvector, while λ is called the eigenvalue.


To develop some geometric intuition, it is handy to think of A as being
the standard matrix of a linear transformation.
Example  
1 4
Consider the matrix as the standard matrix of a linear
1 1
transformation. What is the effect of the transformation on the vectors
(2, 1) and (2, −1)?
299
7.2 Finding eigenvalues

With I denoting the n × n identity matrix, the defining equation for


eigenvalues and eigenvectors can be rewritten

(A − λI )v = 0

The values of λ for which this equation has non-zero solutions are
precisely the eigenvalues.
Theorem
The homogeneous linear system (A − λI )v = 0 has a non-zero solution
if and only if det(A − λI ) = 0. Consequently, the eigenvalues of A are
the values of λ for which

det(A − λI ) = 0

300
Notation
The equation det(A − λI ) = 0 is referred to as the characteristic
equation. From our study of determinants, we know that det(A − λI ) is
a polynomial of degree n in λ. It is called the characteristic polynomial.

Example  
1 4
Find the eigenvalues of .
1 1

301
Example  
0 1
Find the eigenvalues of .
−1 0

If we use the standard basis, how can the corresponding linear


transformation be described geometrically? How does this tell you that
the matrix does not have any invariant subspaces?

302
Example  
1 0 0 0
0 1 0 0
Find the eigenvalues of the matrix A = 
0
.
0 1 −1
0 0 1 1

303
7.3 The Cayley-Hamilton theorem

We note (without proof) the following

Theorem (Cayley-Hamilton Theorem)


A square matrix satisfies its characteristic equation.

This means (among other things) that


◮ Every (positive) power of A can be expressed as a linear
combination of I , A,. . . , An−1 .

◮ If A is invertible, then A−1 can be expressed as a linear


combination of I , A,. . . , An−1 .

304
Examples
 
3 2
Let A =
1 4

1. Calculate the characteristic equation of A.

2. Verify that A satisfies its characteristic equation.

3. Express A−1 as a linear combination of A and I and hence


calculate it.

4. Express A3 as a linear combination of A and I and hence calculate


it.

305
7.4 Finding eigenvectors

To find the eigenvectors of a matrix


◮ For each eigenvalue λ, solve the homogeneous linear system
(A − λI )v = 0.
◮ Use row reduction as usual.
◮ Note that rank(A − λI ) < n, so you always obtain at least one row
of zeros.
Example  
1 4
For each of the eigenvalues 3, −1 of the matrix , find a
1 1
corresponding eigenvector.

306
Example
For each of the eigenvalues
 λ= −1, 8 (the eigenvalue −1 is repeated
3 2 4
twice) of the matrix 2 0 2, find a basis of for the corresponding
4 2 3
eigenspace.

307
7.5 Diagonalization [AR 7.2]

We now take up the problem of studying the question of when the


eigenvectors of an n × n matrix A form a basis for Rn . Such bases are
extremely important in the applications of eigenvalues and eigenvectors
mentioned earlier.

Definition
A square matrix A is said to be diagonalizable if there is an invertible
matrix P such that P −1 AP is a diagonal matrix. The matrix P is said
to diagonalize A.

308
To test if A is diagonalizable, the following theorem can be used.

Theorem
An n × n matrix A is diagonalizable if and only if there is a basis for Rn
all of whose elements are eigenvectors of A.

Idea of proof:
If we can form a basis in which all the basis vectors are eigenvectors of
T , then the new matrix for T will be diagonal.

309
A simple case in which A is diagonalizable is when A has n distinct
eigenvalues. This follows from the theorem above and the following

Lemma
Eigenvectors corresponding to distinct eigenvalues are linearly
independent.

Idea of proof:

310
Example  
1 2
Give reasons why the matrix A = cannot be diagonalized.
0 1

311
How to diagonalize a matrix

Suppose A is diagonalizable. How can we find an invertible matrix P,


and a diagonal matrix D with D = P −1 AP ?

Theorem
Let A be a diagonalizable n × n matrix. Thus there exists a basis
{v1 , . . . , vn } for Rn whose
 elements are eigenvectors of A.
Let P = [v1 ] · · · [vn ] and let λi be the eigenvalue of the eigenvector vi .
Then
P −1 AP = diag [λ1 , λ2 , . . . , λn ]

So, if we have found a basis consisting of eigenvectors, then we can


write down matrices P and D without further calculation.

312
Example
     
2 −1 −1
Check that 1 , 2 , 0  are eigenvectors of the matrix
    
2 0 1
 
3 2 4
A = 2 0 2 
4 2 3

and read off the corresponding eigenvalues.

Write down an invertible matrix P such that D = P −1 AP is diagonal,


and write down the diagonal matrix D.

Check your answer by evaluating both sides of the equation PD = AP.

313
Orthogonal matrices
Because a matrix often represents a physical system it can be important
that the change-of-basis transformation does not affect shape.
This will happen when the change-of-basis matrix is orthogonal:
Definition
An n × n matrix P is orthogonal if the columns of P form an
orthonormal basis of Rn .
Examples
 √ √ √ 
−1/ 2 1/ √6 1/√3  
cos θ sin(θ)
 0
√ −2/√ 6 1/√3 and
 are orthogonal
sin θ cos θ
1/ 2 1/ 6 1/ 3
   
1 1 1 −1
but and are not.
0 1 1 1

314
Orthogonal matrices have some good properties.

If P is orthogonal n × n and u, v ∈ Rn then


◮ P −1 = P T
◮ kPuk = kuk
◮ hPu, Pvi = hu, vi

We can summarise by saying that orthogonal matrices ‘preserve’ length


and angle.

315
Real symmetric matrices
Definition
A matrix A is symmetric if AT = A

Examples  
  3 −1 4
1 2
and −1 1 5 are symmetric,
2 3
4 5 9
 
  3 −1 4
2 1
but and −1 1 5 are not
3 2
4 6 9

Symmetric matrices arise (for example) in:


◮ quadratic functions
◮ inner products
◮ maxima and minima of function of more than 1 variable
316
Theorem
Let A be an n × n real symmetric matrix. Then:
1. all roots of the characteristic polynomial of A are real;
2. eigenvectors from distinct eigenvalues are orthogonal;
3. A is diagonalizable;
(In fact, there is an orthonormal basis of eigenvectors.)
4. we can write A = QDQ −1 where D is diagonal and Q is an
orthogonal matrix.

Note that, in this case the diagonalization formula can be written


A = QDQ T .
Idea of proof of 2:

317
Example  
2 −1
Find matrices D and Q as above for A =
−1 2

318
Powers of a matrix

In applications, one often comes across the need to apply a


transformation many times. In the case that the transformation can be
represented by diagonalizable matrix A, it is easy to compute Ak and
thus the action of the k-th application of the transformation.

The first point to appreciate is that computing powers of a diagonal


matrix D is easy.
Example
With D = diag [1, −3, 2], write down D 2 and D 3 .

In fact, we have
Lemma
Suppose A is diagonalizable, so that D = P −1 AP is diagonal. Then

Ak = PD k P −1

319
Example
For the matrix of slide 301 we can write
   
3 0 2 −2
A = PDP −1 with D = , P=
0 −1 1 1

Thus  
1 1 2
Ak = PD k P −1 with P −1 =
4 −1 2

Explicitly,
   n   
2 −2
1 3 0 1 2
An = × ×
4
1 1 0 (−1)n −1 2
 n n n n

1 2 (3 + (−1) ) 4 (3 − (−1) )
=4
3n − (−1)n 2 (3n + (−1)n )

320
Conic Sections [AR 9.6]

Consider equations in x and y of the form

ax 2 + bxy + cy 2 + dx + ey + f = 0

where a, b, c, d, e, f ∈ R are constants.

The graphs of such equations are called conic sections or conics.


Example
9x 2 − 4xy + 6y 2 − 10x − 20y − 5 = 0
y
4

3
We will see how the shape of
2 this graph can be calculated
using diagonalization.
1

-0.5 0.5 1 1.5 2 2.5


x

321
We shall assume that d and e are zero.
See [AR 9.6] for a discussion of how to reduce to this case.

If the equation is simple enough, we can identify the conic by inspection.

Standard Conics: (see figure 9.6.1 in AR)

x2 y2
ellipse + =1
α2 β 2

x2 y2
hyperbola − =1
α2 β 2

parabola y = αx 2

322
The equation in matrix form
Consider the curve defined by the equation

ax 2 + bxy + cy 2 = 1

The equation can be written in matrix form as


  
  a 1b x
x y 1 2 =1
2b c y

That is,
xT Ax = 1
 
x
where x = , and A is a real symmetric matrix.
y
We can diagonalize in order to simplify the equation so that we can
identify the curve.
Let’s demonstrate with an example.
323
Identify and sketch the conic defined by x 2 + 4xy + y 2 = 1
 
T 1 2
This can be written as x Ax = 1 where A =
2 1
   
T 3 0 1 1 −1
Diagonalizing gives A = QDQ with D = , Q = √2
0 −1 1 1
 ′
x
Let x′ = ′ be the co-ordinates of (x, y ) relative to the orthonormal
y
basis of eigenvectors: B = {( √12 , √12 ), (− √12 , √12 )}
Then x = Qx′ , (Q is precisely the transition matrix PS,B )
and the equation of the conic can be rewritten:

xT Ax = 1 ⇐⇒ (Qx′ )T QDQ T Qx′ = 1


⇐⇒ (x′ )T Q T QDQ T Qx′ = 1
⇐⇒ (x′ )T Dx′ = 1
⇐⇒ 3(x ′ )2 − (y ′ )2 = 1

So the curve is a hyperbola


324
The x ′ -axis and the y ′ -axis are called the principal axes of the conic

The directions of the principal axes are given by the eigenvectors.

In this example the directions of the principal axes are:


1 1 1 1
(√ , √ ) and (− √ , √ ).
2 2 2 2
y
y′ y′ 4
x′
2

x
-2 -1 1 2
x′ -4 -2 2 4

-1 -2

-2
-4

3(x ′ )2 − (y ′ )2 = 1 x2 + 4xy + y 2 = 1
325
Summary

◮ We can represent the equation of a conic (centered at the origin)


by a matrix equation x T Ax = 1 with A symmetric.

◮ The eigenvectors of A will be parallel to the principal axes of the


conic.

◮ So if Q represents the change of basis matrix then Q will be


orthogonal and Q T AQ = D will be diagonal.

◮ If x = Qx′ then x ′ represents the coordinates with respect to the


new basis and the equation of the conic with respect to this basis
is x′T Dx T = 1.

◮ The conic can now be identified.

326
Example (a quadric surface)
The equation

−x 2 + 2y 2 + 2z 2 + 4xy + 4xz − 2yz = 1

represents a ‘quadric surface’ in R3 .


In matrix form it can be represented as xT Ax = 1 with
   
x −1 2 2
x = y  and A= 2 2 −1
z 2 −1 2

The eigenvalues of A are 3, 3, −3. So the equation of the surface with


respect to an orthonormal basis of eigenvectors is

3X 2 + 3Y 2 − 3Z 2 = 1

327
The surface is a ‘hyperboloid of one sheet’; see the sketch below.
(You are not expected to identify quadric surfaces in three dimensions.)

328

Das könnte Ihnen auch gefallen