Sie sind auf Seite 1von 50

M2N1

Numerical Analysis
Mathematics
Imperial College London
ii
CONTENTS iii
Contents
1 Applied Linear Algebra 1
1.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Cauchy-Schwartz inequality . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Generalized inner product . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8 Least Square Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8.1 General Least Squares Case . . . . . . . . . . . . . . . . . . . 21
1.9 A more abstract approach . . . . . . . . . . . . . . . . . . . . . . . . 23
1.10 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Polynomial interpolation 31
2.1 Divided dierence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Finding the error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Best Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Piecewise Polynomial Interpolation . . . . . . . . . . . . . . . . . . . 41
3 Quadrature (Numerical Integration) 43
iv CONTENTS
1
Chapter 1
Applied Linear Algebra
1.1 Orthogonality
Denition. Let a, b R
n
. We dene the inner product of a, b to be inner product
, )
a, b) = a
T
b =
n

i=1
a
i
b
i
.
Also dene the outer product of a and b to be outer product
ab
T
=
_
_
_
a
1
.
.
.
a
n
_
_
_
_
b
1
b
n
_
=
_
_
_
a
1
b
1
a
1
b
n
.
.
.
.
.
.
.
.
.
a
n
b
1
a
n
b
n
_
_
_
.
Note. Note that:
1. The inner product is symmetric:
a, b) =
n

i=1
a
i
b
i
=
n

i=1
b
i
a
i
= b, a)
for all a, b R.
2. The inner product is linear with respect to the second argument:
a, b +c) =
n

i=1
a
i
(b
i
+c
i
) =
n

i=1
a
i
b
i
+
n

i=1
a
i
c
i
= a, b) +a, c)
for all a, b, c R
n
, , R.
3. From 1. and 2. get that the inner product is linear with respect to the second
argument.
2 1.1. ORTHOGONALITY
4. Observe that
a, a) =
n

i=1
a
2
i
0.
Denition. Let |a|
|a| = [a, a)]
1/2
be the length (or norm) of a. length
norm
Denition. We say that a, b R
n
, a, b ,= 0 are orthogonal if a, b) = 0. orthogonal
Example.
Claim. If a, b R
n
are orthogonal, then |a +b|
2
= |a|
2
+|b|
2
.
Proof.
|a +b|
2
def
= a +b, a +b)
= a +b, a) +a +b, b)
= |a|
2
+|b|
2
+ 2a, b)
= |a|
2
+|b|
2
.

Denition. A set of non-trivial vectors q


k

n
k=1
, q
k
R
n
, q
k
,= 0 for k = 1 n is
orthogonal if q
j
, q
k
) = 0 for j, k = 1 n, j ,= k.
As a useful shorthand, introduce the Kronecker Delta notation
jk

jk
=
_
1 j = k,
0 j ,= k.
Example. For the n n identity matrix I we have I
jk
=
jk
.
Denition. A set of non-trivial vectors q
k

n
k=1
, q
k
R
n
, q
k
,= 0 for k = 1 n is
orthonormal if orthonormal
q
j
, q
k
) =
jk
for j, k = 1 n.
Note. A set of vectors is orthonormal if it is orthogonal and each vector has unit
length.
Denition. A set of vectors a
k

n
k=1
, a
k
R
m
for k = 1 n is linearly independent linearly
independent
if
n

k=1
c
k
a
k
= 0
implies c
k
= 0 for k = 1 n. The set a
k

n
k=1
is linearly dependent if there exist linearly
dependent
coecients c
k
R, k = 1 n, not all zero, such that
n

k=1
c
k
a
k
= 0.
1.2. GRAM-SCHMIDT 3
Note. Recall that for A =
_
_
_
a
1
.
.
.
a
n
_
_
_
, a
k
R
m
for k = 1 n:
(1) If the only solution to Ac = 0 is c = 0 then a
k

n
k=1
are linearly independent.
(2) If there exists c ,= 0 such that Ac = 0 then a
k

n
k=1
are linearly dependent.
(3) Restrict to m = n (so that A is square). If A
1
exists then rows (columns) of
A are linearly independent. If a
k

n
k=1
are linearly independent they form a
basis for R
n
and each vector x R
n
can be uniquely expressed a combination
of a
i
s.
Lemma 1.1. Let a
k

n
k=1
, a
k
R
m
, k = 1 n, be orthogonal. Then a
k

n
k=1
is
linearly independent.
Proof. If

n
k=1
c
k
a
k
= 0 then for 1 j n

k=1
c
k
a
k
, a
j
) = 0, a
j
)
n

k=1
c
k
a
k
, a
j
) = 0
c
j
a
j
, a
j
) = c
j
|a
j
|
2
= 0.
Since a
j
s are non-trivial, c
j
= 0. Repeat for all j = 1 n.
Remark 1.1. Linear independence does not imply orthogonality. For example take
n = m = 2 and a
1
=
_
2
0
_
and a
2
=
_
3
1
_
which are clearly linearly independent
but not orthogonal.
1.2 Gram-Schmidt
1 Classical Gram-Schmidt Algorithm (CGS)
1: v
1
= a
1
2: q
1
= v
1
/|v
1
|
3: for k = 2 to n do
4: v
k
= a
k

k1
j=1
a
k
, q
j
)q
j
5: q
k
= v
k
/|v
k
|
6: end for
Claim. Given a
i

n
i=1
, a
i
R
m
, i = 1 n linearly independent (so n m), CGS
nds q
i

n
i=1
, q
i
R
m
, i = 1 n, orthogonal, i.e. q
i
, q
j
) =
ij
with Spana
i

n
i=1
=
Spanq
i

n
i=1
, i, j = 1 n.
4 1.2. GRAM-SCHMIDT
Proof. Since a
i

n
i=1
are linearly independent, a
i
,= 0 for i = 1 n. For k = 1, we
get
q
1
=
v
1
|v
1
|
,
|q
1
| =
_
q
1
, q
1
)
_
1/2
=
_
1
|v
1
|
2
v
1
, v
1
)
_
1/2
= 1
For k = 2. From the code have
v
2
= a
2
a
2
, q
1
)q
1
. ()
Check that v
2
is orthogonal to q
1
:
v
2
, q
1
) = a
2
, q
1
) a
2
, q
1
) q
1
, q
1
)
. .
q
1

= 0.
Need to check that |v
2
| , = 0. If v
2
= 0, then by (), a
2
equals to a
2
, q
1
)q
1
, which is
a multiple of a
1
; contradiction to linear independence of a
i

n
i=1
.
Therefore v
2
,= 0 and q
2
has unit length and is a multiple of v
2
and hence q
i

2
i=1
is
orthonormal. Clearly Spana
i

2
i=1
= Spanq
i

2
i=1
.
Assume the statement is true for k 1, i.e. that q
i

k1
i=1
is orthonormal and
q
j
= linear combination of a
i

j
i=1
a
j
= linear combination of q
i

j
i=1
for j = 1 k 1. ()
Set
v
k
= a
k

k1

i=1
a
k
, q
i
)q
i
.
Then v
k
is orthogonal to all q
j
, j = 1 k 1:
v
k
, q
j
) = q
k
, q
j
)
k1

i=1
a
k
, q
i
) q
i
, q
j
)
. .

ij
= a
k
, q
j
) a
k
, q
j
)
= 0.
If v
k
= 0 then
a
k
=
k1

i=1
a
k
, q
i
)q
i
= linear combination of q
i

k1
i=1
= linear combination of a
i

k1
i=1
1.3. QR FACTORIZATION 5
by (); contradiction to a
i

n
i=1
linearly independent.
Hence v
k
,= 0. From q
k
=
v
k
v
k

get q
i

k
i=1
orthonormal. Since q
k
is a linear
combination of q
i

k1
i=1
and a
k
, it is a linear combination of a
i

k
i=1
by (). Similarly
by (), a
k
is a linear combination of q
i

k
i=1
.
Hence the result follows by induction.
1.3 QR Factorization
Look at CGS from dierent viewpoint. For a
i

n
i=1
, CGS gives q
i

n
i=1
orthonormal.
Let
A
mn
=
_
a
1
. . . a
n
_
,

Q
mn
=
_
q
1
. . . q
n
_
.
Let

R
nn
be an upper triangular matrix

R
lk
=
_
r
lk
l k,
0 l > k
and dene e
(n)
k
R
n
to be e
(n)
k
(e
(n)
k
)
j
=
kj
.
for j = 1 n. Then clearly for any B R
mn
, Be
(n)
k
= k-th column of B. From
CGS we have a
1
= |v
1
|q
1
, let r
11
= |a
1
|. Also for k = 1 n
Ae
(n)
k
= a
k
= v
k
+
k1

i=1
a
k
, q
i
)q
i
= |v
k
|q
k
+
k1

i=1
a
k
, q
i
)q
i
=
k

i=1
r
ik
q
i
=

Q

Re
(n)
k
,
where r
kk
= |v
k
| > 0 and r
ik
= a
k
, q
i
). Hence A =

Q

R.
Expressing A R
mn
as a product of

Q R
mn
with orthonormal columns and

R R
nn
upper triangular with positive diagonal entries is called the reduced QR
factorisation of A. reduced
QR factorisation
Now take Q R
mm
Q =
_

Q q
n+1
. . . q
m
_
with q
n+1
, . . . , q
m
chosen so that the columns of Q are orthonormal and R R
mn
R =
_
_
_
_
_

R
0
.
.
.
0
_
_
_
_
_
.
6 1.3. QR FACTORIZATION
Clearly, R is upper triangular matrix (as

R is). Call A expressed as a product of Q
and R as the QR factorisation of A. QR factorisation
Observe the product of Q with Q
T
_
Q
T
Q

jk
= q
T
j
q
k
= q
i
, q
k
)
=
jk
so Q
T
Q = I
(m)
and also Q
T
= Q
1
.
Denition. Matrix Q R
mm
is called orthogonal if Q
T
Q = I
(m)
. orthogonal
Proposition 1.2. Orthogonal matrices preserve length and angle, i.e. if Q R
mm
and Q
T
Q = I
(m)
then for all v, w R
m
(1) Qv, Qw) = v, w) (angle preserved),
(2) |Qv| = |v| (length preserved).
Proof. For v, w R
m
Qv, Qw) = (Qv)
T
Qw
= (v
T
Q
T
)Qw
= v
T
I
(m)
w
= v, w).
Also
|Qv| = [Qv, Qv)]
1/2
(1)
= [v, v]
1/2
= |v|.

Proposition 1.3. If Q
1
, Q
2
R
mm
are orthogonal, then (Q
1
Q
2
) is orthogonal.
Proof. (Q
1
Q
2
)
T
(Q
1
Q
2
) = Q
T
2
Q
T
1
Q
1
Q
2
= Q
T
2
Q
2
= I
(m)
.
Example. For m = 2 and
Q =
_
cos sin
sin cos
_
.
Clearly Q is orthogonal and rotates a vector in R
2
by an angle around the origin.
Denition. Dene the Givens Rotation Matrix G
pq
() R
mm
, p, q m, as G
pq
()
1.3. QR FACTORIZATION 7
G
pq
() =
_
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
cos sin
.
.
.
sin cos
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
_
,
i.e. j
th
column of G
pq
() is
e
(m)
j
=
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
(with 1 on j
th
row) if j ,= p and j ,= q, or
_
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
cos
.
.
.
sin
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
= e
(m)
p
cos +e
(m)
q
sin
if j = p, or
_
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
sin
.
.
.
cos
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
= e
(m)
p
sin +e
(m)
q
cos
if j = q.
Note. Length of every column of G
pq
() is 1 and columns of G
pq
() are orthogonal;
G
pq
() is orthogonal matrix.
For A, B R
mn
consider
G
pq
()A = B.
All rows of B are the same as those of A, except for rows p and q. Aim is to obtain
a QR factorisation of A using a sequence of givens rotations.
8 1.3. QR FACTORIZATION
Example. For m = 3, n = 2,
A =
_
_
3 65
4 0
12 13
_
_
.
Take a sequence of Givens Rotations so that Ais transformed into R upper triangular.
Choose G
12
() so that
A
(1)
= G
12
()A =
_
_

0
12 13
_
_
.
Choose such that (since G
12
preserves length)
_
cos sin
sin cos
__
3
4
_
=
_
5
0
_
.
Get
G
12
() =
_
_
3
5
4
5
0

4
5
3
5
0
0 0 1
_
_
and so
A
(1)
=
_
_
5 39
0 52
12 13
_
_
.
Choose G
13
() as the next rotation since it does not aect row 2, so A
(2)
21
stays 0
(G
23
() would not work). Want the rst column of A
(2)
to be a multiple of e
(3)
1
.
Since G
13
() preserves length, we know is (5
2
+ 12
2
)
1/2
= 13. So
G
13
() =
_
_
5
13
0
12
13
0 1 0

12
13
0
5
13
_
_
and so
A
(2)
= G
13
()A
(1)
=
_
_
13 27
0 52
0 31
_
_
.
Now choose G
23
()
G
23
() =
1

3665
_
_
1 0 0
0 52 31
0 31 52
_
_
to get
R = A
(3)
= G
23
()A
(2)
=
_
_
13 27
0

3665
0 0
_
_
.
So
R = G
23
()G
13
()G
12
()
. .
G
A
1.4. CAUCHY-SCHWARTZ INEQUALITY 9
with G being orthogonal since Givens rotations are orthogonal. Then
R = GA
G
T
GA = G
T
R
A = QR
with Q = G
T
.
In general, we want to solve Ax = b for A R
mn
. We apply a sequence of
Givens Rotations G to take A to R upper triangular to get an equivalent system
GAx = Rx = Gb = c.
If m > n and if c
i
,= 0 for some i = n+1 m then there is no solution x to Rx = c
and the system is said to be inconsistent. Otherwise there exists a unique solution
x which can be found by backward substitution.
1.4 Cauchy-Schwartz inequality
For vectors a, b R
3
a b = [a[[b[ cos .
Generalize this to R
n
.
Theorem 1.4 (Cauchy-Schwartx inequality). For a, b R
n
[a, b)[ |a||b|
with equality iff a and b are linearly dependent.
Proof. If a = 0 then a, b) = 0 for all b R
n
and so the inequality is trivially true.
If a ,= 0 then let q =
a
a
and c = b b, q)q so that
c, q) = b, q) b, q)q, q)
= 0.
We have
0 |c|
2
= c, c)
= c, b b, q)q)
= c, b) b, q)c, q)
= c, b)
= b b, q)q, b)
= |b|
2
b, q)q, b)
= |b|
2

_
b, q)

2
= |b|
2
[b, a)]
2
/|a|
2
[a, b)[ |a||b|
with equality iff c = 0, i.e.
b = b, q)q
= b, a)a/|a|
2
,
i.e. a, b are linearly dependent.
10 1.5. GRADIENTS AND HESSIANS
1.5 Gradients and Hessians
For a function of one variable f : R R have a Taylor series
f(a +h) = f(a) +hf

(a) +
h
2
2!
f

(a) +o(h
3
).
Now consider functions of n variables, i.e. f : R
n
R. Write f(x) where x =
_
_
_
x
1
.
.
.
x
n
_
_
_
R
n
. We dene the partial derivative of f with respect to x
i
f
x
i
to be a
f
x
i
derivative of f when taking all x
j
, j ,= i, as constants.
Example. For n = 2, x =
_
x
1
x
2
_
, f(x) = f(x
1
, x
2
) = sin x
1
sin x
2
. Then the rst
derivatives are
f
x
1
(x) = cos x
1
sin x
2
,
f
x
2
(x) = sin x
1
cos x
2
.
Generally, the second derivatives are

2
f
x
i
x
j
=

x
i
_
f
x
j
(x)
_
=

x
j
_
f
x
i
(x)
_
for i, j = 1 n and f suciently smooth.
Example. f(x) = sin x
1
cos x
2
. Then

2
f
x
2
1
(x) =

x
1
_
f
x
1
(x)
_
= sin x
1
cos x
2
,

2
f
x
2
2
(x) =

x
2
_
f
x
2
(x)
_
= sin x
1
cos x
2
,

2
f
x
1
x
2
(x) =

x
2
_
f
x
1
(x)
_
= cos x
1
sin x
2
.
Chain Rule
For f : R R, f(x), we can change the variable x so that x = x(t) or t = t(x) and
dene w(t) = f(x(t)). Then
dw(t)
dt
=
df
dx
(x(t))
dx
dt
(t).
Generalize to n variables. If w(t) = f(x(t)) then
dw
dt
=
n

i=1
f
x
i
(x(t))
dx
i
dt
(t).
1.5. GRADIENTS AND HESSIANS 11
Example. For n = 2, f(x) = sin x
1
sin x
2
, x
1
(t) = t
2
, x
2
(t) = cos t and hence
w(t) = sin t
2
sin(cos t). We have
dw
dt
= 2t cos t
2
sin(cos t) + sin t
2
(cos(cos t))(sin t)
=
f
x
1
(x(t))
dx
1
dt
(t) +
f
x
2
(x(t))
dx
2
dt
(t).
For general w(t) = f(a +th),
d
m
w
dt
m
=
_
n

i=1
h
i

x
i
_
m
f(a +th).
Now can generalize the Taylor series to get
f(a +h) = f(a) +
n

i=1
h
i

x
1
f(a) +
1
2
_
n

i=1
h
i

x
i
_
_
_
n

j=1
h
j

x
j
_
_
f(a) +O(|h
3
|).
Denition. For a function f : R
n
R, call the vector f(x) R
n
f(x)
f(x) =
_

_
f
x
1
(x)
.
.
.
f
x
n
(x)
_

_
the gradient of f at x. gradient
Denition. For a function f : R
n
R, call the matrix D
2
f(x) R
nn
D
2
f(x)
_
D
2
f(x)

ij
=

2
f
x
i
x
j
(x)
the hessian of f at x. hessian
Can now rewrite the Taylor series as
f(a +h) = f(a) +h
T
f(a) +
1
2
h
T
D
2
f(a)h +O(|h
3
|).
Example. Let f(x) = x
T
Ax for all x R
n
, where A R
nn
is a given symmetric
matrix. Find f(x) and D
2
f(x).
Get
f(x) =
n

i=1
n

j=1
A
ij
x
i
x
j
and so
f
x
p
(x) =
n

i=1
n

j=1
A
ij

x
p
(x
i
x
j
)
=
n

i=1
n

j=1
A
ij
_
x
j
_

x
p
x
i
_
+x
i
_

x
p
x
j
__
.
12 1.5. GRADIENTS AND HESSIANS
Also

x
p
x
i
=
_
1 if i = p,
0 if i ,= p
=
ip
.
Therefore
f
x
p
(x) =
n

i=1
n

j=1
A
ij
(
ip
x
j
+x
i

jp
)
=
n

j=1
A
pj
x
j
+
n

i=1
A
ip
x
i
and therefore
[f(x)]
p
= [Ax]
p
+
_
A
T
x

p
,
f(x) = Ax +A
T
x
= 2Ax if A is symmetric.
For the Hessian get

2
f
x
q
x
p
(x) =

x
q
_
f
x
p
(x)
_
=

x
q
_
[Ax]
p
+
_
A
T
x

p
_
=

x
q
_
_
n

j=1
A
pj
x
j
+
n

i=1
A
ip
x
i
_
_
= A
pq
+ (A
T
)
pq
and so for A symmetric, D
2
f(x) = 2A. Note the analogy with derivatives of functions
of one variable:
f(x) = ax
2
, f(x) = x
T
Ax,
f

(x) = 2ax, f(x) = 2Ax,


f

(x) = 2a, D
2
f(x) = 2A.
Denition. A function f : R
n
R has a local maximum[minimum] in a if for all local maximum
minimum
u R
n
, |u| = 1, there exists > 0 such that
f(a +hu) f(a)
[]
for all h [0, ].
For n = 1, f

(a) = 0 and f

(a) > [<]0 are sucient conditions for f to have a local


minimum [maximum] at x = a as
f(a h) = f(a) hf

(a) +
1
2
h
2
f

(a) +O(h
3
)
= f(a) +
1
2
h
2
f

(a) +O(h
3
)
[]f(a) for small h.
1.5. GRADIENTS AND HESSIANS 13
Proposition 1.5. For f : R
n
R, if f(a) ,= 0 then f(x) does not have a local
minimum or maximum at x = a, i.e. f(a) = 0 is a necessary condition for f(x) to
have a local minimum or maximum at x = a.
Proof. We show that f does not have a maximum at a (analogous for minimum).
Let h 0 and consider
f(a +hu) = f(a) +hu
T
f(a) +O(h
2
).
Let
u =
f(a)
|f(a)|
so that |u| = 1. Then
f(a +hu) = f(a) +h
|f(a)|
2
|f(a)|
+O(h
2
)
= f(a) +h|f(a)|
. .
>0
> f(a).

Points a where f(a) = 0 are called stationary points of f(x). stationary points
Proposition 1.6. If f(a) = 0 and w
T
D
2
f(a)w > [<]0 for all w R
n
, w ,= 0, then
f(x) has a local minimum [maximum] at x = a.
Proof. Take u such that |u| = 1 (and so u ,= 0). Then
f(a +hu) = f(a) +hu
T
f(a)
. .
=0
+
1
2
h
2
..
0
u
T
D
2
f(a)u
. .
>[<]0
+O(h
3
)
[]f(a).

Example. For n = 2, f(x) = x


2
1
2x
1
+x
2
2
2x
2
+ 1 we have
f(x) =
_
f
x
1
(x)
f
x
2
(x)
_
=
_
2(x
1
1)
2(x
2
1)
_
.
Look for stationary points, i.e. when f(a) = 0; get a = (1, 1). Compute the Hessian
D
2
f(x) =
_
_

2
f
x
2
1

2
f
x
1
x
2

2
f
x
1
x
2

2
f
x
2
2
_
_
=
_
2 0
0 2
_
= 2I
(2)
.
14 1.6. GENERALIZED INNER PRODUCT
Check that for all w R
2
, w ,= 0,
w
T
D
2
f(a)w = 2w
T
w
= 2|w|
2
> 0.
So f has a local minimum at (1, 1).
Denition. Call a matrix A R
nn
denite
positive denite if x
T
Ax > 0,
negative denite if x
T
Ax < 0,
non-negative denite if x
T
Ax 0,
non-positive denite if x
T
Ax 0
for all x R
n
, x ,= 0.
Note. Clearly, a positive (negative) denite matrix A R
nn
is invertible since
there is no x R
n
, x ,= 0, such that Ax = 0; if there was, then x
T
Ax = x
T
0 = 0, a
contradiction.
Example. For n = 2, A =
_
1 1
1 1
_
, x =
_
x
1
x
2
_
,
x
T
Ax = x
2
1
+x
2
2
2x
1
x
2
= (x
1
x
2
)
2
0
so A is non-negative denite but not positive denite.
Using this denition, we can restate the proposition 1.6:
Proposition 1.7. If f(a) = 0 and D
2
f(a) is positive [negative] denite then a is
a local minimum [maximum] of f.
1.6 Generalized inner product
Denition. Let A be symmetric positive denite matrix A R
nn
. Dene the
inner product , )
A
v, u)
A
v, u)
A
= u
T
Av
for all v, u R
n
.
Note. We previously worked with , )
I
= u
T
v.
Check that the required properties of an inner product still hold:
symmetry
u, v)
A
= v
T
Au
= (v
T
Au)
T
= u
T
A
T
v
= u
T
Av
= u, v)
A
,
1.7. CHOLESKY FACTORIZATION 15
linearity
u, v +w)
A
= u, v)
A
+u, w)
A
u +v, w)
A
= u, w)
A
+v, w)
A
for all u, v, w R
n
and , R.
Denition. For a positive denite matrix A R
nn
dene the length of a vector length
| |
A
u R
n
as
|u|
A
= (u, u)
A
)
1/2
.
Theorem 1.8 (Generalised Cauchy-Schwartz inequality). If A R
nn
is symmetric
positive denite then
[a, b)
A
[ |a|
A
|b|
A
for all a, b R
n
with equality iff a, b are linearly dependent.
Proof. Replace , ) by , )
A
and | | by | |
A
in the proof of Cauchy-Schwartz
inequality.
1.7 Cholesky Factorization
Easy method of generating symmetric and positive denite matrices:
Proposition 1.9. If P R
nn
is invertible, then A = P
T
P is symmetric and
positive denite.
Proof. Matrix A is symmetric since
A
T
= (P
T
P)
T
= P
T
P
= A.
It is positive denite since
x
T
Ax = x
T
(P
T
P)x
= (Px)
T
(Px)
= |Px|
2
0
for all x R
n
. Also if x
T
Ax = 0 then |Px| = 0 and so Px = 0 and hence x = 0
since P is invertible.
We now prove the reverse direction.
Cholesky Factorisation
Theorem 1.10. Let A R
nn
be any symmetric positive denite matrix. Then
there exists an invertible P R
nn
such that A = P
T
P. Furthermore, we can
choose P to be upper triangular with P
ii
> 0, i = 1 n, in which case we say that
A = P
T
P is a Cholesky Factorisation (Decomposition) of A.
16 1.7. CHOLESKY FACTORIZATION
2 Apply CGS with , )
A
to v
i

n
i=1
1: w
1
= v
1
2: w
1
= v
1
/|v
1
|
3: for k = 2 to n do
4: w
k
= v
k

i1
j=1
v
k
, u
j
)
A
u
j
5: u
k
= w
k
/|w
k
|
6: end for
Proof. Let v
i

n
i=1
be any n linearly independent vectors in R
n
. Using the inner
product induced by A, we apply Gram Schmidt (with this inner product) to v
i

n
i=1
to get u
i

n
i=1
. Let U =
_
u
1
. . . u
n
_
R
nn
.
Then (this is a proof of 1.1 generalized)
[U
T
(AU)]
ij
= u
T
i
Au
j
= u
i
, u
j
)
A
=
ij
for i, j = 1 n. So U
T
AU = I
(n)
.
Does U
1
exist? Requires u
i

n
i=1
to be linearly independent. Suppose there exists
c R
n
such that

n
i=1
c
i
u
i
= 0. Then
n

i=1
c
i
Au
i
= A0 = 0
u
T
j
n

i=1
c
i
Au
i
= 0
n

i=1
c
i
u
i
, u
j
)
A
= 0
c
j
= 0.
for j = 1 n and so c = 0 and u
j

n
i=1
are linearly independent.
So U
1
exists and
U
1
U = I
(n)
= [I
(n)
]
T
= [U
1
U]
T
= U
T
(U
1
)
T
and therefore (U
T
)
1
= (U
1
)
T
. We let P = U
1
(so P is invertible). Observe that
P
T
= (U
1
)
T
= (U
T
)
1
.
Therefore
P
T
P = P
T
I
(n)
P
= P
T
U
T
AUP = A.
To nd P upper triangular with P
ii
> 0, need to choose v
i

n
i=1
to be a particular
basis for R
n
: for i = 1 n let v
i
= e
(n)
i
((e
(n)
i
)
j
=
ij
for i, j = 1 n). Clearly,
1.7. CHOLESKY FACTORIZATION 17
matrix U from CGS is upper triangular since each u
i
is a linear combination of
e
(n)
1
, . . . , e
(n)
i
. To show that U
ii
> 0, observe that U
ii
= (u
i
)
i
= (w
i
/|w
i
|
A
)
i
and
that
w
i
= e
(n)
i

i1

j=1
e
(n)
i
, u
j
)
A
u
j
.
Since (u
j
)
k
= 0 for k > j, we have that (w
i
)
i
= (e
(n)
i
)
i
= 1. Hence U is upper
triangular with U
ii
> 0.
Now choose P to be U
1
. Then
UP = I
(n)
U
_
p
1
. . . p
n
_
=
_
e
(n)
1
. . . e
(n)
n
_
.
For each i = 1 n solve Up
i
= e
(n)
i
: clearly (p
i
)
j
= 0 for j = i + 1 n and
(p
i
)
i
= 1/U
ii
> 0 so P is upper triangular with P
ii
> 0 for i = 1 n.
Proposition 1.11. Let A R
nn
be symmetric positive denite. Then A
kk
> 0 for
k = 1 n and [A
jk
[ < (A
jj
)
1/2
(A
kk
)
1/2
for j, k = 1 n, j ,= k.
Proof. Since A is symmetric positive denite, by the previous theorem, there exists
an invertible P such that A = P
T
P. Let
P =
_
p
1
. . . p
n
_
.
Then
A
jk
= p
T
j
p
k
= p
j
, p
k
)
for j, k = 1 n. So A
kk
= |p
k
| > 0 as p
k
,= 0 (P is invertible and so p
i

n
i=1
are
linearly independent).
Also
[A
jk
[ = [p
j
p
k
)[
< |p
j
||p
k
|
= (A
jj
)
1/2
(A
kk
)
1/2
by Cauchy-Schwartz (strict inequality as p
j
and p
k
are linearly independent).
Computing Cholesky Decomposition
Given A symmetric positive denite, can nd L = P
T
lower triangular with L
ii
> 0
such that A = LL
T
by applying CGS with , )
A
to e
i

n
i=1
to get u
i

n
i=1
and
putting P = U
1
= [u
1
, . . . , u
n
]
1
.
There is an easier way. Let L = [l
1
, . . . , l
n
] R
nn
and A = LL
T
. Then
A
ij
=
n

k=1
L
ij
(L
T
)
kj
=
n

k=1
(l
k
)
i
(l
k
)
j
. ()
18 1.7. CHOLESKY FACTORIZATION
Also
(l
k
l
T
k
)
ij
= (l
k
)
i
(l
T
k
)
j
= (l
k
)
i
(l
k
)
j
.
So from () get
A
ij
=
n

k=1
(l
k
l
T
k
)
ij
A =
n

k=1
l
k
l
T
k
.
Example. For n = 3. Find a Cholesky Decomposition of A =
_
_
2 1 0
1
5
2
1
0 1
5
2
_
_
,
i.e. nd lower triangular L, L
ii
> 0, i = 1 n, such that A = LL
T
.
Need to check that A is symmetric (clear) and positive denite (good to verify
conditions from 1.11) Take arbitrary x R
3
. Firstly,
x
T
Ax =
3

i=1
3

j=1
A
ij
x
i
x
j
= 2x
2
1
+
5
2
x
2
2
+
5
2
x
2
3
2x
1
x
2
2x
2
x
3
2x
2
1
+
5
2
x
2
2
+
5
2
x
2
3
(x
2
1
+x
2
2
) (x
2
2
+x
2
3
)
= x
2
1
+
1
2
x
2
2
+
3
2
x
2
3
> 0.
So let L = [l
1
, l
2
, l
3
] be lower triangular. Then
A = LL
T
=
3

k=1
l
k
l
T
k
= l
1
l
T
1
+l
2
l
T
2
+l
3
l
T
3
.
Since L is lower triangular
l
1
l
T
1
=
_
_



_
_
,
l
2
l
T
2
=
_
_
0 0 0
0 a
2
ab
0 ab b
2
_
_
,
l
3
l
T
3
=
_
_
0 0 0
0 0 0
0 0 x
_
_
.
1.7. CHOLESKY FACTORIZATION 19
Therefore the rst column of A is generated by l
1
alone, i.e.
l
1
=
Ae
1

A
11
=
Ae
1
|e
1
|
A
.
(l
1
)
1
(l
1
)
1
= 2, (l
1
)
1
(l
1
)
2
= 1, (l
1
)
1
(l
1
)
3
= 0. Thus l
1
=
1

2
_
_
2
1
0
_
_
, a multiple of
rst column of A.
Dene A
(1)
so that A
(1)
= l
2
l
T
2
+l
3
l
T
3
A
(1)
= Al
1
l
T
1
= A
_
_
2 1 0
1 2 0
0 0 0
_
_
=
_
_
0 0 0
0 2 1
0 1
5
2
_
_
.
By the same reasoning l
2
=
1

2
_
_
0
2
1
_
_
, multiple of the second column of A
(1)
.
Dene A
(2)
so that A
(2)
= l
3
l
T
3
A
(2)
= A
(1)
l
2
l
T
2
= A
(1)

_
_
0 0 0
0 2 1
0 1
1
2
_
_
=
_
_
0 0 0
0 0 0
0 0 2
_
_
and so l
3
=
1

2
_
_
0
0
2
_
_
, multiple of the third column of A
(2)
.
Putting these together gives
L =
1

2
_
_
2 0 0
1 2 0
0 1 2
_
_
.
Now consider the above constructive algorithm in the general case, i.e. A R
nn
symmetric positive denite. Since A
11
> 0, we can start the algorithm by dening
l
1
=
Ae
1

A
11
.
Then A
(1)
= A l
1
l
T
1
is symmetric (since A and l
1
l
T
1
are symmetric) and has the
form
A
(1)
=
_
_
_
_
_
0 0 0
0
.
.
. B
0
_
_
_
_
_
20 1.8. LEAST SQUARE PROBLEMS
with B symmetric. To continue, we need to show that B is positive denite and so
B
kk
> 0.
Theorem 1.12. Matrix B R
(n1)(n1)
dened above is positive denite.
Proof. We need to show that u
T
Bu > 0 for all u R
n1
, u ,= 0. Take u R
n1
,
u ,= 0. Construct v =
_
0
u
_
R
n
(hence v ,= 0) and e
T
1
v = 0 means that e
1
and v
are linearly independent. Then
A
(1)
= A
(Ae
1
)(Ae
1
)
T
|e
1
|
2
A
,
v
T
Av = u
T
Bu.
So
u
T
Bu = v
T
Av
(e
T
1
Av)
2
|e
1
|
2
A
=
(|v|
2
A
|e
1
|
2
A
[e
1
, v)
A
]
2
)
|e
1
|
2
A
.
By Cauchy-Schwartz, [e
1
, v)
A
[ < |e
1
|
A
|v|
A
. Hence u
T
Bu > 0.
Also B
11
> 0 and so A
(1)
22
> 0; the procedure can continue.
Application of Cholesky Decomposition
Given A R
nn
symmetric positive denite, can nd L lower triangular such that
LL
T
, L
ii
> 0. Solve Ax = b for given b R
n
: Get
LL
T
x = b
and let z = L
T
x. Solve Lz = b by forward substitution
z
1
= b
1
/L
11
,
z
k
= (b
k

k1

j=1
L
kj
z
j
)/(L
jk
)
for k = 2 n. Having z, solve L
T
x = z by backward substitution.
1.8 Least Square Problems
Example. Take a pendulum with length l, measure the period T and estimate g
(the acceleration due to gravity). Have the following
L =

l, C =
2

g
,
CL = T.
1.8. LEAST SQUARE PROBLEMS 21
Do m experiments to get
LC = T
with L, T R
m
. Plot the data (T
i
against L
i
) and t a straight line through the
data. Choose C to minimize the sum of squares of the errors, i.e. such that
m

i=1
(T
i
CL
i
)
2
= |T CL|
2
= T CL, T CL)
= |T|
2
2CL, T) +C
2
|L|
2
= S
is minimal. The derivative
dS
dC
= 2L, T) + 2C|L|
2
equals 0 iff C =
L,T
L
2
. Check the second derivative
d
2
S
dC
2
= 2|L|
2
> 0.
Take C

=
L,T
L
2
. Then
T C

L, L) = T, L) C

|L|
2
= 0,
i.e. the choice of C

makes T C

L perpendicular to L.
1.8.1 General Least Squares Case
Given A R
mn
(m n) b R
n
nd x R
n
such that Ax = b. For m > n, theres
no general solution as we have an overdetermined system. We are concerned about
nding x

R
n
which minimizes |Ax b| over x. Let
Q(x) = |Ax b|
2
= Ax b, Ax b)
= (Ax b)
T
(Ax b)
= (x
T
A
T
b
T
)(Ax b)
= x
T
A
T
Ax b
T
Ax x
T
A
T
b +b
T
b
= x
T
A
T
Ax 2b
T
Ax +|b|
2
= x
T
Gx 2
T
x +|b|
2
where
G = A
T
A R
nn
,
= A
T
b R
n
.
Note that G is symmetric.
Take derivatives of Q to get
Q(x) = 2(Gx ),
D
2
Q(x) = 2G.
22 1.8. LEAST SQUARE PROBLEMS
Theorem 1.13. Let A R
mn
(m n) with linearly independent columns and
b R
m
. Then A
T
A R
nn
is symmetric positive denite. Moreover, the x

R
n
solving A
T
Ax

= A
T
b is the unique minimum of Q(x) = |Ax b|
2
over x R
n
.
Note. The equations A
T
Ax

= A
T
b are called normal equations and x

is called normal equations


the least squares solution of Ax = b.. least squares
solution
Proof. Matrix A
T
A is clearly symmetric shown above. Let A =
_
a
1
. . . a
n
_
,
a
i
R
m
, a
i

n
i=1
linearly independent. Then for any c R
n
c
T
A
T
Ac = (Ac)
T
Ac
= |Ac|
2
0
with equality iff Ac = 0, i.e. when c = 0 since a
i

n
i=1
is linearly independent. Hence
A
T
A is positive denite.
To nd the minimum of Q(x), nd x

such that Q(x

) = 0 and D
2
Q(x

) is positive
denite. Get
Q(x) = 2(Gx ) = 2(A
T
Ax A
T
b),
D
2
Q(x) = 2G = 2A
T
A.
Therefore x

has to solve A
T
Ax = A
T
b. As A
T
A is positive denite, (A
T
A)
1
exists.
Hence there exists a unique x

solving A
T
Ax = A
T
b. As D
2
Q(x

) is positive denite,
x

is the unique global minimum of Q(x) = |Ax b|


2
.
Example. For m = 3, n = 2, A =
_
_
3 65
4 0
12 13
_
_
, b =
_
_
1
1
1
_
_
. It is obvious that
no x R
2
solves Ax = b. Find the least square solution x

R
2
: solve the normal
equations
A
T
Ax

= A
T
b
to get x

=
_
0.090587 . . .
0.010515 . . .
_
.
In practice, it is not a good idea to solve the normal equations since the matrix
A
T
A is generally badly conditioned. A matrix B R
nn
is ill-conditioned if small ill-conditioned
changes to b lead to large changes in the solution Bx = b, so if in
B(x +x) = +b
for small b, x is large.
We now nd the x

using the QR approach. Using a sequence of Givens rotations,


can nd G orthogonal such that GA = R upper triangular with R
ii
> 0. Then
A = G
T
R.
Rx = Gb
1.9. A MORE ABSTRACT APPROACH 23
with
Gb =
_
_
_
_
_
_
_
_
_
_
(Gb)
1
.
.
.
(Gb)
n
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
+
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
(Gb)
n+1
.
.
.
(Gb)
m
_
_
_
_
_
_
_
_
_
_
= +.
If = 0, then there exists a unique solution to Rx = = Gb so there exists a unique
solution x to Ax = b.
If ,= 0 then Rx = Gb is an inconsistent system and has no solution x and so does
Ax = b. However, we can solve Rx

= . We claim that x

R
n
is the least squares
solution of Ax = b. Also || = |Ax b|.
1.9 A more abstract approach
A more abstract denition of the inner product:
Denition. Let V be a real vector space. A inner product on V V is a function
, ) : V V R such that, for all u, v, w V, , R,
(1) u +v, w) = u, w) +v, w)
(2) u, v) = v, u),
(3) u, u) 0 with equality iff v = 0.
inner
product An inner product induces a norm |u| = (u, u))
1/2
for all u V . This implies
|u| = 0 iff u = 0.
Example. Let V = C[a, b] be continuous functions over [a, b]. Let w C[a, b] with
w(x) > 0 for all x [a, b]. Dene f, g) =
_
b
a
w(x)f(x)g(x)dx. Clearly (1) and (2)
hold. Also
f, f) =
_
b
a
w(x) (f(x))
2
dx
0
and f, f) = 0 implies f = 0.
Let V be a real vector space with inner product, ). Let U be a nite dimensional
subspace of V with basis
i

n
i=1
. Given v V , nd u

U such that |v u

|
|v u| for all u V .
Example. Let V = C[a, b] and f, g) =
_
b
a
f(x)g(x)dx (i.e. w(x) = 1). Let U be
polynomials of degree n 1 with basis
i
= x
i1
.
24 1.9. A MORE ABSTRACT APPROACH
We have u U implies u =

n
i=1

i
with
i
R. Also u

U implies u

n
i=1

i
with

i
R. Therefore
|v u

|
2
|v u|
2
_
_
_
_
_
v
n

i=1

i
_
_
_
_
_
2

_
_
_
_
_
_
v
n

j=1

j
_
_
_
_
_
_
2
.
Let E() =
_
_
_v

n
i=1

i
_
_
_
2
. Now we have to nd

R
n
such that E(

) E()
for all R
n
. We have
E() = v
n

j=1

j
, v
n

i=1

i
)
= |v|
2
2
n

i=1

i
v,
i
) +
n

i=1
n

j=1

i
,
j
).
Let R
n
where
i
= v,
i
). Let G R
nn
where G
ij
=
i
,
j
). Now we have
E() = |v|
2
2
T
+
T
G,
E() = 2 + 2G,
D
2
E() = 2G.
So

minimises E() if E(

) = 0. This is equivalent to G

= . The matrix G
is called the Gram matrix and depends on the basis for U. It is sometimes written Gram matrix
as G(
1
, ...,
n
).
Lemma 1.14. Let
i

n
i=1
be a basis of U. Let G R
nn
be such that G
ij
=
i
,
j
).
Then G is positive-denite.
Proof. Check that for any R
n

T
G =
n

i=1
n

j=1

i
,
j
)
=
n

i=1

i
,
n

j=1

j
)
=
_
_
_
_
_
n

i=1

i
_
_
_
_
_
2
0.
This only equals to zero if

n
i=1

i
= 0. As
i
s are linearly independent this
implies = 0. Therefore
T
G > 0 for all ,= 0.
1.9. A MORE ABSTRACT APPROACH 25
As G is positive denite, we can deduce that G
1
exists, and therefore there is a
unique

R
n
solving G

= , i.e. E(

) = 0 and therefore

is a global
minimum of E().
Theorem 1.15 (Orthogonality Property). Finding

R
n
which minimises E()
is equivalent to nding u

n
i=1

i
U such that v u

, u) = 0 for all u R
n
.
Proof. G

= implies that
T
G

=
T
for all R
n
. Moreover
T
G

=
T

implies that (G

)
i
=
i
, repeat for i = 0 n and we get G

= equivalent to

T
G

=
T
for all R
n
. So
G

T
G

=
T

i=1
n

j=1

i
G
ij

j
=
n

i=1

i=1

i
,
n

j=1

j
) =
n

i=1

i
, v)
u, u

) = u, v)
v u

, u) = 0,
where u =

n
i=1

i
.
Example. Let V = C[0, 1] and f, g) =
_
1
0
f(x)g(x)dx and let U = P
n1
. Take

i
= x
i1
. Given v V , nd u

n
i=1

i
x
i1
such that
|v u

| |v u|
|v u

|
2
|v u|
2
_
1
0
(v u

)
2
dx
_
1
0
(v u)
2
dx
for all u V . We now have to solve the normal equations G

= , where

i
= v,
i
)
=
_
1
0
v(x)x
i1
dx,
G
ij
=
i
,
j
)
=
_
1
0
x
i1
x
j1
dx
=
_
1
0
x
i+j2
dx
=
1
i +j 1
.
26 1.10. ORTHOGONAL POLYNOMIALS
This gives the Hilbert Matrix
G =
_
_
_
_
_
1
1
2
. . .
1
n
1
2
1
3
. . .
1
n+1
.
.
.
.
.
.
.
.
.
.
.
.
1
n
1
n+1
. . .
1
2n1
_
_
_
_
_
which is very badly conditioned as the columns are linearly dependent as n .
We need to change basis; we have two options:
1. We can use the Gram-Schmidt algorithm to change the basis to an orthonormal
basis
i

n
i=1
where
i
,
j
) =
ij
. This implies that G = I.
2. We can also create an orthogonal basis
i

n
i=1
where
i
,
j
) = 0 for i ,= j.
Now G is diagonal and G
ii
=
_
_
_
i
_
_
_
2
> 0. We have

i
=

i
|
i
|
2
and therefore
u

n
i=1

i
|
i
|
2

i
.
Example. Let V = R
m
and let a, b) = a
T
b. Let U = Span a
i

n
i=1
with n m. So
a
i

n
i=1
is a basis for U. Given v R
m
, we want to nd u

n
i=1

i
a
i
such that
|v u

| |v u| for all u U. We need to solve the normal equations G

= ,

i
= v, a
i
)
= a
T
i
v,
G
ij
= a
i
, a
j
)
= a
T
i
a
j
.
Let A =
_
a
i
a
n
_
so A
T
A = G and = A
T
v. We can deduce that
A
T
A

= A
T
v
A

= v.
So A
T
A is ill-conditioned and we shouldnt solve these normal equations and use the
QR approach instead.
1.10 Orthogonal Polynomials
V = C[a, b] and f, g) =
_
b
a
w(x)f(x)g(x)dx where w is the weight function w
C(a, b) such that w 0 with possibly a nite number of zeros. This is required for
the integral to be well-dened.
[f, g)[ =

_
b
a
w(x)f(x)g(x)dx

_
b
a
[w(x)f(x)g(x)[ dx
=
_
b
a
w(x) [f(x)g(x)[ dx

_
b
a
w(x)dx max
axb
[f(x)[ max
axb
[g(x)[ .
1.10. ORTHOGONAL POLYNOMIALS 27
Therefore , ) is well-dened if
_
b
a
w(x)dx < .
Let U = P
n
be the polynomials of degree n. The natural basis
_
x
i
_
n
i=0
leads to an
ill-conditioned Gram matrix. We will construct a new basis for P
n
,
i

n
i=0
where

j
(x) is a monic polynomial of degree j, i.e.
j
(x) = x
j
+

j1
i=o
a
ij
x
i
. monic polynomial
Theorem 1.16. Monic orthogonal polynomials,
j
P
j
, satisfy the three term
recurrence relation, for j 1

j+1
(x) = (x a
j
)
j
(x) b
j

j1
(x) for j 1
where
a
j
=
x
j
,
j
)
|
j
|
2
and
b
j
=
|
j
|
2
|
j1
|
2
.
Proof. Let
j
P
j
be monic. This implies that

j+1
(x) x
j
(x)
. .
P
j
=
j

k=0
b
k
x
k
=
j

k=0
c
k

k
(x).
Now we need to nd c
k
. We have

k=0
c
k

k
(x),
i
(x)) =
j+1
(x) x
j
(x),
i
(x)).
But
j
is orthogonal to
k
for k = 0 j 1. Therefore
j
is orthogonal to any
p P
j1
as
k

j1
k=0
is a basis for P
j1
. Then for i = 0 j
c
i
|
i
|
2
=
j+1
,
i
) x
j
,
i
)
=
j
, x
i
).
We have x
i
P
i+1
and hence
j
, x
i
) = 0 if i j 2. Since c
i
|
i
|
2
=
j
, x
i
),
we have c
i
= 0 for i = 0 j 2. Hence

j+1
(x) x
j
(x) = c
j1

j1
(x) +c
j

j
(x).
This implies
j+1
(x) = (x +c
j
)
j
(x) +c
j1

j1
(x).
28 1.10. ORTHOGONAL POLYNOMIALS
We have
c
j1
=

j
, x
j1
)
|
j1
|
2
,
c
j
=

j
, x
j
)
|
j
|
2
.
Now note that

j
, x
j1
) =
j
, x
j1

j
)
. .
=0
+
j
,
j
).
Therefore c
j1
=

j

j1

2
. Set b
j
= c
j1
and a
j
= c
j
.
To apply this Theorem we need
0
(x) = 1 and
1
(x) = x a
0
where a
0
R must
be chosen such that
1
,
0
) = 0, i.e.
x a
0
, 1) = 0
a
0
1, 1) = x, 1)
a
0
=
x, 1)
|1|
2
=
x
0
,
0
)
|
0
|
2
.
We use the theorem for j 0 by setting
1
(x) = 0. Thus

j+1
(x) = (x a
j
)
j
(x) b
j

j1
(x)
where j 0 and
a
j
=
x
j
,
j
)
|
j
|
2
,
b
j
=
|
j
|
2
|
j1
|
2
,

0
(x) = 1,

1
(x) = 0.
Remark 1.2. Recall that g(x) is even iff g(x) = g(x) or
_
2
2
g(x)dx = 2
_
2
0
g(x)dx.
and g(x) is odd iff g(x) = g(x) or
_
2
2
g(x)dx = 0. odd/even
function
Example. Let f, g) =
_
1
1
f(x)g(x)dx be our inner product (i.e. w(x) = 1 ). We
shall apply our method with j = 0 to this case. We have
0
(x) = 0 and
1
(x) = xa
0
which implies
1
(x) = x. Also
a
1
=
x
0
,
0
)
|
0
|
2
=
_
1
1
xdx
_
1
1
1dx
= 0
1.10. ORTHOGONAL POLYNOMIALS 29
(since x is an odd function). Using the method with j = 1 we deduce
2
(x) =
(x a
1
)
1
(x) b
1

0
(x) = x
2
a
1
x b
1
. Then
a
1
=
x
1
,
1
)
|
1
|
2
=
_
1
1
x
3
dx
|
1
|
2
= 0,
b
1
=
|
1
|
2
|
0
|
2
=
_
1
1
x
2
dx
_
1
1
1dx
=
1
3
.
So
2
(x) = x
2

1
3
and we can continue in this matter.
Recall now our original problem. Given f C[a, b] we wish to nd p

n
P
n
such
that |f p

n
| |f p
n
| for all p
n
P
n
.
We wish to nd an orthogonal basis for P
n
,
j

n
j=0
. Then p

n
=

n
j=0

j
(x). We
solve the normal equations G

= with G R
(n+1)(n+1)
and for i = 1 n
G
ij
=
i
,
j
)
=
_
0 if i ,= j,
|
i
|
2
if i = j,

i
= f,
i
),

i
=

i
G
ii
,
=

i
|
i
|
2
.
This implies that
p

n
(x) =
n

j=0
f,
j
)
|
j
|
2

j
(x)
is the best approximation to f.
Example. Show that the polynomials T
k
(x) = cos(k cos
1
(x)) for 1 x 1 are
orthogonal with respect to the inner product f, g) =
_
1
1
(1 x
2
)
1/2
f(x)g(x)dx.
Does T
k
(x) belong to P
k
?
T
0
(x) = cos 0
= 1,
T
1
(x) = cos(cos
1
x)
= x.
30 1.10. ORTHOGONAL POLYNOMIALS
Lets use a change of variable = cos
1
x and so x = cos . Now we can write
T
k
(x) = cos k. Using cos((k + 1)) + cos((k 1)) = 2 cos k cos . We can deduce
the following
T
k+1
(x) +T
k1
(x) = 2T
k
(x)x
T
k+1
(x) = 2xT
k
(x) T
k1
(x).
We have
T
2
(x) = 2xT
1
(x) T
0
(x)
= 2x
2
1,
T
3
(x) = 2xT
2
(x) T
1
(x)
= 2
2
x
3
3x.
By induction we have T
k
(x) P
k
. The coecient of x
k
is 2
k1
. Using x = cos ,
T
k
(x), T
j
(x)) =
_
1
1
(1 x
2
)
1/2
T
k
(x)T
j
(x)dx
=
_
0

(sin )
1
cos(k) cos(j)(sin )d
=
_

0
cos(k) cos(j)d
=
1
2
_

0
cos((j +k)) + cos((j k))d
=
_
_
_
0 if j ,= k,

2
if j = k ,= 0,
if j = k = 0.
Call T
k
(x) the Chebyshev polynomials. Chebyshev polynomials
31
Chapter 2
Polynomial interpolation
Given (z
j
, f
j
)
n
j=0
, z
j
, f
j
C, 0 = 1 n, we want to nd a polynomial p
n
(z) P
n
such that p
n
(z
j
) = f
j
for j = 0 n. Call such p
n
the interpolating polynomial . To interpolating
polynomial
prove that this polynomial exists:
Lemma 2.1 (Lagrange Basis Function). Let
l
j
(z) =
n

k=0,k=j
(z z
k
)
(z
j
z
k
)
for j = 0 n. Then l
j
(z) P
n
and l
j
(z
r
) =
jr
for j, r = 0 n.
Proof. For j = 0 n, l
j
(z) is a product of n factors of the form
zz
k
z
j
z
k
and therefore
l
j
(z) P
n
.
We have
l
j
(z
r
) =
n

k=0,k=j
z
r
z
k
z
j
z
k
for r = 0 n. If r = j, then clearly L
j
(z
r
) = 1. Otherwise for k = r,
z
r
z
k
z
j
z
k
= 0 and
so l
j
(z
r
) = 0. Hence l
j
(z
r
) =
rj
.
Lemma 2.2. The interpolating polynomial p
n
(z) P
n
for data (z
j
, f
j
)
n
j=0
with
z
j
distinct is
p
n
(z) =
n

j=0
f
j
l
j
(z).
Note. Call p
n
in this form the Lagrange form of the interpolating polynomial. Lagrange form
Proof. We have p
n
(z) P
n
since each l
j
(z) P
n
. Also by the previous lemma, for
r = 0 n,
p
n
(z
r
) =
n

j=0
f
j
l
j
(z
r
)
=
n

j=0
f
j

jr
= f
r
.
32

To prove the uniqueness of the interpolating polynomial:


Theorem 2.3 (Fundamental Theorem of Algebra). Let p
n
(z) = a
0
+a
1
z ++a
n
z
n

P
n
where a
i
C, a
n
,= 0. Then p
n
(z) has n distinct roots in C unless a
i
= 0 for
i = 0 n.
Lemma 2.4. Given (z
j
, f
j
)
n
j=0
, z
j
distinct, there exists a unique interpolating
polynomial p
n
(z) P
n
.
Proof. Assume the contrary, i.e. there exists q
n
P
n
such that p
n
(z
j
) = q
n
(z
j
) = f
j
for j = 0 n. Consider the polynomial (p
n
q
n
) P
n
. Then
(p
n
q
n
)(z
j
) = (p
n
(z
j
) q
n
(z
j
)) = 0
for j = 0 n. Hence (p
n
q
n
) has n + 1 roots and therefore by the Fundamental
Theorem of Algebra is 0, i.e. p
n
= q
n
. Hence p
n
is unique.
Example (of interpolating polynomial). For n = 2. Find p
2
P
2
such that p
2
(0) =
a, p
2
(1) = b and p
2
(4) = c. Get
l
0
(z) =
(z z
1
)(z z
2
)
(z
0
z
1
)(z
0
z
2
)
=
(z 1)(z 4)
(0 1)(0 4)
=
1
4
(z
2
5z + 4),
l
1
(z) =
(z 0)(z 4)
(1 0)(1 4)
=
1
3
(z
2
4z),
l
2
(z) =
(z 0)(z 1)
(4 0)(4 1)
=
1
12
(z
2
z).
Hence
p
2
(z) = al
0
(z) +bl
1
(z) +cl
2
(z)]]
=
_
a
4

b
3
+
c
12
_
z
2

_
3a
4

4b
3
+
c
12
_
z +a
in Lagrange form.
We are interested in nding the coecients of the interpolating polynomial in the
canonical form
p
n
(z) =
n

k=0
a
k
z
k
.
Consider the equations
p
n
(z
j
) =
n

k=0
a
k
z
k
j
= f
j
for j = 0 n. Get a system of equations
_
_
_
_
_
1 z
0
z
2
0
z
n
0
1 z
1
z
2
1
z
n
1
.
.
.
.
.
.
.
.
.
.
.
.
1 z
n
z
2
n
z
n
n
_
_
_
_
_
. .
V
_
_
_
a
0
.
.
.
a
n
_
_
_
=
_
_
_
f
0
.
.
.
f
n
_
_
_
.
33
Call V the Vandermonde Matrix. So we need to solve V a = f. In general, V is Vandermonde
matrix
ill-conditioned. With the canonical basis
_
z
k
_
n
k=0
, we can solve V a = f by nding
the Lagrange basis l
k
(z)
n
k=0
and thus solving Ia = f. However, the Lagrange basis
has to be constructed.
Assume we found p
n1
P
n1
interpolating (z
j
, f
j
)
n1
j=0
and are given a new data
point (z
n
, f
n
). One cannot use p
n1
to compute p
n
, since it is necessary to compute
the new Lagrange basis for P
n
.
We now look for an alternative construction. If p
n1
P
n1
is such that p
n1
(z
j
) =
f
j
for j = 0 n 1, let p
n
P
n
be such that p
n
(z
j
) = f
j
for j = 0 n and
p
n
(z) = p
n1
(z) +c
n1

k=0
(z z
k
).
Clearly p
n
(z
j
) = p
n1
(z
j
) = f
j
for j = 0 n 1. Choose c C such that
p
n
(z
n
) = p
n1
(z
n
) +c
n

k=0
(z
n
z
k
) = f
n
that is
c =
f
n
p
n1
(z
n
)

n1
k=0
(z
n
z
k
)
.
Therefore c depends on (z
j
, f
j
)
n
j=0
. We will use the notation c = f[z
0
, z
1
, . . . , z
n
] f[z
0
, z
1
, . . . , z
n
]
so that
p
n
(z) = p
n1
(z) +f[z
0
, z
1
, . . . , z
n
]
n1

k=0
(z z
k
).
That is, the coecient of z
n
in p
n
(z) is f[z
0
, z
1
, . . . , z
n
].
Note that since p
n
is such that p
n
(z
j
) = f
j
, j = 0 n, is unique,
f[z
(0)
, . . . , z
(n)
] = f[z
0
, . . . , z
n
]
for any permutation of 0, 1, . . . , n.
Lemma 2.5. For (z
j
, f
j
)
n
j=0
, z
j
, f
j
C, z
j
distinct,
f[z
0
, z
1
, . . . , z
n
] =
n

j=0
f
j

k=0,k=j
(z
j
z
k
)
.
Furthermore, if f
j
= f(z
j
), j = 0 n for some function f(z) then f[z
0
, . . . , z
n
] = 0
if f P
n1
.
Proof. Compare coecient of z
n
in the Lagrange form of p
n
with
p
n
(z) = p
n1
(z) +f[z
0
, . . . , z
n
]
n1

k=0
(z z
k
)
=
n

j=0
f
j
_
_

k=0,k=j
(z z
k
)
z
j
z
k
_
_
=
n

j=0
f
j
_
z
n
+

n
k=0,k=j
(z
j
z
k
)
_
.
34 2.1. DIVIDED DIFFERENCE
Clearly the leading coecient of z
n
in the Lagrange form is
n

j=0
f
j

k=0,k=j
(z
j
z
k
)
= f[z
0
, . . . , z
n
].
If f
j
= f(z
j
) for some f P
n1
then p
n
= f P
n1
as the interpolating polynomial
is unique. Therefore the leading coecient of p
n
f[z
0
, . . . , z
n
] is 0.
Note that
p
n
(z) = p
n1
(z) +f[z
0
, . . . , z
n
]
n1

k=0
(z z
k
),
p
n1
(z) = p
n2
(z) +f[z
0
, . . . , z
n1
]
n2

k=0
(z z
k
),
.
.
.
p
1
(z) = p
0
(z) +f[z
0
, z
1
](z z
0
)
p
0
(z) = f
0
= f[z
0
]
and so we can write
p
n
(z) = f[z
0
] +
n

j=1
f[z
0
, . . . , z
j
]
j1

k=0
(z z
k
).
Call this the Newton form of the interpolating polynomial. Newton form
2.1 Divided dierence
Call f[z
0
, . . . , z
n
] the divided dierence. divided
dierence
Theorem 2.6. For any distinct complex numbers z
0
, z
1
, . . . , z
n+1
, the divided dif-
ference satises the recurrence
f[z
0
, z
1
, . . . , z
n+1
] =
f[z
0
, . . . , z
n
] f[z
1
, . . . , z
n+1
]
z
0
z
n+1
.
Proof. Given (z
j
, f
j
)
n+1
j=0
we construct p
n
, q
n
P
n
such that p
n
(z
j
) = f
j
for j =
0 n and q
n
(z
j
) = f
j
for j = 1 n+1. Observe that f[z
0
, . . . , z
n
] is the coecient
of z
n
in p
n
(z) and that f[z
1
, . . . , z
n+1
] is the coecient of z
n
in q
n
(z). Then
r
n+1
(z) =
(z z
n+1
)p
n
(z) (z z
0
)q
n
(z)
z
0
z
n+1
P
n+1
and hence
r
n+1
(z
0
) = p
n
(z
0
) = f
0
,
r
n+1
(z
n+1
) = q
n
(z
n+1
) = f
n+1
,
r
n+1
(z
j
) =
(z
j
z
n+1
)f
j
(z
j
z
0
)f
j
z
0
z
n+1
= f
j
2.2. FINDING THE ERROR 35
for j = 1 n. Therefore r
n+1
(z) is the interpolating polynomial of (z
j
, f
j
)
n+1
j=0
.
Since f[z
0
, . . . , z
n+1
] is the coecient of z
n+1
in r
n+1
(z),
f[z
0
, . . . , z
n+1
] =
f[z
0
, . . . , z
n
] f[z
1
, . . . , z
n+1
]
z
0
z
n+1
.

2.2 Finding the error


Given (z
j
, f
j
)
n
j=0
, z
j
, f
j
C, z
j
distinct, there exists an interpolating polynomial
p
n
P
n
such that p
n
(z
j
) = f
j
for j = 0 n. The Newton Form of p
n
(z) is
p
n
(z) = f[z
0
] +
n

j=1
f[z
0
, . . . , z
j
]
j1

k=0
(z z
k
).
Theorem 2.6 gives a recurrence relation
f[z
0
, . . . , z
j+1
] =
f[z
0
, . . . , z
j
] f[z
1
, . . . , z
j+1
]
z
0
z
j+1
.
Can construct a divided dierence table divided
dierence
table z
0
, f[z
0
],
z
1
, f[z
1
], f[z
0
, z
1
],
z
2
, f[z
2
], f[z
1
, z
2
], f[z
0
, z
1
, z
2
],
.
.
.
.
.
.
z
n
, f[z
n
], f[z
n1
, z
n
], f[z
0
, . . . , z
n
].
Note that the diagonal entries appear in the Newton form of p
n
(z).
Example. For n = 2 and
(z
j
, f
j
)
2
j=0
= (0, a), (1, b), (4, c) ,
we have
f[z
0
] = f
0
= a,
f[z
1
] = f
1
= b, f[z
0
, z
1
] =
f[z
0
] f[z
1
]
z
0
z
1
=
a b
1
= b a,
f[z
2
] = f
2
= c, f[z
1
, z
2
] =
f[z
1
] f[z
2
]
z
1
z
2
=
b c
3
=
c b
3
.
Therefore
f[z
0
, z
1
, z
2
] =
(b a)
_
cb
3
_
4
=
a
4

b
3
+
c
12
and so the Newton Form of p
2
(z) is
p
2
(z) = a + (b a)(z z
0
) +
_
a
4

b
3
+
c
12
_
(z z
0
)(z z
1
).
36 2.2. FINDING THE ERROR
Theorem 2.7. Let p
n
(z) interpolate f(z) at n + 1 distinct points z
j

n
j=0
, z
j
C.
Then the error e(z) = f(z) p
n
(z) is
e(z) = f[z
0
, . . . , z
n
, z]
n

k=0
(z z
k
)
for z ,= z
j
and e(z
j
) = 0 for j = 0 n.
Proof. Polynomial p
n
(z) interpolates f(z) at z
j

n
j=0
. Add a new distinct point z.
Newton Form of p
n+1
(z) is
p
n+1
(z) = p
n
(z) +f[z
0
, . . . , z
n
, z]
n

k=0
(z z
k
).
Therefore
e(z) = f(z) p
n
(z) = f[z
0
, . . . , z
n
, z]
n

k=0
(z z
k
).

Theorem 2.8. Let f C


n
[x
0
, x
n
], f and its rst n derivatives continuous over
[x
0
, x
n
], x
i
ordered x
0
< x
1
< < x
n
. Then there exists [x
0
, x
n
] such that
f[x
0
, x
1
, . . . , x
n
] =
1
n!
f
(n)
().
Proof. Let p
n
(x) interpolate f at x
i
, i = 0 n. Let e(x) = f(x) p
n
(x) and so
e(x
i
) = 0, i = 0 n, therefore e(x) has at least (n+1) zeroes in [x
0
, x
n
]. By Rolles
Theorem
e

(x) has at least n zeroes in [x


0
, x
n
],
e

(x) has at least n 1 zeroes in [x


0
, x
n
],
e
n
(x) has at least 1 zero in [x
0
, x
n
].
Since e(x) = f(x) p
n
(x), e
(n)
(x) = f
(n)
(x) p
(n)
n
(x). The Newton Form of p
n
(x) is
p
n
(x) = p
n1
(x) +f[x
0
, . . . , x
n
]
n1

i=0
(x x
i
)
= f[x
0
, . . . , x
n
]x
n
+ .
Therefore
p
(n)
n
(x) = n!f[x
0
, . . . , x
n
]
and hence, as e
(n)
() = 0,
f
(n)
() = p
(n)
n
() = n!f[x
0
, . . . , x
n
].

2.2. FINDING THE ERROR 37


Theorem 2.9. Let f C
n+1
[a, b] and x
i

n
i=0
distinct in [a, b]. If p
n
P
n
interpo-
lates f at x
i

n
i=0
, then e(x) = f(x) p
n
(x) satises
[e(x)[
1
(n + 1)!

i=0
(x x
i
)

max
ayb

f
(n+1)
(y)

for all x [a, b].


Proof. Result trivially true at x = x
i
since e(x
i
) = 0 for i = 0 n++. From
Theorem 2.7,
e(x) = f[x
0
, . . . , x]
n

k=0
(x x
k
).
From Theorem 2.8, there exists
x
[a, b] such that
e(x) =
f
(n+1)
(
x
)
(n + 1)!
n

k=0
(x x
k
).
Therefore
[e(x)[ =
1
(n + 1)!

k=0
(x x
k
)

f
(n+1)
(
x
)

1
(n + 1)!

k=0
(x x
k
)

max
ayb

f
(n+1)
(y)

Denition. The innity norm of g C[a, b] is innity norm


|g|

= max
axb
[g(x)[.
Note. Beware that
|f p
n
|

, 0
as n in all cases.
Example.
1. Let [a, b] = [
1
2
,
1
2
], f(x) = e
x
, x
i
[a, b], i = 0 n. Then [x x
i
[ 1 and so
|

n
i=0
(x x
i
)|

1. Also |f
(n+1)
|

= |e
x
|

= e
1/2
. Therefore
|f p
n
|


e
1/2
(n + 1)!
0
as n .
2. For any [a, b] and f(x) = cos x, |f
(n+1)
|

1. Also
| cos p
n
|


(b a)
n+1
(n + 1)!
0
as n . Therefore p
n
(x) cos x for all x.
38 2.3. BEST APPROXIMATION
3. Let [a, b] = [0, 1], f(x) = (1 +x)
1
. Then
f

(x) = (1)(1 +x)


2
,
f
(n+1)
(x) = (1)
n+1
(1 +x)
(n+2)
(n + 1)!
and therefore |f
(n+1)
|

(n + 1)!. Hence
|f p
n
|


1
(n + 1)!
(n + 1)! = 1 , 0
as n .
2.3 Best Approximation
Given [a, b] and f C[a, b], we want to choose interpolation points x
k

n
k=0
in [a, b]
to minimize |

n
k=0
(x x
k
)|

, i.e. to nd
min
{x
k
}
n
k=0
_
_
_
_
_
n

k=0
(x x
k
)
_
_
_
_
_

,
i.e.
min
q
n
P
n
|x
n+1
q
n
(x)|

.
Consider the more general problem: to nd
min
q
n
P
n
|g q
n
(x)|

,
that is to nd q

n
such that
|g q

n
|

|g q
n
|

for all q
n
P
n
. Call such q

n
P
n
the best approximation.
Theorem 2.10 (Equioscillation Property). Let g C[a, b] and n 0. Suppose
there exists q

n
P
n
and n + 2 distinct points x

n+1
j=0
,
a x

0
< x

1
< < x

n+1
b,
such that
g(x

j
) q

n
(x

j
) = (1)
j
|g q

n
|

for j = 0 n + 1 where = 1.
Then q

n
is the best approximation to g from P
n
with respect to | |

, that is
|g q

n
|

|g q
n
|

for all q
n
P
n
.
Note. Call x

n+1
k=0
the equioscillation points. equioscillation
points
2.3. BEST APPROXIMATION 39
Proof. Let E = |g q

n
|

. If E = 0, then q

n
= g is the best approximation. If
E > 0, then suppose that there exists q
n
P
n
such that |g q
n
|

< E. Consider
q

n
q
n
P
n
at the n + 2 points x

n+1
j=0
:
q

n
(x

j
) q
n
(x

j
) = (q

n
(x

j
) g(x

j
)) + (g(x

j
) q
n
(x

j
))
= (1)
j+1
E +
j
with
j
R and [
j
[ < E. Thus
sgn((q

n
q
n
)(x

j
)) = sgn((1)
j+1
E)
and therefore q

n
q
n
P
n
changes sign n+1 times and hence has n+1 roots. Then
by the Fundamental Theorem of Algebra q

n
q
n
; a contradiction, so q
n
does not
exist and q

n
is the best approximation.
Theorem 2.11 (Chebyshev Equioscillation Theorem). Let g C[a, b] and n 0.
Then there exists a unique q

n
P
n
satisfying the equioscillation property such that
|g q

n
|

|g q
n
|

for all q
n
P
n
.
Note. Construction of q

n
is dicult, we use best approximation in the least square
sense. But if g(x) = x
n+1
, the construction of q

n
is easy.
Lemma 2.12. If g(x) = x
n+1
on [1, 1], then the best approximation to g by P
n
with respect to | |

is
q

n
(x) = x
n+1
2
n
T
n+1
(x)
where T
n+1
(x) is the Chebyshev Polynomial of degree n + 1, i.e.
T
n+1
(x) = cos((n + 1) cos
1
x).
Proof. We rst need to show that q

n
is really in P
n
: recall that
T
0
(x) = 1,
T
1
(x) = x,
T
n+1
(x) = 2xT
n
(x) T
n1
(x)
and so
T
n+1
(x) = 2
n
x
n+1
+ .
Therefore q

n
P
n
.
The error is x
n+1
q

n
(x) = 2
n
T
n+1
(x) for x [1, 1]. Change the variable:
x = cos , so = cos
1
x [0, ]. Then
T
n+1
(x) = cos((n + 1)).
Hence
|x
n+1
q

n
|

= max
1x1
[ cos((n + 1) cos
1
x)[ = 1.
40 2.3. BEST APPROXIMATION
Choose

j
=
j
n + 1
for j = 0 n + 1 and so x

j
= cos

j
= cos
j
n+1
. Then
T
n+1
(x

j
) = cos((n + 1)

j
) = cos(j) = (1)
j
.
Hence x
n+1
2
n
T
n
(x) satises the equioscillation property and is thus the best
approximation to x
n+1
in P
n
.
Note. Note that the points are equally spaced in terms of , but clustered around
the end points 1 in terms of x.
Example. The interpolation points are the zeros of the error E. Therefore
n

j=0
(x x
j
) = x
n+1
q

n
= 2
n
T
n+1
(x).
Choose

j
=
(2j + 1)
2(n + 1)
and so
x
j
= cos
_
(2j + 1)
2(n + 1)
_
.
Then
T
n+1
(x
j
) = cos(n + 1)
j
= cos
_
(2j + 1)
2
_
= 0.
Therefore
_
cos
(2j + 1)
2(n + 1)
_
n
j=0
are the optimal Chebyshev Interpolation points for p
n
P
n
on [1, 1].
Generalize this for an interval [a, b]. For x [a, b], introduce t =
2x(a+b)
ba
[1, 1],
so x =
1
2
[(b a)t + (a +b)]. Then the optimal interpolation points for [a, b] are
x
j
=
1
2
_
(b a) cos
_
(2j + 1)
2(n + 1)
_
+ (a +b)
_
for j = 0 n.
Proof. Need to nd
min
{x
j
}
n
j=0
[a,b]
_
_
_
_
_
_
n

j=0
(x x
j
)
_
_
_
_
_
_

.
That is to nd
min
q
n
P
n
_
_
_
_
_
_
b a
2
_
n+1
_
2x (a +b)
b a
_
n+1
q
n
(x)
_
_
_
_
_

2.4. PIECEWISE POLYNOMIAL INTERPOLATION 41


for [a, b]. That is the same as nding
min
q
n
P
n
_
b a
2
_
n+1
|t
n+1
q
n
(t)|

for [1, 1] with


q
n
(x) =
_
b a
2
_
n+1
q
n
(t).
Therefore
q

n
(x) =
_
b a
2
_
n+1
_
_
2x (a +b)
b a
_
n+1
2
n
T
n+1
_
2x (a +b)
b a
_
_
.
Using the Equioscillation Property, get
t

j
= cos
j
n + 1
and so
x

j
=
(b a) cos
j
n+1
+a +b
2
.

2.4 Piecewise Polynomial Interpolation


We can try to decrease the error of Polynomial Interpolation by either increasing the
order of the interpolating polynomial or decreasing the interval between individual
interpolation points (i.e. increasing the number of them).
We can also consider piecewise linears. For given ordered equally spaced interpolation
points x
i

n
i=0
with x
0
= a, x
n
= b, x
j
x
j1
= h, we can use linear interpolation
on each subinterval [x
j1
, x
j
] for j = 1 n. Dene for x [x
j1
, x
j
], j = 1 n,
P
L
(x) = f(x
j1
+
(x x
j1
)
n
(f(x
j
) f(x
j1
))
and so P
L
(x
j1
) = f(x
j1
), P
L
(x
j
) = f(x
j
). The error is
|f P
L
|

= max
axb
[f(x) P
L
(x)[
= max
j=1J
_
max
x
j1
<xx
j
[f(x) P
L
(x)[
_
= max
j=1J
_
max
x
j1
<xx
j
[(x x
j1
)(x x
j
)[
2!
f

(z
j
)
_
where z
j
(x
j
1, x
j
). Since maximum of [(x x
j1
)(x x
j
)[ occurs at x =
(x
j1
+x
j
)/2,
|f P
L
|

max
j=1n
_
h
2
8
f

(z
j
)
_

h
2
8
|f

.
Then for h 0, P
L
f provided f C
2
[a, b]. We can generalize this method to
piecewise quadratics, cubics etc.
42 2.4. PIECEWISE POLYNOMIAL INTERPOLATION
43
Chapter 3
Quadrature (Numerical
Integration)
We are given an interval [a, b] and a weight function w(x) C(a, b), such that
w(x) > 0 except for a nite number of zeroes and
_
b
a
w(x)dx < . Now given a
function f(x), we want to approximate
I(f) =
_
b
a
w(x)f(x)dx
by approximating f(x) by an interpolating polynomial p
n
(x), that is approximate
I(f) by
I
n
(f) = I(p
n
) =
_
b
a
w(x)p
n
(x)dx.
The Lagrange form of p
n
(x) is
p
n
(x) =
n

k=0
f(x
k
)l
k
(x), l
k
=

j=0,j=k
(x x
i
)
x
k
x
j
.
Hence
I(p
n
) =
_
b
a
w(x)
_
n

k=0
f(x
k
)l
k
(x)
_
dx
=
n

k=0
_
b
a
w(x)l
k
(x)dx
=
n

k=0
w
k
f(x
k
)
where w
k
=
_
b
a
w(x)l
k
(x)dx for k = 0 n.
Example. Let [a, b] = [0, 1] and w(x) = x
1/2
, n = 1, x
0
= 1, x
1
= 1. Approximate
I(f) =
_
1
0
x
1/2
f(x)dx by
I
1
(f) =
1

k=0
w
k
f(x
k
)
44
where
w
0
=
_
1
0
x
1/2
(1 x)dx =
_
x
1/2
1/2

x
3/2
3/2
_
1
0
= 4/3,
w
1
=
_
1
0
x
1/2
xdx =
2
3
.
Hence
I
1
(f) =
4
3
f(x
0
) +
2
3
f(x
1
).
If w(x) 1, we get
I
1
(f) =
1
2
[f(x
0
) +f(x
1
)] ,
the trapesium rule.
In general, the error of approximation is
[I(f) I
n
(f)[ =

_
b
a
w(x) [f(x) p
n
(x)] dx

_
b
a
w(x)dx|f p
n
|

.
The error is zero if f P
n
, regardless of the interpolation (sampling) points x
k

n
k=0
.
Otherwise, we can choose x
k

n
k=0
in a smart way so that I
n
(f) = I(f) for all f P
m
,
where m > n is as large as possible.
Lemma 3.1. The orthogonal polynomial
n
has n distinct roots in [a, b].
Proof. Let denote the number of sign changes of
n
in [a, b]. If < n, let x
1
, . . . , x

denote the ordered points in [a, b] where


n
changes sign. Consider q

(x) = (x
x
1
) (x x

). We have 2 possibilities q

is either positive or negative at b.


If it is positive, then

n
, q

) =
_
b
a
w(x)
n
(x)q

(x)
. .
>0 except at x
i
dx > 0.
If it is negative, then

n
, q

) =
_
b
a
w(x)
n
(x)q

(x)
. .
<0 except at x
i
dx > 0.
Contradiction as
n
is the orthogonal polynomial of degree n, i.e. it is orthogonal
to all polynomials in P
n1
. Therefore n.
45
Theorem 3.2. Let w C(a, b) with w > 0 except for a nite number of points and
_
b
a
w(x)dx < . Let
n+1
be the orthogonal polynomial of degree n + 1 associated
with the inner product
g
1
, g
2
) =
_
b
a
w(x)g
1
(x)g
2
(x)dx.
Let x

n
i=0
, x

i
[a, b] be the n + 1 distinct zeroes of
n+1
(see the above lemma).
If we approximate
I(f) =
_
b
a
w(x)f(x)dx
by I
n
(f) = I(p
n
) where p
n
P
n
is such that p
n
(x

i
) = f(x

i
) for i = 0 n, then
I
n
(f) =
n

i=0
w

i
f(x

i
),
w

i
=
_
b
a
w(x)l
i
(x)dx,
l
i
=
n

j=0,j=i
(x x

j
)
(x

i
x

j
)
for i = 0 n. Also I
n
(f) = I(f) for all f P
2n+1
.
Proof. Let f P
2n+1
. Then f p
n
P
2n+1
has roots at x

n
i=0
and therefore
f p
n
= q
n

n+1
for some q
n
P
n
. Then
I(f) I
n
(f) = I(f) I(p
n
)
=
_
b
a
w(x) [f(x) p
n
(x)] dx
=
_
b
a
w(x)q
n
(x)
n+1
(x)dx
= q
n
,
n+1
) = 0
as
n+1
is the orthogonal polynomial of degree n + 1. Hence I
n
(f) = I(f) for all
f P
2n+1
.
With n + 1 sampling points, it is not possible to choose w
i
, i = 0 n, such that
I
n
(f) = I(f) for all f P
2n+2
. Consider f(x) =

n
i=0
(x x
i
)
2
P
2n+2
. Clearly
I(f) > 0, but I
n
(f) = 0.
Choosing the sampling points as the roots of
n+1
is called Gaussian Quadrature. Gaussian
Quadrature
Example. Let [a, b] = [1, 1], w 1 and
g
1
, g
2
) =
_
1
1
g
1
(x)g
2
(x)dx.
For n = 1,
I
1
(f) = w

1
f(x

0
) +w

1
f(x

1
).
46
Recall that
2
(x) = x
2
1/3 and so x

0
= 1/

3, x

1
= 1/

3. Now we determine
w

0
, w

1
. Observe that
I
1
(1) = w

0
+w

1
= I(1) =
_
1
1
1dx = 2,
I
1
(x) =
1

3
(w

0
+w

1
) = I(x) =
_
1
1
xdx = 0
and hence w

0
= w

1
= 1. Therefore I
1
(f) = f(1/

3) + f(1/

3). Also I
1
(x
2
) =
2/3 = I(x
2
) and I
1
(x
3
) = 0 = I(x
3
).
For n = 2,
3
(x) = x
3
(3/5)x and so x

0
=
_
3/5, x

1
= 0, x

2
=
_
3/5. Therefore
I
2
(f) = w

0
f(
_
3/5) +w

1
f(0) +w

2
f(
_
3/5).
This is exact for cubics and so
I
2
(1) = w

0
+w

1
+w

2
= 2,
I
2
(x) =
_
3/5w

0
+
_
3/5w

2
= 0,
I
2
(x
2
) =
3
5
w

0
+
3
5
w

2
=
2
3
.
Hence
I
2
(f) =
1
9
_
5f(
_
3/5) + 8f(0) + 5f(
_
3/5)
_
.