J.S.B. Gajjar
September 2004
1 Introduction
In this course we will be studying computational uid dynamics or CFD. CFD
is concerned with the the study of uid ow problems using computational tech
niques, as opposed to analytical or experimental methods. The modelling of uid
ow leads to the solution of unsteady, nonlinear, partial dierential equations,
and there are only a handful of known exact analytical solution. Even in the
simplest of geometries the solution of these equations is quite challenging. In this
CFD course we will aim to study the dierent techniques which are used to solve
the dierent types of equations which arise. As we will see, to be good at CFD,
one has to be uent in many dierent subject areas such as numerical analysis,
programming, uid mechanics. This introductory course will concentrate mainly
on nitedierence methods. In CFD II you will meet the popular nitevolume
method. In the nitedierence method, the solution is sought on a grid of points,
and we deal with function values at these nodal points. In a niteelement envi
ronment on the other hand, the regions is split into various subregions, and the
solution in these subregions is approximated by basis functions.
1.1 Errors
To start with we need to discuss errors. Typically numerical computation of any
sort will generate errors and it is one of the tasks of the CFD practitioner to be
able to assess and be aware of the errors arising from numerical computation.
The dierent types of errors can be categorised into the following main types:
Roundo errors.
Errors in modelling.
Programming errors.
Truncation and discretization errors.
Roundo errors.
1
These arise when a computer is used for doing numerical calculations. Some
typical examples include the inexact representation of numbers such as ,
2.
Roundo and chopping errors arise from the way numbers are stored on the
computer, and in the way arithmetic operations are carried out. Whereas most
of the time the way numbers are stored on a computer is not under our control,
the way certain expressions are computed denitely is. A simple example will
illustrate the point. Consider the computation of the roots of a quadratic equation
ax
2
+bx +c = 0 with the expressions
x
1
=
b +
b
2
4ac
2a
, x
2
=
b
b
2
4ac
2a
.
Let us take a = c = 1, b = 28. Then
x
1
= 14 +
195, x
2
= 14
195.
Now to 5 signicant gures we have
.
Relative error
Another measure is the relative error and this is dened by 
/ if
= 0.
For our example above we see that x
1
x
1
 = 2.4 10
4
and x
2
x
2
 =
2.4 10
4
which look small. On the other hand the relative errors are
x
1
x
1

x
1

=
8.6 10
6
and
x
2
x
2

x
2

= 6.7 10
3
. Thus the accuracy in computing x
2
is far less
than in computing x
1
. On the other hand if we compute x
2
via
x
2
= 14
195 =
1
14 +
195
we obtain x
2
= 0.03576 with a much smaller absolute error of 3.4 10
7
and a
relative error of 9.6 10
6
.
Errors in numerical modelling.
These arise for example when the equations being solved are not the proper
equations for the ow problem in question. An example is using the Euler equa
tions for calculating the solution of ows where viscosity eects are important.
No matter how accurately the solution has been computed, it may not be close
to the real physical solution because viscous eects have been neglected throught
the computation.
Programming errors, and bugs.
These are errors entirely under the control of the programmer. To eliminate
these requires careful testing of the code and logic, as well as comparison with
2
previous work. Even then, for your problem for which there may not be previ
ous work to compare with, one has to do numerous selfconsistency checks with
further analysis as necessary.
Truncation and discretization errors.
These errors arise when we take the continuum model and replace it with a
discrete approximation. For example suppose we wish to solve
d
2
U
dx
2
= f(x).
Using Taylor series we can apoproximate the second derivative term by
u(x
i+1
) 2u(x
i
) +u(x
i1
)
h
2
where we have taken a uniform grid with spacing h say and node points x
i
. As far
the approximation of the equation is concerned we will have a truncation error
given by
(x
i
) =
d
2
U(x
1
)
dx
2
f(x
i
) =
h
2
12
d
4
U(x
i
)
dx
4
+. . . .
Even though the discrete equations may be solved to high accuracy, there will be
still an error of O(h
2
) arising from the discretization of the equations. Of course
with more points, we would expect the error to diminish.
3
2 Initial value problems
Here we will look at the solution of ordinary dierential equations of the type,
say
dy
dx
= f(x, y), a x b, (1)
subject to an initial condition
y(a) = (2)
The methods that we will use generalise readily to systems of dierential equa
tions of the type
dY
dx
= F(x, Y), a x b, (3)
where
Y = (y
1
(x), y
2
(x), ..., y
N
(x))
T
,
F = (f
1
(x, Y), f
2
(x, Y), ..., f
N
(x, Y))
T
,
with initial data
Y(a) = , (4)
say, where = (
1
,
2
, ...,
N
)
T
.
One question which arises immediately is, what about second and third order
dierential equations? Consider for example
g
+gg
+
1
2
(1 g
2
) = 0, g(0) = g
(0) = g
() 1 = 0, (5)
which arises in uid applications. The answer to this is that if we take any
dierential equation, we can always write it as a system of rst order equations.
Example
In (5) let
Y
1
= g(x), Y
2
= g
(x), Y
3
= g
(x),
then (5) becomes
dY
dx
=
_
_
_
Y
2
Y
3
Y
1
Y
3
1
2
(1 Y
2
1
)
_
_
_ = F. (6)
The boundary conditions in (5) are not of the form as in (4) because we have
conditions given at two points x = 0 and x = . We will see how to deal with
this later.
Consider equation (1). There are various mathematical results that we should
be aware of, although we will not go too deeply into the implications or theory
behind some of these.
Suppose we dene D to be the domain
D = {(x, y)  a x b, < y < }
4
and f(x, y) is continuous on D. If f(x, y) satises a Lipschitz condition on D
then (1) has a unique solution for a x b. Recall f(x, y) satises a Lipschitz
condition on D means that there exists a constant L > 0 (called the Lipschitz
constant) such that
f(x
1
, y
1
) f(x
2
, y
2
) Ly
1
y
2

whenever (x
1
, y
1
), (x
2
, y
2
) belong to D.
2.1 Eulers Method
This is the simplest of techniques for the numerical solution of (1). It is good for
proving various results, but it is rarely used in practice because of far superior
methods.
For simplicity dene an equally spaced mesh
x
j
= a +jh, j = 0, .., N
where h = (b a)/N is called the step size.
PSfrag replacements
a = x
0
x
1
x
2
x
i
x
i+1
x
N
= b
h
We can derive Eulers method as follows. Suppose y(x) is the unique solution
to (1), and twice dierentiable. Then by Taylors theorem we have
y(x
i+1
) = y(x
i
+h) = y(x
i
) +y
(x
i
)h +
h
2
2
y
() (7)
where x
i
x
i+1
. But from the dierential equation y
(x
i
) = f(x
i
), and
y
i
= y(x
i
). This suggests the scheme
w
0
= w
i+1
= w
i
+hf(x
i
, w
i
), i = 1, 2, .., N 1, (8)
for calculating the w
i
which will be our approximate solution to the equation.
This is called Eulers method.
2.1.1 Truncation error for Eulers method
Suppose that y
i
= y(x
i
) is the exact solution (1) at x = x
i
. Then the truncation
error is dened by
i+1
(h) =
y
i+1
(y
i
+hf(x
i
, y
i
))
h
=
y
i+1
y
i
h
f(x
i
, y
i
), (9)
5
for i = 0, 1, ..., N 1. From earlier (7) we nd that
i+1
(h) =
h
2
y
(
i
)
for some
i
in (x
i
, x
i+1
). So if y
i
=
2
m
w
(1)
i
w
(2)
i
2
m
1
is a more accurate approximationto the solution than w
(1)
i
or w
(2)
i
.
For a 4th order RungeKutta method the above gives
w
i
=
16w
(1)
i
w
(2)
i
15
.
9
3 Stability
We have introduced a number of dierent methods. How do we select which
one to use? In practice most methods should work reasonably well on standard
problems. However certain types of problems (sti problems) can cause diculty
and care needs to be exercised in the choice of the method. Typically boundary
layer type problems (say with a small parameter multiplying a highest derivative)
are examples of sti problems. This is where one needs to be aware of the stability
properties of a particular method.
3.1 Consistent
A method is said to be consistent if the local truncation error tends to zero as
the step size 0, i.e
lim
h0
max
i

i
(h) = 0.
3.2 Convergence
A method is said to be convergent with respect to the equation it approximates
if
lim
h0
max
i
w
i
y(x
i
) = 0,
where y(x) is the exact solution and w
i
an approximation produced by the
method.
3.3 Stability
If we consider an mstep method
w
0
=
0
, w
1
=
1
, . . . , w
m1
=
m1
,
w
i+1
= a
m1
w
i
+a
m2
w
i1
+... +a
0
w
i+1m
+h[F(x
i
, w
i+1
, w
i
, ..., w
i+1m
)] (16)
then ignoring the F term the homogenous part is just a dierence equation. The
stability is thus connected with the stability of this dierence equation and hence
the roots of the characteristic polynomial
m
a
m1
m1
... a
1
a
0
= 0.
Why? Well consider (1) with f(x, y) = 0. This has the solution y(x) = .
Thus the dierence equation has to produce the same solution, ie w
n
= . Now
consider
w
i+1
= a
m1
w
i
+a
m2
w
i1
+... +a
0
w
i+1m
. (17)
10
If we look for solutions of the form w
n
=
n
then this gives the characteristic
polynomial equation
m
a
m1
m1
... a
1
a
0
= 0. (18)
Suppose
1
, ...
m
are distinct roots of (18). Then we can write
w
n
=
m
i=1
c
i
m
i
. (19)
Since w
n
= is a solution, the dierence equation (17) gives
a
m1
... a
0
= 0,
or
(1 a
m1
... a
0
) = 0.
This shows that = 1 is a root, and we may take c
1
= in (19). Thus (19) may
be written as
w
n
= +
m
i=2
c
i
n
i
. (20)
In the absence of roundo error all the c
i
in (20) would be zero. If 
i
 1 then
the error due to roundo will not grow. The argument used above generalises
to multiple roots
i
of the characteristic polynomial. Thus if the roots
i
, (i =
1, .., m) of the characteristic polynomial satisfy 
i
 1, then the method is
stable.
Example
Consider the 4th order AdamsBashforth method
w
i+1
= w
i
+
h
24
[55f(x
i
, w
i
) 59f(x
i1
, w
i1
) + 37f(x
i2
, w
i2
) 9f(x
i3
, w
i3
)].
This has the characteristic equation
3
= 0,
giving the roots as = 1, 0, 0, 0. Thus this is stable according to our denition
of stability above.
It can be proven that if the dierence method is consistent with
the dierential equation, then the method is stable if and only if the
method is convergent.
Is it enough just to have stability as dened above? Consider the solution of
dy
dx
= 30y, y(0) = 1/3.
The RK(4) method, although stable, has diculty in computing the accurate
solution of this problem. This means that we need something more than just the
idea of stability dened above.
11
3.4 Absolute stability
Consider
dy
dx
= ky, y(0) = , k < 0. (21)
The exact solution of this is y(x) = e
kx
. If we take our onestep method and
apply it to this equation we obtain
w
i+1
= Q(hk)w
i
.
Similarly for a multistep of the type used earlier (11), when applied to the test
equation (21) gives rise to the dierence equation
w
i+1
= a
m1
w
i
+a
m2
w
i1
+... +a
0
w
i+1m
+h[b
m
kw
i+1
+b
m1
kw
i
+... +b
0
kw
i+1m
]. (22)
Thus if we seek solutions of the form w
i
= z
i
this will give rise to the characteristic
polynomial equation
Q(z, hk) = 0,
where
Q(z, hk) = (1 hkb
m
)z
m
(a
m1
+hkb
m1
)z
m1
... (a
0
+hkb
0
).
The region R of absolute stability for a onestep method is dened as the
region in the complex plane R= {hk C, Q(hk) < 1}, and for a multistep
method R = {hk C, 
j
 < 1}, where
j
is a root of Q(z, hk) = 0.
A numerical method is Astable if R contains the entire left half plane.
In practice the above conditions place a limit on the size of the step size which
we can use.
Consider the modied Euler method
w
0
= k
1
= hf(x
i
, w
i
),
w
i+1
= w
i
+
h
2
[f(x
i
, w
i
) +f(x
i+1
, w
i+1
)], i = 1, 2, .., N 1.
This is an Astable method.
12
4 Boundary Value Problems
4.1 Shooting Methods
Consider the dierential equation
d
2
y
dx
2
+k
dy
dx
+xy = 0, y(0) = 0, y(1) = 1. (23)
This is an example of a boundary value problem because conditions have to
satised at both ends. If we write this as a system of rst order equations we
have
Y
1
= y,
Y
2
=
dy
dx
,
dY
1
dx
= Y
2
, (24)
dY
2
dx
= kY
2
xY
1
.
The boundary conditions give
Y
1
(0) = 0, Y
1
(1) = 1.
We do not know the value of Y
2
(0). If we knew this value then we could use our
standard integrator to obtain a solution. So how do we handle this situation.
Suppose we guess the value of Y
2
(0) = g, say. Then we can integrate (24) with
the initial condition
Y(0) =
_
Y
1
(0)
Y
2
(0)
_
=
_
0
g
_
,
using any integrator eg RK(4). This will give us
Y(1) =
_
Y
1
(1)
Y
2
(1)
_
=
_
1
2
_
,
where, because we guessed the value g = Y
2
(0),
1
will not necessarily satisfy the
required condition Y
1
(1) = 1, in (23). So now we need to go through some kind
of iterative process to try and get the correct value of g such that the required
condition at x = 1 is satised.
To do this dene
(g) = Y
1
(1; g) 1, (25)
where the semicolon notation is used to indicate an additional parametric de
pendence on g. We want to nd the value of g such that (g) = 0. This gives rise
to the idea of a shooting method. We adjust the value of g to force (g) = 0. We
can also think of this as nding the root of the equation (g) = 0.
13
4.2 How do we adjust the value of g?
To solve (g) = 0 we can use any of the root nding methods we know, eg secant
method, Newtons method, bisection, etc.
4.3 Newtons Method, Secant Method
Suppose that we have a guess g and we seek a correction dg such that ( g+dg) =
0. By Taylor expansion we have
( g +dg) = ( g) +
d
dg
( g)dg +O(dg
2
).
This suggests that we take
dg =
( g)
( g)
,
and hence a new value for g is g + dg. Thus this gives rise to a sequence of
iterations for n = 0, 1, ...,
g
n+1
= g
n
(g
n
)
(g
n
)
, (26)
where we start the iteration with a suitable guess g
0
. Provided that g
0
is close to to
the true value, Newtons method converges very fast, ie quadratically. Diculties
with convergence arise when
(g
n
). How can we do this? One way is
to estimate
(g
n
) by
(g
n
) =
(g
n
) (g
n1
)
g
n
g
n1
.
This gives
g
n+1
= g
n
(g
n
)(g
n
g
n1
)
(g
n
) (g
n1
)
, (27)
which is known as the secant method. We need two starting values for g and
then we can use (27) to generate the remaining values.
Consider (24) again
dY
1
dx
= Y
2
,
dY
2
dx
= kY
2
xY
1
,
with Y(0) = (0, g)
T
. Now from (25)
(g) =
Y
1
g
(1; g),
14
so this suggests dierentiating the original system of equations and boundary
conditions with respect to g to get
d
dx
_
Y
g
_
=
_
Y
2
g
k
Y
2
g
x
Y
1
g
_
,
_
Y
g
_
(x = 0) =
_
0
1
_
. (28)
The system (28) denes another initial value problem with given initial condi
tions.
Thus the procedure is as follows. First integrate (24) with initial conditions
Y(0) = (0, g)
T
and with a suitable initial guess g. Then, either simultaneously
at each step, or after a step, solve (28). After integrating over all the steps we
will have at x = 1 the values Y(x = 1) from (24) and
Y
g
(x = 1) from (28). Note
that from (25)
(g) =
Y
1
g
(1; g).
From the solution of (28) we can extract
Y
1
g
(x = 1) and hence substitute into (26)
to update g. This forms the basis of Newtons method combined with shooting,
to solve boundary value problems.
It is clear how one can solve (28) after solving (24) at each step. This can also
be done simultaneously by working with an augmented system where we dene
Y
1
= y, Y
2
=
dy
dx
, Y
3
=
y
g
=
Y
1
g
, Y
4
=
Y
2
g
,
and then
d
dx
_
_
_
_
_
Y
1
Y
2
Y
3
Y
4
_
_
_
_
_
=
_
_
_
_
_
Y
2
kY
2
xY
1
Y
4
kY
4
xY
3
_
_
_
_
_
, Y(0) =
_
_
_
_
_
Y
1
(0)
Y
2
(0)
Y
3
(0)
Y
4
(0)
_
_
_
_
_
=
_
_
_
_
_
0
g
0
1
_
_
_
_
_
. (29)
4.4 Multiple end conditions
What we have discussed so far works ne if we just have one condition to satisfy.
The same procedure works also for cases where we have two or more conditions
to satisfy. For example with
d
4
y
dx
4
= y
3
(
dy
dx
)
2
, y(0) = 1,
dy
dx
(0) = 0, y(1) = 2,
dy
dx
(1) = 1,
we would need two starting values at x = 0 and then we will have two conditions
to satisfy at x = 1. The corrections can be calculated in a similar way. Thus in
the example above, suppose we dene
Y
1
= y, Y
2
= y
, Y
3
= y
, y
4
= y
,
15
then we will need to use guesses for Y
3
(0) = e, say, and Y
4
(0) = g. The conditions
we need to satisfy can be written as
1
(e, g) =
2
(e, g) = 0,
where
1
(e, g) = Y
1
(1; e; g) 2,
2
(e, g) = Y
2
(1; e; g) 1.
To obtain the corrections to guessed values e, g we have
1
( e +de, g +dg) = 0 =
1
( e, g) +de
1
e
( e, g) +dg
1
g
( e, g) +O(de
2
, dg
2
),
2
( e +de, g +dg) = 0 =
2
( e, g) +de
2
e
( e, g) +dg
2
g
( e, g) +O(de
2
, dg
2
).
This can be solved to obtain the corrections (de, dg).
If we generalise to a multidimensional case where we have a vector of guesses
g we can nd the corrections dg as
dg = J
1
( g)( g),
where J is the Jacobian
i
g
k
and is the vector of conditions.
There are of course many variations on the above theme and dierent strate
gies when dealing with linear equations. For sti systems one can also integrate
from the two ends and match in the middle. Many of these techniques are dis
cussed extensively in the book by Keller (1976).
16
4.5 Solution of boundary value problems using nitedierences
Boundary value problems can also be tackled directly using nitedierences or
some other technique such as spectral approximation. We will look at one specic
example to illustrate the nitedierence technique.
Consider
d
2
y
dx
2
=
1
8
(32 + 2x
3
y
dy
dx
), 1 x 3,
y(1) = 17, y(3) =
43
3
. (30)
The exact solution of (30) is y(x) = x
2
+ (16/x).
Let us rst dene a uniform grid (x
0
, x
1
, ..., x
N
) with N + 1 points and grid
spacing h = (x
N
x
0
)/N, so that x
j
= x
0
+jh, for (j = 0, 1, .., N).
PSfrag replacements
a = x
0
x
2
x
3
x
i
x
i+1
x
N
= b
h
Let us approximate y by w
i
at each of the nodes x = x
i
. The derivatives of y
in (30) are approximated in nitedierence form as
_
dy
dx
_
x=x
i
=
w
i+1
w
i1
2h
+O(h
2
), (31)
_
d
2
y
dx
2
_
x=x
i
=
w
i+1
2w
i
+w
i1
h
2
+O(h
2
). (32)
These can be derived by making use of a Taylor expansion about the point x = x
i
.
Thus for example
y(x
i+1
) = y(x
i
) +h
dy
dx
(x
i
) +
h
2
2
d
2
y
dx
2
(x
i
) +
h
3
6
d
3
y
dx
3
(x
i
) +
h
4
24
d
4
y
dx
4
(x
i
) +O(h
5
),
(33)
y(x
i1
) = y(x
i
) h
dy
dx
(x
i
) +
h
2
2
d
2
y
dx
2
(x
i
)
h
3
6
d
3
y
dx
3
(x
i
) +
h
4
24
d
4
y
dx
4
(x
i
) +O(h
5
).
(34)
By adding and subtracting (33),(34) and replacing y(x
i
) by w
i
we obtain (31),(32).
Next replace y and its derivatives in (30) by the above approximations to get
w
i+1
2w
i
+w
i1
h
2
= 4 +
x
3
i
4
w
i
_
w
i+1
w
i1
16h
_
,
for (i = 1, 2, ...N 1) (35)
17
and
w
0
= 17, w
N
=
43
3
. (36)
The system of equations (35),(36) is a set of nonlinear dierence equations. We
have N + 1 equations for N + 1 unknowns w
0
, ..., w
N
. At this stage one has
recourse to a number of dierent techniques for handling the nonlinearity. One
is to set up an iteration scheme replacing w
i
by w
(k)
i
say, and starting with a
suitable guess for w
(0)
i
. The nonlinear term in (35) can now be tackled in many
dierent ways. Thus for example we can replace it by
w
(k1)
i
_
_
w
(k)
i+1
w
(k)
i1
16h
_
_
,
or
w
(k)
i
_
_
w
(k1)
i+1
w
(k1)
i1
16h
_
_
,
and we are then left with a linear system of equations to nd the w
(k)
i
and the
iterations can be continued until a suitable convergence condition is satised.
What is a suitable convergence criterion? One condition is that the nonlinear
dierence equations (35) need to be satised to sucient accuracy.
4.5.1 Newton linearization
Where practicable Newtons method oers the best means for handling nonlinear
equations because of its superior convergence properties. To use this technique
for nonlinear dierence equations of the form (35), suppose that we have a guess
for the solutions w
i
= W
i
. We seek corrections w
i
such that the w
i
= W
i
+w
i
satises the system (35). Substituting w
i
= W
i
+ w
i
into (35) and linearizing,
ie ignoring terms of O(w
2
i
), gives
w
i+1
2w
i
+w
i1
h
2
= F
i
w
i
_
W
i+1
W
i1
16h
_
W
i
_
w
i+1
w
i1
16h
_
,
for (i = 1, 2, ...N 1) (37)
and
w
0
= F
0
, w
N
= F
N
. (38)
where
F
i
= 4+
x
3
i
4
W
i
_
W
i+1
W
i1
16h
_
W
i+1
2W
i
+W
i1
h
2
, for i = 1, ..., N1,
and
F
0
= 17 W
0
, F
N
=
43
3
W
N
.
18
The system (37),(38) is now linear and can be solved to obtain the corrections
w
i
and hence we can update the W
i
by W
i
+w
i
. This is continued until the w
i
are small enough and the dierence equations are satised to sucient accuracy.
Instead of solving for the corrections w
i
one can also solve for the total
quantities w
i
by replacing w
i
in (37), (38) by w
i
= w
i
W
i
, and solving for the
w
i
.
The choice of the iteration scheme is dependent on a number of factors includ
ing ease of implementation, convergence properties and so on. The techniques
described above lead to the solution of a tridiagonal systems of linear equations
of the form
i
w
i1
+
i
w
i
+
i
w
i+1
=
i
, i = 0, 1, .., N, (39)
where the
i
,
i
,
i
are coecients obtainable from the linear system and
0
=
N
= 0. For instance from (37) we have
i
=
1
h
2
W
i
16h
,
i
=
2
h
2
+
W
i+1
W
i1
16h
,
i
=
1
h
2
+
W
i
16h
,
i
= F
i
, i = 1, 2, .., N 1,
and
0
= 1,
0
= 0,
0
= F
0
,
N
= 0,
N
= 1,
N
= F
N
.
This can be solved using any standard library routine, or Thomass algorithm.
4.6 Thomass tridiagonal algorithm
This version of a tridiagonal solver is based on Gaussian elimination. First we
create zeros below the diagonal and then once we have a triangular matrix, we
solve for the w
i
using back substitution. Thus the algorithm takes the form
j
=
j
j1
j1
j = 2, 3, ..., N,
j
=
j
j1
j1
, j = 2, 3, ..., N,
w
N
=
N
N
, w
j
=
(
j
j
w
j+1
)
j
, j = N 1, ..., 1.
19
5 Numerical solution of pdes
In this part of the course we will be concerned with the numerical solution of par
tial dierential equations (pdes), using nitedierence methods. This involves
covering the domain with a mesh of points and calculating the approximate solu
tion of the pde at these mesh points. The techniques that we will use will depend
on the type of pde that we are dealing with, ie whether it is elliptic, parabolic,
or hyperbolic, although there are common features. We will need to learn how to
construct approximations to the given equation, how to solve the equation using
iterative methods, investigate stability and so on.
5.1 Classication
Partial dierential equations can be classied as being of type elliptic, parabolic or
hyperbolic. In some cases equations can be of mixed type. There are systematic
methods for classifying an equation but for our purposes it is enough to summarize
the main results. Consider the second order pde
A
x
2
+B
2
xy
+C
y
2
+D
x
+E
y
+F +G = 0, (40)
where, in general, A, B, C, D, E, F, and G are functions of the independent vari
ables x and y and of the dependent variables . The equation (40) is said to
be
elliptic if B
2
4AC < 0,
parabolic if B
2
4AC = 0, or
hyperbolic if B
2
4AC > 0.
An example of an elliptic equation is Poissons equation
x
2
+
2
y
2
= f(x, y).
The heat equation
t
= k
x
2
is of parabolic type, and the wave equation
x
2
2
y
2
= 0
is a hyperbolic pde. An example of a mixed type equation is the transonic small
disturbance equation given by
(K
x
)
x
2
+
2
y
2
= 0.
20
A more formal approach to classication is to consider a system of rst order
partial dierential equations for the unknowns U = (u
1
, u
2
, ..., u
n
)
T
and indepen
dent variables x = (x
1
, x
2
, ..., x
m
)
T
. Suppose that the equations can be written
in quasilinear form
m
k=1
A
k
U
x
k
= Q (41)
where the A
k
are (n n) matrices and Q is an (n 1) column vector, and both
can depend on x
k
and U but not on the derivatives of U. If we seek plane wave
solutions of the homogeneous part of (41) in the form
U = U
o
e
ix.s
,
where s = (s
1
, s
2
, .., s
m
)
T
, then (41) leads to the system of equations
i
_
m
k=1
A
k
s
k
_
U = 0. (42)
This will have a nontrivial solution only if the characteristic equation
det 
m
k=1
A
k
s
k
 = 0,
holds. This will have at most n solutions and n vectors s, which can be regarded
as normals to the characteristic surfaces (which are the surfaces of constant phase
of the wave).
The system is hyperbolic if n real characteristics exist. If all the charcteristics
are complex, the system is elliptic. If some are real and some complex, the system
is of mixed type. If the system (42) is of rank less than n, then we have a parabolic
system.
Example
Consider the steady twodimensional Euler equations for incompressible ow
u
x
+
v
y
= 0,
u
u
x
+v
u
y
=
p
x
,
u
v
x
+v
v
y
=
p
y
.
Introducing the vector U = (u, v, p)
T
and x = (x, y)
T
we can express this in
matrix vector form (41) with Q = 0 and
A
1
=
_
_
_
1 0 0
u 0 1
0 u 0
_
_
_, A
2
=
_
_
_
0 1 0
v 0 0
0 v 1
_
_
_.
21
Introducing = s
1
/s
2
the characteristic equation gives
det
_
_
_
1 0
u +v 0
0 u +v 1
_
_
_ = 0,
which leads to = v/u or = i. Thus the Euler equations are hybrid.
5.2 Consistency, convergence and Lax equivalence theo
rem
These ideas were introduced earlier for ordinary dierntial equations but apply
equally to pdes.
Consistent A discrete approximation to a partial dierential equation is said
to be consistent if in the limit of the stepsize(s) going to zero, the original pde
system is recovered, ie the truncation error approaches zero.
Stability If we dene the error to be the dierence between the computed
solutions and the exact solution of the discrete approximation, then the scheme
is stable if the error remains uniformly bounded for successive iterations.
Convergence A scheme is stable if the solution of the discrete equations
approaches the solution of the pde in the limit that the stepsizes approach zero.
Laxs Equivalence Theorem For a well posed initialvalue problem and
a consistent discretization, stability is the necessary and sucient condition for
convergence.
5.3 Dierence formulae
Let us rst consider the various approximations to derivatives that we can con
struct. The usual tool for this is the Taylor expansion. Suppose that we have a
grid of points with equal mesh spacing
x
in the x direction and equal spacing
y
in the y direction. Thus we can dene points x
i
, y
j
by
x
i
= x
0
+i
x
, y
j
= y
0
+j
y
.
Suppose that we are trying to approximate a derivative of a function (x, y) at
the points x
i
, y
j
. Denote the approximate value of (x, y) at the point x
i
, y
j
by
w
i,j
say.
5.3.1 Central Dierences
The rst and second derivatives in x or y may be approximated as before by
_
x
2
_
ij
=
w
i+1,j
2w
i,j
+w
i1,j
(
x
)
2
+O((
x
)
2
), (43)
22
_
y
2
_
ij
=
w
i,j+1
2w
i,j
+w
i,j1
(
y
)
2
+O((
y
)
2
), (44)
_
x
_
ij
=
w
i+1,j
w
i1,j
2
x
+O((
x
)
2
), (45)
_
y
_
ij
=
w
i,j+1
w
i,j1
2
y
+O((
y
)
2
). (46)
Note that replacing the derivative
_
x
2
_
ij
by the approximation in (43) gives
rise to a truncation error O((
x
)
2
). The approximations listed above are centered
at the points (x
i
, y
j
), and are called centraldierence approximations.
i 1, j
i, j
i + 1, j
Consider the approximation
_
x
_
ij
=
w
i+1,j
w
i,j
x
.
By Taylor expansion we see that this gives rise to a truncation error of O(
x
).
In addition this approximation is centered at the point x
i+
1
2
,j
.
i, j
i + 1, j
5.3.2 Onesided approximations
We can also construct onesided approximations to derivatives. Thus for example
a secondorder forward approximation to
x
at the point (x
i
, y
j
) is given by
_
x
_
ij
=
3w
i,j
+ 4w
i+1,j
w
i+2,j
2
x
.
Tables 1 and 2 list some of the more commonly used central and onesided
approximations. In these tables the numbers denote the weights at the nodes,
and the whole expression is multiplied by
1
(x)
j
where j denote the order of the
derivative, ie j = 1 for a rst derivative and j = 2 for a second derivative.
5.3.3 Mixed derivatives
The techniques for nding suitable discrete approximations for mixed derivatives
is exactly the same, except this time we would make use a multidimensional
23
Table 1: Weights for central dierences
Node Points
Order of i 2 i 1 i i + 1 i + 2
Accuracy
1st derivative
(
x
)
2
1
2
0
1
2
(
x
)
4 1
12
2
3
0
2
3
1
12
2nd derivative
(
x
)
2
1 2 1
(
x
)
4
1
12
4
3
5
2
4
3
1
12
Table 2: Weights for onesided dierences
Node Points
Order of i i + 1 i + 2 i + 3 i + 4
Accuracy
1st derivative
(
x
) 1 1
(
x
)
2
3
2
2
1
2
(
x
)
3
11
6
3
3
2
1
3
(
x
)
4
25
12
4 3
4
3
1
4
2nd derivative
(
x
) 1 2 1
(
x
)
2
2 5 4 1
(
x
)
3 35
12
26
3
19
2
14
3
11
12
24
Taylor expansion. Thus for example second order approximations to
2
/xy
at the point i, j are given
xy
=
w
i+1,j+1
w
i1,j+1
+w
i1,j1
w
i+1,j1
4
x
y
+O((
x
)
2
, (
y
)
2
),
or
xy
=
w
i+1,j+1
w
i+1,j
w
i,j+1
+w
i1,j1
w
i1,j
w
i,j1
+ 2w
i,j
2
x
y
+O((
x
)
2
, (
y
)
2
).
5.4 Solution of elliptic pdes
A prototype elliptic pde is Poissons equation given by
x
2
+
2
y
2
= f(x, y), (47)
where f(x, y) is a known/given function. The equation (47) has to be solved in a
domain D, say, with boundary conditions given on the boundary D of D. These
can be of three types:
Dirichlet = g(x, y) on D.
Neumann
n
= g(x, y) on D.
Robin/Mixed B(,
n
) = 0 on D. Robin boundary conditions involve a
linear combination of and its normal derivative on the boundary. Mixed
boundary conditions involve dierent conditions for one part of the bound
ary, and another type for other parts of the boundary.
In the above g(x, y) is a given function, and derivatives with respect to n denote
the normal derivative, ie the derivative in the direction n which is normal to the
boundary.
Let us consider a model problem with
x
2
+
2
y
2
= f(x, y), 0 < x, y < 1 (48)
= 0 on D. (49)
Here the domain D is the square region 0 < x < 1 and 0 < y < 1. Construct a
nite dierence mesh with points (x
i
, y
j
), say where
x
i
= i
x
, i = 0, 1, ..., N, y
j
= j
y
, j = 0, 1, ..., M,
where
x
= 1/N, and
y
= 1/M are the step sizes in the x and y directions.
25
Next replace the derivatives in (48) by the approximations from (43),(44) to
get
w
i+1,j
2w
i,j
+w
i1,j
(
x
)
2
+
w
i,j+1
2w
i,j
+w
i,j1
(
y
)
2
= f
i,j
, (50)
1 i N 1, 1 j M 1
and
w
i,j
= 0, if i = 1, N, 1 j M, (51)
w
i,j
= 0, if j = 1, M, 1 i N. (52)
Thus we have (N1)(M1) unknown values w
i,j
to nd at the interior points
of the domain. If we write w
i
= (w
i,1
, w
i,2
, ..., w
i,M1
)
T
, f
i
= (f
i,1
, f
i,2
, ..., f
i,M1
)
T
,
we can write the above system of equations (50) in matrix form as
_
_
B I
I B I
I B I
I B
_
_
_
_
w
1
w
2
w
3
w
N1
_
_
= (
x
)
2
_
_
f
1
f
2
f
3
f
N1
_
_
.
In the above I is the (M 1) (M 1) identity matrix and B is a similar sized
matrix given by
B =
_
_
b c
c b c
0 c b c
.
.
.
c b
_
_
, with b = 2(
1
(
x
)
2
+
1
+(
y
)
2
), c = 1.
What we observe is that the matrix is very sparse and can be very large even
with a modest number of grid points. For instance with N = M = 101 the linear
system is of size 10
4
10
4
and we have 10
4
unknowns to nd.
To solve the system of equations arising from the above discretization, one
has recourse to a number of methods. Direct methods are not feasible because of
the size of the matrices involved and in addition they are expensive and require a
lot of storage, unless one exploits the sparsity pattern of the matrices. Iterative
methods oer an alternative and are easy to use, but depending on the method,
convergence can be slow with an increasing number of mesh points.
26
5.5 Iterative methods for the solution of linear systems
5.5.1 Jacobi method
Consider (50) which we can rewrite as
w
i,j
=
1
2(1 +
2
)
_
w
i+1,j
+w
i1,j
+
2
(w
i,j+1
+w
i,j1
) (
x
)
2
f
i,j
_
, (53)
with =
x
/
y
. This suggests the iterative scheme
w
new
i,j
=
1
2(1 +
2
)
_
w
old
i+1,j
+w
old
i1,j
+
2
(w
old
i,j+1
+w
old
i,j1
_
(
x
)
2
f
i,j
). (54)
We start with a guess for w
old
i,j
and compute a new value from (54) and continue
iterating until suitable convergence criteria are satised. What are suitable con
vergence criteria? Ideally we wish to stop when the corrections are small, or when
the equations (53) are satised to sucient accuracy.
Suppose we write the linear system as
Av = f
where v is the exact solution of the linear system. Then if w is an approximate
solution, the error e is dened by e = v w. But this is not much use unless we
know the exact solution. On the other hand we can write
Ae = A(v w) = f Aw.
Then the residual is dened by r = f Aw which can be computed. For the
Jacobi scheme the residual is given by
r
i,j
= (
x
)
2
f
i,j
+ 2(1 +
2
)w
i,j
(w
i+1,j
+w
i1,j
+
2
(w
i,j+1
+w
i,j1
)).
Therefore a suitable stopping condition might be
max
i,j
r
i,j
 <
1
, or
i,j
r
2
i,j
<
2
.
5.5.2 GaussSiedel iteration
The Jacobi scheme involves two levels of storage, w
new
i,j
and w
old
i,j
which can be
wasteful. It is more economical to overwrite values as and when they are com
puted and use the new values when ready. This gives rise to the GaussSeidel
scheme, which, if we sweep with i increasing followed by j, is given by
w
new
i,j
=
1
2(1 +
2
)
_
w
old
i+1,j
+w
new
i1,j
+
2
(w
old
i,j+1
+w
new
i,j1
_
(
x
)
2
f
i,j
), (55)
where the new values overwrite existing values.
27
5.5.3 Relaxation and the SOR method
Instead of updating the new values as indicated above, it is better to use relax
ation. Here we compute
w
i,j
= (1 )w
old
i,j
+w
i,j
,
where is called the relaxation factor, and w
i,j
denotes the value as computed by
the Jacobi, or GaussSeidel scheme. = 1 reduces to the Jacobi or GaussSeidel
scheme. The GaussSeidel scheme with = 1 is called the method of successive
overrelaxation or SOR scheme.
5.5.4 Line relaxation
The Jacobi, GaussSeidel and SOR schemes are called point relaxation methods.
On the other hand we may compute a whole line of new values at a time, leading
to the linerelaxation methods. For instance suppose we write the equations as
w
new
i+1,j
2(1 +
2
)w
new
i,j
+w
new
i1,j
=
2
(w
i,j+1
+w
old
i,j1
)) + (
x
)
2
f
i,j
, (56)
then starting from j = 1 we may compute the values w
i,j
, for i = 1, ..., N 1 in
one go. The left hand side of (56) can be seen to lead to a tridiagonal system
of equations and we may use a tridiagonal solver to compute the solution, and
incrementing j until all the points have been computed. The above scheme can
be combined with relaxation to give us a lineSOR method. In general the more
implicit the scheme the better the convergence properties. A variation on the
above scheme is to alternate the direction of sweep, by doing i lines rst and on
the next iteration doing the j lines rst. This leads to the ADI, or alternating
direction implicit, scheme.
28
5.6 Convergence properties of basic iteration schemes
Consider the linear system
Ax = b (57)
where A = (a
i,j
) is an n n matrix, and x, b are n 1 column vectors. Suppose
we write the matrix A in the form A = DLU where D, L, U are the diagonal
matrix, lower and upper triangular parts of A, ie
D =
_
_
a
1,1
0
0 a
2,2
0
0 0 a
3,3
0
.
.
.
0 a
n,n
_
_
, L =
_
_
0 0
a
2,1
0 0
a
3,1
a
3,2
0 0
.
.
.
a
n,1
a
n,2
a
n,n1
0
_
_
,
U =
_
_
0 a
1,2
a
1,3
. . . a
1,n
0 0 a
2,3
. . . a
2,n
0 0 0 a
3,4
. . a
3,n
.
.
0 a
n1,n
0 0 0
_
_
.
Then (57) can be written as
Dx = (L +U)x +b.
The Jacobi iteration is then dened as
x
(k+1)
= D
1
(L +U)x
(k)
+D
1
b. (58)
The GaussSeidel iteration is dened by
(DL)x
(k+1)
= Ux
(k)
+b, (59)
or
x
(k+1)
= (DL)
1
Ux
(k)
+ (DL)
1
b. (60)
In general an iteration scheme for (57) may be written as
x
(k+1)
= Px
(k)
+Pb (61)
where P is called the iteration matrix. For the Jacobi scheme scheme we have
P = P
J
= D
1
(L +U)
29
and for the GaussSeidel scheme
P = P
G
= (DL)
1
U.
The exact solution to the linear system (57) satises
x = Px +Pb. (62)
Hence if we dene the error at the kth iteration by e
(k)
= x x
(k)
, then by
subtracting (61) from (62) we see that the error satises the equation
e
(k+1)
= Pe
(k)
= P
2
e
(k1)
= P
k+1
e
(0)
. (63)
In order that the error diminishes as k we must have
P
k
 0 as k , (64)
Since
P
k
 = P
k
we see that (64) is true provided P < 1. From linear algebra it can be proven
that the condition that P < 1 is equivalent to the requirement that
(P) = max
i

i
 < 1
where
i
are the eigenvalues of the matrix P. Note that (P) is called the spectral
radius of P. Thus for the iteration to converge we require that the spectral matrix
of the iteration matrix to be less than unity. One further result which is useful
is that for large k
e
(k+1)
 = e
(k)
.
Suppose we want to estimate how many iterations it takes to reduce the initial
error by a factor . Then we see that we need q iterations where q is the smallest
value for which
q
<
giving
q q
d
=
ln
ln
.
The quantity ln is called the asymptotic rate of convergence of the iteration.
It gives a measure of how much the error decreases at each iteration. Thus
iteration matrices where the spectral radius is close to 1 will converge slowly.
For the model problem it can be shown that for Jacobi iteration
= (P
J
) =
1
2
_
cos
N
+ cos
M
_
,
30
and for GaussSeidel
= (P
G
) = [(P
J
)]
2
.
If we take N = M and N >> 1 then for Jacobi iteration we have
q
d
=
ln
ln(1
2
2N
2
)
=
2N
2
2
ln.
GaussSeidel converges twice as fast as Jacobi, and requires less storage.
For point SOR the spectral radius depends on the relaxation factor , but for
the model problem with optimum values and N = M it can be shown that
=
1 sin
N
1 + sin
N
giving
q
d
=
N
2
ln .
Thus SOR with the optimum requires an order of magnitude less number of
iterations to converge as compared to GaussSeidel. Finally similar results can
also be derived for line SOR, and it can be shown that using optimum values
=
_
1 sin
2N
1 + sin
2N
_
2
, q
d
=
N
2
2
ln .
31
6 Solution of dierential equations using Cheby
chev collocation
In this section we will briey look at the solution of equation using Chebychev
collocation. Spectral methods oer a completely dierent means for obtaining a
solution of a dierential equation (ode or pde) and they have a number of major
advantages over nite dierence and other similar methods. One is the concept
of spectral accuracy. With nite dierence methods increasing the number of
points for a second order method means truncation error reducing by a factor 4.
On the other hand with spectral methods the accuracy of the approximation (for
smooth functions) increases exponentially with increased number of points. This
is referred to as spectral accuracy. There are many dierent types of spectral
methods (for example the tau method, Galerkin method, Fourier approximation)
and a good place to start studying the basic ideas and theory is in the book
by Canuto et al. (1988). Here we will briey look at one namely, Chebychev
collocation.
With any spectral method the idea is to represent the function as an expansion
in terms of some basis functions, usually orthogonal polynomials or in the case of
Fourier approximation, trigonometric functions or complex exponentials. With
Chebychev collocation a function is represented in terms of expansions using
Chebychev polynomials.
6.1 Basic properties of Chebychev polynomials
A Chebychev polynomial of degree n is dened by
T
n
(x) = cos(ncos
1
x). (65)
The Chebychev polynomials are eigenfunctions of the SturmLiouville equation
d
dx
_
1 x
2
dT
k
(x)
dx
_
+
k
2
1 x
2
T
k
(x) = 0.
The following properties are easy to prove
T
k+1
(x) = 2xT
k
(x) T
k1
(x),
with T
0
(x) = 1 and T
1
(x) = x. Also
T
k
(x) 1, 1 x 1,
T
k
(1) = (1)
k
, T
k
(1) = (1)
k
k
2
,
2T
k
(x) =
1
(k + 1)
dT
k+1
(x)
dx
1
(k 1)
dT
k1
(x)
dx
, k 1.
32
Suppose we expand a (smooth) function u(x) in terms of Chebychev polyno
mials, then
u(x) =
k=0
u
k
T
k
(x), u
k
=
2
c
k
_
1
1
u(x)T
k
(x)
1 x
2
dx,
where
c
k
=
_
2 k = 0, N
1 1 k N 1
.
If we dene u() = u(cos ) then
u() =
k=0
u
k
cos(k),
so that the Chebychev series for u corresponds to a cosine series for u. This
implies that if u is innitely dierentiable then the Chebychev coecients of the
expansion will decay faster than algebraically.
In a collocation method we may represent a smooth function in terms of its
values at set of discrete points. Derivatives of the function are approximated by
analytic derivatives of the interpolating polynomial. It is common to work with
the GaussLobatto points dened by
x
j
= cos(
j
N
) j = 0, 1, ..., N.
Suppose the given function is approximated at these points and we represent the
approxmiate values of the function u(x) at these points x = x
j
by u
j
. Here we
are assuming that we are working with N + 1 points in the interval 1 x 1.
In a given dierential equation we need to approximate the derivatives at these
node points. The theory says that we can write
_
du
dx
_
j
=
N
k=0
D
j,k
u
k
,
where D
j,k
are the elements of the Chebychev collocation dierentiation matrix
D say. The elements of the matrix D are given by
D
p,j
=
_
_
cp(1)
p+j
c
j
(xpx
j
)
p = j
x
j
2(1x
2
j
)
1 p = j N 1
2N
2
+1
6
p = j = 1
2N
2
+1
6
p = j = N
33
Thus in matrix form
_
_
_
_
_
_
_
_
dw
0
dx
dw
1
dx
.
.
dw
N
dx
_
_
_
_
_
_
_
_
= D
_
_
_
_
_
_
_
_
w
0
w
1
.
.
w
N
_
_
_
_
_
_
_
_
, and
_
_
_
_
_
_
_
_
d
2
w
0
dx
2
d
2
w
1
dx
2
.
.
d
2
w
N
dx
2
_
_
_
_
_
_
_
_
= D
2
_
_
_
_
_
_
_
_
w
0
w
1
.
.
w
N
_
_
_
_
_
_
_
_
34
7 Parabolic Equations
In this section we will look at the solution of parabolic partial dierential equa
tions. The techniques introduced earlier apply equally to parabolic pdes.
One of the simplest parabolic pde is the diusion equation which in one space
dimensions is
u
t
=
2
u
x
2
. (66)
For two or more space dimensions we have
u
t
=
2
u.
In the above is some given constant.
Another familiar set of parabolic pdes is the boundary layer equations
u
x
+v
y
= 0,
u
t
+uu
x
+vu
y
= p
x
+u
yy
,
0 = p
y
.
With a parabolic pde we expect, in addition to boundary conditions, an initial
condition at say, t = 0.
Let us consider (66) in the region a x b. We will take a uniform mesh in
x with x
j
= a + j
x
, j = 0, 1, .., N and
x
= (b a)/N. For the dierencing
in time we assume a constant step size
t
so that t = t
k
= k
t
.
7.1 First order central dierence approximation
We may approximate (66) by
w
k+1
j
w
k
j
t
=
_
w
k
j+1
2w
k
j
+w
k
j1
2
x
_
. (67)
Here w
k
j
denotes an approximation to the exact solution u(x, t) of the pde at
x = x
j
, t = t
k
. The scheme given in (67) is rst order in time O(
t
) and
second order in space O(
x
)
2
. Let us assume that we are given a suitable initial
condition, and boundary conditions of the form
u(a, t) = f(t), u(b, t) = g(t).
The scheme (67) is explicit because the unknowns at level k+1 can be computed
directly. Notice that there is a time lag before the eect of the boundary data is
felt on the solution. As we will see later this scheme is conditionally stable for
1/2, where =
t
2
x
. Note that is sometimes called the Peclet or diusion
number.
35
7.2 Fully implicit, rst order
A better approximation is one which makes use of an implicit scheme. Then
instead of (67) we have
w
k+1
j
w
k
j
t
=
_
w
k+1
j+1
2w
k+1
j
+w
k+1
j1
2
x
_
. (68)
The unknowns at level k + 1 are coupled together and we have a set of implicit
equations to solve. If we rearrange (68) then
w
k+1
j1
(1 + 2)w
k+1
j
+w
k+1
j1
= w
k
j
, 1 j N 1, (69)
In addition approximation of the boundary conditions gives
w
k+1
0
= f(t
k+1
), w
k+1
N
= g(t
k+1
). (70)
The discrete equations (69),(70) are of tridiagonal form and thus easily solved.
The scheme is unconditionally stable . The accuracy of the above fully implicit
scheme is only rst order in time. We can try and improve on this with a second
order scheme.
7.3 Richardson method
Consider
w
k+1
j
w
k1
j
2
t
=
_
w
k
j+1
2w
k
j
+w
k
j1
2
x
_
. (71)
This uses three time levels and has accuracy O(
2
t
,
2
x
). The scheme was devised
by a meteorologist and is unconditionally unstable!
7.4 DuFort Frankel
This uses the approximation
w
k+1
j
w
k1
j
2
t
=
_
w
k
j+1
w
k+1
j
w
k1
j
+w
k
j1
2
x
_
. (72)
This has truncation error O(
2
t
,
2
x
, (
x
)
2
), and is an explicit scheme. The
scheme is unconditionally stable, but is inconsistent if
t
0,
x
0 but with
t
/
x
remaining xed.
36
7.5 CrankNicolson
A popular scheme is the CrankNicolson scheme given by
w
k+1
j
w
k
j
t
=
2
_
w
k+1
j+1
2w
k+1
j
+w
k+1
j1
2
x
+
w
k
j+1
2w
k
j
+w
k
j1
2
x
_
. (73)
This is second order accurate O(
2
t
,
2
x
) and is unconditionally stable. (Taking
very large time steps can however cause problems). As can be seen it is also an
implicit scheme.
7.6 Multispace dimensions
The schemes outlined above are easily extended to multidimensions. Thus in
two space dimensions a rst order explict approximation to
u
t
=
2
u,
is
w
k+1
i,j
w
k
i,j
t
=
_
w
k
i+1,j
2w
k
i,j
+w
k
i1,j
2
x
+
w
k
i,j+1
2w
k
i,j
+w
k
i,j1
2
y
_
. (74)
This is rst order in
t
and second order in space. It is conditionally stable for
t
(
x
)
2
+
t
(
y
)
2
1
2
.
If we use a fully implicit scheme we would obtain
w
k+1
i,j
w
k
i,j
t
=
_
w
k+1
i+1,j
2w
k+1
i,j
+w
k+1
i1,j
2
x
+
w
k+1
i,j+1
2w
k+1
i,j
+w
k+1
i,j1
2
y
_
. (75)
This leads to an implicit system of equations of the form
w
k+1
i+1,j
+w
k+1
i1,j
(2 + 2 + 1)w
k+1
i,j
+w
k+1
i,j1
+ w
k+1
i,j+1
= w
k
i,j
,
where =
t
/
2
x
, =
t
/
2
y
. The form of the discrete equations is very
much like the system of equations arising in elliptic pdes.
From the computational point of view a better scheme is
w
k+
1
2
i,j
w
k
i,j
t
/2
=
_
_
w
k+
1
2
i+1,j
2w
k+
1
2
i,j
+w
k+
1
2
i1,j
2
x
+
w
k
i,j+1
2w
k
i,j
+w
k
i,j1
2
y
_
_ ,
w
k+1
i,j
w
k+
1
2
i,j
t
/2
=
_
_
w
k+
1
2
i+1,j
2w
k+
1
2
i,j
+w
k+
1
2
i1,j
2
x
+
w
k+1
i,j+1
2w
k+1
i,j
+w
k+1
i,j1
2
y
_
_ , (76)
which leads to a tridiagonal system of equations similar to the ADI scheme. The
above scheme is second order in time and space and also unconditionally stable.
37
7.7 Consistency revisited
Let us consider the truncation error for the rst order central (explicit) scheme
(67), and also the DuFort Frankel scheme (72).
If u(x, t) is the exact solution then we may write u
k
j
= u(x
j
, t
k
) and thus from
a Taylor series expansion
u
k+1
j
= u(x
j
, t
k
+
t
) =
u
k
j
+
t
_
u
t
_
j,k
+
2
t
2
_
2
u
t
2
_
j,k
+O(
t
)
3
, (77)
and
u
k
j+1
= u(x
j
+
x
, t
k
) =
u
k
j
+
x
_
u
x
_
j,k
+
2
x
2
_
2
u
x
2
_
j,k
+
3
x
6
_
3
u
x
3
_
j,k
+
4
x
24
_
4
u
x
4
_
j,k
+O(
x
)
5
,
(78)
and
u
k
j1
= u(x
j
x
, t
k
) =
u
k
j
x
_
u
x
_
j,k
+
2
x
2
_
2
u
x
2
_
j,k
3
x
6
_
3
u
x
3
_
j,k
+
4
x
24
_
4
u
x
4
_
j,k
+O(
x
)
5
.
(79)
Using (77, 78,79) and substituting into (67) gives
1
t
_
_
u
k
j
+
t
_
u
t
_
j,k
+
2
t
2
_
2
u
t
2
_
j,k
u
k
j
+O(
t
)
3
_
_
=
2
x
_
_
u
k
j
+
x
_
u
x
_
j,k
+
2
x
2
_
2
u
x
2
_
j,k
+
3
x
6
_
3
u
x
3
_
j,k
+
4
x
24
_
4
u
x
4
_
j,k
2u
k
j
+ u
k
j
x
_
u
x
_
j,k
+
2
x
2
_
2
u
x
2
_
j,k
3
x
6
_
3
u
x
3
_
j,k
+
4
x
24
_
4
u
x
4
_
j,k
+O(
x
)
5
_
_
from which we obtain
_
u
t
2
u
x
2
_
j,k
=
t
2
(
2
u
t
2
)
j,k
+
2
x
12
(
4
u
4
x
)
j,k
. (80)
Thus (80) shows that as
t
0 and
x
0 the original pde is satised, and
the right hand side of (80) implies a truncation error O(
t
,
2
x
).
38
If we do the same for the DuFort Frankel scheme we nd that
u
k+1
j
u
k1
j
2
t
_
u
k
j+1
u
k+1
j
u
k1
j
+u
k
j1
2
x
_
=
_
u
t
2
u
x
2
+
2
t
2
x
2
u
t
2
_
j,k
+O(
2
t
,
2
x
,
2
t
2
x
). (81)
This shows that the DuFort scheme is only consistent if the step sizes approach
zero and also
t
x
0 simultaneously. Otherwise if we take step sizes such that
t
x
remains constant as both step sizes approach zero, then (81) shows that we
are solving the wrong equation.
7.8 Stability
Consider the rst order explicit scheme which can be written as
w
k+1
j
= w
k
j1
+ (1 2)w
k
j
+w
k
j+1
, 1 j N 1,
with w
k
0
, w
k
N
given. We can write the above in matrix form as
w
k+1
=
_
_
(1 2)
(1 2)
0 (1 2)
.
.
(1 2)
_
_
w
k
, (82)
where w
k
= (w
k
1
, w
k
2
, ..., w
k
N1
)
T
. Thus (82) can be written as
w
k+1
= Aw
k
.
If we recall the results from the discussion of convergence of iterative methods,
the above scheme is stable if and only if A 1. Now the innity norm A
is dened by
A
= max
i
N
j
a
i,j

for an N N matrix A. Thus from (82) we have
A
= 2 + 1 2 = 1.
If (1 2) < 0 then
A
= 2 + 2 1 = 4 1 > 1.
Thus we have proved that the explicit scheme is unstable if > 1/2.
39
7.8.1 Stability of CrankNicolson scheme
The CrankNicolson (73) scheme may be written as
w
j1,k+1
+ (2 + 2)w
j,k+1
w
j+1,k+1
= w
j1,k
+ (2 2)w
j,k
+w
j+1,k
,
j = 1, 2, ..., N 1 (83)
where =
t
(x)
2
. In matrix form this can be written as
_
_
(2 + 2)
(2 + 2)
0 (2 + 2)
.
.
(2 + 2)
_
_
w
k+1
=
_
_
(2 2)
(2 2)
0 (2 2)
.
.
(2 2)
_
_
w
k
, (84)
where w
k
= (w
k
1
, w
k
2
, ..., w
k
N1
)
T
. Thus (84) is of the form
Bw
k+1
= Aw
k
,
where B = 2I
N1
S
N1
, and A = 2I
N1
+S
N1
and I
N
is the NN identity
matrix, and S
N1
is the (N 1) (N 1) matrix
S
N1
=
_
_
2 1
1 2 1
0 1 2 1
.
.
1 2
_
_
.
Thus the CrankNicolson scheme will be stable if the spectral radius of the matrix
B
1
A is less than unity, ie (B
1
A) < 1. We therefore need the eigenvalues of
the matrix B
1
A. To nd these we need to invoke some theorems from linear
algebra. Recall that is an eigenvalue of the matrix S, and x a corresponding
eigenvector if
Sx = x.
40
Thus for any integer p
S
p
x = S
p1
Sx = S
p1
x = ... =
p
x.
Hence the eigenvalues of S
p
are
p
with eigenvector x. Extending this result, if
P(S) is the matrix polynomial
P(S) = a
0
S
n
+a
1
S
n1
+... +a
n
I
then
P(S)x = P()x, and P
1
(S)x =
1
P()
x.
Finally if Q(S) is any other polynomial in S then we see that
P
1
(S)Q(S)x =
Q()
P()
x.
If we let P = B(S
N1
) = 2I
N1
S
N1
, and Q = A(S
N1
) = 2I
N1
+ S
N1
then the eigenvalues of the matrix B
1
A are given by
=
2 +
2
where is an eigenvalue of the matrix S
N1
.
Now the eigenvalues of the N N matrix
T =
_
_
a b
c a b
0 c a b
. . .
. . .
c a
_
_
can be shown to be given by
=
n
= a + 2
bc cos
n
N + 1
, n = 1, 2, .., N.
Hence the eigenvalues of S
N1
are
n
= 4 sin
2
n
2N
, n = 1, 2, ..., N 1
and so the eigenvalues of B
1
A are
n
=
2 4 sin
2 n
N
2 + 4 sin
2 n
N
n = 1, 2, .., N 1.
Clearly
(B
1
A) = max
n

n
 < 1 > 0.
This proves that the CrankNicolson scheme is unconditionally stable.
41
7.9 Stability condition allowing exponential growth
In the above discussion of stability we have said that the solution of
w
k+1
= Aw
k
is stable if A 1. This condition does not make allowance for solutions of
the pde which may be growing exponentially in time. A necessary and sucient
condition for stability when the solution of the pde is increasing exponentially in
time is that
A 1 +M
t
= 1 +O(
t
)
where M is a constant independent of
x
and
t
.
7.10 vonNeumann stability analysis
The matrix method for analysing the stability of a scheme can become dicult for
more complicated schemes and in multidmensional situations. A very versatile
tool in this regard is the Fourier method developed by von Neumann. Here initial
values at mesh points are expressed in terms of a nite Fourier series, and we
consider the growth of individual Fourier components. A nite sine or cosine
series expansion in the interval a x b takes the form
n
a
n
sin(
nx
L
), or
n
b
n
cos(
nx
L
),
where L = b a. Now consider an individual component written in complex
exponential form at a mesh point x = x
j
= a +j
x
A
n
e
inx
L
= A
n
e
ina
L
e
injx
L
=
A
n
e
injx
where
n
= n/L.
Given initial data we can express the initial values as
w
0
p
=
N
n=0
A
n
e
inpx
p = 0, 1, ..., N,
and we have N + 1 equations to determine the N + 1 unknowns
A. To nd how
each Fourier mode develops in time, assume a simple separable solution of the
form
w
k
p
= e
inpx
e
t
k
= e
inpx
e
kt
= e
inpx
k
,
where = e
t
. Here is called the amplication factor. For stability we thus
require  1. If the exact solution of the pde grows exponentially, then the
dierence scheme will allow such solutions if
 1 +M
t
where M does not depend on
x
or
t
.
42
7.10.1 Fully implicit scheme
Consider the fully implicit scheme
w
k+1
j
w
k
j
t
=
_
w
k+1
j+1
2w
k+1
j
+w
k+1
j1
2
x
_
. (85)
Let
w
k
j
=
k
e
injx
.
Then substituting into (85) gives
1
k
( 1)e
injx
=
k+1
2
x
(e
inx
2 +e
inx
)e
injx
.
Thus with =
t
/
2
x
1 = (2 cos(
n
x
) 2) = 4 sin
2
(
x
2
).
This gives
=
1
1 + 4 sin
2
(
nx
2
)
,
and clearly 0 < 1 for all > 0 and for all
n
. Thus the fully implicit scheme
is unconditionally stable.
7.10.2 Richardsons scheme
The Richardson scheme is given by
w
k+1
j
w
k1
j
2
t
=
_
w
k
j+1
2w
k
j
+w
k
j1
2
x
_
. (86)
Using a vonNeumann analysis and writing
w
k
p
=
k
e
inpx
,
gives after substitution into (86)
e
inpx
k1
(
2
1) =
k
(e
inx
2 +e
inx
)e
inpx
.
This gives
2
1 = 4 sin
2
(
x
2
),
where = 2
t
/
2
x
. Thus
2
+ 4 sin
2
(
x
2
) 1 = 0.
43
This quadratic has two roots
1
,
2
. The sum and product of the roots is given
by
1
+
2
= 4 sin
2
(
x
2
),
1
2
= 1. (87)
For stability we require 
1
 1 and 
2
 1 and (87) shows that if 
1
 < 1 then

2
 > 1, and viceversa. Also if
1
= 1 and
2
= 1 then again from (87) we
must have = 0. Thus the Richardson scheme is unconditionally unstable.
7.10.3 von Neumann analysis in multidimensions
For two or more dimensions, the Fourier method extends easily. For instance for
two space dimensions we would seek solutions of the form
w
k
p,q
= e
inpx
e
inqy
k
.
44
7.11 Advanced iterative methods
We have looked at a number of simple iterative techniques such as point SOR,
line SOR and ADI. In this section we will consider more advanced techniques
such as the conjugate gradient method, and minimimal residual methods.
7.11.1 Conjugate Gradient Method
This method was rst put forward by Hestenes & Stiefel (1952) and was rein
vented again and popularised in the 1970s. The method has one important
property that it converges to the exact solution of the linear system in a nite
number of iterations, in the absence of roundo error. Used on its own the
method has a number of drawbacks, but combined with a good preconditioner it
is an eective tool for solving linear equations iteratively.
Consider
Ax = b, (88)
where Ais an nn symmetric and positive denite matrix. Recall Ais symmetric
if A
T
= A and positive denite if x
T
Ax > 0 for any non zero vector x. Consider
the quadratic form
(x) =
1
2
x
T
Ax x
T
b. (89)
We see that
(x) =
1
2
x
T
(Ax b)
1
2
x
T
b,
and from the postive deniteness of A the quadratic form has a unique minimum

1
2
b
T
A
1
b when x = A
1
b.
We will try and nd a solution to (88) iteratively. Let x
k
be the kth iterate.
Now we know that decreases most rapidly in the direction , and from (89)
we have
(x
k
) = b Ax
k
= r
k
.
If the residual r
k
= 0 then we can nd an such that (x
k
+ x
k
) < (x
k
). In
the method of steepest descent we choose to minimize
(x
k
+r
k
) =
1
2
(x
T
k
+r
T
k
)A(x
k
+r
k
) (x
T
k
+r
T
k
)b
=
1
2
x
T
k
Ax
k
+r
T
k
Ax
k
+
1
2
2
r
T
k
Ar
k
x
T
k
b r
T
k
b, (90)
where we have used x
T
k
Ar
k
= (x
T
k
Ar
k
)
T
= r
T
k
Ax
k
. Dierentiating (90) with
respect to and equating to zero to nd a minimum gives
0 = r
T
k
Ax
k
+r
T
k
Ar
k
r
T
b,
which gives
k
=
r
T
k
r
k
r
T
k
Ar
k
. (91)
45
The problem with the steepest descent algorithm as constructed above is that
convergence can be slow especially if A is ill conditioned. The condition number
of a matrix A is dened by
(A) =
max
i
min
i
where
i
is an eigenvalue of the matrix A. The matrix is said to be illconditioned
if is large. The steepest descent algorithm works well if is small, but for the
system of equations arising from a secondorder nite dierence approximation
to Poissons equation, is very large. In a sense the steepest descent method
is slow because the search directions are too similar. We can try and search for
the new iterate in another direction, say p
k
. To minimize (x
k
+ p
k
) we must
choose as in (91)
k
=
p
T
k
p
k
p
T
k
Ap
k
,
where now the search direction is dierent from the direction of the gradient.
Then
(x
T
k1
+
k
p
T
k
) =
1
2
(x
T
k1
+
k
p
T
k
)A(x
k1
+
k
p
k
) (x
T
k1
+
k
p
T
k
)b,
= (x
k1
)
1
2
(p
T
k
r
k1
)
2
p
T
k
Ap
k
,
and is reduced provided that p
k
is not orthogonal to r
k1
. Thus the algorithm
loops over
r
k1
= b Ax
k1
k
=
p
T
k
p
k
p
T
k
Ap
k
(92)
x
k
= x
k1
+
k
p
k
Thus x
k
is a linear combination of the vectors p
1
, p
2
, ..., p
k
. How can we choose
the p
i
? It turns out that we can choose them so that each x
k
solves the problem
min (x), x span(p
1
, p
2
, ..., p
k
),
when we choose to solve
min
(x
k1
+p
k
)
provided that we make the p
i
A conjugate.
Dene an nk matrix P
k
by P
k
= [p
1
, p
2
, ..., p
k
] . Then if x span(p
1
, p
2
, ..., p
k
)
there exists a (k 1) vector y such that
x = P
k1
y +p
k
,
46
and
(x) = (P
k1
y) +y
T
P
T
k1
Ap
k
2
2
p
T
k
Ap
k
p
k
b.
Choosing the {p
i
} so that
P
T
k1
Ap
k
= 0
decouples the globalpart of the minimization from the part in the p
k
direction.
Then the required global minimization of (x) is obtained by choosing
P
k1
y = x
k1
, and
k
=
p
T
k
b
p
T
k
Ap
k
=
p
T
k
r
k1
p
T
k
Ap
k
,
as before.
From (??)
p
T
i
Ap
k
= 0, i = 1, 2, ..., k 1,
ie p
k
is A to p
1
, p
2
, ..., p
k1
. Thus the {p
i
} are linearly independent and hence
x
n
is a linear combination of p
1
, p
2
, ..., p
n1
is the exact solution of
Ax = b.
Condition ?? does not determine p
k
uniquely. To do this we choose the p
k
to be closest to r
k1
because r
k1
gives the most rapid reduction of (x).
It can be shown that, (eg, see Golub & van Loan) that
the residuals r
1
, r
2
, ..., r
k1
are orthogonal,
p
T
i
r
j
= 0, i = 1, 2, .., j,
r
j
= r
j1
j
Ap
j
,
p
k
= r
k1
+
k
p
k1
where
k
=
p
T
k1
Ar
k1
p
T
k1
Ap
k1
.
47
7.11.2 Conjugate Gradient Algorithm
Using these results we obtain the following algorithm
r
0
= b Ax
0
p
0
= r
0
FOR i = 0, ..., n DO
BEGIN
i
=
(r
i
, r
i
)
(p
i
, Ap
i
)
x
i+1
= x
i
+
i
p
i
r
i+1
= r
i
i
Ap
i
i
=
(r
i+1
, r
i+1
)
(r
i
, r
i
)
p
i+1
= r
i+1
+
i
p
i
END
In practice x
n
= x because of rounding errors and the residuals are not
orthogonal. It can be shown that
x x
k

A
x x
0

A
_
1
1 +
_
2k
so that convergence can be slow for very large . Preconditioning of A is required
for the CG method to be useful.
7.11.3 Implementation issues
Note that the algorithm requires the matrix vector product Ap to be computed.
The matrix A is usually very large and sparse. As far as implementation of the
algorithm is concerned we do not explicitly construct the matrix A, only the eect
of A on a vector. Consider, for example, a secondorder discretization applied to
say Poissons equation,
2
= f(x, y) with Dirichlet boundary conditions. The
discretization gives the set of equations
w
i+1,j
2w
i,j
+w
i+1,j
2
x
+
w
i,j+1
2w
i,j
+w
i,j1
2
y
= f
i,j
,
2 i N 1
2 j M 1
,
and
w
1,j
= fl(j), w(N, j) = fr(j), 1 j M,
48
w
i,1
= gb(i), w
i,M
= gt(i), 2 i N 1,
and fl(y), fr(y), gb(x), gt(x) are known functions given on the boundaries. Here
we may dene x to the NM 1 vector of unknowns
x = (w
1,1
, w
2,1
.., w
N,1
, w
1,2
, ..., w
i,j
, ..., w
N,M
)
T
.
In fact we may identify the qth component x
q
of x by
x
q
= w
i,j
where q = i +N(j 1).
Then the matrix vector product Ax will have elements d
q
with q = i +N(j 1)
given by
d
q
=
w
i+1,j
2w
i,j
+w
i+1,j
2
x
+
w
i,j+1
2w
i,j
+w
i,j1
2
y
,
if i, j identies an interior point of the domain, and
d
1+N(j1)
= w
1,j
, d
N+N(j1)
= w
N,j
, 1 j M,
d
i
= w
i,1
, d
i+(M1)N
= w
i,M
, 2 i N 1,
for points on the boundary.
One other point to reemphazise is that in the above analysis/discussion the
matrix A is assumed to be symmetric and positive denite. Usually the system
of equations that we come across are not symmetric. Here one needs to consider
alternative algorithms, such as the GMRES method discussed later, or one could
try and use the conjugate gradient algorithm with the system
A
T
Ax = A
T
b.
The latter is not recommended as the condition number of the matrix A
T
A
deteriorates if the original matrix A is illconditioned to start with.
7.11.4 CG, alternative derivation
Let A be a symmetric positive denite matrix ie
A = A
T
, x
T
Ax > 0
for any nonzero vector x.
Also dene an inner product
(x, y) = x
T
y.
We saw that the functional
F(x) =
1
2
(x, Ax) (b, x),
49
is minimized when x is a solution of
Ax = b
The steepest descent method is dened by looking at iterations of the form
x
(k+1)
= x
(k)
+
k
r
(k)
where
r
(k)
= b Ax
(k)
is the residual, and in the direction of the gradient of F(x).
The s are chosen to minimize F(x
(k+1)
), and come out to be
k
=
(r
(k)
, r
(k)
)
(r
(k)
, Ax
(k)
)
.
The convergence of this technique can be slow, especially if the matrix is
illconditioned. By choosing the directions dierently we obtain the conjugate
gradient algorithm.
This works as follows. Let
x
(k+1)
= x
(k)
+
k
p
(k)
where p
(k)
is a direction vector. For the Conjugate Gradient method we let
p
(k)
= r
(k)
+
k
p
(k1)
for n 1, and
k
is chosen so that p
(k)
is A conjugate to x, ie
(p
(k)
, Ap
(k1)
) = 0.
This gives
k
=
(r
(k)
, Ap
(k1)
)
(p
(k1)
, Ap
(k1)
)
.
As before is chosen to minimize F(x
(k+1)
), and come out to be
k
=
(p
(k)
, r
(k)
)
(p
(k)
, Ap
(k)
)
.
Collecting all the results together gives the CG algorithm
x
(0)
arbitrary
x
(k+1)
= x
(k)
+
k
p
(k)
, k = 0, 1, ..
p
(k)
=
_
r
(k)
, k = 0
r
(k)
+
k
p
(k1)
, k = 1, 2,
k
=
(r
(k)
, Ap
(k1)
)
(p
(k1)
, Ap
(k1)
)
, k = 1, 2, ..
r
(k)
= b Ax
(k)
, k = 0, 1, ..
k
=
(p
(k)
, r
(k)
)
(p
(k)
, Ap
(k)
)
, k = 0, 1, ..
50
It can be shown that the
k
,
k
and r
(k)
are equivalent to
k
=
(r
(k)
, r
(k)
)
(r
(k1)
, r
(k1)
)
, k = 1, 2, ..
k
=
(r
(k)
, r
(k)
)
(p
(k)
, Ap
(k)
)
, k = 0, 1, ..
r
(k)
= r
(k1)
k1
Ap
(k1)
, k = 1, 2, ..
Also the following relations hold
(r
(i)
, r
(j)
) = 0 i = j
(p
(i)
, Ap
(j)
) = 0 i = j
(r
(i)
, Ap
(j)
) = 0 i = j and i = j + 1
The residual vectors r
(k)
are mutually orthogonal and the direction vectors p
(k)
are mutually A conjugate. Hence the name conjugate gradient.
7.11.5 Preconditioning
The conjugate gradient method combined with preconditioning is a very eective
tool for solving linear equations iteratively. Preconditioning helps to redistribute
the eigenvalues of the iteration matrix and this can signicantly enhance the
convergence properties. Suppose that we are dealing with the linear system
Ax = b. (93)
A preconditioning matrix M is such that M is close to A in some sense, and
easily invertible. For instance if we were dealing with the matrix A arising from
a discretization of the Laplace operator, then one could take M to be any one
matrices stemming from one of the classical iteration schemes such as Jacobi,
SOR, or ADI, say. The system (93) can be replaced by
M
1
Ax = M
1
b. (94)
The conjugate algorithm is then applied to (94) leading to the algorithm
x
0
guess r
0
= Ax
0
b
solve Mz
0
= r
0
and put
p
0
= z
0
FOR k = 1, n
k
=
(z
k1
, r
k1
)
(p
k1
, Ap
k1
)
51
x
k
= x
k1
+
k
p
k1
r
k
= r
k1
k
Ap
k1
Solve Mz
k
= r
k
k
=
(z
k
, r
k
)
(z
k1
, r
k1
)
p
k
= z
k
+
k
p
k1
END
52