4 M12 Fill

Module 4M12: Partial Differential Equations and
Variational Methods
Index Notation and Variational Methods∗
• Index Notation (2 h)
• Variational calculus vs differential calculus (5h)
• Example Paper (1 h)
• No data sheet
∗
Full information at “www2.eng.cam.ac.uk/˜jl305/4M12/4M12.html” in due course. There is no data
sheet for this course. The lecturenotes are kindly provided by Dr. Garth Wells.
1
1 Index notation1
1.1 The summation convention
Suppose x and y are vectors, and A and B are matrices. Write a few common combi-
nations in terms of their components:
• Dot product
n=3
X
x ·y = xi yi
i=1
• Matrix–vector multiplication
n
X
[Ax]i = Aij xj
j=1
• Matrix–matrix multiplication
n
X
[AB]ij = Aik Bkj
k=1
Notice the curious thing:
Every sum goes with an index which is repeated twice.

Non-repeated indices are not summed.
We can use a simplified notation by adopting the summation convention (due to Einstein),
P
Do not write the summation symbol . A repeated index
implies summation.
(An index may not appear more than twice on one side of an
equality.)
1
Index notation is also known as ‘suffix notation’.
4M12 – PAD/JL(jl305) 2
Using the summation convention,
• x · y = xi yi
• [Ax]i = Aij xj
• [AB]ij = Aik Bkj
Summary
If an index occurs once, it must occur once in every term of the equation, and the
equation is true for each separate value of this index. If an index appears twice it is
summed over all values. It does not matter what this is called: it is a ‘dummy index’
whose name can be changed at will. If an index appears three or more times in any given
term in an equation, it is wrong!
This may seem a very peculiar trick, with no obvious benefit. However, it will turn out
to be surprisingly powerful, and make many calculations involving vector identities and
vector differential identities much simpler.
1.2 The Kronecker delta δij
Two additional pieces of notation are needed. The first is a way to write the identity
matrix I,  
1 0 0
 
I=
0 1 0

0 0 1
We define the ‘Kronecker delta’ as

1 i =j
δij = ,
0 i 6= j
We know that
4M12 – PAD/JL(jl305) 3
Iy = y
and
δi j yj = yi
In other words ‘if one index of δij is summed, the effect is to swap this to the other
index’.
1.3 The permutation symbol ijk
Another necessary ingredient is a way to write the cross product of two vectors in index
notation,

e e e

1 2 3 h i

x × y = x1 x2 x3 = x2 y3 − x3 y2 , x3 y1 − x1 y3 , x1 y2 − x2 y1

y1 y2 y3
where ei are the basis for the vectors. We have assumed that ei are the unit vectors for
Cartesian coordinates (you may have seen the basis vectors written as i , j and k). To
express the cross product in index notation, we will use the permutation symbol ijk .
The permutation symbol ijk is defined as

1 if (ijk) is an even permutation of (1, 2, 3)





ijk = −1 if (ijk) is an odd permutation of (1, 2, 3)



0

otherwise
For example,
4M12 – PAD/JL(jl305) 4
112 = 0 (any repeated index)
312 = 1 (even number of inversions of pairs )
132 = −1 (odd number of inversions of pairs )
The permutation symbol is also known as the ‘alternating symbol’ or the ‘Levi-Civita
symbol’.
Using the permutation symbol, we can write the cross product of two vectors as:
[x × y ]i = ijk xj yk
To prove this, for each i sum over j and k. The permutation symbol possesses a number
of ‘symmetries’,
i jk = ki j = jki (cyclic permutation)
= −ji k = −i kj = −kji (switch pair ij, switch pair ki , switch pair jk)
1.4 The ijk – δij identity
There is an important identity (“contracted epsilon identity) relating ijk and δik :
i jk klm = δi l δjm − δi m δjl

| {z }
sum over k
The proof is simple (but somewhat tedious!), just check every case. We sum over k,
which leaves four free indices and each index runs 1 → 3, therefore there are 43 = 81
cases. Here are two examples:
4M12 – PAD/JL(jl305) 5
• i = 1, j = 2, l = 1, m = 3
ijk klm = 121 113 + 122 213 + 123 313 = 0 + 0 + 0 = 0
δil δjm − δim δjl = δ11 δ23 − δ13 δ21 = 0 + 0 = 0
• i = 1, j = 2, l = 1, m = 2
ijk klm = 121 112 + 122 212 + 123 312 = 0 + 0 + 1 = 1
δil δjm − δim δjl = δ11 δ22 − δ12 δ21 = 1 + 0 = 1
Example A vector identity
[a × (b × c)]i = i jk aj (b × c)k
= i jk aj klm bl cm
= (δi l δjm − δi m δjl ) aj bl cm
= bi am cm − ci aj bj
= [(a · c) b − (a · b) c]i
4M12 – PAD/JL(jl305) 6
1.5 A trick: symmetry and anti-symmetry
We expect the a × a = .
In index notation,
[a × a]i = i jk aj ak
The term aj ak is symmetric in j, k. That is, aj ak = ak aj .

The term i jk is anti-symmetric in j, k. That is i jk = −i kj .
1
i jk aj ak =(i jk aj ak + i jk aj ak )
2
1
= (i jk aj ak + i kj ak aj )
2
1
= (i jk aj ak − i jk aj ak )
2
= 0,
as expected.
Symmetry and anti-symmetry are often useful to simplify expressions. The permutation
symbol is anti-symmetric in any of its two indexes. Some other symmetric expressions
include
• ai aj for any vector a
• The Kronecker delta δij
• The matrix B when [B]ij = [A]ij + [A]ji
∂ 2φ
• The partial second derivative of a scalar function φ,
∂xi ∂xj
4M12 – PAD/JL(jl305) 7
1.6 Vector derivatives
The real power of index notation is revealed when we look at vector differential identities.
The vector derivatives known as the gradient, the divergence and the curl can all be
written in terms of the operator ∇,

∂ ∂ ∂
∇= , , ,
∂x1 ∂x2 ∂x3
where [x1 , x2 , x3 ] are the components of the position vector x.

For the scalar function φ and a vector function u, both of which are a functions of
position x, we can write
Gradient

∂φ ∂φ ∂φ ∂φ
grad φ : [∇φ]i = = , ,
∂xi ∂x1 ∂x2 ∂x3 i
Divergence
∂ui ∂u1 ∂u2 ∂u3

div u : ∇·u = = + +
∂xi ∂x1 ∂x2 ∂x3
Laplace
2 ∂ 2 u1 ∂ 2 u2 ∂ 2 u3
∇ φ = ∇ · (∇φ) == + +
∂x12 ∂x22 ∂x32
Curl (recall that [a × b]i = i jk aj bk )
∂uk
curl u : [∇ × u]i = i jk
∂xj
The machinery we have developed for index notation can be used directly to manipu-
late quantities without having to be constantly thinking about the complicated physical
meanings of div or curl, for example.
4M12 – PAD/JL(jl305) 8
Example: Product rule for div and curl
Divergence
∂
∇ · (φu) = (φui )
∂xi
∂φ ∂ui
= ui + φ (product rule for differentiation)
∂xi ∂xi
= u · ∇φ + φ∇ · u
Curl
∂
[∇ × (φu)]i = i jk (φuk )
∂xj

∂φ ∂uk
= i jk uk +φ
∂xj ∂xj
= [∇φ × u + φ∇ × u]i
Example: curl (gradφ)

∂ ∂φ
[∇ × (∇φ)]i = i jk
∂xj ∂xk
∂ 2φ
= i jk
∂xj ∂xk
=0 (by symmetry and anti-symmetry in j, k)
1.7 Derivatives of the position vector
If the position vector x itself appears in a vector expression which is being manipulated
in index notation, then it is useful to notice that
∂xi
= δij ,
∂xj
4M12 – PAD/JL(jl305) 9
which follows directly from the definition of a partial derivative:
∂x1 ∂x1
= 1, = 0, etc.
∂x1 ∂x2
So for example:
∂xi
• ∇·x = = δii = 3
∂xi
∂xk
• [∇ × x]i = ijk = ijk δjk = 0 (by symmetry/anti-symmetry in j, k)
∂xj
1.8 Magnitude of a vector
Sometimes the magnitude (or modulus) of a vector appears in an expression which you
wish to differentiate in some way. The trick here is to force it into the form of a scalar
product, even though this may make the expression look more complicated than when
you started. Index notation allows scalar products to be handled easily, and the normal
rules of calculus will allow the calculation to be done without difficulty.
Example: What is ∇|x|?
∂ 1/2

[∇|x|]i = (xj xj )
∂xi
1 ∂xj
= (xk xk )−1/2 2xj
2 ∂xi
There is a danger of j appearing four times in the same expression which is not permitted
and would indicate an error. Recall that j was a dummy index, summed over all values.
So in one of the two expressions, j has been replaced by k (or any other index you like,
apart from i or j). Continuing:
1 ∂xj
(xk xk )−1/2 2xj = (xk xk )−1/2 xj δi j
2 ∂xi
= (xk xk )−1/2 xi
4M12 – PAD/JL(jl305) 10
or in vector notation,
x
∇|x| =
,
|x|
which is just the unit vector in the direction of x.
1.9 Integral theorems
Index notation allows the divergence theorem and Stokes’s theorem to be written in
a way which makes them look familiar. Recall this result for ordinary integration and
differentiation of a function f (x) on the interval (a, b):
Z b
df
dx = f |ba = f (b) − f (a)
a dx
In other words, the integral of the derivative of a function is equal to the function, and
can be evaluated from the value of the function on the boundaries of the integration
interval. This general description will turn out to apply just as well to volume integrals
(via the Divergence Theorem) and to surface integrals (via Stokes’s theorem).
Two-dimensional integration.
Let u = (u(x, y ), v (x, y )) be a two-dimensional vector field in a 2D region Ω with

boundary Γ.
Z Z d Z xr Z d
∂u ∂u
dΩ = dx dy = u(xr , y ) − u(xl , y )dy
Ω ∂x c xl ∂x c
I I
= u dy = unx dΓ
Γ Γ
Z I I
∂v
dΩ = −v dx = v ny dΓ
Ω ∂y Γ Γ
4M12 – PAD/JL(jl305) 11
Hence Divergence theorem in 2D
Z I I
∂u ∂v
+ dΩ = (u dy − v dx) = u · n dΓ
Ω ∂x ∂y Γ Γ
1.9.1 Divergence theorem
The divergence theorem states:

Z Z
∇ · f dV = f · n dS,
V S
where f is a vector function of position, V is a volume enclosed by the surface S (S = ∂V )

and n is the unit outward normal vector on S (it points outwards from V ).
s
n
We can also write dA = n dS, or using indexes dAi = ni dS.

For the divergence theorem, we can derive a number of other results (corollaries). For
example, what can we say about the gradient of a scalar function, integrated over the
volume V ? To examine this, consider the divergence theorem using index notation,
Z Z
∂fi
dV = fi ni dS.
V ∂xi S
4M12 – PAD/JL(jl305) 12
Now suppose the we have a scalar function φ. We can create a vector function f by
multiplying it by a vector a,
f = φa,
where a is any fixed vector (it does not depend on x). Substituting this into the divergence
theorem,
Z Z
∂φai
dV = φai ni dS
V ∂xi S
Since a is fixed it can be taken out of the integral,

Z Z
∂φ
ai dV − φni dS = 0
V ∂xi S
The vector a is arbitrary, as long as it is fixed . Therefore the

above expression must hold for all fixed vectors. This implies
that Z Z
∂φ
dV − φni dS =0
V ∂xi S
and therefore Z Z
∂φ
dV = φni dS
V ∂xi S
In vector notation, we have Z Z

∇φ dV = φn dS.
V S
A more general result which can be proved using the ‘fixed vector’ type approach is that
Z Z
∂
(?) dV = (?) ni dS,
V ∂xi S
where (?) is any index notation expression. We could, for example, insert
jik fk
4M12 – PAD/JL(jl305) 13
and deduce a ‘curl theorem’ (check it). We now have a nice generalisation for relating
volume integrals of vector derivatives to surface integrals. The integral of any ‘regular’
function over a volume can be transformed into a surface integral. When in doubt, one
can always return to index notation to check and develop the necessary relationships.
Transforming surface integrals into volume integrals is the key to deriving many equa-
tions used in physics and mechanics which come from conservation laws. We can start
by posing a balance in terms of what is happening on the boundary, and then transform
the expression of conservation into a differential equation.
1.9.2 Stokes theorem
Stokes theorem is stated as:

Z I
∇ × f dA = f · dl ,
S C
where S is a surface (possibly curved), dA = n dS is an area vector element and n is the

unit normal vector to the surface S, C = ∂S is the curve that bounds the surface, m is
the outward unit normal vector to C within the surface S, and dl = s dC is the vector
element running along the curve and s is tangential to the boundary. n, m and s form a
right-hand set. The direction of s is important!
C
n
s
s m dS
As for the Divergence theorem, we can write Stokes theorem using indices,
Z I
∂fk
ijk ni dS = fk sk dC
S ∂xj C
4M12 – PAD/JL(jl305) 14
and we can use the trick with the fixed vector to get
Z I
∂ (?)
ijk ni dS = (?) sk dC
S ∂xj C
The expression can be manipulated further using another expression which involves an-
other symbol, and then using the − δ identity to push the to the other side of
the equation. However, it is not possible to eliminate the entirely, therefore Stokes’s
theorem always appears to be more complicated that the divergence theorem.
1.9.3 Integration by parts
An important tool, particularly for variational methods, is integration by parts. Starting

in one dimension,
Z b Z b Z b
d (uv ) du dv
dx = v dx + u dx,
a dx a dx a dx
and then rearranging,
Z Z b
b
du dv
v dx = − u dx + uv |ba .
a dx a dx
We want now to generalise this to two and three dimensions. The do this, consider the
divergence of a term φu,
Z Z Z
∇ · (φu) dV = ∇φ · u dV + φ∇ · u dV
V V V
where the product rule for differentiation has been applied. Applying the divergence
theorem for the term on the left-hand side,
Z Z Z
φu · n dS = ∇φ · u dV + φ∇ · u dV
S V V
The operation can also be performed using index notation,
Product rule for differentiation

Z Z Z
∂ (φui ) ∂φ ∂ui
dV = ui dV + φ dV
V ∂xi V ∂xi V ∂xi
Apply divergence theorem to LHS

Z Z Z
∂φ ∂ui
φui ni dS = ui dV + φ dV
S V ∂xi V ∂xi
4M12 – PAD/JL(jl305) 15
1.10 Directional derivative
We will make frequent use of the directional derivative when working with variational
problems. The directional derivative of a function f (u) is defined as

df (u + v )
Df (u) [v ] =
d
=0
It means ‘the derivative of f with respect to u in the direction of v ’. In computing the

directional derivative, we first compute the derivative with respect to , and then set
= 0. For example
For f = u · u, where u is some vector function, the directional

derivative is

d
Df (u)[v ] = ((u + v ) · (u + v ))
d =0
= 2 ((u + v ) · v )|=0
= 2u · v
1.11 Tensors
The manipulation of objects using indices has been presented on the premise of a few
basic rules. However, much of what has been presented carries over to the richer field of
tensors. Some basic tensor operations will be considered, but only scratching the surface
so that we can tackle more problems of practical relevance2 .
We are already familiar with the idea of vectors. However, you might find it hard to
give a formal definition of a vector, although you have though been exposed to aspects
of the formal definition. For example, vectors are often expressed as
a = a1 i + a2 j + a3 k,
2
Basic consideration of tensors is new in 2011.
4M12 – PAD/JL(jl305) 16
or using modern notation as
a = ai ei = a1 e1 + a2 e2 + a3 e3 ,
The mathematical definition is in two parts. First, a vector a is quantity which, with
respect to a particular Cartesian coordinate system, has three components (a1 , a2 , a3 ),
or equivalently ai , i = 1, 2, 3. But that is not the whole story. We also need to express
the fact that if we choose to rotate our coordinate axes, the vector (the ‘arrow’ which
you might draw in a diagram) remains the same. This means that a vector also has an
associated basis. In the above example, the basis is ei . We may change the components
and the basis of a vector such that the vector remains the same (it still points in the
same direction and still has the same length).
A second-order (or second-rank) tensor is like a matrix but has a basis associated, and
can be expressed using the notation
A = Aij ei ⊗ ej .
Luckily, when working with a fixed orthonormal basis, we can work with Aij , usually
manipulating it just as we would a matrix.
You have in fact considered second-order tensors in Part IA without being told (and
most likely without the lecturer realising). Recall that ‘matrix’ can be rotated into a new
coordinate system via
A0ij = Rik Akl Rjl
where it turns out that

Rij = ei0 · ej .
Common second-order tensors include stress and strain.
Derivatives and second order tensors

The two operations that we wish to consider are the gradient of a vector and the di-
vergence of a second-order tensor. The gradient of a vector produces a second-order
tensor:
∇a = ∇ ⊗ a
∂ai
= ei ⊗ ej .
∂xj
4M12 – PAD/JL(jl305) 17
For our purposes, it suffices to just recall [∇a]ij = ∂ai /∂xj . Just to make things more
confusing, in fluid mechanics, the gradient is often defined as the transpose of what is
defined here.
Another important operation is the divergence of a second-order tensor. It leads to a
vector:
∂Aij
∇·A= ei
∂xj
If we combine the gradient and the divergence operators, we obtain the Laplace operator
for a vector field a:
∇2 a = ∇ · (∇a) = ∇2 a1 , ∇2 a2 , ∇2 a3

Similar to the gradient, it suffices for our purposes to recall [∇ · A]i = ∂Aij /∂xj . Just
as for a vector, the divergence theorem can be applied to a second order tensor:
Z Z
∇ · A dV = An dS
V ∂V
Tensors, in combination with balance laws, can be used to systematically derive many
differential equations of importance in physics and engineering.
Example: Equilibrium equation of a body with mass density ρ: If the force per unit area
acting on a surface is denoted by the vector t (the traction vector), equilibrium (forces
sum to zero) requires that
Z Z
t dS + ρg dV = .
∂V V
The stress (a second order tensor) at a point satisfies by definition t = σn, where n
is the outward unit normal vector to a surface. Using indices, ti = σij nj . Inserting the
expression for σ, Z Z
σn dS + ρg dV = .
∂V V
Applying the divergence theorem,
Z Z
∇ · σ dV + ρg dV = .
V V
4M12 – PAD/JL(jl305) 18
Since equilibrium must apply to any sub-body of V , the equation can be localised,
∇ · σ + ρg = .
This is balance of linear momentum. It can be shown that angular momentum balance
is satisfied if σ = σ T .
4M12 – PAD/JL(jl305) 19
2 Variational methods
The variational form of a differential equation is an alternative way of expressing the same
problem. The variational view, and the associated machinery of variational methods and
functional analysis are at the heart of the modern study of partial differential equations
and provide the basis for a variety of numerical solution procedures, like the finite element
method.
We will see that classical variational methods involve the minimisation of a functional,
although many of the concepts of variational methods extend beyond this classical per-
spective.
Along with integration by parts, the fundamental lemma of the calculus of variations
is essential to our developments of vational method. The lemma says the following:
Fundamental Lemma of the Calculus of Variations If g(x) is a continuous function

in x0 ≤ x ≤ x1 , and if Z x1
g(x)h(x)dx = 0 ,
x0
where h(x) is an arbitrary differentiable function in the same interval with h(x0 ) =
h(x1 ) = 0, then g(x) = 0 at very point in the interval x0 ≤ x ≤ x1 .
Proof The proof by contradiction is straightfoward and hinges on the fact that the
function h(x) is arbitrary. Let us suppose that the function g(x) is nozero and positive
in some subinterval a ≤ x ≤ b within the interval x0 ≤ x ≤ x1 . Because h(x) is arbitrary,
we can set it to be any function of our choice, such as



 0, x <a

h(x) = (x − a)2 (b − x)2 , a ≤ x ≤ b



0, x >b
4M12 – PAD/JL(jl305) 20
(x−a)2(b−x)2
h(x)
Observe that h(x) is differential and satisfies the required conditions at the end points.
Noting that h(x) > 0 for a < x < b, the integral
Z x1 Z b
g(x)h(x)dx = g(x)(x − a)2 (b − x)2 dx > 0 ,
x0 a
which is positive because the integrand is positive throughout the subinterval except at
x = a and x = b, where it is zero. Thus, the lemma is proven by contradiction, as the
only way for the integral over the entire interval to be zero is if the function g(x) = 0
everywhere in the interval. Note that, from the construction, h0 (x0 ) = h0 (x1 ) = 0.
2.1 Abstract problem
Say we have a functional J that depends on the function u(x). We will usually want
the find the function u that minimises J (sometimes we will be satisfied with stationary
points). The problem is stated as:
min J (u) .
u
The solution u is sometimes referred to as a minimiser of J. In general, some constraints

will be applied to u.
To find u that minimises J, we take the directional derivative of J and set it equal to
zero,
4M12 – PAD/JL(jl305) 21

dJ (u + v )
DJ (u) [v ] = =0
d
=0
Recall that the directional derivative is the ‘change’ in J if we move a small distance from
u in the direction of v (hence the name ‘variational methods’).
For simple problems, we can apply (partial) differentiation directly without going
through the formalities of the directional derivative.
The precise definition of J depends on the problem considered. We will address a
number of different examples in the following.
2.2 Principal of minimum potential energy
An important source of variational problems is the principle of minimum potential energy.

It is an accepted law of physics that the potential energy of a system will approach a
minimum at equilibrium. If we can define the potential energy contributions in a system,
we can minimise a functional to find the equilibrium solution. It also turns out that from
the minimisation of the energy functional that we can also deduce differential equations.
2.3 Introductory examples
2.3.1 Refraction
Refraction describes the path of a wave or light as it passes from one medium to another
when the two mediums have different wave speeds.
4M12 – PAD/JL(jl305) 22
P2
L2
θ2
θ1
L1 x b
P1 a a
medium 1 medium 2
speed c 1 speed c 2
c1 > c 2
Snell’s law describes this process,

sin θ1 c1
= .
sin θ2 c2
This can be demonstrated by using the wave approach, but it also follows from Fermat’s
principle: the path of a wave between two given points is the one that can be traversed
in minimum time.
Take two points P1 , P2 equidistant from the interface. Suppose the ray crosses at x.
Then
p
L1 = x 2 + a2
q
L2 = (b − x)2 + a2
The time T required to travel from P1 to P2 is therefore

L1 L2
T = + (2.1)
c1 c2
where c1 and c2 are the wave speeds in the two media. The time T is minimum is when
dT
= 0.
dx
That is
1 x 1 b−x
√ − q = 0,
c1 x + a
2 2 c2 2 2
| {z } (b − x) + a
sin θ1 | {z }
sin θ2
4M12 – PAD/JL(jl305) 23
which leads to
sin θ1 c1
=
sin θ2 c2
We have assumed that the path of the wave is straight in each medium. A more general
problem would also involve the path being unknown.
The is an example of a variational method. We have formulated a ‘functional’ T
(equation (2.1)) and minimised it.
2.3.2 Static equilibrium of a bodies
Suppose a body is acted on by a set of i forces fi that can all be described via potential
functions Vi (i.e. fi = ∇Vi ). Examples include springs, gravity, electrostatics). At
equilibrium, the forces must sum to zero
X X X
fi = ∇Vi = ∇ Vi = 0
i i i
P
This result says that for equilibrium, the total potential energy V = i Vi must be
stationary (minimum, maximum or inflection). We can explore this for two examples.
1. Bead on a frictionless wire
m f(x)
The height h = f (x) and the gravitational potential V = mgh. The equilib-
rium points are marked with a cross are stationary points of the potential energy
(dV /dx = 0). Intuitively, only the minima are stable points.
2. Mass on a spring system
11
00 m
00
11
00
11
00
11 k
00
11
00000000000
11111111111
00
11
00000000000
11111111111
Potential energies (x: movement of mass downwards):
4M12 – PAD/JL(jl305) 24
• Potential energy in the spring Vs = 12 kx 2
• Potential energy due to gravity Vg = −mgx
• Total potential energy V = 12 kx 2 − mgx
Potential energy is minimum when

dV
= kx − mg = 0,
dx
which is the usual balance of forces.
2.3.3 Equilibrium of a string
We consider now a continuous system and show the link to the principle of virtual work.
Consider a stretched elastic string (with tension P ) between two end points (x = 0,
x = a) from which a distributed mass m (x) per unit length is applied. We seek the
displaced shape y (x) of the string at equilibrium.
x=0 x=a
P y(x) P
gm(x)
The first step is to find the potential energy stored in the displaced string. Consider a
small element,
x x+dx
y(x) y(x+dx)
y(x) + δ x y’(x)
δl
2
δl 2 ≈ δx 2 + (δxy 0 )
Therefore
δl − δx p 1
= ≈ 1 + y 02 − 1 ≈ y 02
δx 2
4M12 – PAD/JL(jl305) 25
(binomial approximation, (1 + z)α ≈ 1 + αz for small z). Hence the potential energy
stored (elastic stored energy) in the string is
P a 02
Z
Vs = y dx.
2 0
This is positive because the strain acts against P . The work done by the gravity in
displacing the string downwards is given by
Z a
Vg = −g my dx.
0
(Reduction in height → reduction in potential energy.) The total potential energy is

equal to Z a
P 0 2
V (y ) = (y ) − gmy dx
0 2
For equilibrium, the total potential energy must be a minimum, and the total potential
energy is stationary when the directional derivative of the total potential energy is equal
to zero. Taking the directional derivative,
Z a
dV (y + z) d P 0 2
DV (y ) [z] = = (y + z 0 ) − gm (y + z) dx
d d 2
=0 Z a 0 =0
= (P y 0 z 0 − gmz) dx
0
(2.2)
To minimise to the potential energy, we need to find a y for which DV (y ) [z] = 0 for
all z. In the variational methods literature (and in this course in the past), the directional
derivative of V is sometimes expressed as δV , and z (which is arbitrary) is replaced by
δy , Z a
δV = (P y 0 δy 0 − gmδy ) dx.
0
This is the virtual work equation.
We can manipulate equation (2.2) into another form using integration by parts,
Z a
DV (y ) [z] = (P y 0 z 0 − gmz) dx
0
Z a (2.3)
00 0 a
=− (P y + gm) z dx + P y z|0 .
0
4M12 – PAD/JL(jl305) 26
Since z is arbitrary, we can choose it such that z(0) = z(a) = 0, then
Z a
− (P y 00 + gm) z dx = 0,
0
which can only be satisfied (from the fundamental lemma of the calculus of variations) if
P y 00 + gm = 0, for 0 < x < a.
This is a key point in the ‘calculus of variations’. Minimisation of the potential energy
functional has implied the solution of a differential equation. We could have derived the
differential equation directly by considering the force balance in the string,
mgdx
P
dy
dx
There is more information to extract from equation (2.3). Since P y 00 + gm = 0, we

are left with P y 0 z|a0 which must vanish. There are two possibilities:
1. z = δy = 0 at x = 0, x = a
That is, we do not permit perturbations of y at the ends which is equivalent to

saying that we fix the end points of the wire.
2. y 0 = 0 at x = 0, x = a
That is, the end of the wire are free.
No vertical reaction force, only horizontal reaction
4M12 – PAD/JL(jl305) 27
Summary of key points
1. The variational method automatically yields the governing differential equation and
suitable boundary conditions.
2. Provided you can write down the energy functional (the energy expression), the
rest is a ‘handle cranking’ exercise.
3. Integration by parts plays a central role. For a two- or three-dimensional body, the
divergence theorem or Stokes’s theorem is used.
2.4 General principles in one dimension
We present now the methodology for class of problems in an abstract framework for
problems with only one independent variable. Suppose we wish to minimise
Z L
I= F (y , y 0 , x) dx,
0
where y = y (x) and y 0 = dy /dx, then we need to compute the directional derivative
and minimise it. (Note that y is ultimately dependent on x, hence it is a one-dimensional
problem.) The directional derivative is
L
Z
0 d 0 0

DI (y , y , x) [z] = F (y + z, y + z , x) dx
d 0 =0
L
∂F (y , y 0 , x) ∂F (y , y 0 , x) 0
Z
DI (y , y 0 , x) [z] = z+ z dx
0 ∂y ∂y 0
(Hint: Set a = y + z and a0 = y 0 + z 0 and apply the chain
rule for differentiation.)
Using integration by parts,
Z L L
∂F d ∂F ∂F
DI (y , y 0 , x) [z] =

− 0
z dx + z
0

0 ∂y dx ∂y ∂y 0
4M12 – PAD/JL(jl305) 28
Following the same arguments used in the previous section (using a convenient function
for z), at stationary points of I it is required that
∂F d ∂F
− = 0, for 0 < x < L, (2.4)
∂y dx ∂y 0
which is known as the Euler–Lagrange equation.
As before, we can also extract some boundary condition information since the boundary
term must vanish, so we have
1. z = δy = 0 fixed boundary, e.g. y = 0; or

∂F
2. =0 free boundary.
∂y 0
It can be helpful to manipulate (2.4) into an alternative form using the Beltrami identity.
Taking the total derivative of F with respect to x,
dF ∂F ∂F ∂F
= + y0 + y 00 0 .
dx ∂x ∂y ∂y
Re-arranging,
∂F dF ∂F ∂F
− + − y0 − y 00 0 = 0. (2.5)
∂x dx ∂y ∂y
0
Multiplying (2.4) by y and then inserting (2.5),

∂F d 0 ∂F
− + F −y = 0. (2.6)
∂x dx ∂y 0
Special cases
There are some special cases that are interesting to examine.
1. F depends on y 0 but not on y .
Then ∂F/∂y = 0, the Euler–Lagrange equation is

d ∂F
= 0,
dx ∂y 0
therefore
∂F
= k, where k is constant. (2.7)
∂y 0
2. If F does not depend explicitly on x (there may still be a dependency through y
and y 0 ).
4M12 – PAD/JL(jl305) 29
In this case, ∂F/∂x = 0. Therefore, from (2.6)

d 0 ∂F
F −y = 0,
dx ∂y 0
hence
∂F
F − y0 = k, where k is constant. (2.8)
∂y 0
In this course we will emphasise a first principles approach rather than the limited case
presented in this section since we want to address problems in different dimensions.
2.5 Examples in one dimension
Example is one-dimension can often make direct use of equation (2.4), which is what we
will do in this section.
2.5.1 Shortest path between two points
Find the shortest line joining the points (0, 0) and (a, b):
(a, b)
ds
(0, 0)
For an element ds of the line,

p p
ds = dx 2 + dy 2 = dx 1 + y 02 .
The length of the line of the line is

Z Z a 21
L= ds = 1 + y 02 dx,
0
with y (0) = 0 and y (a) = b. We wish to find the minimum value of L.

1/2
The function corresponding to F in our general treatment is (1 + y 02 ) , in which
neither x nor y appear, and either of the special cases in the previous section can be
applied. Using the result in equation (2.7),
4M12 – PAD/JL(jl305) 30
∂F 1 1 0
02 − 2
= 1+y 2y = k (2.9)
∂y 0 2
where k is constant. Therefore
y 02 = k 2 1 + y 02 ,

(2.10)
which can be rearranged such that

k
y0 = √ , i.e., y has a constant slope. (2.11)
1 − k2
As expected, the shortest path is a straight line. The determination of the precise
equation (including the value of k) follows from requiring that the line pass through the
end points.
Arclength in cylindrical and spherical coordinate systems The two most important
coordinate systems (besides Cartesian) in three dimensions are the spherical and the cylin-
drical coordinate systems. The following figure and equations shows the geometrical
meaning of the variables and the appearance of the volume elements.
Cylindrical and Spherical Coordinates.
In the cylindrical and spherical coordinates, the arc length element ds is a space diagonal
of the volume element. In the cylindrical coordinates, the sides of the volume element
are dr , r dθ, dz, so the arc length element is given by
4M12 – PAD/JL(jl305) 31
p
ds = dr 2 + r 2 dθ2 + dz 2 ,
In the spherical coordinates, the sides of the volume element are dr , r dθ, r sin θdφ, so
the arc length element is given by
q
ds = dr 2 + r 2 dθ2 + r 2 sin2 θdφ2 .
2.5.2 Soap film spanning across two rings
The shape assumed by a soap film minimises the total surface energy. Since surface
energy is given by the surface tension multiplied by the area, the shape that minimises
the surface energy is simply the one that minimises the area. The result is not as
immediately obvious as the previous example.
z
r z
a a
4M12 – PAD/JL(jl305) 32
Area of the soap ring from z to z + dz is:
p
02
21
dA = 2πr (z) dz 2 + dr 2 = 2πr 1 + r dz
Therefore Z b 21
A = 2π r 1 + r 02 dz
−b
The expression inside the integral does not depend on z, so we can use the special case
in equation (2.8) directly,
21 − 12
2πr 1 + r 02 − r 0 2πr 1 + r 02 r 0 = 2πk
We want now to manipulate this into a convenient form,

1
r 1 + r 02 − r r 02 = k 1 + r 02 2

r 2 = k 2 1 + r 02

√
0 r2 − k2
r =
k
Recall that r 0 = dr /dz, we can move all r -terms to one side of the equality and all
z-terms to the other and integrate,
Z Z
dr dz
√ = + C.
2
r −k 2 k
4M12 – PAD/JL(jl305) 33
Therefore, after integrating both side
z −1 r

= cosh −K (from the maths data book)
k k
The shape of the film is symmetric in z, therefore K = 0, and

the shape is
r z
= cosh
k k
The surface must pass through r = a at z = ±b, so k satisfies
a b
= cosh .
k k
2.6 Canonical multi-dimensional problem: Poisson equation
The same procedures can be extended to the minimisation of a functional over a surface
or a volume. Consider the functional
Z Z
P
I= ∇w · ∇w dV − f w dV,
2 V V
where P is the tension in a membrane, w (x, y ) is the deflection and f (x, y ) is the applied
force per unit area. We have denoted the surface by V , and will denote its boundary
by S. The process will generalise to three dimensions.
We wish to minimise the functional I. Following our standard process, we take the
directional derivative of I,
4M12 – PAD/JL(jl305) 34
Z Z
dI (w + v ) d P
= ∇ (w + v ) · ∇ (w + v ) dV − f (w + v ) dV
d
=0 d 2 V V
Z Z
=P ∇w · ∇v dV − f v dV
V V
Now, when I is a minimum,
Z Z
P ∇w · ∇v dV − f v dV = 0 for all v
V V
This expression can be manipulated into a more familiar form by applying integration by
parts to remove derivatives of v ,
Z Z
P ∇w · ∇v dV − f v dV
V Z V Z Z
2

= −P ∇ w v dV − f v dV + ∇w · nv dS
V V S
Following the usual arguments, we can deduce that the deflection satisfies the differential
equation
P ∇2 w + f = 0
which is Poisson’s equation.

On the boundary, we have either v = 0 (a fixed boundary where w is prescribed) or a
free boundary where the condition is:
4M12 – PAD/JL(jl305) 35
∇w · n = 0.
While this problem was introduced as membrane, we have worked through the problem
in a fashion which is independent of the dimension. The final equations are therefore
equally valid in one, two, and three dimensions. This functional is very common in
engineering and physics. It describes deflection of a membrane, heat conduction and
ground water flow, amongst other things.
2.7 Constrained problems and Lagrange multipliers
It is common to introduce constraints in variational problems. It involves finding station-

ary points of a particular functional subject to some constraint conditions.
2.7.1 Discrete problem
Suppose that we wish to minimise the functional
F (x, y ) = x 2 + y 2 (2.12)
subject to the constraint

G (x, y ) = y + 2x − 3 = 0. (2.13)
y
expect answer
here
x
y = −2x +3
4M12 – PAD/JL(jl305) 36
In other words, we wish to finding the shortest distance from the origin to a point on the
line given by G = 0.
We require F to be stationary as we move along the line G = 0. In other words, at
the stationary point the rate of change of F in the direction of G = 0 must be zero.
Recalling that the vector ∇G is perpendicular to the line G, this is the same as
∇F = 0 in the direction of ⊥ ∇G
or
∇F k ∇G i.e. ∇F = −λ∇G for some value λ,
which we can express as

∇ (F + λG) = 0
The key now to reformulate the constrained problem as an unconstrained problem,

which we can do by adding a multiple of the constraint equation to the original functional
to be minimised. The quantity λ is known as a Lagrange multiplier, and the original
constrained problem an be solved by finding stationary points of the function F + λG,
where is λ is an unknown.
Applying this approach to the problem defined in equations (2.12) and (2.13):
4M12 – PAD/JL(jl305) 37
Find stationary points of H = F +λG = x 2 +y 2 +λ (y + 2x − 3)
∂H
= 0 → 2x + 2λ = 0
∂x
∂H
= 0 → 2y + λ = 0
∂y
∂H
= 0 → y + 2x − 3 = 0
∂λ
Equate first two equations and eliminate λ,
2x − 4y = 0 → x = 2y
From ∂H/∂λ = 0 (the constraint), y + 4y = 3 and
3 6
y= , x = 2y = .
5 5
The same approach can be applied to problems that involve integration. Note that taking
the directional derivative with respect to λ simply recovers the constraint equation.
2.7.2 Shape of a hanging chain
We consider now a continuous problem. We wish to find the shape of a flexible chain
with uniform mass density will make when hanging from supports at x = −a and x = a.
The chain has a length of 2L, where a < L.
(−a, 0) (a, 0)
y(x)
4M12 – PAD/JL(jl305) 38
The potential energy is
Z a Z a 21
I = −ρg y ds = −ρg y 1 + y 02 dx,
−a −a
which we wish to minimise subject to

Z a Z a 21
J= ds = 1 + y 02 dx = 2L.
−a −a
So we introduce a Lagrange multiplier and try to find the function y that makes
Z a
1 2Lλ
02 2
K= (y + λ) 1 + y − dx
−a 2a
stationary.
Notice that when computing directional derivatives with respect to y or y 0 , the 2L will
vanish. Therefore we will get the same answer if we seek to minimise
Z a
1
K= (y + λ) 1 + y 02 2 dx.
−a
In other words, there is no difference between J = 0 and J = constant. What is relevant

are changes (variations) in J. The 2L term will obviously play a role with computing a
directional derivative with respect to λ.
To solve the problem, notice that x does not appear explicitly in the equation, therefore
we can use the special case of the Euler–Lagrange equation in equation (2.8) to deduce
12 1 − 1
(y + λ) 1 + y 02 − y 0 (y + λ) 1 + y 02 2 2y 0 = k
2
which reduces to:
k 1
1 + y 02 2 = 1 + y 02 − y 02
y +λ
=1
4M12 – PAD/JL(jl305) 39
and therefore
2
y +λ
= 1 + y 02 .
k
This can be integrated using the substitution (y + λ)/k = cosh z to give

−1 y +λ
k cosh =x +c
k
where c is a constant. The three unknowns k, λ and c must now be determined using
the two end conditions together with the constraint equation. The end conditions give

a+c λ −a + c
cosh = = cosh
k k k
and since a 6= 0, these imply that c = 0 and

λ a
= cosh .
k k
The constraint (which is recovered by setting the directional derivative of K with respect
to λ equal to zero), after inserting y 0 = sinh (x/k), takes the form
Z ah x i 12
2L = 1 + sinh2 dx
−a k
a
= 2k sinh .
k
Thus the shape of the hanging chain takes the form
x a
y = k cosh − k cosh
k k
where k is the solution of sinh (a/k) = L/k. This is the equation of a catenary, as
expected.
2.7.3 Summary of constrained problems
The approach for solving constrained problems via Lagrange problems is systematic. It
involves:
4M12 – PAD/JL(jl305) 40
1. Multiply constraints Gi = 0 by Lagrange multipliers and add these to the functional
J that is to be minimised subject to the constraints, to form the functional I =
P
J + λi Gi .
2. Take directional derivatives of I with respect to Lagrange multipliers and all variables
in the problem.
3. Set each of the directional derivatives to zero and solve to find the unknown func-
tions.
2.8 Optimal control/constrained optimisation
Variational methods provide the basis for what is known as optimal control. It used
the constrained framework from the previous section. The aim is to minimise a goal
functional J subject to constraints. A typical constraint is a differential equation that
a physical system must obey and typical goal functional will involve the control variable
and the system response.
2.8.1 Control of a time dependent system
Suppose we have a system governed by the differential equation
ẋ + x = u (2.14)
(x (t) is the system response, and u (t) is the control input.)
A typical control problem would be to design the controller input u (t) as to make the
system perform some required motion in the ‘best’ way in some sense. At the same time,
we might also want to minimise the required control input since larger control effort may
require bigger actuators, use fuel, etc. As an example of optimal control, we might seek
to find the control input u (t) that takes the system from x = x0 at t = 0 to x = 0 at
t = T , and that minimises the combined ‘cost function’
Z T
x 2 + u 2 dt.

J=
0
4M12 – PAD/JL(jl305) 41
This function combines the notion of ‘getting x near zero as soon as possible’ with that
of putting in the least control effort. This minimisation must be carried out subject to the
governing equation (2.14) which will appear as a form of constraint. We can therefore
extend the previous treatment of constrained problems to cover this case: introduce a
Lagrange multiplier λ, and try to find functions x (t) and u (t) that make the functional
Z T
x 2 + u 2 + λ (ẋ + x − u) dt

I=
0
stationary. The Lagrange multiplier must depend on t since equation (2.14) must hold
at every time t.
The integrand of I is a function of x and u, so we can write down an Euler–Lagrange
equation (see equation (2.4)) for them separately:
∂Fu
Fu = u 2 − λu → = 2u − λ = 0
∂u
and
∂Fx d ∂Fx dλ
Fx = x 2 − λẋ + λx → − = 2x + λ − =0
∂x dt ∂ ẋ dt
Therefore
dλ
= 2x + λ.
dt
Combining,
2u̇ = 2u + 2x.
Using the governing equation (2.14) leads to
ẍ = 2x.
The general solution for this ordinary differential equation may be written as
√ √
x = A cosh 2t + B sinh 2t.
4M12 – PAD/JL(jl305) 42
Using the condition x (0) = x0 gives A = x0 , and x (T ) = 0 gives
√ √
0 = x0 cosh 2t + B sinh 2t.
It then follows (using identities for hyperbolic functions) that

√
x0 sinh 2 (T − t)
x= √ .
sinh 2T
From the governing equation (2.14), it follows that the control signal is
√ √ √
sinh 2 (T − t) − 2 cosh 2 (T − t)
u = x0 √ .
sinh 2T
2.8.2 Control of Poisson equation
We now consider control of the Poisson equation on a domain V . By controlling the

source term of the Poisson equation, we wish the response u to be close to a prescribed
response z. Formally, we wish to minimise the functional
Z
1 α
J= (u − z)2 + f 2 dV,
V 2 2
where α > 0 is a parameter, subject to the constraints
−∇2 u = f in V
u=0 on ∂V
Note that J depends on f . This is required to prevent wild variations in f (and an ill-posed
problem).
Introducing the Lagrange multiplier λ, the functional I for which we need to find
stationary points of reads
4M12 – PAD/JL(jl305) 43
Z
1 α
I= λ ∇2 u + f + (u − z)2 + f 2 dV.
V 2 2
Computing directional derivatives (variations) with respect to λ, u and f :
Z
λ̄ ∇2 u + f

Dλ I λ̄ = dV,
ZV
Du I [ū] = λ∇2 ū + (u − z) ū dV,
ZV
Df I f¯ = (λ + αf ) f¯ dV,

V
where the over-bar indicates a ‘variation’.

Stationary points are found by setting Dλ I λ̄ = Du I [ū] = Df I f¯ = 0 for all λ̄, ū

and f¯. Following the usual process, this leads to the coupled equations:
∇2 u + f = 0,
∇2 λ + u = z,
λ + αf = 0,
with u = λ = 0 on ∂V . This control problem involves two coupled equations (λ = −αf

is trivial). We have little hope of solving this system analytically, but it can be solved
approximately using a computer.
4M12 – PAD/JL(jl305) 44
3 Weak formulations of differential equations3
3.1 Introduction
The modern study is often based on the weak form of a partial differential equation,
as too are various numerical solution techniques for finding approximate solutions. The
weak form of a partial differential equation is empowering for mathematical analysis as
tools from functional analysis can be leveraged. Weak formulations are often referred
to as ‘variational formulations’, but they they can still be formulated for problems that
cannot be phrased as a minimisation problem. Classical transport equations are a typical
example of a case that cannot be posed as a minimisation problem.
The derivation of the weak form of a differential equation follows a standard process:
1. Multiply the differential equation by an arbitrary weight

function and integrate over the domain
2. Apply integration by parts, if possible, and insert Neumann

boundary conditions
The weak form of an equation does not generally make an equation easier to solve
analytically (it may make it harder), but is usually a more suitable form for mathematical
analysis (allowing us to say things about the properties of the equation without knowing
the solution) and for numerical solution methods.
3.2 Examples
3.2.1 Poisson equation
We have seen Poisson’s equation,
−∇2 u = f ,
3
This topic was new to the course in 2009
4M12 – PAD/JL(jl305) 45
already in numerous guises. The complete boundary value problem also requires boundary
conditions,
u=0 on Sg ,
∇u · n = h on Sh ,
where Sg and Sh cover the entire boundary but do not overlap. To derive the weak form,
we first multiply both sides by a weight function v and integrate over the volume V
Z Z
2
− v ∇ u dV = v f dV.
V V
We require that v = 0 on parts of the boundary where u = 0. Apply integration by parts,
Z Z Z
∇v · ∇u dV = v f dV + v ∇u · n dS,
V V Sh
and we can insert the Neumann boundary condition,
Z Z Z
∇v · ∇u dV = v f dV + v h dS.
V V Sh
Solving Poisson’s equation now involves finding u that satisfies the Dirichlet boundary
conditions such that the above equation holds for all functions v .
An important observation with regard to the weak form is the order of derivatives ap-
pearing. For the Poisson equation, the weak form involves first-order derivatives whereas
the strong form involves second-order derivatives. In this sense, the weak form is more
general as functions that are not sufficiently smooth to be classical solutions to the strong
form may be solutions of the weak form.
4M12 – PAD/JL(jl305) 46
Another important observation is that the term on the left-hand side is symmetric.
That is, if we swap v and u the expression remains the same,
Z Z
∇v · ∇u dV = ∇u · ∇v dV.
V V
It turns out that when the weak form is symmetric, the problem is equivalent to the
minimisation of a functional and is therefore a variational problem in the classical sense.
Solving the weak form is equivalent to minimising
Z Z Z
1
I= ∇u · ∇u dV − f u dV − hu dS,
2 V V Sh
which we can see by taking the directional derivative of I

Z Z Z
d 1
DI(u)[v ] = ∇ (u + v ) · ∇ (u + v ) dV − f (u + v ) dV − h (u + v ) dS
d 2 V V Sh =0
Z Z Z
= ∇u · ∇v dV − f v dV − hv dS,
V V Sh
which is the weak form.
3.2.2 Steady advection-diffusion
The advection-diffusion equation is important for modelling transport processes, espe-

cially in fluids. A quantity φ, which could for example be the concentration of a pollutant
or the temperature, can be transported by a combination of diffusion and advection.
Advective transport is when φ is carried along by, for example, a moving liquid.
Steady advection-diffusion is described by
a · ∇φ − ∇2 φ = f
where a is the known velocity field (which is incompressible) and f is a source term. You
can see that it is an extension of Poisson’s equation, with the term a · ∇φ added to take
into account advective transport.
Following the usual process, we first multiply the equation by a weight function and
integrate,
Z Z Z
2
v a · ∇φ dV − v ∇ φ dV = v f dV,
V V V
4M12 – PAD/JL(jl305) 47
and then apply integration by parts and insert a boundary condition h for the diffusive
flux,
Z Z Z Z
v a · ∇φ dV + ∇v · ∇φ dV = v f dV + v h dS.
V V V Sh
We have not used integration by parts on the advective term. It is possible to apply it,
which would lead to an extra boundary integral. Whether or not to apply integration by
parts to this term is often a matter of convenience depending on the form of the bound-
ary conditions, which can be a complicated matter for the advection-diffusion equation
(related to characteristics).
Unlike the weak form of Poisson’s equation, the weak form of the advection-diffusion
equation is not symmetric since when we switch φ and v ,
Z Z Z Z
v a·∇φ dV + ∇v ·∇φ dV 6= φa·∇v dV + ∇φ·∇v dV
V V V V
when v 6= φ. Therefore, this problem cannot be phrased as a minimisation problem,

although in modern terminology it is still referred to as a variational problem.
3.3 Key features
An important feature of the weak form is that it generally permits functions with a
lesser degree of continuity than the strong form, which is owing to the reduction in the
order of the derivatives. Also there are abstract results for proving existence, stability
and uniqueness of solutions which can be applied to a broad range of equations. These
conditions are usually on the ‘bilinear form’ a (v , u). For the Poisson equation,
Z
a (v , u) = ∇v · ∇u dV,
V
and for the advection-diffusion equation
Z Z
a (v , φ) = v a · ∇φ dV + ∇v · ∇φ dV.
V V
4M12 – PAD/JL(jl305) 48
The terminology ‘bilinear form’ is used because a (v , u) is linear in v and in u, e.g. a (5v , u) =
a (v , 5u) = 5a (v , u). Important conclusions as to the properties of an equation can be
drawn by studying abstract properties of the bilinear form, most notably from the Lax–
Milgram Theorem.
Importantly, problems for which the bilinear is symmetric cam be posed as minimisation
problems.
Question : Consider the differential equation
d 2u du
x2 2
− 2x + 2u = 0, u(−1) = −2, u(1) = 0.
dx dx
1. Deduce a weak form of the above equation.
2. Explain why it is not possible to deduce a variational form.
3. Multiply the equation by x −4 , and deduce the equivalent variational form of the
converted differential equation.
4M12 – PAD/JL(jl305) 49
4 Rayleigh-Ritz, Galerkin and Finite-Element Methods
The reality of partial differential equations is that in most cases it is not possible to find
an analytical solution. This is particularly so for equations on complicated geometries (as
is common in engineering), nonlinear equations and equations with complicated source
terms and boundary conditions.
If a differential equation cannot be solved in closed form, we may obtain approximate
solutions using the following techniques:
1. Solve the differential equation using numerical method, such as finite-difference

methods, spectral methods, or finite element methods, where the latter two are
based on the Galerkin (or other method of weighted residual) approach.
2. Solve the integral variational form approximately using the Rayleigh-Ritz method
or finite-element methods based on the Rayleigh-Ritz method.
In this section, a brief introduction is supplied for the Rayleigh-Ritz and Galerkin approx-
imate methods, as they provide the basis for finite-element methods. The Rayleigh-Ritz
method is applied directly to the variational form of the equations, while the Galerkin
method begins with the weak form of the equations.
4.1 Rayleigh-Ritz method
The process involves:
1. Define the functional I for which you wish to find stationary points.
2. Choose a combination of linearly independent functions that will be used to approx-

imate the solution. These will be called ‘basis functions’. The amplitudes of these
functions will be the unknowns that you will determine. The basis functions must
satisfy the Dirichlet (‘fixed’) boundary conditions.
3. Insert the approximate solution into the functional that is now denoted by Ih .
4. Take the directional derivative of Ih with respect to the unknown amplitudes of the
basis functions.
5. Determine the amplitudes of the basis functions which yield a stationary point of Ih .
4M12 – PAD/JL(jl305) 50
For a small number of unknowns this process can be done by hand. For a clever choice
of basis functions, the solution can be quite accurate (possibly even exact). For larger
problems, the process can be implemented in a computer.
4.1.1 Example: elastic rod with a distributed force
Consider an elastic rod with a distributed force:
11
00
00 x
11
f(x)
00
11
1111111111
0000000000
01
x=0 1
0
L
The process is:
• Define the functional that we wish to minimise4

Z L" 2 #
1 du
I= EA − f u dx (4.15)
0 2 dx
• Choose an approximate displacement field uh that satisfies the displacement bound-

ary condition
uh = φ1 (x) a1 + φ2 (x) a2
x x2
= a1 + a2
L L2
duh 1 2x
= a1 + a2 (strain field)
dx L L2
We have chosen φ1 = x/L and φ2 = x 2 /L2 as a basis. The unknowns are a1 and
a2 .
• Insert the approximate solution uh into the energy functional I in place of u. Call
this functional Ih .
" 2 #
L
x2
Z
1 1 2x x
Ih = EA a1 + 2 a2 −f a1 + 2 a2 dx
0 2 L L L L
4
Note the Euler-Lagrangian equation is EAd 2 u/dx 2 = −f of which I is the equivalent variational form.
4M12 – PAD/JL(jl305) 51
• Compute the directional derivatives of Ih with respect to a1 and a2 . To do this we
insert a1 + a1? and a2 + a2? ,
Z L 2
d ? 1 1 ? 2x ?
DIh (a)[a ] = EA (a1 + a1 ) + 2 (a2 + a2 ) dx
d 0 2 L L
Z L 2

x x
− f (a1 + a1? ) + 2 (a2 + a2? ) dx
0 L L =0
which gives
L
a1? 2xa2?
Z
1 2x
DIh = EA a1 + 2 a2 + 2 dx
0 L L L L
L
x ? x2 ?
Z
− f a + a dx
0 L 1 L2 2
To minimise Ih , we want the above to be zero. Since a1? and a2? are arbitrary, first
set a1? = 1 and a2? = 0, and then a1? = 0 and a2? = 1,
Z L
1 2x x
EA 2
a1 + 3 a2 − f dx = 0 (a1? = 1, a2? = 0)
0 L L L
Z L
4x 2 x2

2x
EA 3
a1 + 4 a2 − f 2 dx = 0 (a1? = 0, a2? = 1)
0 L L L
Integrate,
Z L
EA x
(a1 + a2 ) − f dx = 0 (a1? = 1, a2? = 0)
L L
0 Z L 2
EA 4 x
a1 + a2 − f 2 dx = 0 (a1? = 0, a2? = 1)
L 3 0 L
4M12 – PAD/JL(jl305) 52
• Represent the problem as a 2 × 2 system of equations and solve for a1 and a2 .
L x
  R 
1 1 " # f dx
EA   a1  0 L 
  =  
L  4  a2 R
L x 2 
1 0 f dx.
3 L2
We now have an approximate solution uh along the rod. If the function f is difficult to
integrate, it could be integrated approximately (numerical integration).
4.1.2 Error analysis
The Rayleigh-Ritz method leads us to a solution uh , but how do we know that this bears
any relation to the exact solution u? We can draw some firm conclusions based on some
simple arguments. We assume that we have a problem that has a unique solution and
that it is stable (the stationary point is a minimum).
Firstly, we need to cast the method in a slightly more abstract format. For the bar
the problem is: find uh such that
2
Z L
d 1 duh dvh
DIh (uh ) [vh ] = EA + − f (uh + vh ) dx

d 0 2 dx dx
=0
Z L
duh dvh
= EA − f vh dx
0 dx dx
= 0.
This is just the weak form of the equation for our approximate problem. Note that if u
minimises I and uh minimises Ih , then we have
Z L
du dvh
EA − f vh dx = 0,
0 dx dx
Z L
duh dvh
EA − f vh dx = 0.
0 dx dx
We can conclude from this that
4M12 – PAD/JL(jl305) 53
Z L Z L
du dvh duh dvh
EA = EA , (4.16)
0 dx dx 0 dx dx
or equivalently Z L
du duh dvh
EA − =0 (4.17)
0 dx dx dx
for all vh .
What we have already done is from all the possible approximate solutions vh =
φ1 (x) b1 + φ2 (x) b2 that we have allowed (there are infinite number of possibilities just
by changing b1 and b2 ), we have ‘computed’ uh = φ1 (x) a1 +φ2 (x) a2 that minimises our
expression for the potential energy Ih . We now would like to know how our computed dis-
placement field uh is related to the exact displacement field u. Inserting yh = u − uh + vh ,
where vh is any function which can be represented using the basis which we have chosen,
into the term on the left-hand side in equation (4.15) (in place of u) and expanding,
L 2 2
1 L

d (u − uh + vh ) d (u − uh )
Z Z
1
EA dx = EA
2 0 dx 2 0 dx
Z L 2
1 L

dvh d (u − uh )
Z
dvh
+ EA dx + EA .
dx dx 2 dx
|0 {z } | 0 {z }
=0, due to eqn (4.17) ≥0
Therefore, if we set wh = uh − vh move some terms to the opposite side of the equation,
L 2 L 2
d (u − uh ) d (u − wh )
Z Z
1 1
EA dx ≤ EA dx.
2 0 dx 2 0 dx
The term of the left-hand side is the error in the strain energy. This important result
proves that Rayleigh-Ritz method finds the solution uh , from all the possible solutions
which we allow, that minimises the error in the strain energy. We have proved this without
knowing the analytical solution u!
4M12 – PAD/JL(jl305) 54
4.2 Galerkin Method
The Galerkin method is based on the method of weighted residuals. The primary ad-
vantage over the Rayleigh-Ritz method is that it is not necessary to write, or even be
able to write, the equation in variational form prior to applying the Galerkin method as
is the case when using the Rayleigh-Ritz method. This allows the Galerkin method to be
applied to a much wider class of problems.
We are seeking an approximation to the solution of the differential equation, that is,
the strong form,
Lu = f .
As in the Rayleigh-Ritz method, this approximation is in the form of a trial function

comprised of a linear combination of basis functions as follows
ū = ci φi .
Given this approximate solution ū, we can define the residuals
R(x) = Lū − f
which is a measure of the error of the approximation ū. If ū = u(x), then the residual
vanishes. In the method of weighted residuals, we multiply the residual by a set of weight
functions wi (x), i = 1, ..., N and integrate over the domain
Z x1
(Lū − f )wi dx = 0.
x0
This is equivalent to setting the inner product of the residual with each weight function
to zero. In the general method of weighted residuals, we can choose different functions
for the weight functions wi (x) and the basis functions φi (x) within the trial function ū.
In the Galerkin method, however, we use the same functions for the weight functions and
the basis function in the trial function such that wi (x) = φi (x).
Question : Use the Galerkin method to find an approximate solution of subsection 4.1.1
with two basis functions x/L and x 2 /L2 .
Comments: (1) starts with the weak form of the PDEs. (2) when an equivalent vari-
ational form exists, the Rayleigh-Ritz method and the Galerkin method give the same
results.
4M12 – PAD/JL(jl305) 55
4.3 The finite element method
The finite element is a powerful technique for finding approximate solutions to partial
differential equations. The finite element method described here is based on the Galerkin
method. Ultimately, finding an approximate solution to a differential equations requires
solving a system of linear equations (usually very large and therefore done on a computer).
Rather than considering a functional, the finite element method addresses the weak
form of the equation directly and is therefore applicable to a wider range of problems.
The weak form for the elastic rod considered in the previous section is
Z L Z L
dv du
EA dx = v f dx. (4.18)
0 dx dx 0
The finite element methods represents an approximate solution uh using low-order poly-
nomials on simple shapes as a basis. The simplest basis in one dimensional consists of
‘hat-like’ piecewise linear functions,
φ2 nodes
11
01
0
0
1
0
1
0 1 2 3 4 5 6 7
L
The approximate solution is expressed as a linear combination of these simple basis
functions,
n
X
uh = φi (x) ai
i =1
4M12 – PAD/JL(jl305) 56
where ai is the approximate solution at the point xi . Using the same basis to represent vh ,
n
X
vh = φi (x) ai? ,
i =1
we can insert the expression for uh and vh into the weak form,
Z L X n
! n
! Z L X n
!
dφi ? X dφj
ai EA aj dx = φi ai? f dx.
0 i=1
dx j=1
dx 0 i=1
?
Since ai? is arbitrary, we can set ai=k = 1 and ai6?=k = 0, so for each i we have:
Z L n
! Z L
dφ1 X dφj
i =1: EA aj dx = φ1 f dx,
0 dx j=1
dx 0
Z L n
! Z L
dφ2 X dφj
i =2: EA aj dx = φ2 f dx,
0 dx j=1
dx 0
...
n
!
Z L Z L
dφn X dφj
i =n: EA aj dx = φn f dx.
0 dx j=1
dx 0
This can expressed as a matrix problem,
Ka = b
where Z x=L
dφi dφj
Kij = EA dx
x=0 dx dx
Z x=L
bi = φi f dx
x=0
The matrix problem can be solved using, for example, LU decomposition.
The finite element method can be applied to any differential equation, in any spa-
tial dimension and for any geometry. For two- and three-dimensional problems, basis
functions are usually defined on triangles, quadrilaterals, tetrahedra or hexahedra. For
more accurate results, more ‘elements’ can be used or the polynomial order of the basis
4M12 – PAD/JL(jl305) 57
functions can be increased. The error analysis presented for the Rayleigh-Ritz method
can be generalised for the finite element method. Practical aspects of the finite element
method are covered in module 3D7.
Question: Suppose an interval [0, L] is partitioned in uniform cells x0 = 0 < x1 < ... <
xi < ... < xn = L such that xi+1 − xi = h is constant. The piecewise linear ’hat-like’
function φi is defined as φi (xj ) = δij . Calculate
Z L
dφi dφj
Kij = dx .
0 dx dx
Key Ideas (Points) in Variational Methods
1. Permutation symbol:
[x × y ]i = ijk xj yk
2. Contracted epsilon identity:
ijk klm = δil δjm − δim δjl
3. Divergence theorem and Stokes theorem.
4. Stationary points of a functional J that depends on the function u(x) (infinity

dimension). J is usual defined by an integral but it does not have to be so.
5. Given functions u(x) and v (x), define a function of one independent variable →
J (u + v ). Directional derivative:

dJ (u + v )
DJ (u) [v ] = =0
d
=0
6. Integration by part (divergence theorem in high dimension).
7. Fundamental Lemma of the Calculus of Variations: If g(x) is a continuous function

in x0 ≤ x ≤ x1 , and if Z x1
g(x)h(x)dx = 0 ,
x0
where h(x) is an arbitrary function in the same interval with h(x0 ) = h(x1 ) = 0, then
g(x) = 0 at very point in the interval x0 ≤ x ≤ x1 . ⇒ Euler–Lagrange equation.
Note that we can make h(x) differentiable such that h0 (x0 ) = h0 (x1 ) = 0.
4M12 – PAD/JL(jl305) 58
8. Boundary Conditions (fixed or free) worked out separately at the two ends.
9. Extensions:
a) Functional depends on more that one function, for example, of u(x) and v (x).
b) Function(s) depend(s) on more than one independent variables, u(x, y ).
10. Convert constraint problems to unconstraint problems using Lagrangian multiplier

λ.
a) Integral (global) constraint, λ in a unknown number.
b) Differential (local) constraint , λ in a unknown function.
11. Strong and weak forms of PDEs.

Variational form ⇒ Weak form ⇒ Strong form (Always!)
Strong form ⇒ Weak form ⇒? Variational form (Not always!)
12. Only possible if the bilinear form in the weak form is symmetric, for example,
∇u · ∇v dV ⇒ 21 ∇u · ∇u dV , u · v dV ⇒ 12 u · u dV .
R R R R
13. Numerical methods: look for an approximate solution in the form of a trial function
(finite dimension)
n
X
ū = ci φi ,
i=1
where φi are pre-chosen basis functions and ci are n unknowns to be determined.

⇒ an algebraic system: number of unknowns = numbers of equations.
a) Galerkin method: works on the weak form of PDE which is restricted to the
space of finite dimension. It is a method of weighted residuals where weight
function = basis function.
b) Rayleigh-Ritz: works on the variational form which is restricted to the space

of finite dimension. (only possible for a PDE if its equivalent variational form
exists, and in this case, the Rayleigh-Ritz method and the Galerkin method in
general give the same results).
14. Compare approximate solutions with the exact solution when possible.
4M12 – PAD/JL(jl305) 59

4 M12 Fill

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

4 M12 Fill

Hochgeladen von

Copyright:

Verfügbare Formate

Module 4M12: Partial Differential Equations and

• Variational calculus vs differential calculus (5h)

1.1 The summation convention

Notice the curious thing:

Every sum goes with an index which is repeated twice.

• [AB]ij = Aik Bkj

1.2 The Kronecker delta δij

1.3 The permutation symbol ijk

312 = 1 (even number of inversions of pairs )

132 = −1 (odd number of inversions of pairs )

i jk = ki j = jki (cyclic permutation)

1.4 The ijk – δij identity

i jk klm = δi l δjm − δi m δjl

ijk klm = 121 113 + 122 213 + 123 313 = 0 + 0 + 0 = 0

δil δjm − δim δjl = δ11 δ23 − δ13 δ21 = 0 + 0 = 0

ijk klm = 121 112 + 122 212 + 123 312 = 0 + 0 + 1 = 1

δil δjm − δim δjl = δ11 δ22 − δ12 δ21 = 1 + 0 = 1

Example A vector identity

= (δi l δjm − δi m δjl ) aj bl cm

The term aj ak is symmetric in j, k. That is, aj ak = ak aj .

• ai aj for any vector a

• The Kronecker delta δij

• The matrix B when [B]ij = [A]ij + [A]ji

where [x1 , x2 , x3 ] are the components of the position vector x.

∂ui ∂u1 ∂u2 ∂u3

Curl (recall that [a × b]i = i jk aj bk )

Example: curl (gradφ)

1.7 Derivatives of the position vector

1.8 Magnitude of a vector

Example: What is ∇|x|?

1.9 Integral theorems

Let u = (u(x, y ), v (x, y )) be a two-dimensional vector field in a 2D region Ω with

1.9.1 Divergence theorem

The divergence theorem states:

where f is a vector function of position, V is a volume enclosed by the surface S (S = ∂V )

We can also write dA = n dS, or using indexes dAi = ni dS.

Since a is fixed it can be taken out of the integral,

The vector a is arbitrary, as long as it is fixed . Therefore the

In vector notation, we have Z Z

1.9.2 Stokes theorem

Stokes theorem is stated as:

where S is a surface (possibly curved), dA = n dS is an area vector element and n is the

1.9.3 Integration by parts

An important tool, particularly for variational methods, is integration by parts. Starting

Product rule for differentiation

Apply divergence theorem to LHS

It means ‘the derivative of f with respect to u in the direction of v ’. In computing the

For f = u · u, where u is some vector function, the directional

where it turns out that

Common second-order tensors include stress and strain.

Derivatives and second order tensors

Fundamental Lemma of the Calculus of Variations If g(x) is a continuous function

2.1 Abstract problem

The solution u is sometimes referred to as a minimiser of J. In general, some constraints

2.2 Principal of minimum potential energy

An important source of variational problems is the principle of minimum potential energy.

2.3 Introductory examples

Snell’s law describes this process,

The time T required to travel from P1 to P2 is therefore

2.3.2 Static equilibrium of a bodies

1. Bead on a frictionless wire

2. Mass on a spring system

• Potential energy due to gravity Vg = −mgx

1.3 The permutation symbol ijk

312 = 1 (even number of inversions of pairs )

132 = −1 (odd number of inversions of pairs )

i jk = ki j = jki (cyclic permutation)

1.4 The ijk – δij identity

i jk klm = δi l δjm − δi m δjl

ijk klm = 121 113 + 122 213 + 123 313 = 0 + 0 + 0 = 0

ijk klm = 121 112 + 122 212 + 123 312 = 0 + 0 + 1 = 1

Curl (recall that [a × b]i = i jk aj bk )