Beruflich Dokumente
Kultur Dokumente
Variational Methods
Index Notation and Variational Methods∗
• Index Notation (2 h)
• Example Paper (1 h)
• No data sheet
∗
Full information at “www2.eng.cam.ac.uk/˜jl305/4M12/4M12.html” in due course. There is no data
sheet for this course. The lecturenotes are kindly provided by Dr. Garth Wells.
1
1 Index notation1
Suppose x and y are vectors, and A and B are matrices. Write a few common combi-
nations in terms of their components:
• Dot product
n=3
X
x ·y = xi yi
i=1
• Matrix–vector multiplication
n
X
[Ax]i = Aij xj
j=1
• Matrix–matrix multiplication
n
X
[AB]ij = Aik Bkj
k=1
We can use a simplified notation by adopting the summation convention (due to Einstein),
P
Do not write the summation symbol . A repeated index
implies summation.
(An index may not appear more than twice on one side of an
equality.)
1
Index notation is also known as ‘suffix notation’.
4M12 – PAD/JL(jl305) 2
Using the summation convention,
• x · y = xi yi
• [Ax]i = Aij xj
Summary
If an index occurs once, it must occur once in every term of the equation, and the
equation is true for each separate value of this index. If an index appears twice it is
summed over all values. It does not matter what this is called: it is a ‘dummy index’
whose name can be changed at will. If an index appears three or more times in any given
term in an equation, it is wrong!
This may seem a very peculiar trick, with no obvious benefit. However, it will turn out
to be surprisingly powerful, and make many calculations involving vector identities and
vector differential identities much simpler.
Two additional pieces of notation are needed. The first is a way to write the identity
matrix I,
1 0 0
I=
0 1 0
0 0 1
We define the ‘Kronecker delta’ as
1 i =j
δij = ,
0 i 6= j
We know that
4M12 – PAD/JL(jl305) 3
Iy = y
and
δi j yj = yi
In other words ‘if one index of δij is summed, the effect is to swap this to the other
index’.
Another necessary ingredient is a way to write the cross product of two vectors in index
notation,
e e e
1 2 3 h i
x × y = x1 x2 x3 = x2 y3 − x3 y2 , x3 y1 − x1 y3 , x1 y2 − x2 y1
y1 y2 y3
where ei are the basis for the vectors. We have assumed that ei are the unit vectors for
Cartesian coordinates (you may have seen the basis vectors written as i , j and k). To
express the cross product in index notation, we will use the permutation symbol ijk .
The permutation symbol ijk is defined as
1 if (ijk) is an even permutation of (1, 2, 3)
ijk = −1 if (ijk) is an odd permutation of (1, 2, 3)
0
otherwise
For example,
4M12 – PAD/JL(jl305) 4
112 = 0 (any repeated index)
The permutation symbol is also known as the ‘alternating symbol’ or the ‘Levi-Civita
symbol’.
Using the permutation symbol, we can write the cross product of two vectors as:
[x × y ]i = ijk xj yk
To prove this, for each i sum over j and k. The permutation symbol possesses a number
of ‘symmetries’,
= −ji k = −i kj = −kji (switch pair ij, switch pair ki , switch pair jk)
There is an important identity (“contracted epsilon identity) relating ijk and δik :
The proof is simple (but somewhat tedious!), just check every case. We sum over k,
which leaves four free indices and each index runs 1 → 3, therefore there are 43 = 81
cases. Here are two examples:
4M12 – PAD/JL(jl305) 5
• i = 1, j = 2, l = 1, m = 3
• i = 1, j = 2, l = 1, m = 2
[a × (b × c)]i = i jk aj (b × c)k
= i jk aj klm bl cm
= bi am cm − ci aj bj
= [(a · c) b − (a · b) c]i
4M12 – PAD/JL(jl305) 6
1.5 A trick: symmetry and anti-symmetry
We expect the a × a = .
In index notation,
[a × a]i = i jk aj ak
1
i jk aj ak =(i jk aj ak + i jk aj ak )
2
1
= (i jk aj ak + i kj ak aj )
2
1
= (i jk aj ak − i jk aj ak )
2
= 0,
as expected.
Symmetry and anti-symmetry are often useful to simplify expressions. The permutation
symbol is anti-symmetric in any of its two indexes. Some other symmetric expressions
include
∂ 2φ
• The partial second derivative of a scalar function φ,
∂xi ∂xj
4M12 – PAD/JL(jl305) 7
1.6 Vector derivatives
The real power of index notation is revealed when we look at vector differential identities.
The vector derivatives known as the gradient, the divergence and the curl can all be
written in terms of the operator ∇,
∂ ∂ ∂
∇= , , ,
∂x1 ∂x2 ∂x3
Gradient
∂φ ∂φ ∂φ ∂φ
grad φ : [∇φ]i = = , ,
∂xi ∂x1 ∂x2 ∂x3 i
Divergence
Laplace
2 ∂ 2 u1 ∂ 2 u2 ∂ 2 u3
∇ φ = ∇ · (∇φ) == + +
∂x12 ∂x22 ∂x32
∂uk
curl u : [∇ × u]i = i jk
∂xj
The machinery we have developed for index notation can be used directly to manipu-
late quantities without having to be constantly thinking about the complicated physical
meanings of div or curl, for example.
4M12 – PAD/JL(jl305) 8
Example: Product rule for div and curl
Divergence
∂
∇ · (φu) = (φui )
∂xi
∂φ ∂ui
= ui + φ (product rule for differentiation)
∂xi ∂xi
= u · ∇φ + φ∇ · u
Curl
∂
[∇ × (φu)]i = i jk (φuk )
∂xj
∂φ ∂uk
= i jk uk +φ
∂xj ∂xj
= [∇φ × u + φ∇ × u]i
∂ ∂φ
[∇ × (∇φ)]i = i jk
∂xj ∂xk
∂ 2φ
= i jk
∂xj ∂xk
=0 (by symmetry and anti-symmetry in j, k)
If the position vector x itself appears in a vector expression which is being manipulated
in index notation, then it is useful to notice that
∂xi
= δij ,
∂xj
4M12 – PAD/JL(jl305) 9
which follows directly from the definition of a partial derivative:
∂x1 ∂x1
= 1, = 0, etc.
∂x1 ∂x2
So for example:
∂xi
• ∇·x = = δii = 3
∂xi
∂xk
• [∇ × x]i = ijk = ijk δjk = 0 (by symmetry/anti-symmetry in j, k)
∂xj
Sometimes the magnitude (or modulus) of a vector appears in an expression which you
wish to differentiate in some way. The trick here is to force it into the form of a scalar
product, even though this may make the expression look more complicated than when
you started. Index notation allows scalar products to be handled easily, and the normal
rules of calculus will allow the calculation to be done without difficulty.
∂ 1/2
[∇|x|]i = (xj xj )
∂xi
1 ∂xj
= (xk xk )−1/2 2xj
2 ∂xi
There is a danger of j appearing four times in the same expression which is not permitted
and would indicate an error. Recall that j was a dummy index, summed over all values.
So in one of the two expressions, j has been replaced by k (or any other index you like,
apart from i or j). Continuing:
1 ∂xj
(xk xk )−1/2 2xj = (xk xk )−1/2 xj δi j
2 ∂xi
= (xk xk )−1/2 xi
4M12 – PAD/JL(jl305) 10
or in vector notation,
x
∇|x| =
,
|x|
which is just the unit vector in the direction of x.
Index notation allows the divergence theorem and Stokes’s theorem to be written in
a way which makes them look familiar. Recall this result for ordinary integration and
differentiation of a function f (x) on the interval (a, b):
Z b
df
dx = f |ba = f (b) − f (a)
a dx
In other words, the integral of the derivative of a function is equal to the function, and
can be evaluated from the value of the function on the boundaries of the integration
interval. This general description will turn out to apply just as well to volume integrals
(via the Divergence Theorem) and to surface integrals (via Stokes’s theorem).
Two-dimensional integration.
4M12 – PAD/JL(jl305) 11
Hence Divergence theorem in 2D
Z I I
∂u ∂v
+ dΩ = (u dy − v dx) = u · n dΓ
Ω ∂x ∂y Γ Γ
s
n
Z Z
∂fi
dV = fi ni dS.
V ∂xi S
4M12 – PAD/JL(jl305) 12
Now suppose the we have a scalar function φ. We can create a vector function f by
multiplying it by a vector a,
f = φa,
where a is any fixed vector (it does not depend on x). Substituting this into the divergence
theorem,
Z Z
∂φai
dV = φai ni dS
V ∂xi S
and therefore Z Z
∂φ
dV = φni dS
V ∂xi S
where (?) is any index notation expression. We could, for example, insert
jik fk
4M12 – PAD/JL(jl305) 13
and deduce a ‘curl theorem’ (check it). We now have a nice generalisation for relating
volume integrals of vector derivatives to surface integrals. The integral of any ‘regular’
function over a volume can be transformed into a surface integral. When in doubt, one
can always return to index notation to check and develop the necessary relationships.
Transforming surface integrals into volume integrals is the key to deriving many equa-
tions used in physics and mechanics which come from conservation laws. We can start
by posing a balance in terms of what is happening on the boundary, and then transform
the expression of conservation into a differential equation.
C
n
s
s m dS
As for the Divergence theorem, we can write Stokes theorem using indices,
Z I
∂fk
ijk ni dS = fk sk dC
S ∂xj C
4M12 – PAD/JL(jl305) 14
and we can use the trick with the fixed vector to get
Z I
∂ (?)
ijk ni dS = (?) sk dC
S ∂xj C
The expression can be manipulated further using another expression which involves an-
other symbol, and then using the − δ identity to push the to the other side of
the equation. However, it is not possible to eliminate the entirely, therefore Stokes’s
theorem always appears to be more complicated that the divergence theorem.
4M12 – PAD/JL(jl305) 15
1.10 Directional derivative
We will make frequent use of the directional derivative when working with variational
problems. The directional derivative of a function f (u) is defined as
df (u + v )
Df (u) [v ] =
d
=0
= 2 ((u + v ) · v )|=0
= 2u · v
1.11 Tensors
The manipulation of objects using indices has been presented on the premise of a few
basic rules. However, much of what has been presented carries over to the richer field of
tensors. Some basic tensor operations will be considered, but only scratching the surface
so that we can tackle more problems of practical relevance2 .
We are already familiar with the idea of vectors. However, you might find it hard to
give a formal definition of a vector, although you have though been exposed to aspects
of the formal definition. For example, vectors are often expressed as
a = a1 i + a2 j + a3 k,
2
Basic consideration of tensors is new in 2011.
4M12 – PAD/JL(jl305) 16
or using modern notation as
a = ai ei = a1 e1 + a2 e2 + a3 e3 ,
The mathematical definition is in two parts. First, a vector a is quantity which, with
respect to a particular Cartesian coordinate system, has three components (a1 , a2 , a3 ),
or equivalently ai , i = 1, 2, 3. But that is not the whole story. We also need to express
the fact that if we choose to rotate our coordinate axes, the vector (the ‘arrow’ which
you might draw in a diagram) remains the same. This means that a vector also has an
associated basis. In the above example, the basis is ei . We may change the components
and the basis of a vector such that the vector remains the same (it still points in the
same direction and still has the same length).
A second-order (or second-rank) tensor is like a matrix but has a basis associated, and
can be expressed using the notation
A = Aij ei ⊗ ej .
Luckily, when working with a fixed orthonormal basis, we can work with Aij , usually
manipulating it just as we would a matrix.
You have in fact considered second-order tensors in Part IA without being told (and
most likely without the lecturer realising). Recall that ‘matrix’ can be rotated into a new
coordinate system via
A0ij = Rik Akl Rjl
∇a = ∇ ⊗ a
∂ai
= ei ⊗ ej .
∂xj
4M12 – PAD/JL(jl305) 17
For our purposes, it suffices to just recall [∇a]ij = ∂ai /∂xj . Just to make things more
confusing, in fluid mechanics, the gradient is often defined as the transpose of what is
defined here.
Another important operation is the divergence of a second-order tensor. It leads to a
vector:
∂Aij
∇·A= ei
∂xj
If we combine the gradient and the divergence operators, we obtain the Laplace operator
for a vector field a:
∇2 a = ∇ · (∇a) = ∇2 a1 , ∇2 a2 , ∇2 a3
Similar to the gradient, it suffices for our purposes to recall [∇ · A]i = ∂Aij /∂xj . Just
as for a vector, the divergence theorem can be applied to a second order tensor:
Z Z
∇ · A dV = An dS
V ∂V
Tensors, in combination with balance laws, can be used to systematically derive many
differential equations of importance in physics and engineering.
Example: Equilibrium equation of a body with mass density ρ: If the force per unit area
acting on a surface is denoted by the vector t (the traction vector), equilibrium (forces
sum to zero) requires that
Z Z
t dS + ρg dV = .
∂V V
The stress (a second order tensor) at a point satisfies by definition t = σn, where n
is the outward unit normal vector to a surface. Using indices, ti = σij nj . Inserting the
expression for σ, Z Z
σn dS + ρg dV = .
∂V V
Applying the divergence theorem,
Z Z
∇ · σ dV + ρg dV = .
V V
4M12 – PAD/JL(jl305) 18
Since equilibrium must apply to any sub-body of V , the equation can be localised,
∇ · σ + ρg = .
This is balance of linear momentum. It can be shown that angular momentum balance
is satisfied if σ = σ T .
4M12 – PAD/JL(jl305) 19
2 Variational methods
The variational form of a differential equation is an alternative way of expressing the same
problem. The variational view, and the associated machinery of variational methods and
functional analysis are at the heart of the modern study of partial differential equations
and provide the basis for a variety of numerical solution procedures, like the finite element
method.
We will see that classical variational methods involve the minimisation of a functional,
although many of the concepts of variational methods extend beyond this classical per-
spective.
Along with integration by parts, the fundamental lemma of the calculus of variations
is essential to our developments of vational method. The lemma says the following:
where h(x) is an arbitrary differentiable function in the same interval with h(x0 ) =
h(x1 ) = 0, then g(x) = 0 at very point in the interval x0 ≤ x ≤ x1 .
Proof The proof by contradiction is straightfoward and hinges on the fact that the
function h(x) is arbitrary. Let us suppose that the function g(x) is nozero and positive
in some subinterval a ≤ x ≤ b within the interval x0 ≤ x ≤ x1 . Because h(x) is arbitrary,
we can set it to be any function of our choice, such as
0, x <a
h(x) = (x − a)2 (b − x)2 , a ≤ x ≤ b
0, x >b
4M12 – PAD/JL(jl305) 20
(x−a)2(b−x)2
h(x)
Observe that h(x) is differential and satisfies the required conditions at the end points.
Noting that h(x) > 0 for a < x < b, the integral
Z x1 Z b
g(x)h(x)dx = g(x)(x − a)2 (b − x)2 dx > 0 ,
x0 a
which is positive because the integrand is positive throughout the subinterval except at
x = a and x = b, where it is zero. Thus, the lemma is proven by contradiction, as the
only way for the integral over the entire interval to be zero is if the function g(x) = 0
everywhere in the interval. Note that, from the construction, h0 (x0 ) = h0 (x1 ) = 0.
Say we have a functional J that depends on the function u(x). We will usually want
the find the function u that minimises J (sometimes we will be satisfied with stationary
points). The problem is stated as:
min J (u) .
u
4M12 – PAD/JL(jl305) 21
dJ (u + v )
DJ (u) [v ] = =0
d
=0
Recall that the directional derivative is the ‘change’ in J if we move a small distance from
u in the direction of v (hence the name ‘variational methods’).
For simple problems, we can apply (partial) differentiation directly without going
through the formalities of the directional derivative.
The precise definition of J depends on the problem considered. We will address a
number of different examples in the following.
2.3.1 Refraction
Refraction describes the path of a wave or light as it passes from one medium to another
when the two mediums have different wave speeds.
4M12 – PAD/JL(jl305) 22
P2
L2
θ2
θ1
L1 x b
P1 a a
medium 1 medium 2
speed c 1 speed c 2
c1 > c 2
4M12 – PAD/JL(jl305) 23
which leads to
sin θ1 c1
=
sin θ2 c2
We have assumed that the path of the wave is straight in each medium. A more general
problem would also involve the path being unknown.
The is an example of a variational method. We have formulated a ‘functional’ T
(equation (2.1)) and minimised it.
Suppose a body is acted on by a set of i forces fi that can all be described via potential
functions Vi (i.e. fi = ∇Vi ). Examples include springs, gravity, electrostatics). At
equilibrium, the forces must sum to zero
X X X
fi = ∇Vi = ∇ Vi = 0
i i i
P
This result says that for equilibrium, the total potential energy V = i Vi must be
stationary (minimum, maximum or inflection). We can explore this for two examples.
m f(x)
The height h = f (x) and the gravitational potential V = mgh. The equilib-
rium points are marked with a cross are stationary points of the potential energy
(dV /dx = 0). Intuitively, only the minima are stable points.
11
00 m
00
11
00
11
00
11 k
00
11
00000000000
11111111111
00
11
00000000000
11111111111
Potential energies (x: movement of mass downwards):
4M12 – PAD/JL(jl305) 24
• Potential energy in the spring Vs = 12 kx 2
We consider now a continuous system and show the link to the principle of virtual work.
Consider a stretched elastic string (with tension P ) between two end points (x = 0,
x = a) from which a distributed mass m (x) per unit length is applied. We seek the
displaced shape y (x) of the string at equilibrium.
x=0 x=a
P y(x) P
gm(x)
The first step is to find the potential energy stored in the displaced string. Consider a
small element,
x x+dx
y(x) y(x+dx)
y(x) + δ x y’(x)
δl
2
δl 2 ≈ δx 2 + (δxy 0 )
Therefore
δl − δx p 1
= ≈ 1 + y 02 − 1 ≈ y 02
δx 2
4M12 – PAD/JL(jl305) 25
(binomial approximation, (1 + z)α ≈ 1 + αz for small z). Hence the potential energy
stored (elastic stored energy) in the string is
P a 02
Z
Vs = y dx.
2 0
This is positive because the strain acts against P . The work done by the gravity in
displacing the string downwards is given by
Z a
Vg = −g my dx.
0
Z a
dV (y + z) d P 0 2
DV (y ) [z] = = (y + z 0 ) − gm (y + z) dx
d d 2
=0 Z a 0 =0
= (P y 0 z 0 − gmz) dx
0
(2.2)
To minimise to the potential energy, we need to find a y for which DV (y ) [z] = 0 for
all z. In the variational methods literature (and in this course in the past), the directional
derivative of V is sometimes expressed as δV , and z (which is arbitrary) is replaced by
δy , Z a
δV = (P y 0 δy 0 − gmδy ) dx.
0
This is the virtual work equation.
We can manipulate equation (2.2) into another form using integration by parts,
Z a
DV (y ) [z] = (P y 0 z 0 − gmz) dx
0
Z a (2.3)
00 0 a
=− (P y + gm) z dx + P y z|0 .
0
4M12 – PAD/JL(jl305) 26
Since z is arbitrary, we can choose it such that z(0) = z(a) = 0, then
Z a
− (P y 00 + gm) z dx = 0,
0
which can only be satisfied (from the fundamental lemma of the calculus of variations) if
This is a key point in the ‘calculus of variations’. Minimisation of the potential energy
functional has implied the solution of a differential equation. We could have derived the
differential equation directly by considering the force balance in the string,
mgdx
P
dy
dx
1. z = δy = 0 at x = 0, x = a
2. y 0 = 0 at x = 0, x = a
4M12 – PAD/JL(jl305) 27
Summary of key points
1. The variational method automatically yields the governing differential equation and
suitable boundary conditions.
2. Provided you can write down the energy functional (the energy expression), the
rest is a ‘handle cranking’ exercise.
3. Integration by parts plays a central role. For a two- or three-dimensional body, the
divergence theorem or Stokes’s theorem is used.
We present now the methodology for class of problems in an abstract framework for
problems with only one independent variable. Suppose we wish to minimise
Z L
I= F (y , y 0 , x) dx,
0
where y = y (x) and y 0 = dy /dx, then we need to compute the directional derivative
and minimise it. (Note that y is ultimately dependent on x, hence it is a one-dimensional
problem.) The directional derivative is
L
Z
0 d 0 0
DI (y , y , x) [z] = F (y + z, y + z , x) dx
d 0 =0
L
∂F (y , y 0 , x) ∂F (y , y 0 , x) 0
Z
DI (y , y 0 , x) [z] = z+ z dx
0 ∂y ∂y 0
(Hint: Set a = y + z and a0 = y 0 + z 0 and apply the chain
rule for differentiation.)
Using integration by parts,
Z L L
∂F d ∂F ∂F
DI (y , y 0 , x) [z] =
− 0
z dx + z
0
0 ∂y dx ∂y ∂y 0
4M12 – PAD/JL(jl305) 28
Following the same arguments used in the previous section (using a convenient function
for z), at stationary points of I it is required that
∂F d ∂F
− = 0, for 0 < x < L, (2.4)
∂y dx ∂y 0
which is known as the Euler–Lagrange equation.
As before, we can also extract some boundary condition information since the boundary
term must vanish, so we have
Special cases
There are some special cases that are interesting to examine.
4M12 – PAD/JL(jl305) 29
In this case, ∂F/∂x = 0. Therefore, from (2.6)
d 0 ∂F
F −y = 0,
dx ∂y 0
hence
∂F
F − y0 = k, where k is constant. (2.8)
∂y 0
In this course we will emphasise a first principles approach rather than the limited case
presented in this section since we want to address problems in different dimensions.
Example is one-dimension can often make direct use of equation (2.4), which is what we
will do in this section.
Find the shortest line joining the points (0, 0) and (a, b):
(a, b)
ds
(0, 0)
4M12 – PAD/JL(jl305) 30
∂F 1 1 0
02 − 2
= 1+y 2y = k (2.9)
∂y 0 2
y 02 = k 2 1 + y 02 ,
(2.10)
Arclength in cylindrical and spherical coordinate systems The two most important
coordinate systems (besides Cartesian) in three dimensions are the spherical and the cylin-
drical coordinate systems. The following figure and equations shows the geometrical
meaning of the variables and the appearance of the volume elements.
In the cylindrical and spherical coordinates, the arc length element ds is a space diagonal
of the volume element. In the cylindrical coordinates, the sides of the volume element
are dr , r dθ, dz, so the arc length element is given by
4M12 – PAD/JL(jl305) 31
p
ds = dr 2 + r 2 dθ2 + dz 2 ,
In the spherical coordinates, the sides of the volume element are dr , r dθ, r sin θdφ, so
the arc length element is given by
q
ds = dr 2 + r 2 dθ2 + r 2 sin2 θdφ2 .
The shape assumed by a soap film minimises the total surface energy. Since surface
energy is given by the surface tension multiplied by the area, the shape that minimises
the surface energy is simply the one that minimises the area. The result is not as
immediately obvious as the previous example.
z
r z
a a
4M12 – PAD/JL(jl305) 32
Area of the soap ring from z to z + dz is:
p
02
21
dA = 2πr (z) dz 2 + dr 2 = 2πr 1 + r dz
Therefore Z b 21
A = 2π r 1 + r 02 dz
−b
The expression inside the integral does not depend on z, so we can use the special case
in equation (2.8) directly,
21 − 12
2πr 1 + r 02 − r 0 2πr 1 + r 02 r 0 = 2πk
r 2 = k 2 1 + r 02
√
0 r2 − k2
r =
k
Recall that r 0 = dr /dz, we can move all r -terms to one side of the equality and all
z-terms to the other and integrate,
Z Z
dr dz
√ = + C.
2
r −k 2 k
4M12 – PAD/JL(jl305) 33
Therefore, after integrating both side
z −1 r
= cosh −K (from the maths data book)
k k
a b
= cosh .
k k
The same procedures can be extended to the minimisation of a functional over a surface
or a volume. Consider the functional
Z Z
P
I= ∇w · ∇w dV − f w dV,
2 V V
where P is the tension in a membrane, w (x, y ) is the deflection and f (x, y ) is the applied
force per unit area. We have denoted the surface by V , and will denote its boundary
by S. The process will generalise to three dimensions.
We wish to minimise the functional I. Following our standard process, we take the
directional derivative of I,
4M12 – PAD/JL(jl305) 34
Z Z
dI (w + v ) d P
= ∇ (w + v ) · ∇ (w + v ) dV − f (w + v ) dV
d
=0 d 2 V V
Z Z
=P ∇w · ∇v dV − f v dV
V V
Now, when I is a minimum,
Z Z
P ∇w · ∇v dV − f v dV = 0 for all v
V V
This expression can be manipulated into a more familiar form by applying integration by
parts to remove derivatives of v ,
Z Z
P ∇w · ∇v dV − f v dV
V Z V Z Z
2
= −P ∇ w v dV − f v dV + ∇w · nv dS
V V S
Following the usual arguments, we can deduce that the deflection satisfies the differential
equation
P ∇2 w + f = 0
4M12 – PAD/JL(jl305) 35
∇w · n = 0.
While this problem was introduced as membrane, we have worked through the problem
in a fashion which is independent of the dimension. The final equations are therefore
equally valid in one, two, and three dimensions. This functional is very common in
engineering and physics. It describes deflection of a membrane, heat conduction and
ground water flow, amongst other things.
F (x, y ) = x 2 + y 2 (2.12)
expect answer
here
x
y = −2x +3
4M12 – PAD/JL(jl305) 36
In other words, we wish to finding the shortest distance from the origin to a point on the
line given by G = 0.
We require F to be stationary as we move along the line G = 0. In other words, at
the stationary point the rate of change of F in the direction of G = 0 must be zero.
Recalling that the vector ∇G is perpendicular to the line G, this is the same as
∇F = 0 in the direction of ⊥ ∇G
or
∇F k ∇G i.e. ∇F = −λ∇G for some value λ,
4M12 – PAD/JL(jl305) 37
Find stationary points of H = F +λG = x 2 +y 2 +λ (y + 2x − 3)
∂H
= 0 → 2x + 2λ = 0
∂x
∂H
= 0 → 2y + λ = 0
∂y
∂H
= 0 → y + 2x − 3 = 0
∂λ
2x − 4y = 0 → x = 2y
3 6
y= , x = 2y = .
5 5
The same approach can be applied to problems that involve integration. Note that taking
the directional derivative with respect to λ simply recovers the constraint equation.
We consider now a continuous problem. We wish to find the shape of a flexible chain
with uniform mass density will make when hanging from supports at x = −a and x = a.
The chain has a length of 2L, where a < L.
(−a, 0) (a, 0)
y(x)
4M12 – PAD/JL(jl305) 38
The potential energy is
Z a Z a 21
I = −ρg y ds = −ρg y 1 + y 02 dx,
−a −a
So we introduce a Lagrange multiplier and try to find the function y that makes
Z a
1 2Lλ
02 2
K= (y + λ) 1 + y − dx
−a 2a
stationary.
Notice that when computing directional derivatives with respect to y or y 0 , the 2L will
vanish. Therefore we will get the same answer if we seek to minimise
Z a
1
K= (y + λ) 1 + y 02 2 dx.
−a
12 1 − 1
(y + λ) 1 + y 02 − y 0 (y + λ) 1 + y 02 2 2y 0 = k
2
k 1
1 + y 02 2 = 1 + y 02 − y 02
y +λ
=1
4M12 – PAD/JL(jl305) 39
and therefore
2
y +λ
= 1 + y 02 .
k
where c is a constant. The three unknowns k, λ and c must now be determined using
the two end conditions together with the constraint equation. The end conditions give
a+c λ −a + c
cosh = = cosh
k k k
The approach for solving constrained problems via Lagrange problems is systematic. It
involves:
4M12 – PAD/JL(jl305) 40
1. Multiply constraints Gi = 0 by Lagrange multipliers and add these to the functional
J that is to be minimised subject to the constraints, to form the functional I =
P
J + λi Gi .
2. Take directional derivatives of I with respect to Lagrange multipliers and all variables
in the problem.
3. Set each of the directional derivatives to zero and solve to find the unknown func-
tions.
Variational methods provide the basis for what is known as optimal control. It used
the constrained framework from the previous section. The aim is to minimise a goal
functional J subject to constraints. A typical constraint is a differential equation that
a physical system must obey and typical goal functional will involve the control variable
and the system response.
ẋ + x = u (2.14)
A typical control problem would be to design the controller input u (t) as to make the
system perform some required motion in the ‘best’ way in some sense. At the same time,
we might also want to minimise the required control input since larger control effort may
require bigger actuators, use fuel, etc. As an example of optimal control, we might seek
to find the control input u (t) that takes the system from x = x0 at t = 0 to x = 0 at
t = T , and that minimises the combined ‘cost function’
Z T
x 2 + u 2 dt.
J=
0
4M12 – PAD/JL(jl305) 41
This function combines the notion of ‘getting x near zero as soon as possible’ with that
of putting in the least control effort. This minimisation must be carried out subject to the
governing equation (2.14) which will appear as a form of constraint. We can therefore
extend the previous treatment of constrained problems to cover this case: introduce a
Lagrange multiplier λ, and try to find functions x (t) and u (t) that make the functional
Z T
x 2 + u 2 + λ (ẋ + x − u) dt
I=
0
stationary. The Lagrange multiplier must depend on t since equation (2.14) must hold
at every time t.
The integrand of I is a function of x and u, so we can write down an Euler–Lagrange
equation (see equation (2.4)) for them separately:
∂Fu
Fu = u 2 − λu → = 2u − λ = 0
∂u
and
∂Fx d ∂Fx dλ
Fx = x 2 − λẋ + λx → − = 2x + λ − =0
∂x dt ∂ ẋ dt
Therefore
dλ
= 2x + λ.
dt
Combining,
2u̇ = 2u + 2x.
ẍ = 2x.
The general solution for this ordinary differential equation may be written as
√ √
x = A cosh 2t + B sinh 2t.
4M12 – PAD/JL(jl305) 42
Using the condition x (0) = x0 gives A = x0 , and x (T ) = 0 gives
√ √
0 = x0 cosh 2t + B sinh 2t.
Z
1 α
J= (u − z)2 + f 2 dV,
V 2 2
−∇2 u = f in V
u=0 on ∂V
Note that J depends on f . This is required to prevent wild variations in f (and an ill-posed
problem).
Introducing the Lagrange multiplier λ, the functional I for which we need to find
stationary points of reads
4M12 – PAD/JL(jl305) 43
Z
1 α
I= λ ∇2 u + f + (u − z)2 + f 2 dV.
V 2 2
Z
λ̄ ∇2 u + f
Dλ I λ̄ = dV,
ZV
Du I [ū] = λ∇2 ū + (u − z) ū dV,
ZV
Df I f¯ = (λ + αf ) f¯ dV,
V
and f¯. Following the usual process, this leads to the coupled equations:
∇2 u + f = 0,
∇2 λ + u = z,
λ + αf = 0,
4M12 – PAD/JL(jl305) 44
3 Weak formulations of differential equations3
3.1 Introduction
The modern study is often based on the weak form of a partial differential equation,
as too are various numerical solution techniques for finding approximate solutions. The
weak form of a partial differential equation is empowering for mathematical analysis as
tools from functional analysis can be leveraged. Weak formulations are often referred
to as ‘variational formulations’, but they they can still be formulated for problems that
cannot be phrased as a minimisation problem. Classical transport equations are a typical
example of a case that cannot be posed as a minimisation problem.
The derivation of the weak form of a differential equation follows a standard process:
The weak form of an equation does not generally make an equation easier to solve
analytically (it may make it harder), but is usually a more suitable form for mathematical
analysis (allowing us to say things about the properties of the equation without knowing
the solution) and for numerical solution methods.
3.2 Examples
−∇2 u = f ,
3
This topic was new to the course in 2009
4M12 – PAD/JL(jl305) 45
already in numerous guises. The complete boundary value problem also requires boundary
conditions,
u=0 on Sg ,
∇u · n = h on Sh ,
where Sg and Sh cover the entire boundary but do not overlap. To derive the weak form,
we first multiply both sides by a weight function v and integrate over the volume V
Z Z
2
− v ∇ u dV = v f dV.
V V
Z Z Z
∇v · ∇u dV = v f dV + v ∇u · n dS,
V V Sh
Z Z Z
∇v · ∇u dV = v f dV + v h dS.
V V Sh
Solving Poisson’s equation now involves finding u that satisfies the Dirichlet boundary
conditions such that the above equation holds for all functions v .
An important observation with regard to the weak form is the order of derivatives ap-
pearing. For the Poisson equation, the weak form involves first-order derivatives whereas
the strong form involves second-order derivatives. In this sense, the weak form is more
general as functions that are not sufficiently smooth to be classical solutions to the strong
form may be solutions of the weak form.
4M12 – PAD/JL(jl305) 46
Another important observation is that the term on the left-hand side is symmetric.
That is, if we swap v and u the expression remains the same,
Z Z
∇v · ∇u dV = ∇u · ∇v dV.
V V
It turns out that when the weak form is symmetric, the problem is equivalent to the
minimisation of a functional and is therefore a variational problem in the classical sense.
Solving the weak form is equivalent to minimising
Z Z Z
1
I= ∇u · ∇u dV − f u dV − hu dS,
2 V V Sh
a · ∇φ − ∇2 φ = f
where a is the known velocity field (which is incompressible) and f is a source term. You
can see that it is an extension of Poisson’s equation, with the term a · ∇φ added to take
into account advective transport.
Following the usual process, we first multiply the equation by a weight function and
integrate,
Z Z Z
2
v a · ∇φ dV − v ∇ φ dV = v f dV,
V V V
4M12 – PAD/JL(jl305) 47
and then apply integration by parts and insert a boundary condition h for the diffusive
flux,
Z Z Z Z
v a · ∇φ dV + ∇v · ∇φ dV = v f dV + v h dS.
V V V Sh
We have not used integration by parts on the advective term. It is possible to apply it,
which would lead to an extra boundary integral. Whether or not to apply integration by
parts to this term is often a matter of convenience depending on the form of the bound-
ary conditions, which can be a complicated matter for the advection-diffusion equation
(related to characteristics).
Unlike the weak form of Poisson’s equation, the weak form of the advection-diffusion
equation is not symmetric since when we switch φ and v ,
Z Z Z Z
v a·∇φ dV + ∇v ·∇φ dV 6= φa·∇v dV + ∇φ·∇v dV
V V V V
An important feature of the weak form is that it generally permits functions with a
lesser degree of continuity than the strong form, which is owing to the reduction in the
order of the derivatives. Also there are abstract results for proving existence, stability
and uniqueness of solutions which can be applied to a broad range of equations. These
conditions are usually on the ‘bilinear form’ a (v , u). For the Poisson equation,
Z
a (v , u) = ∇v · ∇u dV,
V
and for the advection-diffusion equation
Z Z
a (v , φ) = v a · ∇φ dV + ∇v · ∇φ dV.
V V
4M12 – PAD/JL(jl305) 48
The terminology ‘bilinear form’ is used because a (v , u) is linear in v and in u, e.g. a (5v , u) =
a (v , 5u) = 5a (v , u). Important conclusions as to the properties of an equation can be
drawn by studying abstract properties of the bilinear form, most notably from the Lax–
Milgram Theorem.
Importantly, problems for which the bilinear is symmetric cam be posed as minimisation
problems.
d 2u du
x2 2
− 2x + 2u = 0, u(−1) = −2, u(1) = 0.
dx dx
1. Deduce a weak form of the above equation.
3. Multiply the equation by x −4 , and deduce the equivalent variational form of the
converted differential equation.
4M12 – PAD/JL(jl305) 49
4 Rayleigh-Ritz, Galerkin and Finite-Element Methods
The reality of partial differential equations is that in most cases it is not possible to find
an analytical solution. This is particularly so for equations on complicated geometries (as
is common in engineering), nonlinear equations and equations with complicated source
terms and boundary conditions.
If a differential equation cannot be solved in closed form, we may obtain approximate
solutions using the following techniques:
2. Solve the integral variational form approximately using the Rayleigh-Ritz method
or finite-element methods based on the Rayleigh-Ritz method.
In this section, a brief introduction is supplied for the Rayleigh-Ritz and Galerkin approx-
imate methods, as they provide the basis for finite-element methods. The Rayleigh-Ritz
method is applied directly to the variational form of the equations, while the Galerkin
method begins with the weak form of the equations.
1. Define the functional I for which you wish to find stationary points.
3. Insert the approximate solution into the functional that is now denoted by Ih .
4. Take the directional derivative of Ih with respect to the unknown amplitudes of the
basis functions.
5. Determine the amplitudes of the basis functions which yield a stationary point of Ih .
4M12 – PAD/JL(jl305) 50
For a small number of unknowns this process can be done by hand. For a clever choice
of basis functions, the solution can be quite accurate (possibly even exact). For larger
problems, the process can be implemented in a computer.
11
00
00 x
11
f(x)
00
11
1111111111
0000000000
01
x=0 1
0
L
uh = φ1 (x) a1 + φ2 (x) a2
x x2
= a1 + a2
L L2
duh 1 2x
= a1 + a2 (strain field)
dx L L2
We have chosen φ1 = x/L and φ2 = x 2 /L2 as a basis. The unknowns are a1 and
a2 .
• Insert the approximate solution uh into the energy functional I in place of u. Call
this functional Ih .
" 2 #
L
x2
Z
1 1 2x x
Ih = EA a1 + 2 a2 −f a1 + 2 a2 dx
0 2 L L L L
4
Note the Euler-Lagrangian equation is EAd 2 u/dx 2 = −f of which I is the equivalent variational form.
4M12 – PAD/JL(jl305) 51
• Compute the directional derivatives of Ih with respect to a1 and a2 . To do this we
insert a1 + a1? and a2 + a2? ,
Z L 2
d ? 1 1 ? 2x ?
DIh (a)[a ] = EA (a1 + a1 ) + 2 (a2 + a2 ) dx
d 0 2 L L
Z L 2
x x
− f (a1 + a1? ) + 2 (a2 + a2? ) dx
0 L L =0
which gives
L
a1? 2xa2?
Z
1 2x
DIh = EA a1 + 2 a2 + 2 dx
0 L L L L
L
x ? x2 ?
Z
− f a + a dx
0 L 1 L2 2
To minimise Ih , we want the above to be zero. Since a1? and a2? are arbitrary, first
set a1? = 1 and a2? = 0, and then a1? = 0 and a2? = 1,
Z L
1 2x x
EA 2
a1 + 3 a2 − f dx = 0 (a1? = 1, a2? = 0)
0 L L L
Z L
4x 2 x2
2x
EA 3
a1 + 4 a2 − f 2 dx = 0 (a1? = 0, a2? = 1)
0 L L L
Integrate,
Z L
EA x
(a1 + a2 ) − f dx = 0 (a1? = 1, a2? = 0)
L L
0 Z L 2
EA 4 x
a1 + a2 − f 2 dx = 0 (a1? = 0, a2? = 1)
L 3 0 L
4M12 – PAD/JL(jl305) 52
• Represent the problem as a 2 × 2 system of equations and solve for a1 and a2 .
L x
R
1 1 " # f dx
EA a1 0 L
=
L 4 a2 R
L x 2
1 0 f dx.
3 L2
We now have an approximate solution uh along the rod. If the function f is difficult to
integrate, it could be integrated approximately (numerical integration).
The Rayleigh-Ritz method leads us to a solution uh , but how do we know that this bears
any relation to the exact solution u? We can draw some firm conclusions based on some
simple arguments. We assume that we have a problem that has a unique solution and
that it is stable (the stationary point is a minimum).
Firstly, we need to cast the method in a slightly more abstract format. For the bar
the problem is: find uh such that
2
Z L
d 1 duh dvh
DIh (uh ) [vh ] = EA + − f (uh + vh ) dx
d 0 2 dx dx
=0
Z L
duh dvh
= EA − f vh dx
0 dx dx
= 0.
This is just the weak form of the equation for our approximate problem. Note that if u
minimises I and uh minimises Ih , then we have
Z L
du dvh
EA − f vh dx = 0,
0 dx dx
Z L
duh dvh
EA − f vh dx = 0.
0 dx dx
We can conclude from this that
4M12 – PAD/JL(jl305) 53
Z L Z L
du dvh duh dvh
EA = EA , (4.16)
0 dx dx 0 dx dx
or equivalently Z L
du duh dvh
EA − =0 (4.17)
0 dx dx dx
for all vh .
What we have already done is from all the possible approximate solutions vh =
φ1 (x) b1 + φ2 (x) b2 that we have allowed (there are infinite number of possibilities just
by changing b1 and b2 ), we have ‘computed’ uh = φ1 (x) a1 +φ2 (x) a2 that minimises our
expression for the potential energy Ih . We now would like to know how our computed dis-
placement field uh is related to the exact displacement field u. Inserting yh = u − uh + vh ,
where vh is any function which can be represented using the basis which we have chosen,
into the term on the left-hand side in equation (4.15) (in place of u) and expanding,
L 2 2
1 L
d (u − uh + vh ) d (u − uh )
Z Z
1
EA dx = EA
2 0 dx 2 0 dx
Z L 2
1 L
dvh d (u − uh )
Z
dvh
+ EA dx + EA .
dx dx 2 dx
|0 {z } | 0 {z }
=0, due to eqn (4.17) ≥0
Therefore, if we set wh = uh − vh move some terms to the opposite side of the equation,
L 2 L 2
d (u − uh ) d (u − wh )
Z Z
1 1
EA dx ≤ EA dx.
2 0 dx 2 0 dx
The term of the left-hand side is the error in the strain energy. This important result
proves that Rayleigh-Ritz method finds the solution uh , from all the possible solutions
which we allow, that minimises the error in the strain energy. We have proved this without
knowing the analytical solution u!
4M12 – PAD/JL(jl305) 54
4.2 Galerkin Method
The Galerkin method is based on the method of weighted residuals. The primary ad-
vantage over the Rayleigh-Ritz method is that it is not necessary to write, or even be
able to write, the equation in variational form prior to applying the Galerkin method as
is the case when using the Rayleigh-Ritz method. This allows the Galerkin method to be
applied to a much wider class of problems.
We are seeking an approximation to the solution of the differential equation, that is,
the strong form,
Lu = f .
ū = ci φi .
R(x) = Lū − f
which is a measure of the error of the approximation ū. If ū = u(x), then the residual
vanishes. In the method of weighted residuals, we multiply the residual by a set of weight
functions wi (x), i = 1, ..., N and integrate over the domain
Z x1
(Lū − f )wi dx = 0.
x0
This is equivalent to setting the inner product of the residual with each weight function
to zero. In the general method of weighted residuals, we can choose different functions
for the weight functions wi (x) and the basis functions φi (x) within the trial function ū.
In the Galerkin method, however, we use the same functions for the weight functions and
the basis function in the trial function such that wi (x) = φi (x).
Question : Use the Galerkin method to find an approximate solution of subsection 4.1.1
with two basis functions x/L and x 2 /L2 .
Comments: (1) starts with the weak form of the PDEs. (2) when an equivalent vari-
ational form exists, the Rayleigh-Ritz method and the Galerkin method give the same
results.
4M12 – PAD/JL(jl305) 55
4.3 The finite element method
The finite element is a powerful technique for finding approximate solutions to partial
differential equations. The finite element method described here is based on the Galerkin
method. Ultimately, finding an approximate solution to a differential equations requires
solving a system of linear equations (usually very large and therefore done on a computer).
Rather than considering a functional, the finite element method addresses the weak
form of the equation directly and is therefore applicable to a wider range of problems.
The weak form for the elastic rod considered in the previous section is
Z L Z L
dv du
EA dx = v f dx. (4.18)
0 dx dx 0
The finite element methods represents an approximate solution uh using low-order poly-
nomials on simple shapes as a basis. The simplest basis in one dimensional consists of
‘hat-like’ piecewise linear functions,
φ2 nodes
11
01
0
0
1
0
1
0 1 2 3 4 5 6 7
L
The approximate solution is expressed as a linear combination of these simple basis
functions,
n
X
uh = φi (x) ai
i =1
4M12 – PAD/JL(jl305) 56
where ai is the approximate solution at the point xi . Using the same basis to represent vh ,
n
X
vh = φi (x) ai? ,
i =1
we can insert the expression for uh and vh into the weak form,
Z L X n
! n
! Z L X n
!
dφi ? X dφj
ai EA aj dx = φi ai? f dx.
0 i=1
dx j=1
dx 0 i=1
?
Since ai? is arbitrary, we can set ai=k = 1 and ai6?=k = 0, so for each i we have:
Z L n
! Z L
dφ1 X dφj
i =1: EA aj dx = φ1 f dx,
0 dx j=1
dx 0
Z L n
! Z L
dφ2 X dφj
i =2: EA aj dx = φ2 f dx,
0 dx j=1
dx 0
...
n
!
Z L Z L
dφn X dφj
i =n: EA aj dx = φn f dx.
0 dx j=1
dx 0
Ka = b
where Z x=L
dφi dφj
Kij = EA dx
x=0 dx dx
Z x=L
bi = φi f dx
x=0
The matrix problem can be solved using, for example, LU decomposition.
The finite element method can be applied to any differential equation, in any spa-
tial dimension and for any geometry. For two- and three-dimensional problems, basis
functions are usually defined on triangles, quadrilaterals, tetrahedra or hexahedra. For
more accurate results, more ‘elements’ can be used or the polynomial order of the basis
4M12 – PAD/JL(jl305) 57
functions can be increased. The error analysis presented for the Rayleigh-Ritz method
can be generalised for the finite element method. Practical aspects of the finite element
method are covered in module 3D7.
Question: Suppose an interval [0, L] is partitioned in uniform cells x0 = 0 < x1 < ... <
xi < ... < xn = L such that xi+1 − xi = h is constant. The piecewise linear ’hat-like’
function φi is defined as φi (xj ) = δij . Calculate
Z L
dφi dφj
Kij = dx .
0 dx dx
1. Permutation symbol:
[x × y ]i = ijk xj yk
5. Given functions u(x) and v (x), define a function of one independent variable →
J (u + v ). Directional derivative:
dJ (u + v )
DJ (u) [v ] = =0
d
=0
where h(x) is an arbitrary function in the same interval with h(x0 ) = h(x1 ) = 0, then
g(x) = 0 at very point in the interval x0 ≤ x ≤ x1 . ⇒ Euler–Lagrange equation.
Note that we can make h(x) differentiable such that h0 (x0 ) = h0 (x1 ) = 0.
4M12 – PAD/JL(jl305) 58
8. Boundary Conditions (fixed or free) worked out separately at the two ends.
9. Extensions:
a) Functional depends on more that one function, for example, of u(x) and v (x).
12. Only possible if the bilinear form in the weak form is symmetric, for example,
∇u · ∇v dV ⇒ 21 ∇u · ∇u dV , u · v dV ⇒ 12 u · u dV .
R R R R
13. Numerical methods: look for an approximate solution in the form of a trial function
(finite dimension)
n
X
ū = ci φi ,
i=1
a) Galerkin method: works on the weak form of PDE which is restricted to the
space of finite dimension. It is a method of weighted residuals where weight
function = basis function.
14. Compare approximate solutions with the exact solution when possible.
4M12 – PAD/JL(jl305) 59