Sie sind auf Seite 1von 340

CAMBRIDGE STUDIES IN

ADVANCED MATHEMATICS 64
EDITORIAL BOARD
D.J.H. GARLING, W. FULTON, K. RIBET, T. TOM DIECK,
P. WALTERS
CALCULUS OF VARIATIONS
Already published
1 W.M.L. Holcombe Algebraic automata theory
2 K. Petersen Brgodic theory
3 P.T. Johnstone Stone spaces
4 W,H. Schikhof Ultrametric calculus
5 J.-P. Kahane Some random series of functions, 2nd edition
6 H. Cohn Introduction to the construction of class fields
7 J. Lambek & P.J. Scott Introduction to higher-order categorical logic
8 H. Matsumura Commutative ring theory
9 C.B. Thomas Characteristic classes and the cohomology of finite groups
10 M. Aschbaeher Finite group theory
11 J.L. Alperin Local representation theory
12 P. Koosis The logarithmic integral I
13 A. Pietsch Eigenvalues and S-numbers
14 S.J. Patterson An introduction to the theory of the Riemann
zeta-function
15 H.J. Baues Algebraic homotopy
16 V.S. Varadarajan Introduction to harmonic analysis on semisimple
Lie groups
17 W. Dicks & M. Dunwoody Groups acting on graphs
18 L.J. Corwin & F.P. Greenleaf Representations of nilpotent Lie groups
and their applications
19 R. Fritsch & R. Piccinini Cellular structures in topology
20 H Klingen Introductory lectures on Siegel modular forms
21 P. Koosis The logarithmic integral II
22 M.J. Collins Representations and characters of finite groups
24 H. Kunita Stochastic flows and stochastic differential equations
25 P. Wojtaszczyk Banach spaces for analysts
26 J.E. Gilbert & M.A.M. Murray Clifford algebras and Dirac operators
in harmonic analysis
27 A. Prohlich & M.J. Taylor Algebraic number theory
28 K. Goebel & W.A. Kirk Topics in metric fixed point theory
29 J.F. Humphreys Reflection groups and Coxeter groups
30 D.J. Benson Representations and cohomology I
31 D.J. Benson Representations and cohomology II
32 C. Allday & V. Puppe Cohomological methods in transformation groups
33 C. Soule et al Lectures on Arakelov geometry
34 A. Ambrosetti & G. Prodi A primer of nonlinear analysis
35 J. Palis & F. Takens Hyperbolicity and sensitive chaotic dynamics at
homoclinic bifurcations
36 M. Auslander, I. Reiten & S. Smalo Representation theory of Artin algebras
37 Y. Meyer Wavelets and operators
38 C. Weibel An introduction to homological algebra
39 W. Bruns & J. Herzog Cohen-Macaulay rings
40 V. Snaith Explicit Brauer induction
41 G. Laumon Cohomology of Drinfeld modular varieties I
42 E.B. Davies Spectral theory and differential operators
43 J. Diestel, H. Jarchow & A. Tonge Absolutely summing operators
44 P. Mattila Geometry of sets and measures in Euclidean spaces
45 R. Pinsky Positive harmonic functions and diffusion
46 G. Tenenbaum Introduction to analytic and probabilistic number theory
47 C. Peskine An algebraic introduction to complex projective geometry I
48 Y. Meyer & R. Coifman Wavelets and operators II
49 R. Stanley Enumerative combinatories
50 I. Porteous Clifford algebras and the classical groups
51 M. Audin Spinning tops
52 V. Jurdjevic Geometric control theory
53 H. Voelklein Groups as Galois groups
54 J. Le Potier Lectures on vector bundles
55 D. Bump Automorphic forms
56 G. Laumon Cohomology of Drinfeld modular varieties II
59 P. Taylor Practical foundations of mathematics
60 M. Brodmann & R. Sharp Local cohomology
64 J. Jost & X. Li-Jost Calculus of variations
Calculus of Variations
Jiirgen Jost and Xianqing Li-Jost
Max-Planck-Institute for Mathematics in the Sciences,
Leipzig
CAMBRIDGE
UNIVERSITY PRESS
PUBLI SHED BY THE PRESS SYNDI CATE OF THE UNI VERSI TY OF CAMBRI DGE
The Pi t t Building, Trumpington Street, Cambridge CB2 1RP, United Kingdom
CAMBRI DGE UNI VERSI TY PRESS
The Edinburgh Building, Cambridge CB2 2RU, UK http://www. cup. ac. uk
40 West 20th Street, New York, NY 10011-4211, USA ht t p: / / www. cup. org
10 Stamford Road, Oakleigh, Melbourne 3166, Australia
Cambridge University Press 1998
This book is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without
the written permission of Cambridge University Press.
First published 1998
Typeset in Computer Modern by the authors using LMj^C 2e
A catalogue record of this book is available from the British Library
Library of Congress Cataloguing in Publication data
Jost, Jiirgen, 1956-
Calculus of variations / Jiirgen Jost and Xianqing Li-Jost.
p. cm.
Includes index.
ISBN 0 521 64203 5 (he.)
1. Calculus of variations. I. Li-Jost, Xianqing, 1956-
II. Title.
QA315.J67 1999
515' .64-dc21 98-38618 CIP
ISBN 0 521 64203 5 hardback
Transferred to digital printing 2003
Dedicated to Stefan Hildebrandt
Contents
Preface and summary page x
Remarks on notation xv
Part one: One-dimensional variational problems 1
1 The classical theory 3
1.1 The Euler-Lagrange equations. Examples 3
1.2 The idea of the direct methods and some regularity
results 10
1.3 The second variation. Jacobi fields 18
1.4 Free boundary conditions 24
1.5 Symmetries and the theorem of E. Noether 26
2 A geometric example: geodesic curves 32
2.1 The length and energy of curves 32
2.2 Fields of geodesic curves 43
2.3 The existence of geodesies 51
3 Saddle point constructions 62
3.1 A finite dimensional example 62
3.2 The construction of Lyusternik-Schnirelman 67
4 The theory of Hami l ton and Jacobi 79
4.1 The canonical equations 79
4.2 The Hamilton-Jacobi equation 81
4.3 Geodesies 87
4.4 Fields of extremals 89
4.5 Hilbert's invariant integral and Jacobi's theorem 92
4.6 Canonical transformations 95
vn
Vl l l Contents
5 Dynamic optimization 104
5.1 Discrete control problems 104
5.2 Continuous control problems 106
5.3 The Pontryagin maximum principle 109
Part two: Multiple integrals in the calculus of
variations 115
1 Lebesgue measure and integration theory 117
1.1 The Lebesgue measure and the Lebesgue integral 117
1.2 Convergence theorems 122
2 Banach spaces 125
2.1 Definition and basic properties of Banach and Hilbert
spaces 125
2.2 Dual spaces and weak convergence 132
2.3 Linear operators between Banach spaces 144
2.4 Calculus in Banach spaces 150
3 L
p
and Sobolev spaces 159
3.1 L
p
spaces 159
3.2 Approximation of L
p
functions by smooth functions
(mollification) 166
3.3 Sobolev spaces 171
3.4 Rellich's theorem and the Poincare and Sobolev
inequalities 175
4 The direct methods in the calculus of variations 183
4.1 Description of the problem and its solution 183
4.2 Lower semicontinuity 184
4.3 The existence of minimizers for convex variational
problems 187
4.4 Convex functionals on Hilbert spaces and Moreau-
Yosida approximation 190
4.5 The Euler-Lagrange equations and regularity questions 195
5 Nonconvex functionals. Relaxation 205
5.1 Nonlower semicontinuous functionals and relaxation 205
5.2 Representation of relaxed functionals via convex
envelopes 213
6 T-convergence 225
6.1 The definition of T-convergence 225
Contents DC
6.2 Homogenization 231
6.3 Thin insulating layers 235
7 BV-functionals and T-convergence: the exampl e of
Modi ca and Mortola 241
7.1 The space BV(ft) 241
7.2 The example of Modica-Mortola 248
Appendix A The coarea formula 257
Appendix B The distance function from smooth hypersurfaces 262
8 Bifurcation theory 266
8.1 Bifurcation problems in the calculus of variations 266
8.2 The functional analytic approach to bifurcation theory 270
8.3 The existence of catenoids as an example of a bifurca-
tion process 282
9 The PalaisSmale condition and unstable critical
points of variational problems 291
9.1 The Palais-Smale condition 291
9.2 The mountain pass theorem 301
9.3 Topological indices and critical points 306
Index 319
Preface and summary
The calculus of variations is concerned with the construction of optimal
shapes, states, or processes where the optimality criterion is given in
the form of an integral involving an unknown function. The task of the
calculus of variations then is to demonstrate the existence and to deduce
the properties of some function that realizes the optimal value for this
integral. Such variational problems occur in many-fold applications, in
particular in physics, engineering, and economics, and the variational
integral may represent some action, energy, or cost functional. The cal-
culus of variations also has deep and important connections with other
fields of mathematics. For instance, in geometrically defined classes of
objects, a variational principle often permits the selection of a unique
optimal representative, and the properties of this representative can fre-
quently be used to much advantage to deduce additional information
about its class. For these reasons, the calculus of variations is a rich and
ample mathematical subject, and a good impression of this diversity
can be obtained by reading the beautiful book by S. Hildebrandt and
A. Tromba, The Parsimonious Universe, Springer, 1996.
In this textbook, we have attempted to present some of the many faces
of the calculus of variations, and a brief summary may be useful before
putting the contents into a broader perspective. At the same time, we
shall also describe the logical connections between the various chapters,
in order to facilitate reading for readers with a specific aim. The book
is divided into two parts. The first part treats variational problems for
functions of one independent variable; the second, problems for functions
of several variables. The distinction between these two parts, however, is
also that the first treats the more elementary and more classical aspects
of the subject, while the second is concerned with some more difficult
topics and uses somewhat more abstract reasoning. In this second part,
x
Preface and summary XI
also some examples are presented in detail that occurred in recent ap-
plications of the calculus of variations. This second part leads the reader
to some topics and questions of current research in the calculus of vari-
ations.
The first chapter of Part I is of a somewhat introductory nature and
attempts to develop some intuition for the properties of solutions of vari-
ational problems. In the basic Section 1.1, we derive the Euler-Lagrange
equations that any smooth solution of a variational problem has to sat-
isfy. The topics of the other sections of that chapter contain some reg-
ularity questions and an outline of the so-called direct methods of the
calculus of variations (a subject that will be taken up in much more de-
tail in Chapter 4 of Part II), Jacobi's theory of the second variation and
stability of solutions, and Noether's theorem that deduces conservation
laws from invariance properties of variational integrals. All those results
will not be directly applied in subsequent chapters, but should rather
serve as a motivation. In any case, basically all the chapters of Part I can
be read independently, after the reader has gone through Section 1.1.
In Chapter 2, we treat one of the most important variational prob-
lems, namely that of geodesies, i.e. of finding (locally) shortest curves
under smooth geometric constraints. Geodesies are of fundamental im-
portance in Riemannian geometry and several physical applications. We
shall make use of the geometric nature of this problem and develop some
elementary geometric constructions, to deduce the existence not only of
length-minimizing curves, but also of curves that furnish unstable criti-
cal points of the length functional. In Chapter 3, we present some more
abstract aspects of such so-called saddle point constructions. At this
point, however, we can only treat problems that allow the reduction to
a finite dimensional situation. A deeper treatment needs additional tools
and therefore has to wait until Chapter 9 of Part II. Geodesies will only
occur once more in the remainder, namely as an example in Section 4.3.
Chapter 4 is concerned with one of the classical highlights of the cal-
culus of variations, the theory of Hamilton and Jacobi. This theory is
of particular importance in mechanics. Presently, its global aspects are
resurging in connection with symplectic geometry, one of the most active
fields of present mathematical research.
Chapter 5 is a brief introduction to dynamic optimization and control
theory. The canonical equations of Hamilton and Jacobi of Section 4.1
briefly reoccur as an example of the Pontryagin maximum principle at
the end of Section 5.3.
As mentioned, Part II is of a less elementary nature. We therefore need
xii Preface and summary
to develop some general theory first. In Chapter 1 of that part, Lebesgue
integration theory is summarized (without proofs) for the convenience
of the reader. While in Part I, the Riemann integral entirely suffices
(with the exception of some places in Section 1.2), the function spaces
that are basic for Part II, namely the LP and Sobolev spaces, are es-
sentially based on Lebesgue's notion of the integral. In Chapter 2, we
develop some results from functional analysis about Banach and Hilbert
spaces that will be applied in Chapter 3 for deriving the fundamen-
tal properties of the L
p
and Sobolev spaces. (In fact, as the tools from
functional analysis needed in subsequent chapters are of a quite varied
nature, Chapter 2 can also serve as a brief introduction into the field of
functional analysis itself.) These chapters serve the purpose of making
the book self-contained, and for most readers the best strategy might
be to start with Chapter 4, or at most with Chapter 3, and look up the
results of the previous chapters only when they are applied. Chapter 4
is fundamental. It is concerned with the existence of minimizers of vari-
ational integrals under appropriate convexity and lower semicontinuity
assumptions. We treat both the standard method based on weak com-
pactness and a more abstract method for minimizing convex functionals
that does not need the concept of weak convergence. Chapters 5-7 essen-
tially discuss situations where those assumptions are no longer satisfied.
Chapter 5 deals with the method of relaxation, while Chapters 6 and
7 present the important concept of T-convergence for minimizing func-
tionals that can be represented only in an indirect manner as limits of
other functionals. Such problems occur in many applications, including
homogenization and phase transitions, and several such examples are
treated in detail. Chapter 8 discusses bifurcation theory. We first dis-
cuss the variational aspects (Jacobi fields), taking up the constructions
of Sections 1.1 and 1.3 of Part I, then develop a general functional an-
alytic framework for analyzing bifurcation phenomena and then treat
the example of minimal surfaces of revolution (catenoids) in the light
of that framework. Chapter 8 is independent of Chapters 4-7, and of a
more elementary nature than those. The key tool is the implicit function
theorem in Banach spaces, proved in Section 2.4. The last Chapter 9 re-
turns to the topic of the existence of non-miminizing, unstable critical
points of variational integrals. While such solutions usually cannot be
observed in physical applications because of their unstable nature, they
are of considerable mathematical interest, for example in the context of
Riemannian geometry. Chapter 9 is independent of Chapters 4-8.
Preface and summary
xin
The present book is self-contained, with very few exceptions. Prere-
quisites are only the calculus of one and several variables.
Although, as indicated, there are important connections between the
calculus of variations and geometry, the present book is of an analytic
nature and does not explore those connections. One such connection con-
cerns the global aspects of the space of solutions of one-dimensional vari-
ational problems and their trajectories that started with the qualitative
investigations of Poincare and is for example represented in V.I. Arnold,
Mathematical Methods of Classical Mechanics, GTM 60, Springer, New
York, 2nd edition, 1987. Here, geometric methods are used to study
variational problems. In the opposite direction, variational methods can
often be used to solve geometric problems. This is the topic of geometric
analysis; we refer the interested reader to J. Jost, Riemannian Geome-
try and Geometric Analysis, Springer, Berlin, 2nd edition, 1998, and the
references contained therein.
There is one important omission in this textbook. Namely, the reg-
ularity theory for solutions of variational problems is not treated, with
the exception of the one-dimensional case in Section 1.2 of Part I, and
the simplest example of the multi-dimensional theory, namely harmonic
functions (plus an easy generalization) in Section 4.5 of Part II. There-
fore, the solutions of the variational problems that are discussed usually
only are obtained in some Sobolev space. We think that a detailed treat-
ment of regularity theory more properly belongs to the realm of partial
differential equations, and therefore we have to refer the reader to text-
books and monographs on partial differential equations, for example
D. Gilbarg and N. Trudinger, Elliptic Partial Differential Equations of
Second Order, Springer, Berlin, 2nd edition, 1983, or J. Jost, Partielle
Differentialgleichungen, Springer, Berlin, 1998.
In any case, the present textbook cannot cover all the many diverse
aspects of the calculus of variations. For readers who are interested in a
more extensive treatment, we strongly recommend M. Giaquinta and St.
Hildebrandt, Calculus of Variations, several volumes, Springer, Berlin,
1996 ff., as well as E. Zeidler, Nonlinear Functional Analysis and its
Applications, Vols. Ill and IV, Springer, New York, 1984 ff. (a second
edition of Vol. IV appeared in 1995). Additional references are given
in the course of the text. Since the present book, however, is neither a
research monograph nor an account of the historical development of the
calculus of variations, references to individual contributions are usually
not given. We just list our sources, and refer the interested readers as
well as the contributing mathematicans to those for references to the
original contributions.
XIV Preface and summary
The authors thank Felicia Bernatzki, Ralf Muno, Xiao-Wei Peng, Mar-
ianna Rolf, and Wilderich Tuschmann for their help in proofreading and
checking the contents and various corrections, and Michael Knebel and
Micaela Krieger for their competent typing.
The present authors owe much of their education in the calculus of
variations to their teacher, Stefan Hildebrandt. In particular, the pre-
sentation of the material of Chapters 1 and 4 in Part I is influenced
by his lectures that the authors attended as students. For example, the
regularity arguments in Section 1.2 are taken directly from his lectures.
For these reasons, and for his generous support of the authors over many
years, and for his profound contributions to the subject, in particular to
geometric variational problems, the authors dedicate this book to him.
Remarks on notation
A dot '' always denotes the Euclidean scalar product in W
d
, i.e. if
x = (x\...y),y = (y\...y)eB*,
then
d
x y 2_\ x
l
y
l
= %
l
y
l
(Einstein summation convention) ,
2 = 1
and
|x| =xx.
For a function it(), we write
u(t) = u(t).
In Part I, the independent variable is usually called , because in many
physical applications, it is interpreted as the time parameter. Here, the
dependent variables are mostly called u(t) or x(t). In Part II, the inde-
pendent variables are denoted by x = ( x
1
, . . . , x
d
), conforming to estab-
lished conventions.
We use the standard notation
c
k
{rt)
for the space of fc-times continuously differentiate functions on some
open set fi C K
d
, for k = 0 (continuous functions), 1, 2, . . . , oo (infinitely
often differentiate functions). For vector valued functions, with values
in M
d
, we write
c
k
(n,m
d
)
XV
XVI Remarks on notation
for the corresponding spaces.
c
0
(fi)
denotes the space of functions of class C on ft that vanish identically
outside some compact subset K C ft (where K may depend on the
function, of course). Occassionally, we also use the notation
c
0
fc
( n)
for C
k
functions on fi that again vanish outside some compact subset
Ken.
Finally, we use the notation
to indicate that the expression on the left of this symbol is defined by
the expression on the right of it.
Part one
One-dimensional variational problems
1
The classical theory
1.1 The Euler-Lagrange equations. Exampl es
The classical calculus of variations consists in minimizing expressions of
the form
I(u) = I F(t,u(t),u(t))dt,
Ja
where F : [a, 6] x R
d
x R
d
> E is given. One seeks a function u : [a, 6] >
R
d
minimizing J. More generally, one is also interested in other critical
points of J. Usually, u has to satisfy some constraints, the most common
one being a Dirichlet boundary condition
u(a) = u\
u(b) = U2>
Also, one needs to specify a class of admissible functions among which
one seeks a minimizing u. For example, one might want to take the
class of continuously difFerentiable or piecewise continuously difFeren-
tiable functions. Let us consider some examples of such variational prob-
lems:
(1) We want to minimize the arc-length of the graph of a function u :
[a, 6] 1R, i.e. the length of the curve (t,u(t)) C K
2
among all
graphs with prescribed boundary values u(a),u(b). This leads to
the variational problem
b
y/l -f u(t)
2
dt -+ min.
Of course, one knows and easily proves that the solution is the
straight line between u(a) and u(b), i.e. satisfies il(t) = 0.
/
3
4 The classical theory
(2) Historically, the calculus of variations started with the so-called
brachystochrone problem that was posed by Johann Bernoulli.
Here, one wants to connect two points (to,yo) and (t\,y\) in R
2
by such a curve that a particle obeying Newton's law of gravi-
tation and moving without friction travels the distance between
those points in the fastest possible way. After falling the height t/,
the particle has speed (2gy)z where g is the gravitational accel-
eration. The time the particle needs to traverse the path y = u(t)
then is
=xV
' <>=> . / ^
(3) A generalization of (1) and (2) is
r
b
y/l + u(t)
2
I{u) = f
J a
dt,
l(t,u(t))
where 7 : [a,fe] x R- > 1 is a given positive function. This vari-
ational problem also arises from Fermat's principle..That princi-
ple says that a light ray chooses the path that needs the shortest
time to be traversed among all possible paths. If the speed of
light in a given medium is y(t,u(t)), we obtain the preceding
variational problem.
If one seeks a minimum of a smooth function
/ : fi -+ E (fi open in M
d
),
one knows that at a minimizing point ZQ fi, one necessarily has
Df(z
0
) = 0,
where Df is the derivative of / . The first variation of / actually has
to vanish at any stationary point, not only at minimizers. In order to
distinguish a minimizer from other critical points, one has the additional
necessary condition that the Hessian D
2
f(z
0
) is positive semidefinite and
(at least for a local minimizer) the sufficient condition that it is positive
definite.
In the present case, however, we do not have a function / of finitely
many independent real variables, but a functional Zo n a class of func-
tions. Nevertheless, we expect that a first derivative of J something
still to be defined needs to vanish at a minimizer, and moreover that
a suitably defined second derivative is positive (semi)definite.
1.1 The Euler-Lagrange equations. Examples 5
In order to investigate this more closely, we assume that F is of class
C
1
and that we have a minimizer or, more generally, a critical point of /
that also is C
1
. We also assume prescribed Dirichlet boundary conditions
u(a) = ui , u(b) = U2- In other words, we assume that u minimizes / in
the class of all functions of class C
1
satisfying the prescribed boundary
condition. We then have for any 77 G CQ ([a, &],R
d
)f and any s G E
I(u + sri) > I(u).
Now
I(u + sr)) = I F(t,u(t) + sr){t),u{t) + sr}{t))dt.
Ja
Since F, it, and 77 are assumed to be of class C
1
, we may differentiate
the preceding expression w.r.t. s and obtain at s = 0
^ ( + *)|.-o (1-1-1)
= f {F
u
(t,u(t),u(t))-r){t)+F
p
(t,u(t),u(t))-r,(t)}dt,
Ja
where F
u
is the vector of partial derivatives of F w.r.t. the components
of u, and F
p
the one w.r.t. the components of u(t).
We now keep 77 fixed and let s vary. We are thus just in the situation of
a real valued / ( s) , s G R, (/(s) = I(u + S77)), and the condition / ' (0) = 0
translates into
0 = / {F
u
(t,u(t),ii(t)) - r](t) + F
p
(t,u(t),ii(t)) - fj(t)}dt, (1.1.2)
Ja
and this actually then has to hold for all rj CQ. We now assume that F
and u are even of class C
2
. Equation (1.1.2) may then be integrated by
parts. Noting that we do not get a boundary term since 77(a) = 0 = 77(6),
we thus obtain
0 =
/ { ( ^ (* > " (* ) (* )) - 1 (
F
P( * ' ( * ) ' ( * ) ) ) ) ' ( * ) } * ( i-
L3
>
for all 7] G Co([a, 6],R
d
). In order to proceed, we need the so-called
fundamental lemma of the calculus of variations:
f This means t hat rj is continuously differentiable as a function on [a, b) with values
in R
d
and t hat there exist a < a\ < b\ < b with rj(x) = 0 if x is not contained in
[<*i,6i].
/
6 The classical theory
Lemma 1.1.1. If h e C ((a, 6), R
d
) satisfies
h(t)<p(t)dt = 0 for all <p e C ( ( a, 6) , R
d
),
tften ft = 0 on (a, 6).
Proof Otherwise, there exists some to (a, 6) with
ft(*o) ^ 0.
Thus, h
l
(to) i=- 0 for some index i e { 1 , . . . , d}. Since ft is continuous,
there exists some 6 > 0 with
a <
0
- < o + < 6
and
|/i*(t)| > i |ft
io
(*o)| whenever |t
0
- t| < (5.
We then choose <p e C ((a, 6), M
d
) with
y>(t) = 0 if \t
0
-t\>6
<p
io
(t)>0 if |*o - * | < 6
(p
io
(t)=0 fori^io, t { l , . . . , d } .
For this choice of </?, however
pb pto+6
/ h(t)<p(t)dt = / ft( * Mt) cft^ 0,
./a Jto 6
contradicting our assumption. Thus, necessarily ft (to) = 0 for all to
(a, 6).
g.e.d.
Lemma 1.1.1 and (1.1.3) imply that a minimizer of I of class C
2
has
to satisfy the so-called Euler-Lagrange equations, namely:
Theorem 1.1.1. Let F C
2
([a,b]xR
d
xR
d
,R), and letu e C
2
([a,6],I
be a minimizer of
r
b
I(u) = J F(t,u{t),u(t))dt
J a
among all functions with prescribed boundary values u(a) andu{b). Then
u is a solution of the following system of second order ordinary differ-
ential equations, the Euler-Lagrange equations
j
t
(F
p
(t,u(t),ii(t))) - F
u
(t,u(t),u(t)) = 0. (1.1.4)
1.1 The Euler-Lagrange equations. Examples 7
Written out, the Euler-Lagrange equations are
F
pp
(t, u(t),u{t))u{t) + F
pu
(t, u(t), ii(t))u{t)
+ F
pt
{t,u(t),u{t)) - F
u
(t,u(t),u(t)) = 0, (1.1.5)
i. e. a system of d ordinary differential equations of second order that are
linear in the second derivatives of the unknown function u.
Let us compute the Euler-Lagrange equations for our preceding three
examples:
(1) Here F
u
= 0, F
p
= *<?) , and we get
d u(t) _ il{t) u{t)
2
u{t)
ii(t)
(yi+w)
3 '
i.e.
u(t) = 0
meaning that u has to be a straight line, a fact that we know of
course.
(3) For the general example (3), we obtain as Euler-Lagrange equa-
tions
u{t) u{t)
2
u(t) 7
t
u{t)
7
2
y/l + u{t)
2
7
2
hence
0 = il(t) - ^i i ( t ) (1 + u(tf) + ^ (1 + ii(t)
2
). (1.1.6)
(2) We just need to insert 7 = y/2gu(t) into (1.1.6) to obtain
0 = u(t) + (l+ii(t)
2
) *
2u(t)'
8 The classical theory
Actually, (2) is an example of an integrand F(t , it, u) that does not
depend explicitly on t, i.e. F
t
= 0. In this case
| ( F - * F
P
) = ( * . - ! . -(F - uF
p
) = <i(F
u
- -F
p
) = 0 by (1.1.4),
and hence every solution of the Euler-Lagrange equation (1.1.4) satisfies
F(t,u(t),ii(t)) ~u(t)F
p
(t,u(t),u(t)) = constant. (1.1.7)
Conversely, every solution of (1.1.7), with the exception of u = 0, i.e.
u = constant, also satisfies (1.1.4).
In the case of example (2), we have F = * A j ^ , and (1.1.7) becomes
j-z = u(l + u
2
), if we denote the constant in (1.1.7) by A.
In all examples (l)-(3), we actually had d = 1. If one modifies e.g. (1)
and seeks a curve g(t) = (#i(), ...,&*()) C R
d
connecting two given
points g(a) and #(6), our variational problem becomes
m = \m\dt = r(j
t
9i(t)) j dt.
The Euler-Lagrange equations in this case are
d d
, . , , , 9iT,(9j)
2
-9iJ2 9j9j
Q=
d 9jt}_^
=
_J^ qi
(1
.
L8)
dt
/d \ * (d \ *
for i = 1, . . . , d.
We now recall that any smooth curve g(t) C R
d
may be parameterized
by arc-length, i.e.
U* )\
dt"
= 1. (1.1.9)
We also know that a reparameterization of a curve g(t) does not change
its arc-length 1(g)- Consequently, we may assume (1.1.9) in (1.1.8). The
latter then becomes
d f d , A
, . . ^x~, . 0 for i = 1 , . . . , d
dt
so that we see again that a length minimizing curve in E
d
is a straight
line.
1.1 The Euler-Lagrange equations. Examples 9
Often, one also meets the task of minimizing
I(u) = / F(t,u{t),u{t))dt
J a
subject to some constraint, for example
S(u) = / G(t,u(t),u{t))dt = CQ (a given constant). (1.1.10)
J a
As in the case of finite dimensional minimization problems, one then
finds a Lagrange multiplier A with
0 = ^ (I(U + ST]) + XS{U + ST))) U Q (1.1.11)
as
for all 77 G Co([a,6],E
d
). This leads to the Euler-Lagrange equations
d
Jt
[lpK
- {F
u
(t,u(t),u(t)) + \G
u
(t,u(t),u(t))) = 0. (1.1.12)
(F
p
(t, u(t),ii(t)) + \G
p
(t, u(t),u(t)))
u(t,u(t),u(t))
Example. We wish to miminize
"
2
dt
I(u) = / u(tf
J a
under the constraint
r
b
S(u)= / u( t )
2
cf t =l , (1.1.13)
J a
with u(a) = 0 = it(6). (1.1.12) becomes
u(t) - Xu(t) = 0. (1.1.14)
Thus, A is an eigenvalue for the differential operator d
2
/dt
2
under the
Dirichlet boundary conditions u(a) = 0 = u(b). Of course, this example
can easily be generalized.
Summar y. We seek solutions of the variational problem
I(u) min,
with
I(u) = f
J a
b
F(t,u(t),u{t))dt
10 The classical theory
for given F and unknown u : [a, b) R
d
. If F and w are differentiate,
one may consider some kind of partial derivative, namely
8I(u,r)) := I(u + sr})
u=0
for 77 G Co ([a, 6],R
d
). For a minimizer u then
61 (u, 77) = 0 for all such 77.
If F and u are of class C
2
, this leads to the Euler-Lagrange equations
j
t
F
p
(t, u(t),ii(t)) - F
u
(t,u(t),ii(t)) = 0.
The classical strategy for solving the problem
I(u) > min
consists in solving the Euler-Lagrange equations and then investigating
whether a solution of the equations is a minimum of J or not.
1.2 The idea of the direct methods and some regularity
results
So far, our formulation of the variational problem
I(u) min
has been rather vague, because we did not specify in which class of
functions u we are trying to minimize / . The only things we did prescribe
were boundary conditions of Dirichlet type, i.e. we prescribed the values
u{a) and u(b) for our functions u : [a, 6] * U
d
.
Because of our derivation of the Euler-Lagrange equations in the pre-
ceding section, it would be desirable to have a solution u of class C
2
.
So one might want to specify in advance that one minimizes / only
among functions of class C
2
. This, however, directly leads to the ques-
tion whether / achieves its infimum among functions of class C
2
(with
prescribed Dirichlet boundary conditions, as always) or not, and if it
does, whether the infimum of / in some larger class of functions, say C
1
,
could be strictly smaller than the one in C
2
. In the light of this question,
it might be preferable to minimize / in the class of all functions u for
which
I(u) = / F(t,u(t),u(t))dt
J a
1.2 Direct methods, regularity results 11
is meaningful. Here, we assume that F(t,u,p) is continuous in u and p
and measurable in t. For this purpose, one needs the class of functions
for which the derivative u(t) exists almost everywhere and is finite. This
is the class
AC([a,b})
of absolutely continuous functions. A function u G AC ([a, b]) satisfies for
h,t
2
e [a,6]
u{t
2
)-u(h) = / ii(t)dt.
Jti
Note that F(t , u(t), ii(t)) is a measurable function of t for u AC by our
assumptions on F and the fact that the composition of a measurable and
a continuous function is measurablef. The idea of the direct methods in
the calculus of variations, as opposed to the classical methods described
in the preceding section then consists in minimizing / in a class of func-
tions like AC([a,b]) and then trying to show that a solution u because
of its minimizing character actually enjoys better regularity properties,
for example to be of class C
2
, provided F satisfies suitable assumptions.
This minimizing procedure will be treated later J, since we want to
return to the classical theory for a while. Nevertheless, even for the
classical theory, one occasionally needs certain regularity results, and
therefore we now briefly address the regularity theory. To simplify our
notation, we put / := [a,6]. A class of functions intermediate between
C
1
and AC is
D
1
( / , E
d
) := {u : / M
d
, u continuous and piecewise
continuously differentiable, i.e. there exist
a = to < t\ < ... < t
m
= b with u G
C\[t
j
,t
j
+
1
],R
d
)tovj = 0,...,m-l}.
u G D
1
then has left and right derivatives u~(tj) and u+(tj) even at the
points where the derivative is discontinuous, and
t tjU t j i "
t Lebesgue integration theory is summarized in Chapter 1 of Part II. The required
composition property is stated there as Theorem 1.1.2. Here, this point will not
be pursued or used any further.
t See Chapter 4 of Part II.
We shall use the same letter J to denote t he functional to be minimized and t he
domain of definition of t he functions, inserted into this functional. This conforms
to standard notations. The reader should be aware of this and not be confused.
12 The classical theory
Exampl es
Example 1.2.1. [a,6] = [ - l , l ] , d = 1
I(u) = j (l - (u(t))
2
f dt,
u ( - l ) = l =u ( l ) .
A minimizer is
^ ) = | t | GD
1
( / , R)
which is not of class C
1
. The minimizer of / is not unique (exercise:
determine all minimizers), but none of them is of class C
1
.
Example 1.2.2. [a,6] = [-1,1],d = 1
I{u)= f {l-u(t))
2
u{t)
2
dt
u ( - l ) = 0 , ti(l) = l.
Here, the unique minimizer is
u
. fO f o r - l < < 0
u{t)
= \t f o r O< * < l
which again is of class D
1
, but not C
1
.
Example 1.2.3. [a, 6] = [-1,1], d = 1
I(u)= f (2t-ii(t)fu(t)
2
dt,
t i ( - l ) = 0 , ti( l) = l.
The unique minimizer is
,
x
. (0 f o r - l < * < 0
u
= \l* f o r O< * < l
which is of class C
1
, but not of class C
2
.
Theor em 1.2.1. Let F(t,u,p) be of class C
1
w.r.t. u and p and con-
tinuous w.r.t. t {F : I x R
d
x R
d
- R), and let u AC{I,R
d
) be a
solution of
61{u,rj)= {F
u
{t,u,u)-r] + F
p
(t,u,u)'r)}dt==0 (1.2.1)
J a
1.2 Direct methods, regularity results 13
for all rj G j 4C
0
(/ , R
d
) (.e. 77 G AC( / , R
d
) ) and we require that if I =
[a, 6], f/iere exist a < a\ <b\ < b with rj(x) = 0 if x is not contained in
[ai,&i], as in the definition o/Co([a, 6],R
d
)). Then for almost all points
in I
d_
dt
:
F
p
(t, u{t), ii(t)) = F
u
{t, u{t),ii{t)) (1-2.2)
{note, however, that the derivative on the left hand side cannot be com-
puted by the chain rule). If u G C
l
{I,WL
d
), {1.2.2) holds for all t G J,
and if u G JD
1
(/, E
d
), at those points tj where u{tj) is discontinuous
{F
p
{t
j
,u{t
j
),u-{t
j
))_ = Fuitjiufa^u-itj)),
and analogously for the right derivative.
Remark. It actually suffices to assume (1.2.1) for all rj G Co(/ , R
d
),
because functions in ACQ may be approximated by CQ functions. If
u G C
1
or D
1
, the proof anyway only requires (1.2.1) for rj G Cg or )Q,
respectively (where Dg is defined analogously to CQ).
Proof We have, omitting the obvious arguments of F, F
u
, etc.,
/ F
u
rjdt = Jj(J F
u
dy) r]dt = - f (j F
u
dy\ f]dt
(1.2.1) then implies
0 = / [Fp- [
F
udy\ rjdt.
We now make use of:
Lemma 1.2.1. Let h G L^JTjR) satisfy
r
b
h{t)(p{t)dt = 0 for all <p G AC
0
(I, R). (1.2.3)
/
J a
Then there exists a constant c G M iyift
/i() = c /or almost all t G / .
Remark. It actually suffices to assume (1.2.3) for all <p G CQ ( / , R) . If
h G C
1
, one directly sees from the proof that (p CQ suffices.
Proof. We put
c := ! / h{t)dt
b
~
a
J a
14 The classical theory
and
<p(t):= f (h(y)~c)dy.
J a
Then
ip{a) = 0, and <p(b) = / <p(t)dt = 0. (1.2.4)
./a
Equation (1.2.3) implies
0 = / (h(y)-c)h(y)dy= f (h(y) - c)
2
dy
J a J a
because of (1.2.4). This implies the claim.
q.e.d.
We now may complete the proof of Theorem 1.2.1:
By Lemma 1.2.1 there exists c G R
rf
with
J a
F
p
(t,u(t),u(t)) = / F
u
(y,u(y),u(y))dy + c ( 1.2.5)
J a
for almost all t E L Therefore, F
p
is of class AC, and differentiating
(1.2.5) gives (1.2.3). The claims for u G C
1
or D
1
are obvious from the
proof.
q.e.d.
Theorem 1.2.2. Let F : / x R
d
x R
d
be of class C
1
, and let F
p
be also
of class C
l
, and let det (F
p
i
p
j(t,u(t),u(t))
iy
j
=
i
y
,
tt}
d) " 0 f
or
all t e I
and a solution u G C
1
(/,]R
d
) of
61 (u, r])=0 for all r) G Cg(JT, E
d
).
T/ien u is of class C
2
.
Proof We define
</>: E x E
rf
x E
d
x R
d
via
(j)(t,u,p,q) := F
p
(t,u,p) ~ q.
Our assumption det F
p p
^ 0 makes it possible to apply the implicit
function theorem to conclude that
<p(t,u,p,q) = 0
may be uniquely solved w.r.t. p near UQ = u(t
0
), p
0
= ii(^o), tfo =
1.2 Direct methods, regularity results 15
F(o,t(bPo) f
r an
Y ^o G i". Thus, there exists a neighbourhood U of
(to,Uo,qo) such that for each (t,u,q) G /, <t> 0 has a unique solution
p = (p(t, u, q) and that (p : U M
rf
is of class C
1
. Since we already know
a solution of <fi = 0, namely (t,ix(t),Tx(t),F
p
(t,ix(t),ix(t))), the uniqueness
of the solution cp implies
u(t) = </?(, ii(),F
p
(,i(),u(f))) for t near
0
-
Since <p is of class C
1
, so then is u(), hence u G C
2
. Since to E I was
arbitrary, u G C
2
(J, R
r f
).
g.e.dL
Theorem 1.2.3. Le F satisfy the assumptions of Theorem. 1.2.2, and
in addition assume that F
pp
is (positive or negative) definite on ft xR
d
where 0 C E
d + 1
contains {(t,u(t)) : t G / } . Let u G AC(I,R
d
) satisfy
6I(u, rj) = 0 /or a// 77 G i4C
0
(JT, M
d
)
(assume that F
w
(,u(t),u()) and F
p
(t,tt(t),t6(t)) are integrable). Then
ueC
2
(I,R
d
).
Proof. Since the uniqueness result of the implicit function theorem is
only local, it cannot be applied anymore because u(t) might be discon-
tinuous. We thus need a global argument. Thus, assume that for given
(,11,(7) G( l x R
d
, there are two solutions pi,P2 G R
d
of 0(,ii,p, g) = 0,
i.e.
q = F
p
(t,u,pi) and q = F
p
(t,u,p
2
).
Thus
/ F
p p
(t , u, p!-f s( p
2
~Pi ) ) ds ( P 2 - P i ) = 0 . (1.2.6)
Jo
By our assumption on F
p p
, (1.2.6) is invertible, hence p
2
= pi , hence
uniqueness.
Using this global uniqueness together with the existence result of
the implicit function theorem, we now see that for any (t,u,q) in a
sufficiently small neighbourhood of (to,Uo,qo) (to G / , Uo u(to),
qo = Fp(to,v>o,Po), Po = uo(to)), there is a unique solution <p(t,u,q)
of
F
p
(t,u,p) -q = 0
and <p is of class C
l
. Thus, as in the proof of Theorem 1.2.2,
u(t) = <p(t,u(t),F
p
(t,u(t),u(t)))
16 The classical theory
for almost all t in a neighbourhood of to- Since u(t) and F
p
(, u(t), u(t))
are absolutely continuous w.r.t. t (the latter by Theorem 1.2.1), u(t)
coincides for almost all t near to with an absolutely continuous function
v(t). We put
w(t) := u(t
0
) + / v(y)dy.
Jt
0
w then is of class C
1
. Since it is absolutely continuous, by a theorem of
Lebesgue
u(t) = it(
0
) / u\y)dy.
Jt
0
Since v = u almost everywhere, we conclude u = w, hence it G C
1
near
to, which was arbitrary in I. Theorem 1.2.2 then gives u G C
2
.
q.e.d.
Corollary 1.2.1. Under the assumptions of Theorem 1.2.3, any AC-
solution of 8I(u,r)) = 0 for all rj G ACo(/,M
rf
) is a solution of the
Euler-Lagrange equations
d_
dt
or equivalently of
F
p
(t,u(t),u(t)) - F
u
(t,u(t)M*)) = 0 (L2.7)
F
pp
{t, u(t),u(t))u(t) 4- F
pu
(t, u(t), u{t))u{t)
+F
pt
(t,u(t),u(t)) - F
u
(t,u(t)M*)) = 0. (1.2.8)
T/ie same ZioMs under the assumptions of Theorem 1.2.2 for a C
l
- so-
lution of8I(u,r)) = 0 for all r] G C(I,R
d
).
g.e.d.
Theorem 1.2.4. Let F : I xR
d
x R
d
^ R be of class C
k
, and let F
p
also be of class C
k
, k G { 2, 3, . . . , oo}. Suppose u is of class C
1
and a
solution of 6I(u,r)) = 0 for all 77 G Co(J, R
d
), and suppose
det (F
p
i
pj
{t,u{t),u{t))
ij=h
_
4
) ^ 0 for all t G I. (1.2.9)
Then u G C
f c +1
(J, R
d
). {The same result holds if we assume that u G C
1
is a solution of the Euler-Lagrange equations (1.2.8).)
Proof. By Theorem 1.2.2, u is of class C
2
, and by Corollary. 1.2.1, it
1.2 Direct methods, regularity results 17
solves (1.2.8). Because of (1.2.9), F
pp
(t, u(t), ii(t)) is an invertible matrix,
hence
u(t) = F
pp
1
(t,u(t),u(t))
{-F
pu
{t,u{t),u{t)) - F
pt
{t,u{t),u{t)) + F
u
{t,u{t),u{t))} .
(1.2.10)
Let now j < fc, and suppose inductively u 0 . The right hand side of
(1.2.10) then is of class C-
7
"
1
. Therefore, u is of class C-
7
""
1
, hence u is
of class C'
+ 1
.
g.e.d.
The preceding proof most clearly shows the importance of the as-
sumption det(F
p
t
p
j(t,u(t),u(t))) 7^ 0 that already occurred in the proof
of Theorem 1.2.2. Namely, it implies that the Euler-Lagrange equations
(1.2.8) can be solved for u in terms of u and it.
Corollary 1.2.2. If under the assumption of Theorem 1.2.3, F and F
p
are of class C
k
, then a solution u o/<$/(i, 77) = 0 for all rj G ACo is of
class C^
1
.
q.e.d.
Summary. If one wants to solve
I(u) > min
by a direct minimization procedure, it is preferable to admit a class of
comparison functions u that is as large as possible. AC (I, R
d
) seems to
be a good choice, because this is the largest class for which
I(u) = fF(t,u(t)Mt))
is well defined, assuming F( t , u, p) to be continuous in u and p and
measurable in t. However, if one then finds a minimizer w, it might not
be a solution of the Euler-Lagrange equations, because it is not regular
enough. If the invertibility condition det F
pp
^ 0 is satisfied, however,
one may show that a minimizer u is as regular as F allows. Namely, if
F and F
p
are of class C
fc
, k {1, 2 , . . . , oo}, then u is of class C
k+l
.
Examples show that without such an invertibility condition, regularity
need not hold. This invertibility condition det F
pp
^ 0 implies that the
Euler-Lagrange equations allow the expression of u(i) in terms of u(i)
and ii(t).
18 The classical theory
1.3 The second variation. Jacobi fields
We assume that u G D
l
(I, R
d
) is a critical point of
I(u) = / F(t,u(t),u(t))dt,
Ja
i.e.
We recall that
6I(u,<n) = 0 for all rj G Z^( / , E
d
) . (1.3.1)
6I(u,r]) := I(u + s<n)
u=0
,
and 8I(u,ri) 0 is equivalent to s = 0 being a critical point of the
function
f(s)=I(u + sri).
If we want to decide if a given solution u minimizes J instead of just
being a critical point, we immediately see that a necessary condition
would be
/ "(0) > 0 (1.3.2)
for the above function / and all rj G DQ(I,R
d
). Namely, by Taylor's
theorem, since / ' (0) = 0
f(s)-f(0) = \s
2
f"(0)+o(s
2
) f o r s ^ O.
More precisely, (1.3.2) is needed for u to minimize J when compared with
u -f ST] for sufficiently small s. In other words, we want u to minimize i"
in a D
1
-neighbourhood of itself, i.e. among functions
veD
l
{I,R
d
)
with
u(a) = v(a), u(b) = v(b) and (1.3.3)
sup (\u(t) - v(t)\ -f \u-(t) - v-(t)\ 4- \ii+(t) - *
+
(t )| ) < e (1.3.4)
tei
for some e > 0. (Note: It is not clear that e may be chosen independently
of v.) We define the second variation of / at u in the direction rj G DQ
as
d
2
6
2
I(u,f]):= J(
M
+ 5f7)
UsB0
.
1.3 The second variation. J'acobi fields 19
In order that this variation exists, we require for the rest of the section
that F is of class C
2
. We then compute
6
2
I(u, n) = ^ J F(t, u(t) + sr,(t), u(t) + 7(t))d*|..
0
, 5
= / {F
pipi
{t,u{t)Mt))Vi{t)Vj{t)
J a
+ 2F
p
i
UJ
(t,u(t),u(t))f
H
(t)r
1j
(t)
4- F
u
w(tMt)Mt))m(t)Vj(t)}dt. (1.3.5)
Here, and in the sequel, we employ the standard summation conventions,
e.g.
d
i,j = l
We abbreviate (1.3.5) as
f
b
6
2
I(u, r])= {F
pp
riri + 2F
pu
rjr] + F
uu
r]r]} dt. (1.3.6)
J a
Our preceding considerations imply:
Theorem 1.3.1. Suppose F C
2
(IxR
d
xR
d
xE) andletu e D
l
(I,R
d
)
satisfy I(u) < I(v) for all v with {1.3.3), (1.3.4). Then
6
2
I(u,r))>0 forallrieDl(I,R
d
). (1.3.7)
We now put, for given w,
<p(t, 77, TT) := F
pipj
(, u(t), u^TTiTTj + 2F
p
i
u
i (, u{t), u{t))*i'nj
+F
u
i
UJ
(t,u(t),u(t))r)
i
r)
j
,
and we define the accessory variational problem for I(u) > min as
f
b
Q(rj) := / cf)(t,r](t),f)(t))dt -> min among all 77 G Z^( J, R
d
) .
(1.3.8)
If u satisfies the assumptions of Theorem 1.3.1, then
Q(r]) > 0 for all 77 e D^ , (1.3.9)
and hence 77 = 0 is a trivial solution of (1.3.8). We are interested in the
question whether there are others. The Euler-Lagrange equations for
(1.3.8) are
j
t
M*Mt)M*)) = 4>v(tMt)>W))> (1-3-10)
20 The classical theory
i.e.
- (F
pp
(t, u(t), u(t))rj(t) + F
pu
(t, u(t), u(t))v(t))
= F^t, u(t), t*(t))i)(t) + F^( t , ix(t), u(t))rj(t). (1.3.11)
Since it is considered as given, our first observation is that (1.3.11) is a
linear homogeneous system of second order equations for the unknown
rj. These equations are called Jacobi equations.
Definition 1.3.1. A solution rj C
2
(I,R
d
) of the Jacobi equations
(1.3.11) is called a Jacobi field along u(t).
Lemma 1.3.1. Let F C
3
(I x R
d
x K
d
, R), det F
pp
(t , u(t ), 6(t )) + 0
for all t I, u C
2
(I,R
d
). Then any solution of rj AC
0
(I,R
d
),
6Q(ri,<p) = 0 for all <p ACo(I, R
d
) is of class C
2
and hence a Jacobi
field.
Proof. We apply Theorem 1.2.3. For that purpose, we note that
4>(t,ri(t),fi(t)) = F
pp
(t,u(t),u(t)) for all t and rj
and so the assumption det F
pp
(t, u(t), ii(t)) ^ 0, that is seemingly weaker
than the one of Theorem 1.2.3, indeed suffices to apply that Theorem.
q.e.d.
We now derive the so-called necessary Legendre condition:
Theorem 1.3.2. Under the assumption of Theorem 1.3.1, i.e. u
D
1
(/ , E
r f
) minimizes I in the sense described there, we have that
F
pp
(t,u(t),u(t)) is positive semidefinite for all t I,
i.e.
F
pipj
(t, u(t), ii(t))?& > 0 for alH = ( \ . . . ,
d
) R
d
.
(At points where ii(t) is discontinuous, this holds for the left and right
derivatives.)
Proof. We may assume that to I and u is continuous at to. The result
at the points where u jumps then follows by taking appropriate limits,
and likewise at to a, b. We then consider 0 < e < min(o a, b to)
and define r? D^(J, R
d
) by
{
0 for a < t < to e and t
0
+ c < t < b
e for t = t
0
linear for to e < t < to and for t
0
< t < to -f e
1.3 The second variation. Jacobi fields 21
for given ( e l f Then
{
0 for a < t < t
0
or t
0
+ e < t < b
for t
0
- e < t < t
0
- for t
0
< t < t
0
+ e.
We apply Theorem 1.3.1 to obtain
0 < 6
2
I(u, rj) = / F
pipj
(t, u(t), u{t))CZ
J
dt + 0(e
2
) for c -+ 0,
Jto-e
since all other terms contain a factor e, and we integrate over an interval
of length 2e. Hence
F
P
v(toMto)Mto))?Z
j
= lim - / F
pipj
{t,u{t),ii(t))Ct
j
dt > 0.
0 ^ e 7 t
0
- c
g.e.d
The Jacobi equations and the notion of Jacobi fields are meaningful
for arbitrary solutions of the Euler-Lagrange equations, not only for
minimizing ones. In fact, Jacobi fields are solutions of the linearized
Euler-Lagrange equations. Namely:
Theorem 1.3.3. Let F e C
3
(I x R
d
x R
d
, R) , and let u
s
(t) be a family
of C
2
-solutions of the Euler-Lagrange equations
-F
p
(t,u
s
{t),u
s
{t))-F
u
(t,u
s
(t),u
s
(t))==0, (1.3.12)
with u
s
depending differentiably on a parameter s 6 (e,e). Then
is a Jacobi field along u = uo.
Proof. We differentiate (1.3.12) w.r.t. s at s = 0 to obtain
~ (F
pp
(t, u(t), u(t))fj(t) + F
pu
(t, u(t), ii(t))rj(t))
-F
pu
(t, u(t),u(t))fj(t) - F
uu
{t, u(t), ii(t))rf(t) = 0.
i.e. the Jacobi equation (1.3.11). q.e.d.
Lemma 1.3.2. Let a < a\ < a
2
<b, and let F and F
p
be of class C
2
in [ai, a
2
], and suppose rj G C
1
([ai,a
2
],M
d
) is a Jacobi field on [ai,a
2
]
with r/(ai) = 0 = r](a
2
). Then
(p(t,r](t),f)(t))dt = 0. (1.3.13)
22 The classical theory
Proof. Since <f> is homogeneous of second order in (77,71-), we have
20(t, 77, IT) = (pr,(t, ry, TT)77 4- <M*, V, * >
Therefore
r 2
2 / <p{t,
V
,v)dt = {<f>
v
(t,
V
,v)-V + 4>At,il,v)-v}dt. (1.3.14)
/ai /ai
Comparing (1.3.10) and (1.3.11), we see that <fi
n
is of class C
1
as a
function of t. We may hence integrate the last term in (1.3.14) by parts.
Since 77(01) == 0 = 77(02), we obtain
2 / 0(t,r7,7))dt= / (<M*,77,77)- ^ M ^
7
) ) ) *7<ft = 0,
since 77 is a Jacobi field. q.e.d.
As before, let F be of class C
3
, and let u(t) be a solution of class C
2
on [a, 6] of the Euler-Lagrange equations
;
F
p
(t, u(t),u(t)) - F
u
(t, u(t), u(t)) = 0.
d_
dt
Definition 1.3.2. Let a < a\ < 0,2 < b. We call the parameter value
a,2 conjugate to a\ and the point (02,1^(02)) conjugate to (ai,i/(ai)) if
there exists a not identically vanishing Jacobi field 77 on [01,02] with
77(01) = 0 = 77(02).
We may derive the important result of Jacobi:
Theorem 1.3.4. LetF e C
3
( J xR
d
xR
d
, R) and suppose ue C
2
{I,R
d
).
Suppose that F
pp
(t,u(t),ii(t)) is positive definite on I. If there exists a*
with a < a* < b that is conjugate to a, then u cannot be a local mini-
mum of I. More precisely, for any e > 0, there exists v E D
l
(I, R
d
) with
v(a) = u(a), v(b) = u(b),
sup (\u(t) - v(t)\ + \ii(t) - v

(t)\) < e
tei
and
I(v) < I(u).
Proof. Let rj(t) be a nontrivial Jacobi field on [a, a*]. We put
*?*(*)
f rj(t) fora<t<a*
\ 0 for a* < t < b.
1.3 The second variation. Jacobi fields 23
Then 77* e Dl(I,R
d
), and by Lemma 1.3.2
Q(V*)= I <fi(t,V*,V*)dt = 0.
J a
If u were a local minimum, then by Theorem 1.3.1
0 < 6
2
I(u,rj) = Q(r)) for all fj Dj (7, R
d
).
Hence rj* would be a minimizer of Q, hence by Lemma 1.3.1 rj*
C
2
(I,m
d
). Since j)*.(a*) = 0, then
77* ( a* ) =0.
Since also rj*(a*) = 0, and since rj* solves the Jacobi equation, a (linear)
second order ordinary differential equation, the uniqueness theorem for
solutions of such equations implies
a contradiction, because by assumption rj does not vanish identically.
Hence u cannot be a local minimizer.
q.e.d.
In words, Theorem 1.3.4 says that a solution of the Euler-Lagrange
equations cannot be minimizing beyond the first conjugate point. Turned
the other way round, Theorem 1.3.4 says that if u is a local minimizer,
then there cannot be any parameter value a* with a < a* < b that is
conjugate to a. It may happen, however, that b is conjugate to a. An
example will be given in the next chapter.
Summar y. In order to obtain necessary conditions for a solution of the
Euler-Lagrange equations
d_
dt
to minimize
;
F
p
(t,u(t),u(t)) = F
u
(t,u(t),u(t))
I(u)= I F(t,u(t),u(t))dt,
J a
one needs to study the second variation
ds
2
Q{
n
) := 6
2
I{u,
n
) = A / ( + ,),,_ for D\.
24 The classical theory
If, for fixed it, we consider the variational problem Q(rj) * 0, we are led
to the Jacobi equations
- (Fpp(t,u(t),u(t))i)(t) +F
pu
{tMt)Mt)Mt))
= F
up
(tMt)M*))v(t) +F
uu
(t,u(t),ii(t))i
1
(t)
for rj.
Solutions rj with 77(a) = rj(b) = 0 are called Jacobi fields, a* G (a, 6) for
which there exists a nontrivial Jacobi field on [a, a*] is called conjugate
to a, and if there exists such a*, u cannot be locally minimizing on [a, 6].
In other words, a solution of the Euler-Lagrange equations cannot be
minimizing beyond the first conjugate point.
1.4 Free boundar y condi t i ons
We recall the definition of an n-dimensional embedded differentiable sub-
manifold M of R
d
: For every p G M, there have to exist a neighbourhood
V = V(p) C M
d
, an open set t / c l
n
and an injective differentiable map
/ : U > V of everywhere maximal rank n (i.e. for every z G U, the
derivative Df(z), a linear map from E
n
to E
d
, has rank n) with
MHV = f(U).
An example is the sphere 5
n
described in detail in Section 2.1 (Exam-
ple 2.1.1). The tangent space T
P
M of M at p then is the vector space
Df(z)(R
n
). It can be considered as a subspace of the vector space T
p
M
d
,
the tangent space of R
d
at p.
As in 1.1, we now consider the variational problem
I(u)= / F(t,u(t),u(t))dt mi n
J a
with F of class C
2
. This time, however, we do not impose the Dirichlet
boundary condition that the values of u(a) and u(b) were prescribed,
but the more general condition that for given submanifolds Mi, M2
(differentiable, embedded) of M
d
, we require that
u(a) eM
u
u(b) eM
2
.
(Dirichlet boundary conditions constitute the special case where M\ and
M2 are points.)
In this section, we do not consider regularity questions. As an exercise,
1.4 Free boundary conditions 25
the reader should supply the necessary regularity assumptions on F, it,
etc. at each step.
Let u be a solution. Then, as before, u has to satisfy the Euler-
Lagrange equations, because if u(a) G Mi , 77(a) = 0, then also u(a) +
577(a) G Mi for any s, and likewise at 6, and so we may again consider
variations of the form u + srj, rj e DQ. This time, however, also more
general variations are admissible. Namely, let u
s
(t) be a family of maps
from J into R
d
depending differentiably on s G (e, e), with u(t) = Uo(t)
and
u
s
(a) G Mi , u
s
(b) G M
2
for all s.
Let
Then again
0= ^ / ( . ) |
0
= ^ jf F(t,u(t),u(t))dt^
0
= f { F
p
(t, u(i), w(t))-J7(i)+i
r
(i, u(0, w(i))-ry(<)}di
Jo
= ^{-j
t
F
P
+ F
u
yr
1
+F
p
(t,u(t)Mt))-v(t)\
t
t
Z
b
a
= F
p
(t,u(t),u(t))-v(t)\
t
t
Z
b
a
,
since it solves the Euler-Lagrange equations.
We now observe that 77(a) G T
U
^M\ (and likewise at 6), since we may
find a 'local chart' / as above with Mi f)V(u(a)) = f(U) for a neighbour-
hood V of u(a) and some open set U C R
Hl
(n\ = dim Mi ). By choosing
smaller if necessary, we may assume u
s
(a) G Mi D V = f(U) for s G
(~e, e). Since / is injective, there then has to exist a curve 7(5) C U with
u
s
(a) = / o
7
( s ) for all s. Hence 77(a) = u
s
(a)
u=0
= D}'(f
1
u{a))i
r
(0)
is indeed tangent to Mi at u(a). Moreover, any tangent vector to Mi at
u(a) can be realized in this manner. Therefore, since we may choose the
values of 77 at a and 6 independently of each other, we conclude
F
p
(a,7i(a),7j(a)) V = 0 for all V G T
u{a)
M
u
and likewise
F
p
(6, u(6), u(b)) -W = 0 for all W G T
u{b)
M
2
.
26
The classical theory
We have thus shown:
Theorem 1.4.1. Let u be a critical point of I among curves with u(a) G
M\j u(b) G M2 (Mi, M2 given differentiable embedded submanifolds
of R
d
), i.e. 5^(^s)|
s =0
= 0 for all variations u
s
(t) differentiable in
s with u
s
(a) G Mi, u
s
(b) G M
2
for all s G (~e,e) (e > 0). Then
u is a solution of the Euler-Lagrange equations for I, and in addi-
tion, F
p
(a,u(a),u(a)) and F
p
(b,u(b),u(b)) are orthogonal to T
u
^Mi
and T
U
(5)M2, respectively. In particular, if for example Mi = R
d
, then
F
p
(a,u(a),u(a)) = 0.
Summary. If instead of a Dirichlet boundary condition, we more gen-
erally impose a free boundary condition that u(a) and u(b) are only
required to be contained in given differentiable submanifolds Mi and
M2, respectively, of E
d
, then F
p
(a,u(a),u(a)) and F
p
(6, u(b),u(b)) are
orthogonal to these submanifolds for a critical point of / under those
boundary conditions.
1.5 Symmetries and the theorem of E. Noether
In the variational problems of classical mechanics, one often encounters
conserved quantities, like energy, momentum, or angular momentum. It
was realized by E. Noether that all those conservation laws result from
a general theorem stating that invariance properties of the variational
integral / lead to corresponding conserved quantities. We first treat a
special case.
Theorem 1.5.1. We consider the variational integral
I(u) = / F(t,u(t),u(t))dt,
J a
with F G C
2
([a, b] x R
d
x M
d
, E). We suppose that there exists a smooth
one-parameter family of differentiable maps
h
s
: R
d
-> R
d
(the precise smoothness requirement is that
h(s,z) := h
s
(z)
is of class C
2
( ( - e
0
, e
0
) x M
d
,M) for some e
0
> 0),
with
h
0
(z) = z for all zeR
d
1.5 Symmetries and the theorem of E. Noether 27
and satisfying
J FUh
s
(u(t)),f
t
h
s
(u(t))\dt = J FUu{t),j
t
u{t)\dt (1.5.1)
for all s G (-e, e) and all u G C
2
([a,6],R
d
).
Then, for any solution u(t) of the Euler-Lagrange equations (1.1.4)
fori,
F
p
(t,u(t),u(t)) ^-h
s
(u(t))\
s=0
(1.5.2)
is constant in t G [a, 6].
Definition 1.5.1. A quantity C(,ii(),it()) that is constant in t for
each solution of the Euler-Lagrange equations of a variational integral
I(u) is called a (first) integral of motion.
Proof of Theorem 1.5.1: Equation (1.5.1) yields for any to G [a, 6], using
h
0
(z) = z,
0 = ^ j F Uh
s
(u(t)), j
t
h
a
(u(t))\ dt\
s=0
= J {F
u
(t, u(t), u(t)) ^h,(u(t)) (1.5.3)
+F
p
( * ,ti( * ) ^ ( * ) ) ^ ^ aK* ) ) } * U=o.
We recall the Euler-Lagrange equations (1.1.4) for u:
0 = ~F
P
(t,u(t),ii(t)) - F
u
(t,u(t),ii(t)). (1.5.4)
Using (1.5.4) in (1.5.3) to replace F
u
, we obtain
0 =
/ {jt
Fp
(*'"(*)'(')) z
h,{u{t))
+F
p
(t , u(t ), (*)) ^^/ i ( ( 0) }<t t | - =o (1-5-5)
r
to
d_
J
a
dt
F
p
(t,u{t),u(t)) h
s
{u(t))\
s=0
) dt.
dt V "'"'""-"~'~"ds
Therefore
,-rh
s
{u(t
0
))\
s
=o = F
p
(a,u{a),u(a))
(1.5.6)
for any to G [a,6]. This means that (1.5.2) is constant on [a, 6].
q.e.d.
F
p
(to,u(to),u(to))~h
s
(u(t
0
))\s=o = F
p
(a,u(a),u(a)) h
s
(u(a))\
s=
o
28 The classical theory
Examples
Example 1.5.1. We consider for u : E > E
3 n
, u = ( i / i , . . . , it
n
) with
i.e. a mechanical system in M
3
with point masses m^, and a potential
V(u) that is independent of the third coordinates of the Ui. Then
h
a
(z) = z + se$,
where es is the third unit vector in M
3
, leaves F invariant in the sense
of Theorem 1.5.1. Since
d
u I
n
s
\s=0
e
3>
as
we conclude that
n
i = l
i.e. the third component of the momentum vector of the system is con-
served.
Example 1.5.2. Similarly, if a system as in Example 1.5.1 is invariant
under rotations about the e3-axis, and if h
8
now denotes such rotations,
then (up to a constant factor)
d
L ,
h
a
\s=oUi = e
3
AUi.
as
Hence, the conserved quantity is the angular momentum w.r.t. the e$-
axis,
n
E
F
P
e3 A Ui =
E (
m
* ^ ) " (
e
3
A U
i)
=
E (
Wi A m
^ ) *
e
3-
i = l t i
We now come to the general form of E. Noether's theorem
Theorem 1.5.2 ( Theorem of E. Noether) . We consider the varia-
tional integral
I(u) = / F(t,u(t),ii(t))dt
J a
1.5 Symmetries and the theorem of E. Noether 29
with F G C
2
([a,6] x R
d
x E
d
, E) . We suppose that there exists a smooth
one-parameter family of differentiable maps
h
s
= (h
s
,h
s
) : [a,6] x R
d
-> E x R
d
(s G (eo,o)
as
before) with
h
0
(t, z) = (t, z) for all (t, z) G [a, 6] x R
d
and satisfying
r
h
*
{b)
( d \ r
b
/ F(t
s
,h
s
(u(t
s
)), h
s
(u(t
s
)) )dt
s
= / F{t,u(t),u(t))dt
JhO(a) \ t
s
/ J
a
(1.5.7)
fort
s
h?
s
(t), all s G (-e
0
, e
0
) and all u G C
2
([a,6],R
d
). Then, for any
solution u(t) of the Euler-Lagrange equations {1.1.4) for I,
F
p
(*,ti(*),tiW)x
fc
-N*))l-=o
+ (F(*,u(*),<i(*)) ~ F
p
(tMt)M*))*(t)) ^2WI - =o (1.5.8)
w constant in t G [a, 6].
Proo/. We reduce the statement to the one of Theorem 1.5.1 by artifi-
cially considering t as a dependent variable on the same footing with u.
Thus, we consider the integrand
F ( * ( T ) , ( < ( T ) ) , ^ , ^ ( * ( T ) )
/ , 4-u(t(T))\ dt
:=F\ tMt),
dT
* )& (
L5
-
9
)
eft
F(t,(),u(*))-
dT
.
Then
= I F{t,u(t),u{t))dt, if t(T
0
) = a, t(n) = b (1.5.10)
/a
/().
By our assumption, F remains invariant under replacing (t,u) by
h
s
(t,u). Consequently, Theorem 1.5.1 applied to I yields that
d - d
, h
s
(u{t))\
s=0
+ F
p
o{t,u(t),u(t))~-.
as as
F
p
{t,u{t)
y
u{t)) h
s
{u(t))\
s=0
+ F
p
o{t,uW
30 The classical theory
with p standing for the place of the argument ^ of F (while p stands
as before for the arguments u), is invariant. Since, by (1.5.9),
F F
F
p
o = F - Fpti
at s = 0 (note ^ = 1 for s = 0 since /i[}() = ), this implies the
invariance of (1.5.8).
q.e.d.
Example 1.5.3. Suppose F = F(i/,t/), i.e. F does not depend explicitly
on t. Then
h
a
(t,z) = ( t - f s, z)
leaves I invariant as required in Theorem 1.5.2. Therefore, the 'energy'
F{t, u(t),ii(t)) - F
p
(t, u{t), it{t))u{t)
is conserved. We shall see another proof of this fact in Section 4.1.
Summar y. The theorem of E. Noether identifies a quantity that is pre-
served along any solution u(t) of the Euler-Lagrange equations of a
variational integral, a so-called first integral of motion, with any differ-
entiable symmetry of the integrand. For example, in classical mechan-
ics, conservation of momentum and angular momentum correspond to
translational and rotational invariance of the integral, respectively, while
time invariance leads to the conservation of energy.
Exerci ses
1.1 For mappings u : [a, 6] E^, consider
E(u)~\ J
a
\ u(t)\
2
dt
(| | is the Euclidean norm of M
d
, i.e. for z = (z
l
,... , z
d
),
\z\
2
= J2i=\(
z%
)
2
)' Compute the Euler-Lagrange equations and
the second variation. Also, let
L(u) := / \it(t)\dt.
J a
Show that
Exercises 31
with equality if \ii(t)\ = constant almost everywhere. (What
is an appropriate regularity class for the mappings u that are
considered here?)
1.2 Determine all minimizers of the variational integral
I(u)= f (l-u{t))
2
dt
wi t hu(-l ) =0 = ii(l).
1.3 Develop a theory of Jacobi fields for variational problems with
free boundary conditions. In particular, you should obtain an
analogue of Jacobi's theorem.
1.4 For mappings u : [a, 6] E
d
, consider
Compute the first and second variation of / and the Jacobi
equation. Can you find Jacobi fields?
2
A geometric example: geodesic curves
2.1 The length and energy of curves
We let M be an n-dirnensional embedded submanifold of R
d
. In this
section, we assume that / is of class C
3
, i.e. that all local charts are
thrice differentiable. We let c AC([0,T],M) be a curve on M. This
means that c is an absolutely continuous map from the interval [0, T] into
R
d
with the property that c(t) M for every t e [0,T]. The derivative
of c w.r.t. t will be denoted by a dot ' ,
e ) := ,.
The length of c is given by
L(c):=\c(t)\dt = Hr(c
a
A dt, (2.1.1)
where ( c
1
, . . . , c
d
) are the coordinates of c = c(t). We also define the
energy of c as
E(c) := \ |c(t)|
2
dt=\j2 (c
a
f dt. (2.1.2)
We let now
f:U->V , f(U) = MDV
be a local chart for M as defined in Section 1.4. We assume for a moment
that c([0, T]) is contained in f(U). Since / maps U bijectively onto f(U),
there exists a curve
l(t) C U
32
2.1 The length and energy of curves 33
with
c(0 = / ( 7( *) ) . (2.1-3)
Since the derivative Df(z) has maximal rank everywhere (by definition
of a chart, cf. 1.4), 7 is absolutely continuous, since c is, and we have
the chain rule
c(t) = (Df) h(t)) 07( 0,
or
* ( ) = 7( 7( <) ) 7
i
W,
where the index i is summed from 1 to n. Thus
*>-jf(
^(l(f))7'')f(7(f))V(<) I *
and
1 f
T
df
a
df
a
E{c) =
2 Jo ^ r( ^ ) ) ^ W^ j^ W) ^ W* -
In these formulae, and in sequel, the index is summed from 1 to d. For
zeU,
we put
9f
a
df
a
ftjM - ^ W ^ j W - (2-1-4)
With this notation, the preceding formulae become
( c ) = / ( sy( 7( *) ) 7
<
( ) y( ) )
i
* (2.1-5)
Jo
and E(c)=
l
-j ffyfrtoftWW*- (2.1.6)
Definition 2. 1. 1.
is called /ie metric tensor of M w.r.L the chart f U >V.
We note that (9ij(z))
ij=1
^
n
is symmetric, i.e.
gij(z) =9ji(z) for all i,j
and positive definite, i.e.
9ij{z)rfrf > 0 whenever rj = (77
1
,..., rf
1
) ^ 0 W
1
.
34 Geodesic curves
Remark 2.1.1. The use of local charts for M seems to have the obvious
disadvantage that the expressions for length and energy of curves be-
come more complicated. The advantage of this approach, namely not to
consider curves on M as curves in R
d
satisfying a constraint, is that this
constraint now is automatically fulfilled. All curves represented in local
charts lie on M. This more than compensates for the complication in
the formulae for L and E.
Our aim will be to find curves of shortest length or of smallest energy
on M, i.e. to minimize the functional L and E among curves on M. For
this purpose it will be useful to observe certain invariance properties of
L and E. First of all, whenever i : R
d
> R
d
is a Euclidean isometry, i.e.
i(y) = Ay + b with A 6 0(d), the orthogonal group, and b 6 R
d
, then
L(i(c)) = L(c) (2.1.7)
E(i(c)) = E(c) (2.1.8)
for any curve c : [0, T] - R
d
.
Secondly, L is parameterization invariant in the sense that whenever
r : [ 0 , S ] - [ 0 , T ]
is a diffeomorphism (i.e. r is bijective, and both r and its inverse r
_ 1
are everywhere differentiable), then
L(c) = I ( c o r ) ,
Namely
L( cor )
for any curve c : [0, T] R
d
. (2.1.9)
C 1 A 1
J h ^
C 0T
^
s
T
s
0
/l(5
c
)H
\
dT
i \\
ds
T
f\ c(t)\ dt.
2.1 The length and energy of curves 35
E, however, is not parameterization invariant. By the Schwarz inequality,
we have instead
T \ 2 / T
1
\ 2
f dt\ -If \ c(t)\
2
dt) ^V^^E(cj,
(2.1.10)
with equality iff
\c(t)\ = constant for almost all t. (2.1.11)
We have shown:
Lemma 2.1.1. For every c e AC([0,T],R
d
)
L(c) < V^>/E{C),
with strict inequality, unless
\c(t)\ = constant almost everywhere.
If
\c(t)\ = constant almost everywhere ,
we say that the curve c is parameterized proportionally to arc-length,
and if
|c(*)| = 1,
we say that it is parameterized by arc-length. We recall that a Jordan
curve, i.e. an injective curve c : [0, T] > M
rf
, is rectifiable if it is absolutely
continuous (which we always assume), and this implies that it may be
parameterized by arc-length, i.e. there exists a diffeomorphism
r : [ 0, L( c) ] - +[ 0, T]
with
(c o r)(s) = 1 for almost all s,
I s I
i.e. the reparameterized curve
c = cor
is parameterized by arc-length. From Lemma 2.1.1, we obtain:
Corollary 2. 1. 1. Let c : [0, L(c)] R
d
be a curve parameterized on
[0,L(c)]. Among all reparameterizations
r : [ 0, L( c ) ] - [ 0, L( c ) ]
36 Geodesic curves
(i.e. we keep the interval of definition fixed, namely [0, L(c)]), the par-
ameterization by arc-length leads to the smallest energy. Namely, if
c : [0, L(c)] E
d
is parameterized by arc-length
L(c) = 2JE?(c), (2.1.12)
whereas for any other parameterization of c on the same interval,
L(c) < 2E(c). (2.1.13)
We now return to those curves c that are confined to lie on M, in order
to discover a third invariance. Namely,we compare the two expressions
(2.1.1) and (2.1.5) for the length of c, and similarly (2.1.2) and (2.1.6)
for its energy. (2.1.1) is obviously independent of the chart / : U V
and its metric tensor, and therefore (2.1.5) has to be independent of
them, too. In order to study this more closely, let
f:U-+V
be another chart with
c([0,T}) C f(U).
Then there exists a curve 7 in U with c(t) = / (7(t )) for all t. Putting
df
a
df
a
^ ) : =^ ( *) ^ r ( 2) brzeu,
we then also have
T 1
L{C) =
L (^W^'W)'*- (2.1.14)
In order to study this invariance property more closely, we define
V := I'
1
of:f-
1
(f(U) n f(U)) - r
1
(/ ([/ ) n / (/ ))
(see Figure 2.1).
if is called a coordinate transformation, (p is a diffeomorphism, i.e. a
bijective map between open subsets of E
n
whose derivative D<p(z) has
maximal rank (= n) at every z. Then from
/ o
7
( t ) = c(t) = / o 7 ( t ) ,
7(*) = >(7), hence ^(t) = | ( 7 W) V W (2.1.15)
and from
/ >( z ) ) = / (z)
2.1 The length and energy of curves 37
Figure 2.1.
we get
9iA* )=hiM* ))^(z)^(* )- ( 2-1-16)
F rom ( 2.1.15) and( 2.1.16) , we see
9ii ( 7( t ) ) 7W =0 (7(*))7*(*)7 (t), ( 2.1.17)
and this shows again the equivalence of (2.1.5) and (2.1.1), and likewise
for the corresponding expressions of the energy. The important transfor-
mation formula (2.1.16) shows how the metric tensor transforms under
coordinate transformations. This invariance property of L and E makes
it possible to express the length and energy of an arbitrary curve c on
M that is not necessarily contained in the image of a single chart as
follows:
One finds a subdivision
t
0
= 0 < U < ... < t
m
_i < t
m
= T
of [0, T] with the property that
c ([t_i,t])
is contained in the image of a single chart
for each v = 1, . . . , m. Let (<7^(z)). .
=1
be the metric tensor of M
38 Geodesic curves
w.r.t. the chart / . Then
m
m * *
x
= E / WMt)H(tH(t)V dt
where c(t) = }vlv{i) for t [_i,]. By the preceding considerations,
this does not depend on the choice of charts f
u
. For this reason, one
usually just says that for a curve c on M
L(c)= f (ff(7(*))7
i
(t)y(t))
i
dt, (2.1.18)
Jo
where 7 is the representation for c w.r.t. a local chart, and (flfy),i=i,.-,n
is the metric tensor of M w.r.t. this chart. Similarly
E{C) =
\1 ^(7(*)M<)7W. (2.1.19)
We now assume that the charts for M are twice differentiate and return
to the question of finding shortest curves on M, for example between two
given points. By Corollary. 2.1.1, it is preferable to minimize E instead
of L, because a minimizer for E contains more information than one
for L; namely, minimizers for E are precisely those minimizers for L
that are parameterized proportionally to arc-length. Thus, minimizing
E not only selects shortest curves but also convenient parameterizations
of such curves.
We now compute the Euler-Lagrange equations for E as given by
(2.1.19):
d
0 = ~v:Eji y for i = 1, . . . , m
<= 0 = ! (2s(7(t))V(t)) - ( ^ j f l y ) (i(t))i
k
(tW(t)
(the factor 2 in the first term results from the symmetry gij = gji)
0 = 2
9ii
'f 4- 2 ^
5 i j 7
V - i9kji
k
i
j
- (2-1.20)
We now introduce some further notation:
(9
ij
) ,
w / t , j = l,...,n
2.1 The length and energy of curves 39
is the matrix inverse to (<fcj)i,j=i,...,n> i-
e
-
I 1 f o r ? ~~ jk*
9ij,k : Q
z
k9iJi
and finally the Christoffel symbols
r
j*
: =
2
gtl
(
gjl
* ~* ~
gklJ
"
9jk
^'
Equation (2.1.20) then becomes
o = f + W
l
$9i3,ki
k
i
j
- 9kj,n
k
)
= 7* + 0*' (&',* + 0wi ~ 9jk,i) 7*7*
by using symmetries. Thus:
Lemma 2.1.2. T/ie Euler-Lagrange equations for the energy E for
curves on M are
0 = f ( i ) + r;.
fe
(
7
(t))7
J
'(t)7
fe
(<) / or t = l , . . . , n. (2.1.21)
The theorem of Picard-Lindelof about solutions of ordinary differential
equations implies:
Lemma 2. 1. 3. For any z G U, v G M
n
, the system (2.1.21) has a
unique solution y(t) with
7(0) = z , 7(0) = v for t G [, e] and some e > 0.
Moreover, ^(t) depends differentiably on the initial values z, v.
Definition 2.1.2. The solutions of (2.1.21) are called geodesies on M.
Exampl es
Example 2.1.1. The sphere
{
n+l ^
( x
1
, . . . , x
n+1
) G M
n+
\ Y^ (
xi
f =
l
\ C M
n+1
is a differentiable manifold of dimension n. In order to construct local
charts, we put
fix := 5 " \ { ( 0 , 0 , . . . , 0, 1)} ,n
2
:=S
n
\{(0,0,--- , 0, - 1) }
and
40 Geodesic curves
and define
01 : fix -+ E
n
, g
2
: fi
2
-+ M
n
as
9 2 ( l V
-
x

+ , ) =
(r^ T i^ )
(<7I and 02 are the stereographic projections from the south and north
pole, respectively). We then obtain charts
/ i =
S
f
1
: R
n
- S
n
\ { ( 0 , . . . , 0 , l ) }
/
2
=
5 2
-
1
: M" ^ 5
n
\ { ( 0 , . . .
)
0 , - l ) } .
More explicitly, f\ can be computed as follows:
With
[Z ,...,Z ) - ^
1
_
x
n + l ' - - - '
1
_
x
n + l
>
/ '
1 =
x
Q
x
a
= *V( 1 - x
n + 1
)
2
4- x
n + 1
x
n + 1
,
hence
and then
Thus
z*V 4-1
C
J
=
_ (j 1 n).
For the metric tensor, we compute
df{ _ 26
jk
4z*z
k
dz
k
~ 1 4 ^z* (1 4 ^ ^ )
2
df^
1
_ 4z
k
dz
k
~ ( 1 4 ^ ^ )
2
'
(j,k = l , . . . , n)
2.1 The length and energy of curves 41
Hence
QfocQfa 4
9ij
(
z
) = Tr-Tr- = o&ij' (2.1.22)
Actually, the metric tensor w.r.t. the chart fi is given by the same
formula. In order to compute the expression for geodesies, we also need
to compute the Christoffel symbols. It turns out that adding a little
generality will actually facilitate the computations. We consider a metric
of the form
9ij = ^ y , (2.1.23)
where <f>: K
n
+ M
+
is positive and differentiable. Then
9
ij
=( / >%. (2.1.24)
We also put
Then
Next
<p : =l og0.
9ij
* - dz
k
~
ij
0
3
dz* "
ij
0
2
dz* '
^k _ 1 fcf
r ^ = 2^ ( #U + M - #i,) (2-1.25)
a<p a ^ a<p
Thus, r f vanishes if all three indices i, j , k are distinct, and for all z, j
r
J< =
r
= - ^ .
a n d r
Ji = ^ * * * * * * (2-
1
"
26
)
In the present case,
^ = log(l + | z |
2
) - l o
g
2
hence
dtp _ 2z*
dz^~ 1 + I d
2
'
42 Geodesic curves
f
Therefore, the equations for geodesies become
n n
0
= ? +
2
E
r
W^ - r(7)ff +
r
ii(7)W
(using the symmetry T^ = T^)
2 V
7
2
7
J
V + V
7
2
y V. (2.1.27)
^tti + W
2
. t t i + W
2
We now claim that the geodesic *y(t) through the origin, i.e. 7(0) = 0,
with 7(0) = a E
n
is given by
-y(t) = aa(t), (2.1.28)
where a : R R then satisfies a(0) = 0, d(0) = 1. Making the ansatz
(2.1.28) in (2.1.27) leads to
fr{l + a
2
\a\
2
fril + a
2
\a\
2
2\a\
2
a
\ ,2
&
~Z , ,2 o
d
i = l , . . . , n .
v
l - f | a | ^a
2
7
Since we may assume o / O (otherwise the solution with 7(0) = a is
a point curve, hence uninteresting), this equation holds, if a(t) satisfies
the ordinary differential equation (ODE)
n
2 \a\ a .9
/n
.,
rt
^
N
0 = a L a
2
. (2.1.29)
1 4- |a| a
2
The theorem of Picard-Lindelof implies that (2.1.29) has a unique solu-
tion in a neighbourhood of t = 0. We then have found a solution *y(t) of
(2.1.27) of the desired form (2.1.28). The image of j(t) is a straight line
through 0. By Lemma 2.1.3, we have thus found all solutions through 0.
The images of the straight lines under the chart /1 are the great circles
on 5
n
through the south pole. We can now use a symmetry argument
to conclude that all the geodesic lines on S
n
are given by the great cir-
cles on S
n
. Namely, the south pole does not play any distinguished role,
and we could have constructed a local chart by stereographic projection
from any other point on S
n
as well, and the metric tensor would have
assumed the same form (2.1.22). More generally, one may also argue as
follows: We want to find the geodesic arc j(t) on S
n
with 7(0) = po,
7(0) = VQ for some p
0
G S
n
, V
0
T
Po
S
n
. Let c
0
(t) be the great circle on
2.2 Fields of geodesic curves 43
S
n
parameterized such that Co(0) = po, Co(0) = Vo- c$ is contained in a
unique two-dimensional plane through the origin in E
n + 1
. Let i denote
the reflection across this plane. This is an isometry of R
n + 1
mapping S
n
onto itself. It therefore maps geodesies on S
n
onto geodesies, because
we have observed that the length and energy functionals are invariant
under isometries, and so isometries have to map critical points to crit-
ical points. Now i maps po
a n
d Vo to themselves. If 7 were not invariant
under i, i o 7 would be another geodesic with initial values p
0
, Vo, con-
tradicting the uniqueness result of Lemma 2.1.3. Therefore, 2 0 7 = 7,
and therefore 7 = CQ.
We draw some conclusions:
The geodesic arc through two given points need not be unique. Namely,
let p, q be antipodal points on 5
n
, e.g. north and south pole. Then there
exist infinitely many great circles that pass through both p and q.
We shall later on see that the first conjugate point of a point p S
n
along a great circle is the antipodal point q of p. One also sees by explicit
comparison that a geodesic arc on S
n
ceases to be minimizing beyond
the first conjugate point, in accordance with Theorem 1.3.4.
2.2 F ields of geodesic curves
Let M be an embedded, differentiate submanifold of E
d
, or, more gen-
erally, a Riemannian manifold of dimension nf, again of class C
3
. Let
MQ be a submanifold of M; this means that Mo itself is a differentiate
submanifold of E
d
, respectively a Riemannian manifold, and that the
inclusion i : M
0
c
-> M is a differentiate embedding. We assume that
MQ has dimension n 1, and that it is also of class C
3
.
Theorem 2.2.1. For any x
0
in M
0
, there exist a neighbourhood V of
XQ in M, and a chart f : U V with the following properties:
(i) U contains the origin o/ E
n
7
/ (0) = #o-
(ii) M
0
nV = f{Un{x
n
=0})
(iii) The curves x
%
= Q, C% = constant, i = l , . . . , n 1, are geodesies
parameterized by arc-length. The arcs 1 < x
n
< 2 on any such
f We do not introduce t he concept of an abstract Riemannian manifold here, but
some readers may know t hat concept already, and in fact it provides the natural
setting for the theory of geodesies. On the other hand, the embedding theorem
of J.Nash says t hat any Riemannian manifold can be isometrically embedded into
some Euclidean space ]R
d
, hence considered as a submanifold of R
d
. Therefore, from
t hat point of view, no generality is gained by considering Riemannian manifolds
instead of submanifolds of R
d
.
44 Geodesic curves
curve between the hypersurfaces x
n
= 1 and x
n
= 2 we all of
the same length 1 2
(iv) The metric tensor on U satisfies
9nn = 1, 9%n = 0 for all i = 1, . . . , n - 1 (2.2.1)
(T/ie second relation means that the curves x
l
= Q, i = 1, . . . , n
1, intersect the hypersurfaces x
n
= constant orthogonally.)
Proof. Since Mo is a hypersurface, for every p G Mo, there exist two
unit normal vectors n(p) to Mo at p, i.e.
n(p) G T
p
M,
\\n(
P
)\\ = l
(n

(p), v) = 0 for all v G T


P
M
0
C T
P
M.
In a sufficiently small neighbourhood V of #o, we may assume that such
a normal vector n(p) may be chosen so that it depends smoothly on
p G Mo fl V =: Vo. We assume that there is a local chart (p
0
: UQ * Vo
for Mo (Uo C M
71-1
), possibly choosing V smaller, if necessary. For every
p G Mo fl V, we then consider the geodesic arc 7
P
() with
7P( 0) = P,
7
P
(0) = n(p). (2.2.2)
This geodesic exists for || < e = e(p) by Lemma 2.1.3. By choosing
V smaller if necessary, we may assume that e > 0 is independent of p.
Instead of 7
P
(), we write ~/(p,t). Since the solution of (2.2.2) depends
differentiably on its initial values (see Lemma 2.1.3), hence on p, the
map
/ : tfo x ( - , ) - M
(x,t) ->7(</?(x),)
is likewise differentiate, where (p : Uo Vb is a local chart for Mo. We
may assume
x
0
= </?(0),
by composing (p with a diffeomorphism if necessary. At (0,0) G
C/
0
x(-e,e), the Jacobian of / is spanned by the linearly independent vec-
t o r s
^ " ' ' S J ^ T '
n
( ^(
x
) ) (
n o t e t n a t
7(^(^)>0) = <^(
x
)
a n d n
( ^(
x
) )
2.2 Fields of geodesic curves 45
are orthogonal to all the vectors -^ G T^^Mo, j = 1, . . . , n 1). There-
fore, by the inverse function theorem, / yields a chart in some neigh-
bourhood U of (0,0) G Uo x (-e, e). / obviously satisfies (i), (ii) (after
redefining V). (iii) also holds by construction (putting x
n
= t). Next,
g
nn
= 1, since the curves x
%
= Q, namely / ( c i , . . . , c
n
__i, ), G (c, e),
are geodesies parameterized by arc-length, hence g
nn
= (-gf, -gf) = 1.
Finally, the system of equations for these curves to be geodesic is
d
2
x
k
^
k
dx
l
dx
j
(dx
n
)
2 tJ
dx
n
dx
n
Hence in particular
+ I 7 r - r i r - r ( *" =<) forfc=l,..-,n.
r * = 0 forfc=l,
nn
Now
^nn
=
9 ^ (2g
n
l
yTl
- <7nn,/) = <7 <7n/,n,
since #
n n
= 1. Therefore
g
nkyU
= 0 for all k = 1, . . . , n.
Since furthermore g
n
k{x
1
^... , x
n _ 1
, 0) = 0, because the geodesic arc
x
n
= , x
l
= Q = constant, is orthogonal to the surface (p(x
x
,..., x
n _ 1
)
= / ( a:
1
, . . . , x
n
~
l
, 0), we obtain
nfc = 0.
q.e.d.
Definition 2. 2. 1. The coordinates whose existence is affirmed by The-
orem 2.2.1 are called geodesic parallel coordinates based on the hyper-
surface M
0
.
Theorem 2.2.2. Let f : U V be a chart with the properties described
in Theorem 2.2.1. In particular, the curves x
l
= c, c = const ant, for
i = l , . . . , n 1 are geodesic arcs. Then any such curve is the short-
est connection of its endpoints when compared with all curves contained
entirely in U and having the same endpoints.
Proof We consider the geodesic
7
( t ) = {x* = Ci,a:
n
= * , - < * < } ,
where U = U
0
x (-e, e). Let 7(f), h < t < t
2
be another curve in U with
7(^i) = 7( - e) , 7(t
2
) = 7(e). We have to prove
L(7) > L(7), (2.2.3)
46 Geodesic curves
with strict inequality, unless 7 is a reparameterization of 7. Now
(7) - f
2
( E 9a (7 to) 4 W to + ( V' t o) j A, (2.2.4)
'
/ t l
\ i , i =l /
since #
nn
= 1, ^
n
= 0 for i = 1, . . . , n - 1 by Theorem 2.2.1(iv),
r
t<2
1 ^
(t)| * > 7
n
(t
2
) - 7
n
(*i) = 7"(e) - 7
n
(~e)
= L(
7
).
The first inequality is strict, unless 7* is constant for i = 1, . . . , n 1,
and the second one is strict, unless j
n
(t) is monotonic.
q.e.d.
Following Weierstrafi, we say that the geodesies
7
() = {x
{
= a,x
n
= t,-c<t<e}
constitute a field of geodesies. Theorem 2.2.2 essentially says that any
geodesic arc in this field is shorter than any other curve with the same
endpoints in the region covered by the field. Both properties are essential.
Namely geodesic arcs on S
n
that are longer than a great semicircle show
that geodesies not embedded in a field need not minimize the length
between their endpoints. And geodesic arcs on a cylinder, contained in
meridians, but longer than a semicircle show that there may be shorter
curves not contained in the field.
We observe that if 7(2) solves (2.1.21), so does 7(A) for A = constant.
We fix ZQ G U and denote the geodesic arc 7 of Lemma 2.1.2 with
7
(0) = z
0
,7(0) = ^
by 7. Then by the above observation
7 < / t o = 7 A Q) f or A^O. (2.2.5)
Thus 7A
V
is defined on [ ^, j], if 7 is defined on [c, c]. Since 7^ depends
differentiably on v, and since v G R
n
, \v\ = 1, is compact, there exists
eo > 0 with the property that for all v with \v\ = 1, j
v
is defined
on [eo, eo]. Prom (2.2.5), we then conclude that for any w G E
n
with
M < eo> y
w
is defined on [1,1]. For later purposes, we also note that
by Lemma 2.1.3, eo may be chosen to depend continuously on ZQ.
2.2 Fields of geodesic curves 47
We now define a map
e = e
Z0
: {w G R
n
: M < e
0
} -+ J7
WH+7^, (1).
Then e(0) = z
0
. We compute the derivative of e at 0 as
De(0){v) = | 7t (l )| , =o
= ^ 7 , ( %= by (2.2.5)
= 7(0)
= V.
Hence, the derivative of e at 0 G l
n
is the identity, and the inverse
mapping theorem implies:
Theorem 2.2.3. e maps a neighbourhood of 0 G E
n
diffeomorphically
(i.e. e is bijective, and both e and e~
l
are differentiate) onto a neigh-
bourhood of ZQ G U. q.e.d.
We want to normalize our chart / : U V for M. First of all, we may
assume
z
Q
= 0 (2.2.6)
for the point Zo G U under consideration. Secondly, the transformation
formula (2.1.16) implies that we may perform a linear change of coord-
inates (i.e. replace f by f o A, where A G GL(n, R)) in order to achieve
fti(0) = * . (2.2.7)
We assume that / : U > V satisfies these normalizations. We then
replace / by / o e defined on {w G E
n
: \w\ < e
0
}.
Theorem 2.2A. In this new chart, the metric tensor satisfies
gijM^Sij (2.2.8)
r$
fc
(O) = O = 0y,
fc
(O) for alii , j , k. (2.2.9)
Proof. By (2.1.16), gij = 6ij holds, since the metric tensor w.r.t. the
chart / satisfies this property and De(0) is the identity by the proof
of Theorem 2.2.3. In order to verify (2.2.9), we observe that in our new
chart, the straight lines tv (v G E
n
, t \v\ < e) are geodesies. Namely, tv is
mapped to 7t
v
(l) = 7v() (see (2.2.5)), where *y
v
(t) is the geodesic with
48 Geodesic curves
initial direction v. We thus insert 7(2) = tv into the geodesic equation
(2.1.21). Then 7 = 0, hence
r
jk
(tv)v
j
v
k
= 0 for t = l , . . . , n.
In particular, inserting t = 0, we get
r)
k
(0)v
j
v
k
= 0 for all v G R
n
, i = 1, . . . , n.
We use t; = e*, where (ei)
l=1 n
is an orthonormal basis of R
n
. Then
r{i(0) = 0 for al i i and/ .
We next insert v = ^(ej 4- e
m
), ^ m. The symmetry T
l
jk
T^. (which
directly follows from the definition of H
fc
and the symmetry gj
k
= #fcj)
then yields
rj
m
(0) = 0 for al l i , / , m.
The vanishing of gij^ for all i,j, k then is an easy exercise in linear
algebra. q.e.d.
Definition 2.2.2. The local coordinates x
l
,...,x
n
constructed before
Theorem 2.2.4
are
called Riemannian normal coordinates.
We let x
1
, . . . , x
n
be Riemannian normal coordinates. We transform
them into polar coordinates r, (p
1
,..., ip
n
~~
l
in the standard manner (e.g.
if n = 2, x
1
= r cos ^
1
, x
2
= r s i n^
1
) . This coordinate transformation
is of course singular at 0. We now express the metric tensor w.r.t. these
polar coordinates. We write g
rr
instead of #n, and we write g
r(p
instead
of g\i, I = 2, . . . , n, and g^ instead of (9ki)
k
,i=2,...,d' *
n
Particular, by
Theorem 2.2.4 and the transformation rule (2.1.16)
ffrr(0) = l, ffr
V
(0)=0- (2.2.10)
The lines through the origin are geodesies by the construction of Rie-
mannian normal coordinates, and in polar coordinates, they now become
the curves <p = (y?
1
, . . . , </?
n-1
) = constant; thus they can be written as
j(t) = (, (fo) with fixed </?
0
-
Therefore, the geodesic equation (2.1.21) gives
T
r r
= 0 for all i
(where of course T
l
rr
stands for T^) , i.e.
-g
il
(2g
r
i,
r
~ 9rr,i) = 0 for all i,
2.2 Fields of geodesic curves 49
hence
2#w,r ~ 9rr,i = 0 for al l / . (2.2.11)
Putting r I gives
g
rrr
= u,
and with (2.2.10) then
g
rr
= 1. (2.2.12)
Using this in (2.2.11) gives
9rip,r
==:
U,
hence with (2.2.10) again
gry = 0. (2.2.13)
We have thus shown:
Theorem 2.2.5. In the preceding coordinates, so called Riemannian
polar coordinates, that are obtained by transforming Riemannian nor-
mal coordinates into polar coordinates, the metric tensor has the form
/ l 0 . . . 0 \
I 1
I : 9w (
r
, <P) I '
\ 0 /
where g^ stands for the (n 1) x (n 1)-matrix of the components of
the metric tensor w.r.t. the angular variables y?
1
, . . . , (p
n
~
l
.
Note that this generalizes the situation for Euclidean polar coord-
inates. The Euclidean metric on M
2
, written in polar coordinates, e.g.
takes the form
( J r" ) -
Note that Theorem 2.2.5, in contrast to Theorem 2.2.4, is valid on the
whole chart, not only at the origin.
Corollary 2.2.1. Riemannian polar coordinates are geodesic parallel
coordinates based on the hypersurfaces r = constant (r ^ 0, since r = 0
corresponds to a single point, and not a hypersurface).
Proof. By Theorem 2.2.5, all properties stated in Theorem 2.2.1 hold.
q.e.d.
50 Geodesic curves
By Corollary 2.2.1 and Theorem 2.2.1, the curves (p = constant, r\ <
r <
r
2
5
are shortest connections between their end points among all
curves lying in the chart. We are now going to observe that this holds
even globally, i.e. also in comparison with curves that may leave the
chart:
Theorem 2.2.6. For each p E M, there exists e
0
> 0 with the property
that Riemannian polar coordinates centered at p may be introduced with
domain
{ ( r , ^ ) : 0 < r < 6 o } ,
eo may be chosen to depend continuously on p. We denote the subset of
M corresponding to this coordinate domain by B(p,o). For any e with
0 < e < 6Q and any q dB(p,e), there exists precisely one geodesic of
shortest length ( e) from p to q. Namely, if q has coordinates (e, (po),
this geodesic arc is given by *y(t) = (t, tpo), 0 < t < e.
Proof. The first claim follows from Theorem 2.2.3, since Riemannian
polar coordinates are based on the diffeomorphism e (see the construc-
tions before Theorems 2.2.4 and 2.2.5). As already noted before Theo-
rem 2.2.3, Lemma 2.1.3 implies that.we may choose 6
0
as a continuous
function of p. In order to verify the second claim, let c(t) be a curve from
p to g, with c(0) = p. Let

0
:= sup{ > 0 : c(r) 6 B(p, e) for 0 < r < t}.
Since w.l.o.g. e > 0 and c is continuous, t
0
is positive. We are going to
show that
V
C
l[o.'o]J
> 6. (2.2.14)
Since the curve (t,(po), 0 < t < e, has length e as easily follows from
Theorem 2.2.5, this will imply the claim. In order to verify (2.2.14), we
proceed as follows:
L
(
c
!
I
o. ^) = /
t 0
( f t j (
c
( *) ) c
i
( t ) ^ ( *) )
i
*
J 0
(identifying Cj with its coordinate representation)
[to J
> / (9rrrf)
2
dt
Jo
2.3 The existence of geodesies 51
by Theorem 2.2.5 and since g
w
is positive definite (writing c(t)
(r{t)Mt)))
rto
\ \r\ dt, again by Theorem 2.2.5
Jo
> rdt = r(t
0
) = e.
Jo
Here, equality only holds if g
w
<fiip = 0, i.e. <p(t) = constant, r > 0, i.e. if
q . is a straight line through the origin. The second claim now easily
follows. q.e.d.
Corollary 2.2.2. If M is compact, there exists e
0
> 0 with the property
that for every p G M, there exist Riemannian polar coordinates with
domain
{ ( r , ^ ) : 0 < r < 6 o } .
Proof This follows from Theorem 2.2.6, since the constructions em-
ployed for polar coordinates depend continuously on p (see essentially
the construction of the diffeomorphism e). q.e.d.
2.3 The exi st ence of geodesies
Definition 2. 3. 1. Let M be a connected differentiate submanifold of
Euclidean space R.
d
, or, more generally^, a connected Riemannian man-
ifold. The distance between p,q G M is
d(p,q) := inf{L(c)| c : [a, 6] M
rectifiable curve with c(a) = p, c(b) = q}.
Theorem 2.3.1. Let M (as in Definition 2.3.1) be compact. There
exists eo > 0 with the property that any two points p,q G M with
d(p, q) < e
0
can be connected by a unique shortest geodesic arc (i.e. of length d(p,q)).
This geodesic arc depends continuously on p and q.
Proof. We take e
0
as described in Corollary 2.2.2. This gives a unique
shortest geodesic arc from p to q which furthermore depends contin-
uously on q. Exchanging the roles of p and q then yields continuous
dependence on p, too. q.e.d.
f See footnote on p. 43.
52 Geodesic curves
We now proceed to establish a global result:
Theorem 2.3.2. Let M be a compact connected differentiable subman-
ifold ofW*, or, more generally, a compact connected Riemannian man-
ifold. Then any two points p,q M can be connected by a shortest
geodesic arc (i.e. of length d(p,q)).
Proof. Let (c
n
)
n
N be a minimizing sequence. We may assume w.l.o.g.
that all c
n
are parameterized on the interval [0,1] and proportionally to
arc-length. Thus
Cn( 0) =p, Cn( l ) = g ,
L(c
n
) > d(p, q) for n > oo.
For each n, we may find
t(),n = 0 < ti
iTl
< . . . < t
m
,
n
= 1
with
L
(
Cn
i [, -. . . i , i )^
c
'
with eo given by Theorem 2.3.1. By Theorem 2.3.1, there exists a unique
shortest geodesic arc between c
n
(fy_i
jn
) =: Pj - i ,
n
and c
n
(tj,
n
) =: Pj ,
n
.
We replace c
n
\
t
, by this shortest geodesic arc and obtain a
new minimizing sequence, again denoted by c
n
, that now is piecewise
geodesic. Since the length of the c
n
are bounded because of the mini-
mizing property, we may actually assume that m is independent of n.
Since M is compact, after selecting a subsequence of c
n
, the points p^
n
converge to limit points pj, (j = 0, . . . , m) as n oo. c
n
\
t
, the
unique shortest geodesic arc between Pj_i,
n
and p
J >n
, then converges to
the unique shortest geodesic arc between Pj~\ and pj (for this point, one
verifies that limits of geodesic arcs are again geodesic arcs, that limits
of shortest arcs are again shortest arcs, that d(pj~i,pj) < eo, and one
uses Theorem 2.3.1). We thus obtain a piecewise geodesic limit curve c,
with c(0) = p, c(l) = g, and
L(c) = lim L(c
n
),
n+oo
since we have for the geodesic pieces
L
( %- . ' , l )
=
^ o
I
' (
C
l ^ - i . - . i . - l )
for all j (tj = lim t,
n
). Since the c
n
constitute a minimizing sequence,
n+oo
L(c) = d(p,q),
2.3 The existence of geodesies 53
and c thus is of shortest possible length. This implies that c is geodesic.
Namely, otherwise we could find 0 < S\ < S2 < 1 with L Icj
[s s
J < e
0
,
but with C|
(s s
j not being geodesic. Replacing c\
{ ]
by the shortest
geodesic arc between c(s\) and c(2) would yield a shorter curve (cf.
Theorem 2.2.6.), contradicting the minimizing property of c.
q.e.d.
Thus, any two points on a compact M may be connected by a shortest
geodesic. We now pose the question whether they can be connected by
more than one geodesic, not necessarily the shortest. On 5
n
, for example,
this is clearly the case. Actually, the answer is that it is the case on any
compact M. That result needs a topological result that is not available
to us here, however. Therefore, we will restrict ourselves to a special
case which, however, already displays the crucial geometric idea of the
construction for the general case, too.
Theorem 2.3.3. Let M be a differentiable submanifold of Euclidean
space R
d
, (or more generally], a Riemannian manifold), diffeomorphic
to the sphere S
2
. The latter condition means that there exists a bijective
map
h:S
2
^M
that is differentiable in both directions. Then any two points p, q M
can be connected by at least two geodesies.
Proof. M is compact and connected since diffeomorphic to S
2
which is
compact and connected. Let us assume p ^ q. We leave it to the reader
to modify our constructions in order that they also apply to the case
p = q. (In that case, Thm 2.3.3 asserts the existence of a nonconstant
geodesic c : [0,1] M with c(0) = p = c(l).) One may then construct
a diffeomorphism
h
0
: S
2
-+ M
with the following properties:
Let S
2
= { ( x
1
, ^
2
, ^
3
) e M
3
: |x| = l } . Then
P = M0, 0, 1) , g = MO, 0, - 1)
and a shortest geodesic arc c : [0,1] M with c(0) = p, c(l) = q is
given by
c(i) M0,sin7r,cos7r).
f See footnote on p. 43.
54 Geodesic curves
Let us point out that these normalizations are not at all essential, but
only convenient for our constructions. We look at the family of curves
7(^,5) = /i
0
(sin27rssin7r, cos27rssin7r,cos7r), 0 < s
}
t < 1. (2.3.1)
Then
7
(, 0) = 7(, 1) = c(i) for all t
and
7(0, s) = c(0), 7(1, s) = c(l) for all s.
We find some number K with
Hn(;8))<K for all s. (2.3.2)
Redefining the parameter t, we may also assume that all curves 7(-,s)
are parameterized proportionally to arc-length. By Theorem 2.3.1, there
exists 60 > 0 such that the shortest geodesic between any p, q M, with
d(p, q) < e
0
*
s u r n c
l
u e
- Let
0 =
0
< h < . . . < t
m
= 1
be a partition of [0,1] with
' i - ; - ! < ] f or j = l , . . . , m. (2.3.3)
Let another partition (TI, . . . , r
m
) satisfy
To = ^O < T\ < h < T
2
< . . . < Tm < t
m
= T
m +
i
and
T
J~
T
J~i<^ forj = l , . . . , r o + l. (2.3.4)
If 7 : [0,1] M is any curve parameterized proportionally to arc-length
with
L(l) < K,
we then have for j = 1, . . . , m
d(
1
(t
j
^),
1
(t
j
))<L(
%hitj]
)<K^=e
0
.
Therefore, by Theorem 2.3.1, the shortest geodesic from 7( ^- 1) to ^y(tj)
is unique. We then define 7*1(7) to be that piecewise geodesic curve for
which ri(7)|
[ t t
. coincides with the shortest geodesic from 7(
7
_
1
)
to y{tj), j = 1, . . . , m. Likewise, we let ^( 7) by that piecewise geodesic
curve for which r
2
(7) |,
r
coincides with the again unique short-
est geodesic from 7( TJ _I ) to 7(7^), j = l , . . . , m + l. We now observe:
2.3 The existence of geodesies 55
Lemma 2. 3. 1. Suppose rf(7(^),7(^-1)) < e
0
and d(7(
r
j )>7(
r
j -i )) <
e
0
for all j .
r(7) : =r
2
on( 7)
L( r (
7
) ) < L(
7
) ( 2-3.5)
wn/i equality iff 7 is geodesic.
Proof. By uniqueness of the shortest geodesic between 7(fy_i) and 7(j),
we have
L ( n (
7
) ) < L (
7
)
with equality only in case
7*1(7) = 7-
Likewise, for every curve 7' , L (7,' ) < Q for all j ,
MMV))<W)
with equality only in case
r
2
( y) =7
;
-
Therefore
L ( r (
7
) ) < L ( 7 )
with equality only if
r(7) = 7.
If r(7) = 7, however, 7|
t
and 7|
r
. are geodesic for every j ,
and hence 7 is geodesic itself. (If 1*1(7) = 7, then 7 is piecewise geodesic
with corners at most at the tj, and if r
2
( n (7)) = ri (7), then 7*1(7) *
s
geodesic with corners at most at the Tj. Thus, if r(^) = 7, 7 cannot have
any corners at all.)
g.e.d.
Lemma 2.3.2. Let 7 : [0,1] M be a curve parameterized propor-
tionally to arc-length and with L{")) < K. Then a subsequence of>
n
(7)
(= ro...or(7)) converges uniformly to a geodesic with the same endpoints
as 7.
56 Geodesic curves
Proof. Each curve r
n
(7), n E N, is a piecewise geodesic with corners
r
n
7( r i ) , . . . , r
n
7( r
m
) and endpoints r
n
7( r
0
) = 7(0), r
n
7( r
m
+i ) = 7(1).
The individual segments are the unique shortest connections between
these points. Therefore, each such curve is uniquely determined by the
m-tupel
A" := (r
n
7(Ti ), . . . , r"7(T
r o
)) e M x ... x M.
m times
Since M is compact, a subsequence of A
n
converges to some limit
(Pi , . . . , Pm) 6 M x ... x M.
r
n
(7) then converges uniformly towards the piecewise geodesic 70 with
endpoints 7
0
(0) = 7(0),7o(l) = 7(1) and nodes 70(r,) = Pi (i =
1, . . . , m) with segments 70j,
r
being the shortest geodesic arcs be-
tween their endpoints. This follows from the continuous dependence of
the occurring geodesic arcs on their endpoints (Theorem 2.3.1). We de-
note the convergent subsequence of ( r
n
( 7) )
n N
by (7z,)
N
. For all v N
then
7 l / + 1
= r
n( l / )
7^ with n(y) N.
By the minimizing property of the subsegments of the
7
,
L
( ' Mr , - . ! , ^)
= d
(l (
r
i - i ) >7" fo)),
hence
ra+1
L
( > ) = X)
d
(?" (
r
i - i ) ' > (
r
J' ))'
j =l
Since 7I/(T
7
) converges to p^ = 70(T^), L(7I/ ) converges to
m+l
L
(7o) = X] d (7o to-i), 70 (r,-))
for ^ 00. Then also
L(
7
o) = lim L(
7
+i) = lim L(r
n
<">
7
)
1/> oo 1/> oo
< lim L(7i/) by Lemma 2.3.1
= i (7o),
and equality has to hold throughout. Moreover, r(7) converges to r(f
0
),
2.3 The existence of geodesies 57
and
L( r (
7 o
) ) = lim L( r (
7
) )
> lim L (r
n
^7 ) by Lemma 2.3.1 again
= L{
10
).
Lemma 2.3.1 then implies that 70 is geodesic.
q.e.d.
We now return to the proof of Theorem 2.3.3:
We apply the preceding curve shortening process to all curves 7(-, s),
$ [0,1], simultaneously. For each 5, a subsequence of r
n
7(-, $) then
converges to a geodesic from p to q. We want to exclude the situation
that all those limit geodesies coincide with c. Let
K
0
: =( c) ,
and
tti := sup lim L(r
n
7(-, s)).
o<s<i
n

Since 7(-, 0) = c(-) is geodesic, r
n
7(-, 0) = 7(-, 0) for all n, hence K\ > K
0
.
We distinguish two cases:
( 1) Ki > K
0
Since 7(-, 5) is continuous in s, so is r
n
7(-, s) for every n G N. We
now claim:
Whenever
supi ( r
n
7( - , s) ) < Kx + e (2.3.6)
s
there exists s
n
[0,1] with
L(r"
7
(-, *)) " ^( r
n + 1
7( - , )) < 2e (2.3.7)
and
i ( r
n
7 ( - , * n ) ) > i - c - (2-3-8)
Indeed, otherwise
supL( r
n + 1
7
( - , s) ) < ! - ,
S
contradicting the definition of K\ (note that sup
5
L(r
n
"
f l
7(-,5))
58 Geodesic curves
is monotonically decreasing in n by Lemma 2.3.1). By definition
of i, there exists a subsequence ( e
n
)
n
^ 0 with
supL(r
n
7(-, 5)) < Ki + e
n
.
s
A subsequence of (r
n
7(-,s
n
))
n(E
N has to converge to some limit
curve c as above, and because of (2.3.7) with e = e
n
, we conclude
as in the proof of Lemma 2.3.2 that
L(r(c)) = L(c),
and c is hence geodesic by Lemma 2.3.1. Because of (2.3.8) and
continuity of L in the limit as in the proof of Lemma 2.3.2, we
get
L(c) = K\.
Since c and c are both defined on [0,1] and have different lengths,
they have to be different curves. Thus, c is the desired second
geodesic.
( 2) Ki = ft
0
We are going to show that in this case, there even exist infinitely
many geodesies from p to q. For that purpose, we consider the
curve
700 : =7(2>
5
)-
This is a closed curve with 7(0) = 7(1) = c{\) (see Figure 2.2).
Since ho is a diffeomorphism and r
n
7(t , 5) is obtained through a process
that can easily be made continuous from
7(^,5) = /io(sin27T5sin7rt,cos27r5sin7rt,cos7rt),
r
n
7(, s) has to map [0,1] x [0,1] surjectively onto M. Therefore, for
every n N and every 5 G [0,1], there exists &
n
(s) with
7(5) r
n
7(-, <7
n
(5))=:
7n
,
5
(-)
(in other words, r
n
7(-,<7
n
(s)) is a curve passing through 7(5)). 7n,
5
(*)
then is a curve with
7n,s(0) =c(0) =p, 7n, (l ) = c(l) =q,
and because of K\ = K
0J
we obtain
lim L(7
n
,
a
(-)) < sup lim L( r
n
7
( - ,
5
) ) = ^o. (2.3.9)
2.3 The existence of geodesies 59
q
Figure 2.2.
After selection of a subsequence, (7n,s(*))nN again converges to some
limit curve c
s
(-) with
c
a
(0) =p, c
5
( l ) =q
and
7(s)ec
a
(-).
By (2.3.5),
L(c
a
(-)) < K
0
,
and since Ko is the infimum of the energies of all curves from p to q
(o = (c),
a n
d c is minimizing), c
s
(-) is a minimizing curve itself,
hence geodesic.
Therefore, we have shown that for every 5, there exists a geodesic
from p to q that passes through 7(5). Hence there exist infinitely many
geodesies from p to g, as claimed.
q.e.d.
Remarks:
(1) Lemmas 2.3.1 and 2.3.2 do not need that M is diffeomorphic to
S
2
. Compactness suffices.
60 Geodesic curves
(2) We may construct the curves 7
n
,s(*) at the end of the proof also
in case K\ > K
0
. In that case, however, limits of such curves need
not be geodesic anymore.
(3) See Section 3.1 for an abstract version of the argument at the end
of the preceding proof.
Exerci ses
2.1 For curves <y(t) = (7
1
, 7
2
) : K -+ {(x\x
2
) 6 R
2
| x
2
> 0}, con-
sider
Compute the Euler-Lagrange equations and determine all solu-
tions.
2.2 For curves
d
7(t) = (
7
1
, . .
t 7
d
) : ^ { ( x
1
, . , x
d
) 6 l
d
| 5 ] ( x
l
)
2
< l }
)
t =i
consider
Compute the Euler-Lagrange equations and determine all solu-
tions.
2.3 Determine all geodesies between two given points on a cylinder
{ ( x , y , z ) e R
3
: x
2
- f y
2
= l } .
2.4 Let be a surface of revolution in R
3
, i.e.
= { ( x , y , * ) e R
3
: x
2
+ y
2
=/ ( * ) }
for a smooth, positive / : R * R. What can you say about
geodesies on ? For example, are the curves (x, y) = constant
geodesies? When are the curves z = constant geodesies?
2.5 Determine Riemannian polar coordinates on the sphere S
n
with
a domain of definition that is as large as possible.
2.6 Let p be the center of Riemannian polar coordinates on M, with
domain of definition {v e R
rf
: \\v\\ < g}. Let c : [0,e] -+ M be
a geodesic with c(0) p that is parameterized by arc-length,
0 < e < Q. Show that c([0, e]) does not contain a point that is
conjugate to p.
Exercises 61
2.7 Let M be a differentiable submanifold of R
d
that is diffeomor-
phic to S
2
. Show that for any p G M, there exists a nonconstant
geodesic c : [0,1] M with c(0) = c(l) = p.
2.8 Try to find other topological classes of manifolds with the prop-
erty that there always exists more than one geodesic connection
between any two points.
3
Saddle point constructions
3.1 A finite dimensional example
Let F : R
d
R be a function of class C
1
which is bounded from below
and which is 'proper' in the following sense:
F(x) -+ oc for |x| -* oc. (3.1.1)
Since F is bounded from below, (3.1.1) is equivalent to: For every s R,
{x e R
d
: F(x) < $} is compact. (3.1.2)
Therefore, F assumes its infimum. Namely, we take any
s
0
> inf F(x).
xR
d
Then
{x e R
d
: F(x) < so}
is compact and nonempty, and since F is continuous, it has to assume
its infimum on that set. We now assume that F even has two relative
minima, #i , #2 in R
d
, and that they are strict in the following sense: For
x = i , #2,
w e
have
36
0
Vy with 0 < \y-x\ < 6
0
: F(y) > F(x). (3.1.3)
Theorem 3. 1. 1. Under the above assumptions, F has a third critical
point 3 (i.e. VF(xs) = 0) with
F(x
3
) > max(F(xi ), F(x
2
)) =: o
Proof. We consider curves 7 : [0,1] R
d
with
7(0) = xi , 7(1) =x
2
. (3.1.4)
62
3.1 A finite dimensional example 63
We first observe that there exists a > 0 with the property that for any
such curve, there exists to (0,1) with
F(7(t
0
))>Ko + <*. (3.1.5)
In order to verity this, we may assume w.l.o.g.
F( xi ) < F(x
2
).
We then choose 6 with
0 < 6 < min(<5
0
, - \xi - x
2
\). (3.1.6)
For every y with \y x
2
\ = 5 then by (3.1.3)
F(y) > F(x
2
),
and since {\y x
2
\ 6} is compact, F assumes its minimum on this set,
hence for some a > 0
min F(y) > F(x
2
) + a = n
0
+ a. (3.1.7)
Since for every curve 7 with (3.1.4) we have
|7(1) - x
2
\ = 0, |7(0) - x
2
\ = \xi - x
2
\,
there has to exist some t
0
[0,1] with
h(t
0
)-x
2
\ =6 (recall (3.1.6)) .
By (3.1.7) then
i Wo ) ) > o 4- a,
and (3.1.5) follows indeed.
We now define
K\ := inf sup F(7()),
7
*[o,i]
where 7 again is a curve in R
d
with 7(0) = #i , 7(1) = #2- By (3.1.5)
i >
0
. (3.1.8)
Our intention now is to find a critical point 3 of F with
F(x
3
) = i .
Since
F( x i ) , F( x
2
) <
0
,
64 Saddle point constructions
Xs will then be necessarily be different from X\ and #2- As a step towards
the existence of such a point x
3
, we claim
Ve > 0 3<5 > 0 V curves 7 with 7(0) = #1,7(1) = x
2
with
sup F ( 7 ( * ) ) < i + * (3.1.9)
* [0,1]
3t
0
[0,1] with:
F{i{t
0
)) > i - 6 (3.1.10)
| ( VF) (
7
( *o) ) | <e. (3.1.H)
Suppose this is not the case. Then
3 e
0
> 0 V n e N 3 curve 7 between X\ and x^ with
s u
P
F(
7 n
( ) ) <Ki + - (3.1.12)
t n
Vt
0
with F(7(o)) > i - eo (3-1-13)
| ( VF ) (
7 B
( O) ) | > CO. (3-1-14)
For s > 0, we define a new curve 7
n>s
by
7n, , ( 0: =7n( 0- 5( VF) ( 7n( 0) .
Since x\ and #2 are minima, VF( xi ) = 0 = VF(x
2
)> and so
7n,s(0) =Xi , 7,
5
( l ) = X
2
,
so that the curves 7
n>s
are valid comparison curves. By our properness
assumption (3.1.2) and (3.1.12), 7
n
() stays in a bounded subset of E
d
,
and VF will then be bounded on that bounded set, and hence for any
s
0
> 0 and all 0 < s < s
0
, the curves 7
n
,s() stay in some bounded
set, too. This set is independent of n (as long as 0 < 5 < 5
0
, for fixed
s
0
> 0). By Taylor's formula
F(7n, . (0) = F(y
n
(t)) - sVF(
ln
(t)) VF(
7
(<)) + o(s).
Since F is continuously differentiate and 7
n
,
5
() is contained in a bound-
ed set, o(s) can be estimated independently of n and t (as long as 0 <
s < s
0
). In particular, after possibly choosing s
0
> 0 smaller,
F(7n, . W) < F(
7
( 0) - | | VF(
7
(t ))|
2
(3.1.15)
3.1 A finite dimensional example 65
for all n, s with 0 < s < so, and t with
| VF(
7
nW)l > co. (3.1.16)
Thus, in particular,
F(
ln
,
S0
(t)) < F (
7
( 0) - ye g ( 3.1.17)
for all such t and all n. We now simply choose n so large that
i < ?* (M.
Then by our assumption, all Q with F(y
n
(to)) > i ~~
e
o satisfy (3.1.14),
and hence for all such to
F(j
n
,so(to)) < F(
7
(o)) - yeg
<
K l +
I _ f 2
e
2 by (3.1.12) (3.1.19)
n z
< i by (3.1.18).
Having proved (3.1.19), there are now various ways to construct a path
7 from X\ to #2 with
F( 7( 0) < i for all t [0,1]. (3.1.20)
One way is to refine the above construction by letting s depend on t as
follows: we choose a smooth function
a ( t ) : [ 0 , l ] - [ 0 , a
0
]
with
a(t) = 0 whenever F(y
n
(t)) < K\ e
0
and
a(t) = so whenever F(y
n
(t)) > K\ .
We then look at the path j(t) = 7
n
,<r(t)W- Then for with F(7
n
()) <
i - c o
F( 7( 0) = ^ ( 7 n W) < i - c o ,
for * with i - co < F(>y
n
(t)) <
x
- ~
F ( 7( * ) ) < F (
7B
( t) ) - ^ < M - ? - ^
66 Saddle point constructions
(cf. (3.1.15), (3.1.16), (3.1.14)), and finally for all t with F(7
n
(*)) >
* i -
T
F(7(*)) = i
!
' ( 7n, . o( 0) <i (cf-(3.1.19)).
Thus, (3.1.20) holds indeed. This, however, contradicts the definition of
K\. Therefore, the assumption that our claim was not correct led to a
contradiction, and the claim holds. It is now simple to prove the theorem.
Namely, we let e
n
0 for n oo, and for e = e
n
, we find <5 = 6
n
as in
the claim. We than choose a curve y
n
from x\ to x<i with
sup F(7
n
(t)) < i +min(c, ). (3.1.21)
t[0,l]
According to the claim, there exists t
n
[0,1] with
F( 7n( *n) ) >Kl -
n
(3.1.22)
| (VF)(
7
n(*n))| <C. (3.1.23)
After selection of a subsequence, (7
n
(^n))nN then converges to some
point 3, because of (3.1.2) and (3.1.21). 3 then satisfies by continuity
of F and VF
F(x
3
) = i (3.1.24)
VF(x
3
) = 0. (3.1.25)
Thus, 3 is the desired critical point.
q.e.d.
Theorem 3.1.1 may be refined as follows:
Theorem 3.1.2. Let F as above again have two relative minima, not
necessarily strict anymore. Then either F has a critical point 3 with
F(x
3
) > max(F(xi ), F(x
2
)) = o>
or it has infinitely many critical points.
Proof. For the argument of the proof of Theorem 3.1.1, we only need
inf sup F(7(*)) >
0
, (3.1.26)
7
t[0,lj
where the infimum again is taken over curves 7 : [0,1] R
d
with 7(0) =
x\, 7(1) = X2. So, suppose that (3.1.26) does not hold. We then want to
3.2 The construction of Lyusternik-Schnirelman 67
show the existence of infinitely many critical points. As in the proof of
Theorem 3.1.1, we may assume
F(x
x
) < F{x
2
).
The argument at the beginning of the proof of Theorem 3.1.1 then shows
that (3.1.26) holds if x
2
is a strict relative minimum. If x
2
is a relative
minimum, which is not strict, for all sufficiently small 8 > 0, say 8 < <5Q,
we have
F(x
2
) < F(x) for all x with \x - x
2
\ < 8
0
(3.1.27)
and there always exists some x$ with 0 < \xs x
2
\ < 8 and
F(x
6
) = F(x
2
). (3.1.28)
We then put 8\ = 8
0
/2. Then x&
x
is a relative minimum of F by (3.1.27),
(3.1.28), hence a critical point. Having found a critical point xs
n
with
0 < \xs
n
x
2
\ < \xs
n
_
1
- x
2
\, we put
8
n+1
= -\x
6n
-x
2
\
and find a critical point xs
n+1
with
0
< | ^
n + 1
-x
2
\ < (5
n+
i.
Thus, xs
n+1
is a critical point of F different from all preceding ones.
q.e.d.
Remark. It is not very hard to sharpen the statement of Theorem 3.1.2
from 'infinitely many' to 'uncountably many'.
3.2 The construction of Lyusternik -Schnirelman
In this section, we want to prove the following theorem, in order to ex-
hibit some important global construction in the calculus of variations, in-
troduced by Lyusternik-Schnirelman. The result presented is much more
elementary than the theorem of Lyusternik-Schnirelman, which says
that on any surface with a Riemannian metric, e.g. a surface embedded
in some Euclidean space, diffeomorphic to the two-dimensional sphere,
there exist at least three closed geodesies without self-intersections. The
more elementary character of our setting allows us to bypass essential
geometric difficulties encountered in a detailed proof of the Lyusternik-
Schnirelman Theorem.
68 Saddle point constructions
Figure 3.1.
Theor em 3. 2. 1. Let 7 be a closed convex Jordan curved of class C
1
in
the plane M
2
. (7 then divides the plane into a bounded region A, and an
unbounded one, by the Jordan curve Theorem. That 7 is convex means
that the straight line between any two points of 7 is contained in the
closure A of A.) Then there exist at least two such straight lines between
points on 7 meeting 7 orthogonally at both end points (see Figure 3.1).
Proof. We start by finding one such line. Let be the set of all straight
lines / in A with dl C 7. We say that a sequence (l
n
)nen C C converges
to / E , if the end points of the l
n
converge to those of /. In order to
have a closed space, we allow lines to be trivial i.e. to consist of a single
point on 7 only. We denote the space of these point curves on 7 by Co.
We let / := [0,1] be the unit interval. We consider continuous maps
v:I-+C
with the following two properties:
(i) v(0) = v(l).
(ii) To any such family, we may assign two subregions A\(t) and A2(t)
of A in a certain manner. Namely, we let A\(t) and A
2
(t) be the
two regions into which v(t) divides A. Having chosen A\(0) and
A
2
(0), A\(t) and A
2
(t) then are determined by the continuity
t A closed Jordan curve is a curve 7 : [0, T) -> R
d
with 7(0) :
on [0, T). Cf. the definition of a Jordan curve on p. 35.
: 7(T) t hat is injective
3.2 The construction of Lyusternik-Schnirelman 69
Figure 3.2.
requirement. We then require
A
1
(1) = A
2
(Q).
We let Vi be the class of all such families v.
The construction is visualized in Figure 3.2. (0 corresponds to 0 E J,
/ t o , / / t o , / / / t o , l t o l )
Actually, in order to simplify the visualization, if v(0) is a point curve
(on 7), i) may be relaxed to just requiring that v(l) also is a point curve
(on 7), not necessarily coinciding with v(0) (see Figure 3.3). Namely, any
point curves can be connected through point curves, i.e. with vanishing
length.
We denote by L(l) the length of / G and define
K\ : inf s\ipL(v(t)).
veVi tei
Figure 3.3.
70 Saddle point constructions
We want to show that
i > 0.
For this purpose, let p > 0 be the inner radius of 7, i.e. the largest p for
which there exists a disc
B(x
0j
p) C A
for some XQ G A (B(xo,p) := {x G E
2
: \x - x
0
| < p}) Then
i > i = inf supL(v(t)nB(xo, /o)).
We let A'i(t) := ^( ^) n B(xo,p), i = 1,2. Because of (ii) and the
continuous dependence of Ai(t) and hence also of A
f
{
(t) on t, there exists
some to I with
Area (A[(t
0
)) = Area(A
;
2
(t
0
)).
Thus v(to) divides B(xo,p) into two subregions of equal area. v(to) then
has to be a diameter of J5(xo,p), i.e.
L(v(t
0
)nB(x
0
,p)) = 2p.
Therefore
i > i = 2p > 0
and i is positive indeed. We are now going to show by a line of reasoning
already familiar from Sections 2.3 and 3.1 that K\ is realized by a critical
point / of L among all lines with end points in 7, i.e. by / meeting 7
orthogonally (see Theorem 1.4.1). For that purpose we shall assume for
the moment that 7 is of class C
3
. Later on, we shall reduce the case
where 7 is only C
1
to the present one by an approximation argument.
We now claim
Ve > 0 36 >0: Vv Vi with
sup L(v(t)) < K1+6
tei
3t
0
e I with L (y(to)) > K\ - e
and | cos(ai (v(t
0
)))| , |cos (a
2
(v (t
0
)))\ < e,
where a\(l) and 0*2(1) are the angles of / at its endpoints with 7.
3.2 The construction of Lyusternik-Schnirelman 71
Otherwise
3e
0
> 0 : Vn G N 3v
n
G Vi with
supL(i;
n
0O) < i + ^
Vt
0
with L (v
n
(*o)) > tfi - co
| cosai ( v
n
( t
0
) ) | > e
0
or | cosa
2
(vn(*o))| > eo-
The idea to reach a contradiction from that assumption is simple, once
the following Lemma is proved:
Lemma 3. 2. 1. For every planar closed Jordan curve 7 of class C
3
,
there exists (3 > 0 with the following property: Whenever # G E
2
satisfies
dist(x,7) := inf \x - y\ < (3
ye-y
there exists a unique y G 7 with dist(x,7) = \x y\.
Proof We consider 7 as an embedded submanifold of the Euclidean
plane E
2
. 7 is then covered by the images of charts / : U V of the
type constructed in Theorem 2.2.1. Here, U and V are open in E
2
, and
7
n v = f (u n {x
2
= 0} ) .
Furthermore, the curves x
1
= constant in U correspond to geodesies, i.e.
straight lines in V perpendicular to 7, and they form shortest connec-
tions to 7 fl V. By shrinking U, if necessary, we may assume that it is of
the form (, ) x (77,77), with > 0, 77 > 0. Since 7 is compact, it can
be covered by finitely many such charts
fi
:
( - 6 , 6 ) x (-WiVi) -* Vi , i = l , . . . , r a.
If we then restrict fi to (6,6)
x
("i
21
, ^) the lines x
1
= constant,
:z
^

< x
2
< ^ , then correspond to shortest geodesies to 7, since the part
of 7 not contained in Vi is not contained in the image of fi, and hence has
distance at least %* from the image of the smaller set (&, &) x (-^
2i
, ?f).
This is indicated in Figure 3.4 where the broken lines correspond to
x
2
= Jk, and this is depicted for two different indices i.
Therefore, (3 := min ( ^) satisfies the claim.
t =l , . . . , n
q.e.d.
72
Saddle point constructions
Figure 3.4.
V
n
(to)
Figure 3.5.
We now return to the proof of Theorem 3.2.1:
Without loss of generality eo < f3 < ^ . Assume e.\
cosai (v
n
(t
0
)) > e
0
.
The following construction is depicted in Figure 3.5. Choose SI(Q)
3.2 The construction of Lyustemik-Schnirelman 73
v
n
(to) with
l * i ( t o ) - Pi ( t o ) | =
where p\(to) is the endpoint of v
n
(to) where it forms the angle a\(to)
with 7. We replace the subarc v^tto) of v
n
(to) between p\(to) and s\(to)
by the shortest line segment v
f
n
(to) from s\(to) to 7. By the theorem of
Pythagoras and the convexity of 7
L (v'
n
(t
0
)) < L {yl (t
0
)) si nai (v
n
(t
0
))
<L{v\(t
0
))yJT^7l
We then let
<(*o)
be the straight line from the second endpoint ^2(^0) of v
n
(t$) to the
point where v
f
n
(t
0
) meets 7. Then, letting v^ih) denote the segment of
v
n
(to) between s\(to) and P2(to), by the triangle inequality
L(v*
n
(t
0
))<L(v'
n
(t
0
)) + L(vl(t
0
))
< l K ( t ) ) ^ + I ( ^ ( ( o) )
= /3y/l^4 + L(v
2
n
(t
0
)).
Since L (y\ {to)) = j3, we have
L ( i ( t o ) ) < Ki - / ? + ^ .
We then choose n so large that
Pyjl ~ 4 + * i - / ? + - < i - *7
for some rj > 0. Hence
L ( * o ) ) < * ! - >? .
We now continuously select points s\(t), S2(t) on v
n
(t) for every G J
with
\Pi(t) - Si(t)\ = /?, whenever L (v
n
(t)) > Ki - (3
and
Pi(t) = s^(t), whenever L (i>
n
()) < K\ 2/3 i = 1,2
and
l f t W- * i WI < / ? for all*.
74 Saddle point constructions
We then choose again the shortest lines from Si(t) to 7 and replace v
n
(t)
by the straight line v^{t) between those points, where these two shortest
lines meet 7. By our geometric argument above
L (v* (t)) < K\ 77 for some rj > 0,
whenever
L(v
n
(t)) > fti - 6
0
.
Since also always
L(v'
n
(t))<L(v
n
(t)),
we may then construct a family v
s
n
V\ with
sup L (v
a
n
(t)) < KI - min(;/, (3)
tei
contradicting the definition of K\. Consequently, our claim is correct. We
then find a sequence (t
n
)
neN
C J and (v
n
)
neN
C V\ with
sup L(v
n
(t)) <K\ + -
tei n
L(v
n
(t
n
)) >KI
n
|cos (ai (v
n
(t
n
)))\, |cos (a
2
(v
n
(t
n
)))\ < - .
A subsequence of (v
n
(t
n
))
ne
^ then converges to a straight line l\ in A
of length K\ meeting 7 orthogonally at its endpoints.
In order to construct a second line I2 meeting 7 orthogonally at its
endpoints, we proceed as follows:
We denote by V2 the class of all continuous maps
v : / x / C
with
v({0} x / ) and v({l} x I) C C
0
(3.2.1)
and with the following property:
For all continuous maps
T:I-*IXI
T(8) = (h(8)Ms))
3.2 The construction of Lyusternik-Schnirelman 75
r
2
= 0 f
2
= l / 4 r
2
= 1/2 f
2
3 / 4 >
2
=l
Figure 3.6.
with
*i(l) = 1 ~ *i(0),t
2
(0) = 0, t
2
(l ) = l, (3.2.2)
we have
v o r Vi.
Let us exhibit an example of such a v V
2
(see Figure 3.6). We consider
the vi 6 Vi of Figure 3.5 where i>i(0) and i>i(l) were point curves on
7, and we rotate v\ via the parameter t
2
so that at t
2
= 1 we have
the same picture as at t
2
= 0, but with t\ interchanged with 1 t\.
Equation (3.2.2) then holds.
We note that I x I becomes a Mobius strip, when we identify the
parameter ti on the line t
2
= 1 with the parameter 1 t\ on the line

2
=0 .
We define
K
2
:= inf sup L(v(t)).
vV2 tei
2
Then
K
2
> tti,
and ^2 again is realized by some straight line l
2
in A meeting 7 orthog-
onally at its endpoints. We consider two cases:
(1) K
2
> *i .
Then L(l
2
) = K
2
> K,\ = L( / I ) , and l
2
hence is different from l\.
( 2) K
2
= Ki.
We claim that in this case, we even get infinitely many solutions
of our problem, i.e. lines in A meeting 7 orthogonally. Namely,
we let VQ V
2
be any critical family, i.e. satisfying
tel
2
76 Saddle point constructions
(It is not hard to see that in the present case such a VQ V
2
indeed exists.)
We then have for any r : J -+ I
2
with (3.2.2)
sup L (vo (r (s))) <K
2
. (3.2.3)
sei
On the other hand, since v$o T V\,
*i <supL{v
0
{T{s))), (3.2.4)
and since K\ = K
2
, we have equality in (3.2.3) and (3.2.4). This means
that VOOT is a critical family for i, and it then has to contain a solution
l
r
of our problem.
Let S C {(s,t) I x I\ L(vo(s,t)) = K
2
)} denote the set in J x J
corresponding to all solutions induced by vo. After carrying out the
identification prescribed by (3.2.2), which makes I x I into a Mobius
strip, we see that the complement of S in this Mobius strip is not path
connected. Namely, otherwise we could find r satisfying (3.2.2) for which
T(I) avoids 5, and for such a r, VQ or would then not contain a solution,
as S is the set of all solutions in the family vo. This, however, contra-
dicts what has just been said (see Figure 3.7). In fact, S has to carry
a one dimensional cyclef on the Mobius strip. Otherwise, S would be
contractible (in the Mobius strip) and one could reparameterize VQ on
I
2
so that the set of solutions corresponds to a finite number of points.
But this is incompatible with K
2
= i as we have just seen. Since for
each path r as in (3.2.2) with T(I) C S, VO O T G V\ is nonconstant by
(3.2.1) and (3.2.2), we obtain an uncountable number of solutions.
We thus have shown our result if 7 is of class C
3
. If 7 is only of class
C
1
, we choose a sequence of curves 7
n
of class C
3
approximating 7. This
means that there are parameterizations 7nW l(
T
) by arc-length with
+00
lim sup ( | 7
n
(r) -
7
(
r
) | +
^ 7 n ( r ) - ^
7
( r ) 0.
We then let l\,
n
and l
2yn
be the corresponding solutions for 7
n
. After
selection of subsequences, l\,
n
and l
2yU
then converge to solutions Zi, l
2
for
7, and those l\ and l
2
realize the critical values K\ and K
2
, respectively.
Since the argument to produce infinitely many solutions in case K\ = K
2
f We have to employ here some constructions from algebraic topology. A reference is
any good book on t hat subject, e.g. M.Greenberg, Lectures on Algebraic Topology,
Benjamin, Reading, Mass., 1967, pp. 33-45, 186. While this is somewhat technical
we strongly urge the reader to try to understand the essential geometric idea of
the preceding construction.
3.2 The construction of Lyusternik-Schnirelman 77
/ r(I)
h
Figure 3.7.
did not depend on a higher differentiability assumption on 7, it is still
applicable here, and we thus can complete the proof as before.
q.e.d.
The variational content of Theorem 3.2.1 is that we produce two
geodesies in E
2
that meet a given convex Jordan curve orthogonally.
In fact, this statement generalizes to any closed convex Jordan curve on
some surface, enclosing a domain homeomorphic to the unit disk.
In Sections 2.3, 3.2, we could only treat variational problems that
could be reduced to finite dimensional problems, because we did not
yet develop tools to show the existence of critical points of functionals
defined on infinite dimensional spaces. We shall develop such tools in
Part II, and consequently in Chaper 9 of Part II, we shall be able to
present general results about the existence of unstable critical points in
the spirit of the preceding results. The crucial notion will be the Palais-
Smale condition that guarantees that the type of reasoning presented in
Section 3.1 extends to certain functionals defined on infinite dimensional
spaces. Also, the reasoning employed in Section 3.2 that infinitely many
critical points can be found if two suitable critical values coincide will
be given an axiomatic treatment in Section 9.3 of Part II.
78 Saddle point constructions
Exercises
3.1 Let F C
X
(M, R) (M an embedded, connected, differentiable
submanifold of R
d
) be bounded from below and proper (i.e. for
all s R, {x M : F(#) < 5} is compact), and suppose F has
two relative minima i,#2- Let
tt
0
:= max(F(xi), F(x
2
)).
Show that F either possesses a critical point 3 with F(#3) > KQ,
or that it has uncountably many critical points.
3.2 Let F C^ R^ R) be bounded from below and proper, and
suppose it has three strict relative minima xi,X2,x
3
. Try to
identify conditions under which F then has to possess more
than two additional critical points, e.g. three or four.
3.3 Let A be a compact convex subset of the unit sphere S
2
C R
3
,
and suppose dA is a smooth curve 7; the convexity condition
here means that for any two points in A, one can find precisely
one geodesic arc inside A that connects them. Show the existence
of at least two geodesic arcs in A that meet 7 orthogonally at
both endpoints.
4
The theory of Hamilton and Jacobi
4.1 The canonical equations
We let t be a real parameter varying between t\ and ^ . We consider the
variational integral
1= f
2
L fax
1
,...,x
n
(t),x\t),... ,x
n
{t)) dt (4.1.1)
Jti
for the unknown functions x(t) = ( x
1
( t ) , . . . , x
n
(t)) with fixed endpoints
x(t\) and x(t
2
). Here,
dt'
We assume that L is of class C
2
. The Euler-Lagrange equations for i"
are
L
it
-L
x
i=0 (i = l,...,n). (4.1.2)
We assume the invertibility condition
det L

i
i
^0. (4.1.3)
As shown in 1.2, this implies that solutions of (4.1.2) are of class C
2
.
(4.1.3) also implies that we may perform a Legendre transformation.
Namely, by the implicit function theorem, we may then locally solve
Pi = L

i (4.1.4)
w.r.t. x\ i.e.
x
i
=x
i
(t,x,p) (p= ( pi , . . . , p) ) . (4.1.5)
79
80 The theory of Hamilton and Jacobi
The expressions pi are called momenta. The Hamiltonian H is defined
as
H(t,x,p) : =i * p t - ( * , s , ) . (4.1.6)
We obtain
dx^ dx^
Hxi
^
Pj
'd^~
Lii
'd^~
Lxi
'
and with (4.1.4) then
H
x
i = ~L
x
i.
and with (4.1.2) and (4.1.4) then
H
x
i = -n. (4.1.7)
Also
dx
j
ti
dx
j
dpi dpi
and thus again with (4.1.4)
H
Pi
=x\ (4.1.8)
(4.1.7) and (4.1.8) constitute a so-called canonical system. We are going
to see that (4.1.7) and (4.1.8) also arise as Euler-Lagrange equations of
the variational problem obtained by expressing L in (4.1.1) through H
via (4.1.6). Namely,
1= f
2
(x
j
Pj
-H(t,x,p)) dt, (4.1.9)
Jti
where the unknown functions are x(t) and p(t), has Euler-Lagrange
equations (4.1.7) and (4.1.8), and so does
/ = - f
2
(x
j
pj + H{t,x,p))dt. (4.1.10)
Jti
Before proceeding, we observe that if H does not depend explicitely on
t, i.e. H = H(x,p), then if is a constant of motion, i.e. constant along
any solution x(t) of the equations, Namely,
~H (x(t),p(t)) = H
x
ix* + H
Pi
pi = 0 (4.1.11)
by (4.1.7) and (4.1.8).
4-2 The Hamilton-Jacobi equation 81
Example. For L = | | i | V(x), we have
H=\\p\
2
+ V(x),
and the canonical equations become
x = p
P=-V
x
.
This example, which describes the Newtonian motion of a particle of
unit mass subject to a potential F, is helpful for remembering the signs
in the canonical equations.
4.2 The Hami l ton-Jacobi equation
Assumpti on. There is given a set fi C M
n+1
= { ( ^x
1
, . . . , x
n
) } with
the property that for any points A, B fi, A = (a, K,
1
, . . . , tt
n
), B =
(s,*?
1
,.. . ,<?
n
), there is a unique solution x(t) (x
l
(t),... ,x
n
(t)) of
(4.1.2) contained in fi with (a, x(a)) = A, (s,x(s)) = B.
Thus, fi is covered by solutions of (4.1.2), and those can be considered
as functions of their endpoints. Thus
x
i
= f
i
(t;s,q\...,q
n
;a
)
K\...^
n
) (4.2.1)
and also
Pi =g
i
(t;s,q
1
,..., q
n
; a,*
1
,... ,K
n
) = L
i
. (4.2.2)
In particular,
K
i
= f
i
(a;s,q\...,q
n
;a,K\...
)
K
n
) (4.2.3)
q
i
= f
i
(s;s,q\...,q
n
;a,K\...,K
n
).
We also define
<Pi := 9i(o-; 5, g \ . . . , g
n
; <r, / c
1
, . . . , tf
n
) = L** (<r, K, k) (4.2.4)
v< := gi(s; 5, g
1
, . . . , g
n
; a, ft
1
,..., K
n
) = L
4
i (s, q, q).
In the sequel, f
l
etc. will mean a derivative w.r.t. the first independent
variable, f
l
etc. a derivative w.r.t. the second one. Inserting (4.2.1),
(4.2.2) into i", we obtain
J = /(*,<?, a, *) (4.2.5)
82 The theory of Hamilton and Jacobi
and call this expression the geodesic distance betweeen A and B. In this
connection, / is called eiconal. Recalling (4.1.9), we may write
/ = / (
Pi
i< -H(t,x,p))dt. (4.2.6)
We want to compute the derivatives of I(s, q, a, K).
I
s
= Viq
1
- H(s, q, v) + J' (g'J
1
+ gj
1
' - H
Xi
f - H
Pi
gty dt.
Equations (4.1.7) and (4.1.8) yield H
x
< = &, H
Pi
= /*, and thus
la = Vi? ~ H(s, q,v)+ (dif) '
= fi<f - H(s,q,v) + gif
x
Equation (4.2.3) yields
f + /' = 0 for t = s
f = 0 for * = a,
dt
.,\ t=s
and thus altogether
/
s
= Viq
%
- H(s, q, v) - Vitf
= -H{s,q,v)
= L(s,q,q)-q
i
L^ (4.2.7)
Next
J a
dqi
1 +9i
dqi "* dq* *
p
' dqi
^ = / 7 0 f +9iTh ~ H
xi
^j -H
p
,^\ dt
r\ <i I t S
= 9i^-\ again by (4.1.7), (4.1.8)
1j \t=<r
= g
j
(s;s,q
1
,...,q
n
;a,K
1
,...,K
n
) by (4.2.3)
A
dK
' n
d<
? x
a n d
^
=
' ^ J
=
^ -
Thus
I
qj
=
Vj
= Lg,{s,q,q). (4.2.8)
4-2 The Hamilton-Jacobi equation 83
Analogously,
I
a
= H(a, K, ip) = -L(a, K, k) + A*L*. (4.2.9)
4 ; = ->j = - i i ( ^ *, ) . (4.2.10)
Inserting (4.2.8) into(4.2.7), we obtain
I
8
+ H{s,qj
q
)=0. (4.2.11)
Thus, the geodesic distance as function of the endpoint satisfies (4.2.11),
a Hamilton-Jacobi equation. In the present context that equation then
is also called eiconal equation. We observed at the end of Section 4.1
that H is constant along solutions if it does not depend on t explicitly.
In that case, (4.2.11) implies that / then depends linearly on 5. It may
be useful for understanding the preceding formulae if we derive them
without the use of the Legendre transformation. Thus
/ L{t,x{t),x{t))dt = J L{t,fJ)dt
and
Is = L(s, q, q) + f (L
x
if' + W* ' ) dt.
The Euler-Lagrange equations give
d_
dt
I>x
x
~ 37-^irM
and so
= L(8,q,q) + J'(L

<f
i
')dt
= L(s,q,q)+ L
i
f
l
\ t=(T
As before, we obtain from (4.2.3)
/*' = - / * fort = 5
/*' = 0 for t = a,
hence
I
s
= L(s,q,q)~ L
qi
q\
84 The theory of Hamilton and Jacobi
i.e. (4.2.7). Likewise,
i

= f
dp
U
L
T
t TTT + LT' TT-; dt
Lt
w
t = S
i.e. (4.2.8). Thus, the Hamilton-Jacobi equation (4.2.11) is
J
s
- L(
5
, ? , ( j ) + J
q
i<f = 0 . (4.2.12)
We have seen in the preceding how solutions of the canonical equa-
tions yield solutions of the Hamilton-Jacobi equations. We now want to
establish a converse result.
Let </?(, x
1
, . . . , x
n
) be a solution of the Hamilton-Jacobi equation
which we now write as
po + H(t, x\ . . . , x
n
, p i , . . . , p
n
) = 0 (4.2.13)
with
Po = <Pt
Pi = <Px
i
'
Definition 4. 2. 1. / /
<p = G{t, x\ . . . , x
n
, Ai , . . . , A
n
) with GeC
2
(4.2.14)
and
det(G
xiXj
)
ij=1 n
^0 ( 4.2.15)
is a family of solutions of (4-2.13) depending on n parameters Ai , . . . , A
n
,
we call
if = G(t, x
1
, . . . , x
n
, Ai , . . . , A
n
) + A (4.2.16)
(where A is a free real parameter) a complete integral of (4-2.13).
We have the following theorem of Jacobi:
Theorem 4. 2. 1. Let (p = G(t, x
1
, . . . , x
n
, Ai , . . . , A
n
) -f A be a complete
integral of (4.2.13). Then one may obtain a family of solutions of the
4-2 The Hamilton-Jacobi equati
canonical equations
depending on 2n
Hpi
Hx
l
parameters Ai , . . .
Gx<
G
x
i
= x
l
= ~Pi
, A
n
, /i , .
= S
= Pi-
. . , / x "
Ion
by solving
(4.2
(4.2
(4.2
(4.2
85
.17)
.18)
.19)
.20)
Proof. Because of (4.2.15), (4.2.19) may be solved w.r.t. #*,
x
l
= z
t
( , Ai , . . . , A
n
, / / \ . . . , / /
n
) .
Inserting this into (4.2.20) then yields
Pi = Pi(t, Ai , . . . , A
n
, [i , . . . , fi
n
).
We have to show that x
l
and pi satisfy the canonical equations. For this
purpose, we differentiate (4.2.13) w.r.t. x
l
and obtain:
G
tXi
+ H
Pk
G
xkx
i + H
x
i = 0. (4.2.21)
Differentiating (4.2.13) w.r.t. A*, we obtain
G
tXi
+H
Pk
G
xkXi
=0, (4.2.22)
since the terms containing ^- cancel by (4.2.21). Differentiating (4.2.19)
w.r.t. t, we obtain
dx
k
G
Xit
+ G
Xixk
=0. (4.2.23)
Comparing (4.2.22) and (4.2.23) and recalling (4.2.15) yields (4.2.17).
Differentiating (4.2.20) w.r.t. t, we obtain
^ i = G
xH
+ G
xixk
^-. (4.2.24)
Comparing (4.2.24) and (4.2.21) and using the relation (4.2.17) just
derived, we then obtain (4.2.18).
q.e.d.
The canonical equations are a system of ODE whereas the Hamilton-
Jacobi equation is a 1
s t
order partial differential equation (PDE). The
preceding considerations show the equivalence of these equations. While
in general, one may consider a PDE as being more difficult than a system
of ODE, in applications, one may often find a solution of the canonical
86 The theory of Hamilton and Jacobi
equations by solving the Hamilton-Jacobi equation. Here, it is typically
of great help that the Hamilton-Jacobi equation does not depend on the
unknown function itself, but only on its derivatives.
Let us consider the following example of geometric optics:
= / ip{t,x)y/\ 1 + x
2
dt {if (t, x) > 0),
already explained in Example (3) of Section 1.1 in a slightly different
notation. The physical meaning is that x(t) is considered as the graph
of a light ray travelling in a medium with light velocity 'u
x
\, where c
is the velocity of light in vacuum. In this example, putting
L(t, x, x) = <p(t, x) \ A + :z
2
, (4.2.25)
we have
p = Li =
y/TTx
1
H = px-L = - v V ~ P
2
- (4.2.26)
I(s, q, (7, K) here is the time that a light ray needs to travel from A =
(a, K) to B = (5, q). The Hamilton-Jacobi equation I
s
-f H (5, g, I
q
) = 0
becomes the eiconal equation
I
2
s
+I
2
q
=V
2
- (4-2.27)
The surfaces I(s, q) = constant are called wave fronts.
Another simple example comes from a quadratic
L(t, x, x) = -(x
2
+ ax
2
) (a = constant). (4.2.28)
Then
p = L

= x,H = px-L= -(p


2
- ax
2
), (4.2.29)
and the Hamilton-Jacobi equation becomes
J
t
+ i ( /
x
2
- a r r
2
) = 0 . (4.2.30)
If we substitute J = p(t)x
2
, we are led to the Riccati equation
p + 2p
2
-%=0. (4.2.31)
4-3 Geodesies 87
If we substitute / = Xt 4- ip(x) with a parameter A, we obtain from
(4.2.30)
-A + - (V'(rr)
2
- ocx
2
) = 0,
i.e.
and a solution
The equation
means
<ip'(x) = y/ax
2
4 2A
JT = - A t + / >/a
2
+ 2Adf. (4.2.32)
./o
I\ (t,X,\ ) = [I
di
-
t+
f-r
Jo \/a<
V ^
2
4- 2A
This can be solved for x; let us assume for example a < 0; then the
solution is
/ 2A
x = \ sin ( \ / ^a (t 4- /^)) .
V ex
x of course solves the Euler-Lagrange equation for (4.2.28)
x = ax.
A physical realization is the harmonic oscillator, where x(t) is the dis-
placement of an oscillating spring, with a = - ^ (m = mass, k = spring
constant). Since
p = I
x
, I
t
+ H(x, I
x
) = 0,
we obtain from (4.2.32)
A = H{x,p),
i.e. A is the energy of the spring.
4.3 Geodesi es
We consider the case where L is homogeneous of degree 1, i.e.
L = L&x
i
. (4.3.1)
Then
det L

i
XJ
= 0, (4.3.2)
88 The theory of Hamilton and Jacobi
and we cannot perform a Legendre transformation as in Section 4.1. We
have
H^-L + tfLv = 0, (4.3.3)
and the computations of Section 4.2 yield (writing L& instead of pi etc.)
J, = L ( * , g , g ) - g % = 0 (4.3.4)
An example are the geodesic lines considered in Chapter 2. Here,
L=y/Q
with
Q = g
ij
(x\...,x
n
)x
i
x
j
. (4.3.5)
The Euler-Lagrange equations are
Z{7Q
Q
")-JQ
Q
* --
(43
-
6)
Since t does not occur explicitely in (4.3.5) and since / is invariant under
transformations of t, we may choose t such that
Q = 1, (4.3.7)
i.e. that solutions are parameterized by arc-length. Equation (4.3.6) then
becomes
j
t
Q* - Qx* = 0. (4.3.8)
Conversely, along a solution of (4.3.8), we have Q = constant, justifying
our choice of t. Namely, Q is homogeneous of degree 2 w.r.t. the variables
*, hence
Qiii* = 2Q. (4.3.9)
Differentiating (4.3.9) w.r.t. t along a solution,
j
t
Qi J i* 4- Q

iS? = 2j
t
Q = 2Q
x
ii
i
-f 2Q^x*,
and (4.3.8) indeed yields
-Q 0 along a solution,
at
4-4 Fields of extremals 89
As already demonstrated in 2.1, (4.3.8) are the Euler-Lagrange equa-
tions for
E=
\j Q(
x
(
t
)^W)dt=\J\i
j
(x(t))x\t)x'(t)dt (4.3.10)
We recall (Lemma 2.1.1) that the Schwarz inequality implies
/ y/Qdt <(s-<r)( J Qdt)
2
with equality precisely if Q = constant, and the extremals of E are pre-
cisely those extremals of i" parameterized proportionally to arc-length.
In contrast to I, E is no longer invariant under transformations of t.
Therefore, for solutions of the Euler-Lagrange equations corresponding
to E, the parameterization is determined up to a constant factor. The
Hamiltonian for E is
H = Q
i
x
{
-Q = Q because of (4.3.9) . (4.3.11)
Moreover,
Pi = Qx*=29iji
j
- (4.3.12)
Thus
H = -/hiPj (with g* = (gij)-
1
). (4.3.13)
The Hamilton-Jacobi equation becomes
Et + ^g
ij
E
x
<E
XJ
= 0 cf. (4.3.13), (4.2.11), (4.3.10) (4.3.14)
and the canonical equations are
x'^l^Pj cf. (4.1.8), (4.3.13) (4.3.15)
lda
kj
Pi = - 4- ^r Pf cPi
cf
- t
4
-
1
-
7
)- (
4
-
3
-
n
)> (
4
-
3
-
5
)-
As observed at the end of Section 4.2, E depends linearly on t.
4.4 F ields of extremals
Let ft C M
n+1
satisfy the assumptions of 4.2, T G ( ^( f yR) . The equa-
tion
T( ( T, ^
1
, . . . , ^
n
) =0 (4.4.1)
90 The theory of Hamilton and Jacobi
then defines a possibly degenerate hypersurface E (assume E ^ 0). Given
B = (s, q
1
,..., q
n
) G fi, we seek A = (a, K
1
, . . . , K
n
) G E that minimizes
I(s,q\...,q
n
,a,K\...,K
n
)
as a function of (a, ) satisfying (4.4.1). At such a minimizing
A, we have with some Lagrange multiplier A
I
a
+ Ai ; = 0 (4.4.2)
7 ^ - f AT ^ = 0 (j = l , . . . , n) .
Unless the situation is degenerate (A = 0 or T
a
= T
K
i = 0 for all z), this
means that the vector ( 1^, i ^i , . . . , i ^ ) is proportional to the gradient
of T, hence orthogonal to E. From (4.2.9), (4.2.10), we then obtain
-H(<r,, </?) = L(a, K, k) - kiL
ki
= XT^ (4.4.3)
These are equations for the tangent vector (K
1
, . . . ,
n
) of the solution
from A to . A solution satisfying (4.4.3) is called orthogonal to E. We
want to use the following:
Assumption. Through each point of CI, there is precisely one solution
orthogonal to E.
For each B = ( s , ^
1
, . . . , <?
n
), we thus find a unique
A = (a (5, q), K (5, q)) G E minimizing / ( s, #, a, K). We call
J(s, q) := / ( 5, g, a ( 5, g) , K ( 5, g) )
the geodesic distance from the hypersurface E.
Theorem 4. 4. 1. Given such a field of solutions orthogonal to E, the
geodesic distance satisfies
J
9
= -H(s,q,L
4
) (4.4.4)
and
J
q
j =L#, (4.4.5)
hence also the eiconal equation
J
s
+ H(s
y
g, J
q
) = 0. (4.4.6)
4.4 Fields of extremals 91
Proof.
J
s
= I
s
+ I
a
a
s
+ I
K
iK
i
s
(4.4.7)
T(a(s, g), tt(s, #)) = 0 implies
a
a
T + n\T
K
i = 0
and likewise
If we then use (4.4.2), we obtain in (4.4.7)
Jgi -fgi >
and the result follows from (4.2.7), (4.2.8), (4.2.11).
Conversely
q.e.d.
Theorem 4.4.2. If J(s,q) is a solution of (4*4-G) f
c
^
ass
C
2
> there
exists a field of solutions orthogonal to the hyper surf aces J(s, q) = con-
stant, and J is the geodesic distance from the hypersurface J = 0.
Proof Let J satisfy (4.4.6). We put
JH:=J
q
*(s,q). (4.4.8)
The following system of ODE
(f = #
p
, (
S
, ^ , J
g
, ) (4.4.9)
then defines an n-parameter family of curves. By (4.4.8), we have along
any such curve
Pi
=
Jq
i
a "J" JqiqiQ >
and (4.4.6) gives
J
S
q
i
"+" Hq* "+" H
pj
J
q3q
x 0.
Recalling (4.4.9), we obtain
Pi = ~H
qi
. (4.4.10)
92
The theory of Hamilton and Jacobi
Equations (4.4.9) and (4.4.10) state that the curves q(s) constitute a
field of solutions. (4.4.6) and (4.4.8) yield
-H = J
s
This means that (4.4.3) is satisfied for T = J with A = 1, and the
solutions are orthogonal to the hypersurfaces J = constant.
q.e.d.
Theorem 4.4.1 gives solutions of the Hamilton-Jacobi equation (4.4.6)
depending on an arbitrarily given function T C
1
( R
n +1
) (namely, we
obtain those solutions that start on T = 0), whereas Theorem 4.4.2
implies that all solutions are obtained in that way. The surfaces J =
constant are called parallel surfaces of the field. In the special case where
the hypersurface T 0 degenerates into a point, we recover the consid-
erations of Section 4.2.
4.5 Hilbert' s invariant integral and Jacobi' s theorem
For a solution J(, r e
1
, . . . , x
n
) of the Hamilton-Jacobi equation, we put
again
Pi '- J
x
i
V"i
x
> # )
If A = (a, K
1
, . . . , K
n
) and B = (5, q
1
,..., q
n
) are connected by an arbi-
trary differentiable path x*(r), the integral
J(B)-J(A) = j
t
J(r,x(r))dr
+ J
T
)dr
does not depend on this particular path, but only on the end points A
and B. We rewrite this integral as
fA
p~-H(T,x(T),p(r)))dT (4.5.1)
and call it Hilbert's invariant integral Conversely now let functions
Pi{T,x
x
,... , #
n
) be given in a region O C E
n + 1
for which the integral
(4.5.1) does not depend on the path x{r) connecting A = (a, x (a)) and
4-5 Hilbert's invariant integral and Jacobi's theorem 93
B = (s,x (s)). Thus, we may define J : Q > R by
J(B) - J(A) = j ' (p~ - H( r , x ( T) , p ( r ) ) ) dr. (4.5.2)
Since this integral does not depend on the path connecting A and JB, we
must have
J
x
* = Pi (4-5.3)
Jt = -H(t,x,p).
J then solves the Hamilton-Jacobi equation. By Theorem 4.4.2, any so-
lution of the Hamilton-Jacobi equation is the geodesic distance function
for a field of solutions of the canonical equations. Thus, any invariant
integral of the form (4.5.1) yields a field of solutions.
Let us now reconsider Jacobi's Theorem 4.2.1. Let
ip = G(t,x\...,x
n
,\
u
...,\
n
) + \ (4.5.4)
be a complete integral of
p + H{t, x\ . . . , x
n
,
P l
, . . . , p
n
) = 0 (4.5.5)
(with p = <pt, Pi w ) ;
m
particular
det(G
xtXj
)?0. (4.5.6)
Jacobi's theorem says that we obtain a 2n-parameter family of solutions
of the canonical equations by solving
G
x
i = P i ,
where the parameters are Ai , . . . , A
n
, /i
1
,. > / / \ For fixed values of Ai , . . . ,
A
n
, A, G determines a field of solutions of the canonical equations, and
by the preceding consideration, it is given by the corresponding invariant
integral
G(B) - G(A) = j
S
( W ^ - H \ d r (4.5.7)
= { L (r, x*(r), X\T)) + (^ - i r ^r ) ) L j dr,
where x
l
(r) now denotes the derivative in the direction of the solution
and not in the direction of the arbitrary curve x*(r) connecting A and B.
We now vary Ai , . . . , A
n
, but keep the curve x*(r) fixed. Then the field
94 The theory of Hamilton and Jacobi
of solutions varies, and so then does x
l
{r). We also determine A so that
G(A) = 0. Differentiating (4.5.7) then yields
G
" - f ( ( ^ -
i ,
)
L
^ )
d T
'
(4
'
5
-
8)
In the same way as G{B), this expression only depends on B {A is kept
fixed for the moment) but not on the particular x
J
( r ) . For each 2?, we
find Bo on the surface
G( *, x\ . . . ,
n
, Ai , . . . , A
n
) = 0
that can be connected with B by a solution of the canonical equations.
Along such a solution, we have
dx> .
7
and the integrand in (4.5.8) thus vanishes along this curve. Instead of
integrating from A to B, it therefore suffices to integrate from A to JBo,
and we obtain
G
Xi
= IM\ (4.5.9)
with fi
1
being the value of the integral from A to Bo. Thus, fi
l
can be
considered as a constant for the solution passing through Bo.
If, conversely, (4.5.9) defined a family of curves x
l
(t, X
3
r, //?) (the family
is locally unique because of (4.5.6)), then, since G\
3
is constant, the
integrand in (4.5.8) has to vanish along any curve of the family. Thus
-it )L
JXi
=Q (i = l , . . . , n) . (4.5.10)
dr
In our field we have (cf. (4.2.8))
Li G
x
j,
hence by assumption (4.5.6)
det L
JXi
= detG
XJXl
^ 0.
Equation (4.5.10) then implies
this means that the curves defined by (4.5.9) are solutions of the canoni-
cal equations contained in the field defined by G(t, rr
1
, . . . , rr
n
, Ai , . . . , A
n
).
We also observe that the parameter A is only used for specifying the sur-
face G = 0 and has no geometric meaning beside that.
4-6 Canonical transformations 95
4.6 Canonical transformations
We want to find transformations, i.e. diffeornorphismsf
(x,p) h-> (,TT),
that preserve the canonical equations
x H
p
P=-H
x
. (4.6.1)
This means that = (#,p), n = 7r(x,p) satisfy
* = - # . (4.6.2)
with #*(t,(x,p),7r(x,p)) = H(t,x,p).
Equation (4.6.1) constitutes a system of ODE and if the assump-
tions of the Picard-Lindelof theorem are satisfied, a solution exists for
given initial values x(t
0
) = x
0
, p(to) po on some interval [t
0
, t i ].
For any i [^o^i]
5
we then obtain such a transformation by letting
(x,p) = x(f), 7r(x,p) = p(t) where (#(),p()) is the solution of (4.6.1)
with x(to) = x,p(to) = p. Thus, the evolution of (4.6.1) in time t, the
so-called Hamiltonian flow, yields 'canonical transformations'. However,
the concept of canonical transformations is more general as we now shall
see.
Since
dx
1
dpi
_ d& a&
* > = M
x +
e
p
<
f A diffeomorphism is a bijective map t hat together with its inverse is everywhere
differentiable.
96 The theory of Hamilton and Jacobi
and
dx
%
dpi
jsa
l
dlT4
we obtain the conditions
or in matrix notation
rSi
\ dx
dn
lite
SL-i
dp
dn
aFJ
- l
H
x
.
8X
\ H
x
d&
P
dpj _ d&
dnj dx
1
'
dx
1
_ d&
die* ~~ dpi'
dpi dnj
dx
1
dnj
\ (%Y
. - ( S)
T
dpi
-( g)
( H)
(4.6.3)
(4.6.4)
where ^4
T
denotes the transpose of a matrix A. Obviously, this is a con-
dition that does not depend anymore on the particular Hamiltonian H.
Definition 4. 6. 1. A diffeomorphism ip : R
2n
- R
2n
, (x,p) *-> fan),
satisfying (^..#) (or,equivalently (4-6-4)) i>
s
called canonical transfor-
mation.
Canonical transformations can often be used to simplify the canonical
equations. Before we return to that topic, however, we interrupt the
discussion of the Hamilton-Jacobi theory in order to describe some basic
points of symplectic geometry (for more information on that subject, we
refer to D.Mc Duff, D.Salamon, Introduction to Symplectic Topology,
Oxford University Press, Oxford, 1995). We denote the (n x n) unit
matrix by I
n
and put
0 - V
Jn 0
Then obviously
J:=
J
2
= - I
2n-
(4.6.5)
4-6 Canonical transformations 97
Equation (4.6.4) may then be written as
(Dil>)-
1
= -J(DiP)
T
J, (4.6.6)
or equivalently
(Di/jfjDi/j = J. (4.6.7)
In this connection, a ty satisfying (4.6.7), i.e. a canonical transformation,
is also called symplectomorphism. Prom these relations, one also easily
sees that ^ is a canonical transformation iff ip~
l
is.
In terms of J, the canonical equations (4.6.1) can also be written as
z = -JVH(t,z) (4.6.8)
where z = (x,p), VH(t,z) = (H
X
,H
P
).
For a reader who knows the calculus of exterior differential forms, the
following explanation should be useful. We consider the two-form
uj = dx
i
A dpi on E
2 n
(here, as always, we use a summation convention: dx
l
A dpi means
Sr =i dx
l
Adpi). According to the transformation rules for exterior differ-
ential forms (i.e. d^ ^*jdx
l
etc.), we have, for = (#,p), n = 7r(x,p),
d
J
A dnj = T^TTT-^ - TT^-Tr-^ dx
%
A ofe.
^
J
\dx*dp
k
dp
k
dx')
yk
Thus, a; remains invariant under the transformation ?/>, i.e.
dtf A rfTr^ = dx
l
A dp< (4.6.9)
precisely if ip is a canonical transformation. In fact, this is often used as
the definition of a canonical transformation. If UJ is left invariant under
i/>, so is
uj
n
:= UJA---A<JJ = nli-l)****^ dx
1
A- -Adx
n
Adpi A- -Adp
n
. (4.6.10)
n times
Since
G^
1
A Ad
n
Ad7Ti A Ad7r
n
= (det Dip)dx
l
A Acte
n
Adpi A Adp
n
,
we conclude Liouville's:
98 The theory of Hamilton and Jacobi
Theorem 4. 6. 1. Every canonical transformation i/>: R
2n
> R
2n
satis-
fies
det>V> = l. (4.6.11)
q.e.d.
One also expresses this result by saying that a canonical transforma-
tion is volume preserving in phase space as dx
1
A A dx
n
A dpi A A dp
n
can be interpreted as the volume form of R
2n
. By what was observed in
the beginning of this section, this applies in particular to the Hamilto-
nian flow which constitutes Liouville's original statement.
After this excursion and interruption, we return to our canonical equa-
tions (4.6.1) and try to simplify them by suitable canonical transfor-
mations. Canonical transformations may be easily obtained from the
variational integral
/ = / L(t,x,x)dt
Jti
with
L(t,x,x) = x-p- H(t,x,p) (p = L).
If W is any differentiate function, then
has the same critical points as / , because
r = i + w (t2)-w (t!),
so that I* and J differ only by a constant independent of the particular
path x(t). Thus, we may for example take any function W(t,x,) and
require that for all choices of x, , x,
dW
x-p- H(t,x,p) = n - #*(*,, TT) + . (4.6.12)
Then, with
/* = f'A(t,s,)dt,
differs from J only by a constant. Thus, if x(t) is a critical path for J,
4-6 Canonical transformations 99
(#(), p()) then becomes a critical path for /*. Since
dW
- = W
t
+ W
x
.x + W
r
S,
(4.6.12) becomes
* (P ~ W
x
) - (n + Wz) - H + H* - Wi = 0 (4.6.13)
Since (4.6.13) is required to hold for all choices of x, , x, , we obtain:
Theorem 4.6.2. Given an arbitrary (differentiable) function W(t, x, ),
a canonical transformation (transforming (4-0.1) into (4-6.2)) is ob-
tained through the equations
V^W
X
7T = -Wf: (4.6.14)
H* = # , i.e. W
t
=0.
Wt = 0 of course means that W = W(x, ).
In the same manner, we may also take a function W(t,p, ), W(t, x, TT)
or W(t,p,7r). In the first case, we obtain for example the equations
x = W
p
H* =H , i.e. W
t
= 0.
Here and above, of course H* if*(,,7r).
We may now easily explain Jacobi's method for solving the canonical
equations. We try to find W(x,) satisfying
H(t,x,W
x
(x,0) = H*(0, (4.6.15)
i.e. reduce the Hamiltonian to a function of the variable alone. We
have to require that
detW
x
i
V
T^O. (4.6.16)
This ensures that the equation n = W^ determines x, and p then is
determined from p = W
x
. If (4.6.15) holds, (4.6.2) becomes
7r = - # f (4.6.17)
This implies that ^
1
, . . . , ^
n
are constants of motion (i.e. independent
of t), or so-called integrals of the Hamiltonian flow. A system for which
100 The theory of Hamilton and Jacobi
n independent integrals can be found is called completely integrable.
Thus, if we can find a so-called generating function W(x, ) of the above
type reducing the Hamiltonian to a function of alone, the canonical
system is completely integrable. Clearly, since in this case *, . . . ,
n
are
constant in , the relation n H^(^) can then be used to determine
7Ti,..., 7r
n
. In other words, a completely integrable canonical system may
be solved explicitly through quadratures. Actually, one may show in this
case that the sets T
c
= {* = c
1
, . . . ,
n
= c
n
} for a constant vector
c = ( c
1
, . . . , c
n
) are n-dimensional tori, if compact and connected. Thus,
the so-called phase space {(x,p) G R
2n
} is foliated by tori that are
invariant under the motion, and on each such torus, the motion is given
by straight lines.
It should be pointed out, however, that completely integrable dynam-
ical systems are quite rare, in the sense that the complete integrabil-
ity usually depends on particular symmetries, and their dynamical be-
haviour is quite exceptional in the class of all Hamiltonian systems.
The invariant tori may disappear under arbitrarily small perturbations.
By way of contrast, the Kolmogorov-Arnold-Moser theory asserts that
these invariant tori persist under sufficiently small and smooth pertur-
bations if the coordinates of HZ are rationally independent and satisfy
certain Diophantine inequalities, and if the matrix Hit of second deriva-
tives is invertible.
In the older literature, the notion of 'canonical transformation' is usu-
ally applied to any transformation ip : R
2n
> R
2n
that preserves the
form of the canonical equations, i.e. (4.6.1) is transformed into (4.6.2),
but without requiring that
H'(t,Z,ir)=H(t,x,p).
An example of a canonical transformation in this wider sense is
= 2X , 7T = p
with H* = 2H.
If we now take a generating function W (t, x, ) as above, the Hamiltonian
is transformed into
H* =:H + W
t
(4.6.18)
while the first two relations of (4.6.14), i.e.
p = W
x
,7r = - W
e
(4.6.19)
4-6 Canonical transformations 101
still hold. This may be used to explain Jacobi's theorem once more, as
we now shall see.
Let I(t , x , . . . , x
n
, Ai , . . . , A
n
) be a solution of the HamiltonJacobi
equation
I
t
+ H(t,x,I
x
) = 0, (4.6.20)
depending on parameters Ai , . . . , A
n
and satisfying as usually
det /
x
i
Ai
^ 0 . (4.6.21)
We now choose
W( t , x
1
, . . . , x
n
, i , . . . , 4n) = / ( t , a :
1
, . . . , x
n
^ i , . . . , 4
n
) .
The corresponding transformation then is
7T = - J
e
(4.6.22)
j r ( t , , 7r ) = ff(t,x,p) + I
t
.
Because of (4.6.20),
J T s O .
Thus, the new canonical equations are just
= 0
7T = 0 .
Solutions are of course
= A = constant
7T = I\ = fi = constant.
We have thus obtained the statement of Jacobi's Theorem 4.2.1, namely
that from a solution of (4.6.20) with (4.6.21), we may obtain solutions
of the canonical equations by solving
h = V
I
X
=P
with parameters A = ( Ai , . . . , A
n
), \i = (/ i
1
, . . . , \i
n
).
102 The theory of Hamilton and Jacobi
Classical references for this chapter include:
C.G.J. Jacobi, Vorlesungen iiber Analytische Mechanik (ed. H. Pulte),
Vieweg, Braunschweig, Wiesbaden 1996,
C. Caratheodory, Variationsrechnung und partielle Differentialgleich-
ungen erster Ordnung, Teubner, Leipzig 1935,
R. Courant, D. Hilbert, Methoden der Mathematischen Physik II,
Springer, Berlin, 2nd edition, 1968.
The global aspects are developed in
V.I. Arnold, Mathematical Methods of Classical Mechanics, GTM60,
Springer, New York, 1978.
A recent advanced monograph is
H. Hofer, E. Zehnder, Symplectic Invariants and Hamiltonian Dynam-
ics, Birkhauser, Basel, 1994.
That text will give readers a good perspective on the present research
directions in the field.
Exerci ses
4.1 Discuss the relation between the canonical equations for the
energy functional E and the equations for geodesies derived in
Chapter 2.
4.2 (Kepler problem) Consider the Lagrangian
L{x,x) = I \x\
2
+ A for x M
3
.
2 |x|
Compute the corresponding Hamiltonian and write down the
canonical equations. Show that the three components of the
angular momentum x Ax are integrals of the Hamiltonian flow.
4.3 For smooth functions F, G : R
2n
E, define their Poisson
bracket as
where z = (x,p) = (x
l
,..., x
n
,p\,... ,p
n
) are Euclidean coordi-
nates of E
2 n
. Let z(t) = (x(t),p(t)) be a solution of a canonical
system
x = H
p
P= -H
x
Exercises 103
for some Hamiltonian H(x,p) that is independent of t. Show
that for any (smooth) F : E
2 n
- E
F{z{t)) = {F,H}.
Show that the Poisson bracket is antisymmetric, i.e.
{F,G} = -{G,F}
and satisfies the Jacobi identity
{{F, G}, L} + {{G, L}, F} + {{L, F}, G) = 0
for all smooth F, G, L.
4.4 Show that a diffeomorphism i/> : E
2 n
~> E
2 n
is a canonical trans-
formation if
for all smooth F, G.
5
Dynamic optimization
Optimal control theory is concerned with time dependent processes that
can be influenced or controlled via the tuning of certain parameters.
The aim is to choose these parameters in such a manner that a desired
result is achieved and the cost resulting from the intermediate states
of the process and from the application or change of the parameters
is minimized. In some problems, the control parameter can be applied
only at discrete time steps, while other problems can be continuously
controlled. As we shall see, however, the discrete and the continuous
case can be treated by the same principles. Since the end result may be
prescribed, and the value of a parameter at some given time influences
the state of the system at subsequent times and therefore typically will
also contribute through this influence to the cost of the process at those
later times, the determination of the optimal control parameters is best
performed in a backward manner. This means in the discrete case that
one first selects the best value of the control parameter at the last stage,
whatever state the system is in at that time, then the value at the
second-to-last stage, so that at this step the contribution of the value
of the control parameter at the last stage to the total cost function is
already determined and one only needs to optimize the cost function
w.r.t. the second-to-last parameter value, and so on.
5.1 Discrete control problems
We consider a process with n states # i , . . . , x
n
M
d
. At each state #i,
we may choose a control parameter
KeAi, (5.1.1)
104
5.1 Discrete control problems 105
where A* is a given control restriction (A* C R
c
) to determine
Xi+i =(pi(xi,\i) (5.1.2)
with cost
The total cost of the process starting at the initial state x
v
is
n
j Mx, A, . . . , A
n
) := ^f ct ( xi , Ai ) , with x
i+
i = y><(x<,A<). (5.1.3)
We wish to minimize the total cost of the process and define the Bellman
function
J(x) := inf K
u
(x
u
,\
u
,...,\
n
) (i/ = 1, . . . , n). (5.1.4)
A
i
A
i
t = f,.. . , n
Theor em 5. 1. 1. The Bellman function satisfies the Bellman equation
J(x) = inf (fc (x, A) + J+i (yv (x, A))) /or i/ = 1, . . . , n
AA
(5.1.5)
(ftene, we pttf 7
n+
i = 0). Furthermore, (A, . . . , A
n
) G A^ x x A
n
,
( x^, . . . , x
n
) mf/i (5.1.2) are solutions of (5.1.4) iff
Ij(xj) = kj(xj, Xj) + Ij+xixj+i) forj = v,..., n. (5.1.6)
Proof. Since
K
v
\x
v
\ A, . . . , A
n
) = k
u
(Xi,, \v) -h Kv+i {(PvyXi") Ai/); Aj,_|_i,..., A
n
)),
we get
J(x) = inf jFf(x;A,...,A
n
)
t = t/,. , . , n
inf inf K
v
(x
v
\ \
v
,..., A
n
)
= inf M^ > < M + inf jFf+i (y>(x, A); A+i , . . . , A
n
)
AA \
x
j
A
j I
\ j = v + l,...,n /
= inf (fc(x, A) + I+i (<Pv(x
v
, A))),
AA
which is (5.1.5). For (A, . . . , A
n
) G A^ x x A
n
, Xj+i = <PJ(XJ,\J) for
106 Dynamic optimization
j = i / , . . . , n,
< kvyXy, X
u
) + -h rC
n
(x
n
, A
n
j = K
y
\X
yi
A
n
, . . . , A
n
j .
If the infimum w.r.t. Xj G Aj (j = z/, . . . , n) is realized, we must have
equality, and (5.1.6) follows.
q.e.d.
Corollary 5.1.1. (Ai , . . . ,A) G Ai x- - - xA
n
, ( xi , . . . , x) with (5.1.2)
is a solution of (5.1.4), iff for all v = I,... ,n, (A, . . . , A
n
) G A x x
A
n
, ( av, . . . , x
n
) with (5.1.2) is a solution of (5.1.4)-
Corollary 5.1.2. (Bellman's method) An optimal solution of the pro-
cess can be calculated as follows:
For any value of x
n
, compute X
n
(x
n
) minimizing (5.1.5) for
v = n. Having computed XJ(XJ) for j = i / + l , . . . , n , com-
pute X
u
(x
u
) for any value of x
v
as to minimize (5.1.5) and put
x+i = (pu(x
u1
X
u
(x
u
)). For an arbitrary initial value x\, an
optimal process thus is given by:
Ai := Ai(xi) , x
2
:= Vi(xi,Ai) , A
2
= A
2
(
2
) ,...
5.2 Continuous control problems
We want to minimize
K(ti,x(h)) for a path x : [t
0
,h] -+ R
d
under the following conditions: We have the initial condition
x(t
0
) = x
0
and the final condition
x(h) G B
x
with a given set B\ eR
d
. We have the control equation
x(t) = f(t, x(t), X(t)) for almost all t G (t
0l
h)
for a piecewise continuous control function X(t) satisfying
X(t) G A
5.2 Continuous control problems 107
for some given A C E
c
. Pairs (\(t),x(t)) satisfying all these restrictions
are called admissible, and the set of admissible pairs is called P(o,#o).
We put
I(t
0
,x
0
):= inf K(t
u
x(ti))
( A( t) ,x( t) ) P( to,x
0
)
(Bellman function).
Lemma 5.2.1.
(i) I(t\,xi) = K(ti,xi) for all x\ G B\
(ii) For any path (A(t), x(t)) G P(to, XQ), I(t, x(t)) is a monotonically
increasing function oft [to,t\].
Proof (i) is obvious. For (ii), if to < T\ < r
2
< t i , the set of all admiss-
ible paths from (T
2
,X(T2)) to (t\,B\) can be considered as a subset of
those ones from ( TI , X( TI ) ) to (t\,x(ti)). Namely, if we have any path
from ( T2, ( T2) ) to (ti,x\) for some x\ G fii, we may compose it with
x(t)\.
r
j to obtain a path from ( TI , X( TI ) ) to ( t i , #i ) . Thus, every end-
point in B\ that can be reached from (T2, X(T
2
)) by an admissible path
can also be reached from ( TI , X( TI ) ) by an admissible path. This implies
monotonicity.
q.e.d.
Theorem 5.2.1. (A(t),ir(t)) is a solution of the problem, if I(t,x(t))
is constant in t. Moreover, if there exist a function J(t,x) that satisfies
J ( t i , #i ) = K(ti,xi) for all x\ G B\ and is monotonically increasing
along any admissible path, and an admissible path (\(t),x(t)), along
which J is constant, then that path is a solution of the problem.
Proof. For a solution,
/(to, *o) = K{t
u
x{h)) = J( t i , x( t i ) (x
0
= x(t
0
)), (5.2.1)
I(t,x(t)) then is constant by Lemma 5.2.1 (ii). If 7(t,x(t)) is constant,
then (5.2.1) holds, and by Lemma 5.2.1, we have a solution. Given J as
described, by the monotonicity of J, for any admissible path
J{to,x
0
) < K(ti,x(t\)) and for the path (A(),x(t)),
J(t
0
,x
0
) = J(ti,5(ti)) = K(t
u
x(h)),
and optimality follows.
q.e.d.
108 Dynamic optimization
Lemma 5.2.1 implies that for those t for which I(t,x(t)) is differen-
tiable ((\(t),x(t)) G P(t
0
,x
0
))
I
t
(t, x(t)) + /*(*, x(t))f(t, x(t), X(t)) > 0.
For an optimal (A(t),x(t)), we have by Theorem 5.2.1 then
hit, x{t)) + h{t, xit))fit, xit), A(t)) = 0.
Corollary 5.2.1. iBellman equation) Let t G [to,i]> ^
d
- Assume
that for every X G A, t/iere exists on admissible pair (A(t),x(t)) mt/i
A(r) = A, ar(r) = 6 TTien
i ni : ( / t ( r , 0 + / ( r , 0 / ( r , , A) ) =0 .
A6A
Proof. This follows from the proof of Lemma 5.2.1. Namely, the assump-
tion implies that we may select A such that the path is optimal at the
point (r, ) under consideration.
q.e.d.
Example. We want to minimize the integral
f
1
{u
2
it)+X
2
it))dt
Jt
0
with the initial condition
uit
0
) = u
0
and the control equation
u(t) = auit) + p\it) with given a, p G R. (5.2.2)
In order to express this problem as a control problem, we introduce a
new dependent variable v{i) as solution of the equation
v(t) = u
2
it) + A
2
(t) , vit
0
) = 0. (5.2.3)
We then want to minimize
Given /9 : [to, *i] > R with
p(*i) = 0
and satisfying the Riccati equation
p(t) = - 2 o p ( t ) +/ 3
a
p
a
( t ) - l , (5.2.4)
5.3 The Pontryagin maximum principle 109
we put
J{t,u,v) = p(t)u
2
(t)+v(t).
Then
J(h,u(t
1
),v(t
1
))=v(t
1
)
and from (5.2.2), (5.2.3), (5.2.4)
^-J(t,u(t),v(t)) = (3
2
p
2
u
2
+ 2p(3u\ + A
2
= (/Jpu + A)
2
> 0,
at
and this expression vanishes precisely if
\(t) = -0p(t)u(t). (5.2.5)
By Theorem 5.2.1, x(t) = (u(t),v(t)) and \(t) = ~/3p(t)u(t) yield an
optimal solution.
If we substitute X(t) through the control equation (5.2.2) in the vari-
ational integral, we obtain the integral
(w ^
2+
(
i+
$ )
u{t)2
~
2
-^
u{t)ii{t)
)
dt
'
which is essentially the same as the one considered at the end of 4.2 with
integrand given by (4.2.28). We recall that the latter one had also been
reduced to a Riccati equation.
Equation (5.2.5) expresses the control parameter as a function of the
state of the system. We just have a feedback control: knowing the state
at a given time determines the control needed to reach an optimal state
at the next time.
5.3 The Pontryagin maxi mum principle
We consider the control problem
/ F(,x(), X(t))dt min
Jt
0
(5.3.1)
with the control conditions
x(t
0
) = x
0
(5.3.2)
x(t) = f(t,x(t),X(t))
110
with controls
and the end condition
Dynamic optimization
X(t) G A C R
c
0( <i , * ( i ) ) =O. (5.3.3)
Here, X(t) is required to be piecewise continuous, and x(t) to be continu-
ous. (Equation (5.3.2) then has to be interpreted as an integral equation
x(t) = x
0
+ J
t
/ ( r , #( T) , A(r))dr.) F, / , and g are required to be of class
C
1
. Also, to is fixed, whereas t\ > to is variable subject to the restriction
(5.3.3). We define the Pontryagin function
H(x,\,p,t,no) : = P - / ( * , S , A ) -n
Q
F(t,x,\).
We now state the Pontryagin maximum principle
Theorem 5.3.1. If (x(t),\(t)) is a solution of the control problem,
there exist Ao > 0, a = ( a i , . . . , a^) M
d
(a ^ 0 if HQ = 0) and a
continuous p = ( pi , . . . ,Pd) on [o, h] such that at all points where X(t)
is continuous, we have
H(x(t), \(t),p(t)
9
1, fio) = max H(x(t)
9
A,p(*), *, W>) (5.3.4)
and
p = -H
x
,x = H
p
(5.3.5)
and at the end point t\, we have the transversality condition
da
j
p(t
1
) = ~--(t
1
,x(t
1
))-a
j
. (5.3.6)
There also exists a continuous function rj : [to,t\] > K
s
^ch that at all
points where X(t) is continuous
v
(t)=H(x(t),\(t),p(t),t,iio) (5.3.7)
and
f){t) = H
t
(5.3.8)
V(ti)^^(tux(ti))^ (5.3.9)
Also, one may always achieve fiQ = 0 or I.
5.3 The Pontryagin maximum principle 111
Remarks:
(1) The equation x = H
p
is just the control equation
x = f(t,x(t),\(t)).
(2) If A = M
c
, then (5.3.4) becomes
A(x(t ), A(t ), p(t ), t , / i o)=0.
(3) If we want to guarantee a fixed end time i\, we simply introduce
an additional variable
x
d+1
= t
with control conditions
x
d+1
= 1
x
d+l
{to) = t
0
and end condition
x
d+1
{h) = h.
We now want to exhibit the Hamilton-Jacobi theory as a special
case of optimal control theory. Concretely, we want to derive the Euler-
Lagrange equations which are equivalent to the canonical equations of
Chapter 4 from the Pontryagin maximum principle. We thus consider
the variational problem
L(t, #(), x(t))dt min
with x(t
0
) = #o, #(i) = #i , x : [^o>^i] ~ R
d a n
d where x(t) is required
to have piece wise continuous first derivatives. We introduce the control
variable through the control equation
\{t) = x(t)
with A = M
d
, i.e. no constraint imposed. We have g(ti,x(t\)) = x\
x(t\). The Pontryagin function of this problem is
H(x, A,p, t, fi
0
) = p A - noL(t, x, A).
By Theorem 5.3.1 there exists fi
0
= 0 or 1, a G R
d
(a ^ 0 for fi
0
= 0)
J to
112 Dynamic optimization
and p e C([t
0
, *i], R
d
) with
V=~H
X
p(ti) = a
H{t,x(t),\(t),p(t),ii
0
) = maxW(*,x(t),A,p(t),/x
0
)
andr/ C([t
0
, t i ], ]R) with
t7(t)=W(t,a:(t),A(t),p(t),
W
,)
*) = Wt >
t ?(*i )=0.
We now want to exclude that fio = 0. In that case, we would have
f] = Ht = 0 , hence 77 = 0 since 77(^1) = 0
and
p = H
x
= 0 , hence p = a since p(ti) = a.
Thus
W = a A,
and since H
t
= 0, W(x(t), A(t),p(t), t, 0) = 0, and thus
a = 0,
contradicting the statement of the theorem that a ^ 0 in case /L^O = 0.
We may thus assume fio = 1.
The Pontryagin maximum principle then gives the Weierstrafi condi-
tion
L(t, x(t), A) - L(t, x(t), i (t )) > p (A - (*)) for all XeR
d
(5.3.10)
and

A
( t , x( t ) , A( t ) , p( t ) , l ) =0 (5.3.11)
and the Legendre condition
z (,#(),#()) is positive semidefinite. (5.3.12)
Equation (5.3.11) implies
P = L,
and together with
P =

ria; = LOJ
5.3 The Pontryagin maximum principle 113
we obtain the Euler-Lagrange equations
~L

= L
x
. (5.3.13)
at
A basic reference for the variational aspects of optimization and control
theory where also a detailed proof of the Pontryagin maximum principle
together with many applications is given is
E. Zeidler, Nonlinear Functional Analysis and its Applications, III,
Springer, New York, 1984, pp. 93-6, 422-40.
Part two
Multiple integrals in the calculus of
variations
1
Lebesgue measure and integration theory
1.1 The Lebesgue measure and the Lebesgue integral
In this section, we recall the basic notions and results about the Lebesgue
measure and the Lebesgue integral that will be used in the sequel. Most
proofs are omitted as they can be readily found in standard textbooks,
e.g. J. Jost, Postmodern Analysis, Springer, Berlin, 1998, pp. 151-97 and
209-15.
Definition 1.1.1. A collection S of subsets ofR
d
is called a a-algebra
(on R
d
) if
(i) E
d
6 E
(ii) IfAeT,, then also R
d
\ A e E
(iii) / / A
n
e E, n = 1, 2, 3. . . , then also U^Li
A
n e E.
The Borel a-algebra is the smallest a-algebra containing all open sub-
sets of R
d
. The elements of the Borel o-algebra are called Borel sets.
Easy consequences of (i)-(iii) are
(iv) 0 e E
(v) If A
n
e E, n = 1, 2, 3. . . , then also (X=i E.
(vi) UA,Be E, then also A - B := A \ (A n B) e E.
Definition 1.1.2. Let E be a o-algebra. A measure \i on E is a count-
ably additive function
\i : E -+ E
+
fl {oc}.
117
118 Lebesgue measure and integration theory
'Countably additive' here means that
(
oo \ oo
U
An
I = 5Z ^
A
)
n=l / 7n=l
/or any collection of mutually disjoint (AmCiA
n
= 0 /or ra ^ n) elements
of E. A measure defined on the Borel a-algebra is called a Borel measure.
A Borel measure [i is called a Radon measure if n(K) < oo for every
compact K C R
d
and fi(B) sup{fi(K) \ K C B, K compact} for every
Borel set B.
A measure / i o n E enjoys the following properties:
(vii) /x(0) = 0
(viii) If A, B G E, A c B, then fi(A) < fi(B)
(ix) If A
n
e E, n = 1, 2, 3, . . . and A
n
C A
n
+i for all n, then
M| I J A
n
J = lim fi(A
n
).
\ n = l /
Theorem 1.1.1. There exist a {unique) a-algebra E on R
d
and a
(unique) measure \i in E satisfying
(x) ;4ra/ open subset ofR
d
is contained in E (i.e. E contains the Borel
a-algebra)
(xi) For
Q
:
= {
x
= ( x
1
, . . . ,
x
d
) G R
d
| a, < x
j
< bj , j = 1, . . . , d} ,
/or numbers a\,..., a^, 6i , . . . , bd, we have
d
i =i
(xii) (translation invariance) For x G R
d
, A G E we /love
x + A := {x + 2/1 y G A} G E and fi(x + A) = n(A)
(xiii) If A C B, B T,, l*(B) 0, f/ien A G E (and, consequently,
H(A) = 0).
This [i is called Lebesgue measure, and the elements of E are called
(Lebesgue) measurable.
In later chapters, we shall however write meas in place of fi for Lebesgue
measure.
1.1 The Lebesgue measure and the Lebesgue integral 119
One should note that the a-algebra of (Lebesgue) measurable sets is
larger than the Borel a-algebra.
We say that a property holds almost everywhere in A C E
d
if it holds
on A \ B for some B C A with n(B) = 0. We say that two functions
f,g : A > EU {00} are equivalent if f(x) = g(x) for almost all x E A.
A set contained in a set of measure 0 is called a null set.
We usually write measA instead of n(A) for a measurable set A.
Definition 1.1.3. Let A C R
d
be measurable. A function
f : A-+RU{oc}
is called measurable if
{xeA\f(x)<\}
is measurable for every A E.
If/
n
, n e N, are measurable, c e E, then / i + /
2
, c/ i , /1/2, max(/ i , /
2
) ,
mi n(/ i , /
2
) , limsup
n
_>
00
/
n
, liminf
n
_>oo /
n
are likewise measurable. Any
continuous function / is measurable, because in that case {f(x) < A}
is open in its domain of definition. We have the following important
composition property:
Theorem 1.1.2. Let g : A E
c
be measurable (i.e. g = (g
1
,... ,g
c
),
and each component gi is measurable), y : E
c
E continuous. Then
y o g is measurable.
XA( * ) :={ J
The characteristic function \A oi A C R
d
is defined as
if xe A
0 otherwise.
Thus, A is measurable if and only if its characteristic function \A is
measurable.
More generally, s : A E is called a simple function or a step function
if it assumes only finitely many values, say s(A) = { Ai , . . . , AjJ, and if
all the sets {s(x) = A*} are measurable. Thus
k
S = 22*iX{aix)=\i}-
i=l
Theorem 1.1.3. / : A E is measurable if and only if it is the point-
wise limit of a sequence of simple functions. If f : A E is measurable
and bounded, then it is the uniform limit of a sequence of simple func-
tions.
120 Lebesgue measure and integration theory
Definition 1.1.4.
(1) Let A cR
d
be measurable with [i(A) < oo,
k
a simple function on A. The Lebesgue integral of s is
k
s(x)dx := ] P A^({ s(x) = A*}).
L
(2) Let A be as in (1), f : A R measurable and bounded. Let s
n
:
A E 6e a sequence of simple functions converging uniformly to
f according to Theorem 1.1.3. The Lebesgue integral of f then is
I f(x)dx := lim / s
n
(x)da
J A
n
~* J A
(this integral is independent of the choice of the sequence (s
n
)nN)-
(3) A as in (i ), / : i - ^ R U {00} measurable. Put
{
m if }{x) < m
n if f(x) > n
f(x) ifm<f(x)<n.
We say that f is integrable if
lim / fm
in
(x)da
exists. That limit then is called the Lebesgue integral f
A
f(x)dx
off-
(4) A c K
d
measurable, f : A R U {00} measurable, f is called
integrable if for any increasing sequence A\ C A<i C C A
of measurable subsets of A with fi(A
n
) < oc for all n, f
XAn
is
integrable on A
n
and
lim /
n-^ooJ
A
f{x)XA
n
{x)dx
exists. That limit then is independent of the choice of (A
n
) and
called the Lebesgue integral f
A
f(x)dx of f.
Theorem 1.1.4. The Lebesgue integral is a linear nonnegative func-
tional on C
l
(A), the vector space of Lebesgue integrable functions on a
measurable set A, and it satisfies:
1.1 The Lebesgue measure and the Lebesgue integral 121
(1) If f C
X
(A), and if f = g almost everywhere on A, i.e.
li{xeA\f(x)^g(x)}=0,
then g C
l
(A), and
I f(x)dx= / g(x)dx.
In particular,
f f(x)dx = Oifti(A) = 0.
(2) Iff C
l
(A), then \f\ C\A), and
\f f(x)dx\< [ \f(x)\dx.
\JA I J A
(3) If f e C
X
(A), h: A-+ RU {00} measurable with \h\ < / , then
hC
l
{A) and
I h(x)dx\ < I f(x)dx.
\JA I J A
(4) If ix(A) < 00, / : A K measurable with m < f < M, then
fC\A), and
mfji(A) < J f(x)dx < Mfi{A).
(5) / / (-A
n
)
ne
N is a sequence of mutually disjoint (Am O A
n
= 0 for
m^n) measurable sets, A := U^=i A
n
, f
l
{A
n
) for every n,
and if
00 .
V; / \f(x)\dx<<x>,
then f eC
l
(A), and
[ f(x)dx = T I f(x)dx.
Conversely, if f C
l
(A), then this equation holds for any such
sequence {A
n
)
neN
.
122 Lebesgue measure and integration theory
(6) / / / G C
X
(A), then for every e > 0, there exists
(f G Co(M
d
) := {g : R
d
-> R continuous\ {x \ g(x) ^ 0} bounded}
with
/ 1/0*0 ~" ^0*01 ^ <
6
-
Theorem 1.1.5 ( F ubini) . Let A C R
c
, B C R
d
be measurable, and
write x = (,77) G Ax B. If f : A x JE? E U {oc} zs integrable, then
f f(x)dx= f ( [ m,v)dv)dt
JAXB J A \JB /
{Here, for example f
B
/ (, n)dn exists for almost all G A.)
For f e C
l
(A), we put
We then have Jensen's inequality:
Theorem 1.1.6. Let A C R
d
be bounded and measurable, f a convex
function. Then for all ip G C
l
(A)
' ( / / ) * / / *
1.2 Convergence t heor ems
In this section, again no proofs are given, and the reader is referred to
J. Jost, loc. cit., pp. 199-208.
Theorem 1.2.1 ( B. Levi) . Let A C R
d
be measurable, and let f
n
:
A R U {00} be a monotonically increasing sequence (i.e. f
n
(%) ^
/
n +
i ( x) for all x G A, n N) of integrable functions. If
lim / f
n
(x)dx < 00,
n-^ooJ
A
then f := l i mn- ^ f
n
(pointwise limit) is integrable, and
/ f(x)dx = lim / f
n
(x)dx.
J A
n
^ J A
1.2 Convergence theorems 123
Corollary 1.2.1. Let A C R
d
be measurable, f
n
: A -* R+ U {00}
(nonnegative and) integrable. If
00
r
/ U(X) v)dx < oc
then YlnLi fn is integrable, and
/ y]f
n
(x)dx==y] / fn(
x
)
dx
-
Theorem 1.2.2 ( F atou) . Let A C R
d
be measurable, f
n
: A - l U
{ztoc} integrable for n G N. Assume /ia /iere exists some integrable
F : A-+RU{oo} with
L
fn>F for all n G N,
f
n
(x)dx < K < oc /or some i f independent of n.
1A
Then lim infn^oo /
n
is integrable, and
/ liminf f
n
(x)dx < liminf / f
n
(x)dx.
J A
n
-*
n
-* J A
Theorem 1.2.3 ( Lebesgue) . Let A C R
d
be measurable, f
n
: A *
RU{oo} a sequence of integrable functions converging pointwise almost
everywhere on A to some function f : A R U {00}. Suppose there
exists some integrable F : A R U {00} with
\fn\<F for all n.
Then f is integrable, and
/ f(x)dx = lim / f
n
(x)dx.
J A
n
^ J A
Thoerem 1.2.3 is called the theorem on dominated convergence.
Let us consider an example that shows the necessity of the hypotheses
in the previous results:
f
n
: [0,1] R is defined as
/(*) : = {
n for 1/n < x < 2/n ,
>
^\
0 otherwise. ~
124 Lebesgue measure and integration theory
Then
and
lim f
n
= 0,
n+oo
lim / f
n
(x)dx = 1 ^ 0 = / f(x)dx.
n
-+ Jo Jo
The f
n
do not form a monotonically increasing sequence so that B.Levi's
theorem does not apply, and they are not bounded by some integrable
function that is independent of n so that Lebesgue's theorem does not
apply either. Considering f
n
instead of /
n
, we finally obtain a sequence
for which Fatou's theorem does not hold.
As a corollary of Theorem 1.2.3 one has (approximate the derivative
by difference quotients):
Corollary 1.2.2 ( Differentiation under the integral) . Let I C R
be an open interval, A C R
d
measurable, and suppose f : A x I *
R U {00} satisfies
(i) for any t I, / (, t) is integrable on A
(ii) for almost all x A, / ( #, ) is differentiable on I
(Hi) there exists an integrable cj>: A RU {00} with the property that
for all t I and almost all x G A
^/ ( M) | <*(*)
Then
ip(t):= J f(x,t)dx
is a differentiable function oftEl, with
d
W
f(x,t)dx.
AOt
q.e.d.
2
Banach spaces
In this chapter, we present some results from functional analysis that will
be needed in the sequel, in particular in the next chapter. All proofs are
supplied. As a reference, one may use any good book on functional anal-
ysis, e.g. K. Yosida, Functional Analysis, Springer, Berlin, 5th edition,
1978, pp. 52-5, 81-3, 90-92, 102-28, 139-45 or F. Hirzebruch, W. Schar-
lau, Einfuhrung in die Funktionalanalysis, Bibliograph. Inst., Mannheim,
1971, pp. 60-88, 107-12. (These were also our main sources when com-
piling this chapter.)
2.1 Definition and basic properties of Banach and Hilbert
spaces
Definition 2. 1. 1. A vector space V overR is called a normed space if
there exists a map
||-|| : V E, called norm,
satisfying
(i) | H| >0 for allv e V, v^O
(ii) ||Ai;|| = |A| | M| for all\eR,veV
(iii) \\v -f w\\ < \\v\\ -f \\w\\ for all v,w V (triangle inequality)
A sequence (f
n
)nN C V is said to converge to v V if
lim \\v
n
v\\ = 0.
n+00
(In order to distinguish the notion of convergence just defined from
the notion of weak convergence to be defined in the next section, we
sometimes call it norm convergence or strong convergence.)
125
126 Banach spaces
A sequence (v
n
)neN C V is called a Cauchy sequence if for every e > 0
we may find N G N such that for all n,m> N
\ \ Vn-Vm\ \ <
A normed space (V, ||-||) is called a Banach space if it is complete w.r.t
the notion of convergence just defined, i.e. if every Cauchy sequence
converges to some v V.
Exampl es
(1) Every finite dimensional normed vector space is a Banach space,
for example R
d
with its Euclidean norm |-|.
(2) Let K C R
d
be compact. C(K) := {/ : K ~+ R continuous},
ll/lloo
: = su
Pa:GiC l/(
x
)l f
r
f C(K), defines a Banach space.
If we equip C
m
(K) := {/ : K R m-times continuously dif-
ferentiable}, m G N, with the norm HH^, it is not a Banach
space, because it is not complete. Namely the convergence w.r.t.
11-1loo is uniform convergence, and while the uniform limit of con-
tinuous functions is continuous, in general the uniform limit of
differentiable functions is not necessarily differentiable.
(3) Let (V, 11-||) be a Banach space, W dV & linear subspace that is
closed w.r.t. ||-|| i.e. if (u>
n
)nN C W converges to v G
V^limn^oo \\w
n
- v\\ = 0), then v G W. Then (W, ||-||) is a Ba-
nach space itself.
Definition 2.1.2. A Hilbert space is a vector space H overR equipped
with a scalar product, i.e. a map (,): JJ x if R satisfying
(i) (v, w) = (w, v) for all v,w G H
(ii) (Aifi + A
2
t>2,w) = \\{vi,w) + A
2
(t>2,w) for all Ai,A
2
G R,
vi,v
2)
w G H
(iii) (v, v) > 0 for alive H\ {0}.
In addition, we require
(iv) H is complete w.r.t. the norm \\v\\ := (v, v)*, i.e. a Banach space.
In order to justify the preceding definition, we need to verify that
\\v\\ = (v,v)* defines indeed a norm in the sense of Definition 2.1.1.
Since the properties (i), (ii) of Definition 2.1.1 are clearly satisfied, we
only need to check the triangle inequality:
2.1 Basic properties of Banach and Hilbert spaces
127
Lemma 2. 1. 1. Let (
v
) : ffxfl-+R satisfy (i)~(iii) of Definition
2.1.2. Then we have the Schwarz inequality: \(v,w)\ < \\v\\ \\w\\ for all
v,w G H, with equality if and only if v and w are linearly dependent.
Proof. We have for v, w G H, X G R
(v + Xw, v + Xw) > 0 by (iii) .
Inserting A = ,y( and expanding with the help of (i), (ii) yields the
Schwarz inequality.
Since \\v + w\\
2
= (t; -f tc, v + tc) = ||f||
2
+ I M|
2
+ 2(v, w), the Schwarz
inequality in turn implies the triangle inequality.
q.e.d.
Definition 2.1.3. Let V be a vector space (overR, as always). M CV
is called convex if whenever x, y G M, then also
tx + (1 - t)y GM for all 0 <t< 1.
Example 2.1.1. Let (F, ||-||) be a normed space. Then for every fi < 0,
JE?
M
:= {# G V | | | x| | < fi} is convex. Namely if x,y G JB
M
, i.e. |x| < //,
|y| < /x, then for 0 < t < 1
| te + ( l - t ) y | < t | x | + ( l - t ) | y | < j x ,
hence to + (1 - )y B^.
The following definition contains a sharpening of the convexity of the
balls B^. It will be formulated only for fi = 1, but by homogeneity ((ii)
of Definition 2.1.1), it implies an analogous condition for any [i > 0.
Definition 2.1.4. A normed space (V, ||-||) is called uniformly convex
if for all e > 0 there exists <5 > 0 with the property that for all x, y G V
with \\x\\ = \\y\\ = 1, we have
2^ + V)
> 1 - =H|x - 2/|| < c.
Remark 2.1.1. An equivalent form of the implication (2.1.1) is
11
(again for | | x|
x-V\\ > 1 =
= 1).
K
x +
2 / )
< 1
(2.1.1)
(2.1.2)
128 Banach spaces
Example 2.1.2. In a
identity
jffr + tf)
Hilbert space (if, (, )), we have the parallelogram
2
1 ^ I MI ' + ^ I MI
2
- ^ - * / ! !
2
( 2.1.3)
which follows by expanding the norms in terms of the scalar product.
Therefore, any Hilbert space is uniformly convex.
Lemma 2.1.2. In Definition 2.1.3, the condition \\x\\ = \\y\\ = 1 may
be replaced by
Ml < i , < I.
Proof. In the situation of Definition 2.1.3, for o > 0, we may find So > 0
such that for all z,w with \\z\\ = \\w\\ = 1, we have
j (* + H > i~<5
0
= I k - HI <
e
o-
(2.1.4)
Let now c > 0, | | x| | < 1, \\y\\ < 1. If for 6 < \
l - <
2^ + y)
then
| | x | | > l - 2 , | | | | > l - 2 &
In particular, x, y ^ 0, and by the triangle inequality,
1(JL + JL)
2 v i wi iiwiiy
2^
x+
y)
y
y
>
> 1 - 3<5.
We apply (2.1.4) with z = j^fjj, w = J ^ J , e
0
= f. If 36 < 6
0
, we then get
_f y_
W| ||2/! |
< 2 -
Now
k- 2/ll <
x
+
y
\\y\\
+
Nil
y
\\y\\
by the triangle inequality
<4*
+
i.
2.1 Basic properties of Banach and Hilbert spaces
Choosing 6 = min(3<5o,e/8), we have shown the implication
|1
129
\ (* + v)
> l-6=>\ \ x-y\ \ <e
for ||*|| < 1, \\y\\ < 1.
q.e.d.
Lemma 2.1.3. Let (V, ||-||) be a uniformly convex Banach space. Let
(#n)nN C V be a sequence with
limsup | | x
n
| | < 1 for all n G
and
lim
n,m+c
~\ X
n
-h X
m
)
= I.
(2.1.5)
(2.1.6)
Then (x
n
) converges to some x G V with \\x\\ = 1.
Proof. Let e > 0. (2.1.5) and (2.1.6) imply l i m| | x
n
| | = 1. Therefore,
by replacing x
n
by nf^rr, we may assume w.l.o.g | | x
n
| | = 1. Because of
(2.1.5), we may apply Lemma 2.1.2. By (2.1.6), we may find N G N such
that for n,m> N
ly^n i %m)
> 1-6,
with 6 determined by Lemma 2.1.2. We obtain
\\x
n
x
m
11 <c c,
i.e. (x
n
)nGN is a Cauchy sequence. Since (V, ||-|
has a limit x, and
| | x| | = lim | | x
n
| | = 1.
n*oo
is a Banach space, it
q.e.d.
In order to formulate the Hahn-Banach theorem, a fundamental ex-
tension result for linear functionals from a linear space to the whole
space, we need:
Definition 2.1.5. Let V be a (real) vector space.
p : V -+ M+ ( E
+
: = { t E l | t > 0 } )
130 Banach spaces
is called convex if
(i) p(x + y) < p{x) + p{y) for all x, y e V
(ii) p(Xx) = Xp(x) for all x eV, A > 0
Example 2.1.3. The norm on a normed vector space.
Let VQ be a linear subspace of the vector space V, /o : VQ R linear.
A linear / : V * R is called an extension of /o if
/Ivo = /o-
Theorem 2.1.1 ( Hahn-Banach) . Let VQ 6e a Zineor subspace of the
vector space V, p : V R
+
convex. Suppose that fo:Vo~+Ris linear
and satisfies
fo{x)<p{x) forallxeVo. (2.1.7)
Then there exists an extension f : V R o/ /o t^it/i
f(x) < p(x) for all xeV. (2.1.8)
Remark 2.1.2. We shall need the Hahn-Banach theorem only in the case
where V possesses a countable basis, i.e. is separable (see p. 130).
Proof We may assume VQ 7^ V. Let v V\ Vb, V\ be the linear subspace
of V spanned by VQ and v, i.e.
Vi:={x + tv\xeVo, ,te R}.
We shall now investigate how /o can be extended to f\ : V\ > R with
/i(a?) < p(x) for all a; Vi. (2.1.9)
We put fi{v) =: a. Then as an extension of /o, / i satisfies
fi(x + tv) = /o(x) + t a.
Equation (2.1.9) requires
/o(a;) -h t a < p(x -h tv). (2.1.10)
For t > 0, this is equivalent to
^ p ( ? + " ) - / o ( f ) . (2-1-H)
and for t < 0 to
> - p ( - f - ) - / ( f ) . ( 2. U2)
2.1 Basic properties of Banach and Hilbert spaces 131
For #1, x
2
V, we have
/0O2) - fo(xi) < p{x
2
- xi)
= p((x
2
+ v) - {xi + v))
< p(x
2
4- v) + p(-Xi - v),
hence
~fo{x
2
) +p{x
2
+ v) > -fo{xi) - p ( - x i - v). (2.1.13)
Thus
a
2
:= inf ( - /
0
(x
2
) + p(x
2
+ tv))
> a
x
: = s u p ( - /
0
( x i ) - p ( - x i -v)).
Therefore, any a with
ot\ < a < a
2
satisfies (2.1.11) and (2.1.12), hence (2.1.10). Thus, the desired extension
/1 exists. If V possesses a countable basis, we may use the preceding
construction to extend /o inductively to all of V.
If V does not possess a countable basis, we need to use Zorn's lemma
to complete the proof. For that purpose, let
<!>:={</?: W E extension of /o to some
linear subspace W, VQ C W C V,
satisfying </?(#) < p(x) for all x W}
On <, we have an obvious ordering relation (namely, for </^ : W% > R,
i = 1,2, we have </?i < </?2 if W^i C W
2
and y>2|wi = <Pi), and every
totally ordered subset $0 f * possesses a maximal element, namely
<Po defined on the union of the domains of all <p 0o and coinciding
with each such (p on its domain of definition. By Zorn's lemma, $ then
contains a maximal element / . Let W be the domain of definition of / .
/ then extends /o to W. If W were not the whole space V, we could
use the preceding construction to extend / to a larger subspace of V,
contradicting the maximality of / . Therefore, / furnishes the desired
extension of /o.
q.e.d.
Corol l ary 2. 1. 1. Let VQ be a linear subspace of the normed vector space
(Vi ll'ID* A > 0, /o : VQ -> E linear with
\fo(x)\ < A | | x| | for all x V
0
.
132 Banach spaces
Then there exists an extension / : F - + l o / / o with
\f(x)\ < X\\x\\ forallxeV.
Theorem 2.1.2 ( Helly) . Let (V, ||-||) be a Banach space, / i , . . . , /
n
linear functional V -* R that are continuous w.r.t. the norm conver-
gence, /i, a\,...,a
n
ER. Suppose that for any Ai,. A
n
G
n
<M
1
n
1
(2.1.14)
(2.1.15)
TTien for each e > 0, there exists x
e
G V with
fi(x

) = a; for t = 1, 2, . . . , n
and
| | x

| | </ * + .
Proof Let m < n be the maximal number of linearly independent /*,
z = l , . . . , n. It suffices to consider ra linearly independent /*, w.l.o.g.
fi, /m, since the remaining ones are easily seen to be taken care of by
(2.1.14). F(x) := ( / i ( x) , . . . , fm(x)) may then be considered as a linear
map onto R
m
. We equip M
m
with its Euclidean structure. Let
B
H+ e -
{xeV\ ||x|| </* + }.
Then F(B
M+C
) is a convex set containing 0 as an interior point. Also,
F(?
M+C
) is balanced in the sense that with p G R
m
it also contains -p.
We now assume that a i , . . . , a
m
is not contained in F(B
M+C
). Be-
cause of the properties of F(B
M+C
) just noted, we may then find A =
(Ai,.
, A
m
) with
/ J AiCki > sup
t = l
(*X + )
X>/<(*)
=1
771
t = l
contradicting (2.1.14). Thus ( a i , . . . , a
m
) G F( J B
M+ C
) , implying the
claim. g.e.d.
2.2 Dual spaces and weak convergence
Let V be a vector space. The linear functionals
f:V-+R
2.2 Dual spaces and weak convergence 133
then also form a vector space. If (V, ||-||) is a normed vector space, we
define the norm of a linear functional / : V R as
#
:= sup ! ^ i G R+ U {oo}. (2.2.1)
Lemma 2. 2. 1. A linear functional f :V -+R is continuous if and only
z/ | | / | U < oo.
The easy proof is left to the reader. (See also Lemma 2.3.1 below.)
q.e.d.
Definition 2. 2. 1. V* := {/ : V R linear with \\f\\^ < oo} equipped
with the norm (2.2.1) is called the dual space of (V, | | -| | ). (It is easy to
verify that (2.2.1) defines a norm on V* in the sense of Definition 2.1.1.)
Lemma 2.2.2. (V*, | | -| | J is a Banach space.
Proof Let (f
n
)neN C V* be a Cauchy sequence. For every e > 0 we may
then find N G N such that for n, m G N
\\fn-fm\l<e.
By (2.2.1), this implies that for every x G V
\fn(x) - fm(x)\ < C.
Therefore, since R is complete, (f
n
(
x
))neN converges for every x G X.
We denote the limit by f(x). f : V R then is a linear functional. It is
an easy consequence of the triangle inequality that \\f\\^ < oo and that
lim
n
_oo | | /
n
- / | |
+
= 0. This implies that (f
n
)nen converges to / G V*,
and (V*, | HU therefore is complete, hence a Banach space.
q.e.d.
Remark 2.2.1. We did not assume that V itself is a Banach space.
We now consider
(V*)* =: V**,
the dual space of V*, with norm denoted by ||-||
##
. Any x V defines
a linear functional
i(x) : V* -* R
i(x)(f):=(f
lX
):=f(x).
134 Banach spaces
Lemma 2.2.3. | | i(#)| |
++
= | | x| | . Thus, the linear functional i(x) : V* >
E is contained in V**, i.e. we have a linear isometric map i : V V**.
Proof. We have
l(/,*)l< ll/IUMI,
and therefore
IM|> supi j M = ||t(x)|L. (2.2.2)
Conversely, let x G V. Let
/o(te) :=*| | a;| | for t e R.
By the Hahn-Banach theorem (Corollary 2.1.1), we may extend /
0
from
{tx\ t G l } to V as a linear functional / with
and
l(/,)l = lkll-
Therefore
l l ^ ) I L = s u p K M i > | N | . (2.2.3)
fev* II/IU
Equations (2.2.2) and (2.2.3) imply the result.
q.e.d.
Definition 2.2.2. A normed linear space (V, ||-||) is called reflexive if
i:V-*V**
is a bijective isometry {i.e. \\x\\ = | | i(x)| |
++
for all x G V).
Remark 2.2.2.
(1) Since (V**, | | | |
++
) is a Banach space by Lemma 2.2.2, any reflexive
space is complete, i.e. a Banach space.
(2) By the remark before Definition 2.2.2, the crucial condition in
that definition is the surjectivity of i.
2.2 Dual spaces and weak convergence 135
Definition 2.2.3.
(i) Let (V,||-||)&e a normed linear space. (#
n
)nGN C V is said to be
weakly convergent to x G V if f(x
n
) converges to f(x) for all
f G V*, in symbols:
x
n
v
x.
(ii) Let (V*, | | -| | J be the dual of a normed linear space. (f
n
)nen C V*
is said to be weak* convergent to f G V* if f
n
(x) converges to
f(x) for all x G V.
Theorem 2. 2. 1. Let V be a separable} normed linear space. Let
{fn)nN C V* be bounded, i.e. | | /
n
| | * < constant (independent of n).
Then (f
n
) contains a weak* convergent subsequence.
Proof. Let (y)?$ by a dense subset of V. Since {f
n
{yi))neN is bounded,
a subsequence (fn(yi)) of (fn(yi)) converges. Having iteratively found
a subsequence (/) of (/
n
) for which (/(2/z,))nN converges for 1 <
v < m, we may find a subsequence (/
+1
) of (/) for which also
(/^T
+1
(2/m+i))nGN converges. The diagonal sequence (/)nN then con-
verges at every y
v
, v G N, and since (y
v
)vm is dense in V, (/(x))
nG
N
has to converge for every x eV. Thus, we have found a weak* convergent
subsequence of (/
n
)nN-
q.e.d.
Remark 2.2.3.
(1) The argument employed in the preceding proof is called Cantor
diagonalization.
(2) Theorem 2.2.1 remains true without the assumption that V is
separable, and so does the following:
Corollary 2.2.1. Let (V,||-||)6e a separable reflexive Banach space.
Then every bounded sequence (x
n
)
ne
n contains a weakly convergent sub-
sequence.
Proof. By (2.2.2) or reflexivity, (i(x
n
))
ne
n is a bounded sequence in
V** and therefore contains a weak* convergent subsequence. Since V is
f Separable means t hat V contains a countable subset (yv)uN t hat is dense w.r.t.
11-||, i.e. for every y V, > 0 there exists y
u
with \\y y
u
\\ < e.
136 Banach spaces
reflexive, the limit is of the form i(x) for some x eV. Thus
f(x
n
) - (/,a?n) -* (/ , ) = f{x) for every / G T
so that (x
n
)
n
N converges weakly to x.
q.e.d.
Theor em 2.2.2. Any weakly convergent sequence (x
n
)
n
^ in a Banach
space is bounded.
Proof. We shall show that i(x
n
) is uniformly bounded on
{ / e r | | | / | L < 1 } . Then also
ll*n|| = \\i(*n)\\ = sup J ^ i (2.2.4)
is bounded (see Lemma 2.2.3 for the first equality). Since i(x
n
) is linear,
it suffices to show uniform boundedness on some ball in V*. Otherwise,
we find a sequence Bj of closed balls,
Bj = {/ V* | | | / - 11 <
Qi
} for some fj e V* ,
6j
> 0
with
Bj+\ C JBJ and lim Qj = 0
3-
and a subsequence (x^) of (x
n
) with
| ( / , x ; ) | > j for all fBj. (2.2.5)
By construction, (fj)jen forms a Cauchy sequence and therefore con-
verges to some /o G T , with
oo
/oef|^.
Because of (2.2.5), we have
\(fo,x'
n
)\>j f or al l j eN.
This is impossible since ( / o, ^ ) converges because (x^)
n
m converges
weakly. q.e.d.
Example 2.2.1.
(1) In a finite dimensional normed vector space (which automati-
cally is complete, i.e. a Banach space), weak convergence is just
componentwise convergence and therefore equivalent to the usual
convergence w.r.t. the norm.
2.2 Dual spaces and weak convergence 137
(2) In an infinite dimensional reflexive Banach space (V, | | -| | ), this is
no longer so, because one may always find a sequence (e
n
)
nG
N C V
with | | ei| | < 1 for all i and ||e* ej\\ > 1 for i ^ j . Such a
sequence cannot converge w.r.t. | | | | , because it is not a Cauchy
sequence, but it always contains a weakly convergent subsequence
according to Corollary 2.2.1 (we have shown Corollary 2.2.1 only
under the assumption of separability, but it holds true in general).
Lemma 2.2.4. Let ( V, ||-||)6e a separable normed space. Then V* satis-
fies the first axiom of countability w.r.t. the weak* topology, i.e. for each
f V*
f
there exists a sequence (U^ueN of subsets of V* that are open
in the weak* topology such that every U that is open in this topology
and contains x is contained in some U
n
. Consequently, if (V, \\-\\)is also
reflexive, then V* satisfies the first axiom of countability w.r.t. the weak
topology.
Proof. Let / V*. Every neighbourhood of / w.r.t. the weak* topology
contains a neighbourhood of the form
U
ttVu
...,
Vk
(f):={geV*\ \g(vi) - m)\ < e for i = 1, . . . , * } .
Since V is separable, there exists a sequence (w
n
)
ne
^ C V that is dense
w.r.t the 11-|| topology. We claim that the neighbourhoods of the form
U
tWilt
...
tWik
(f)
form a basis of the neighbourhood system of / of the required type, i.e.
every U
;Vli
.
iVk
(f) contains some such U.
w
.
fW
, (/ ). For that pur-
pose, we choose n with | < e and Wi
x
,..., Wi
k
with \VJ - Wi
3
\ < ^ for
j = 1, . . . , k. For g e [A (/ ), we then have
\9(Vj) ~ f(vj)\ < k K ) - / K ) | + \(9 - f)(vj - ti;
y
)l < \ + I < ,
i.e. g e Ve-
Vl
,...,v
k
(f) as required.
Finally, if V is reflexive, then the weak* and the weak topology of V*
coincide.
q.e.d.
We now present some further applications of the Hahn-Banach theo-
rem that will be used in Chapter 3.
Lemma 2.2.5. Let (V, ||-||) be a normed space, Vo a closed
linear subspace. Then VQ is also closed w.r.t. weak convergence.
138 Banach spaces
Proof. By the Hahn-Banach theorem (Corollary 2.1.1), for every XQ G
V \ Vb, we may find a continuous linear functional /o : V R with
/o(x
0
) = 1
/ok=0.
Thus, #o cannot be a weak limit of a sequence in Vo-
q.e.d.
Lemma 2.2.6. Let ( V,||-||) 6e a reflexive Banach space, Vo a closed
linear subspace. Then Vo is reflexive.
Proof. We may identify VQ* with a subspace of V**, by putting v(f) =
v(f\
Vo
) for / G V*, v G V
0
**. Let v e V*. Since V is reflexive, there
exists x G V with
v(/ ) = / ( x) for all / G T .
We claim x G Vo- Otherwise, by the Hahn-Banach theorem (Corollary
2.1.1), there exists f eV* with
m o
f\ v
0
= 0.
Since f(x) = f (/|vb) by the above, this is impossible. Since every /o G
VQ* can be extended to / G V*, again by Hahn-Banach, we conclude
v(fo) = fo(x) for a l l / eV
0
*.
Thus, t; = i(x). This implies VQ** = i(Vfo), i.e. reflexivity of Vo-
g.e.d.
Corollary 2.2.2. -4 Banach space (V, ||-||)is reflexive if and only if its
dual (V*, 11-|U is reflexive.
Proof If V = V**, then also F* = F***. Thus, if V is reflexive, so is
V*. Consequently, if conversely V* is reflexive, so then is V**. Since V
can be identified with a closed subspace of V** by Lemma 2.2.2, Lemma
2.2.6 then yields reflexivity of V.
q.e.d.
Lemma 2.2.7. Let (V, ||-||)6e a normed space, and suppose that
(x
n
)
nG
N C V converges weakly to x eV. Then
| | x| | <l i mi nf | | x
n
| | .
n+oo
2.2 Dual spaces and weak convergence 139
Proof. After selection of a subsequence, we may assume that | | #
n
| | con-
verges (see Theorem 2.2.2). Assume
| | x| | > lim | | x
n
| | .
n+oo
As in the proof of Lemma 2.2.3, we may find f EV* with
11/11. = 1
l/( x) | = IW|.
But then
| / ( x) | > lim | | x
n
| | > l i ms up| / ( x
n
) | ,
n-+oo n~*oo
while the weak convergence of (#
n
)
n
eN to x implies
f(x) = lim f(x
n
).
n- +oo
This contradiction establishes the claim.
q.e.d.
Theorem 2.2.3 ( Mi l man) . Any uniformly convex Banach space is re-
flexive.
Proof (Kakutani). Let (V, ||*||)be a uniformly convex Banach space, and
let XQ* V**. We need to show that there exists some xo V with
i(x
0
) = x* (2.2.6)
(see Remark 2 after Definition 2.2.2). We may assume w.l.o.g. that
11*0*11 = 1- ( 2-2-7)
For every n 6 N, we may then find f
n
V* with | | /
n
| | = 1 and
(2.2.8)
We now claim that for every n G N, we may find x
n
E V with
fi(x
n
)=x*
0
*(fi) for i = l , . . . , n (2.2.9)
and
I M| < 11*5*11 + ^ = l + i . (2.2.10)
1 - - < x*
0
* (f
n
) < 1.
n
For any Ai , . . . , A
n
R, we have
X>*o*(/o
r \ / ' ***
< WxoW
X>/J
U=i
^ i =l | t = l
140 Banach spaces
and so the claim follows from Helly's Theorem 2.1.2. Since in addition
to (2.2.10) also
I K H = | | / n | | l k n | | > fn(x
n
) = XQ*(fn) > 1 ~ ~ ,
we must have
lim | | x
n
| | = 1.
n>oo
For ra > n, we have
2 2
2 - - < fn(x
n
) + / n ^ m ) < | | x
n
+ X
m
\\ < \\x
m
\\ + \\x
m
\\ < 2 + - .
n n
By Lemma 2.1.3, (x
n
)nN is a Cauchy sequence and converges to some
xo eV, satisfying
Hxoll = 1 (2.2.11)
and
fi(xo)=x*
0
*(fi) for i = 1, 2, 3, . . . (2.2.12)
The solution #o of (2.2.11), (2.2.12) is unique. Namely, if there were
another solution Q, on one hand, we would have
IN+ 411 <
2
(2.2.13)
by uniform convexity. On the other hand
fi{x
0
+ x'
0
) = 2XQ*(/<) for all z,
hence
2 - - < 2x*
0
*(fi) = /<(x
0
+ 4) < IN + *oll ,
hence
IN + 4ll>2.
This contradicts (2.2.13), and so Xo is unique. We now claim that
/o(*o) = x*
0
*(f
0
) for any /
0
G V*, (2.2.14)
so that XQ* = i(xo), proving the theorem. Let this /o G V* be given.
In the above reasoning, we replace the sequence / i , / 2, / 3, - - - by
/o> /l> /2, / 3, We then obtain X Q G F with
IKII = i
and
fi(x'
0
) = x*
0
*(fi) fori = 0, 1, 2, 3, . . . (2.2.15)
2.2 Dual spaces and weak convergence 141
Since the solution XQ of (2.2.11), (2.2.12) was shown to be unique, how-
ever, we must have x
r
0
= #o- Equation (2.2.15) for i = 0 then is (2.2.14).
q.e.d.
Corollary 2.2.3 ( Riesz) . Any Hilbert space (H, (, )) can be identified
with its dual H*.
Proof. Since a Hilbert space is uniformly convex, Therem 2.2.3 implies
H = H**. On the other hand, any x H induces an f
x
G H* by
fx{y) '= (x,y) for y G H.
We have
11/0-11= sup (x,y) < \\x\\
IMI=i
and f
x
(x) = (x,x) = | | x| |
2
, hence
ll/xll = IN|.
Thus, H is isometrically embedded into H*. For the same reason, H* is
isometrically embedded into H**, and since H = if**, one readily verifies
that these embeddings must be surjective, hence H = H* = H**.
q.e.d.
Let M be a linear subspace of a Hilbert space H. The orthogonal
complement M
L
of M is defined as
M
1
:= {x e H : (z, y) = 0 for all y G M} .
It is clear that M
1
is a closed linear subspace of H. M need not be
closed here, but the orthogonal complement of M is the same as the one
of its closure M in H.
Corollary 2.2.4. Let M be a closed linear subspace of the Hilbert space
H. Then every x G H can be uniquely decomposed as
x = xi + x
2
with xi M,x
2
M
L
.
Proof. By the proof of Corollary 2.2.3, x G H corresponds to f
x
G H*
with
fx(y) = (x,y) for all y G H.
We let f^ be the restriction of /^ to M. M, since closed, is a Hilbert
142 Banach spaces
space itself, and f*f is an element of the dual M*. By Corollary 2.2.3,
it corresponds to some X\ M, i.e.
f(y) = (xi,y) for all yM.
We put X2 := x x\. Then for all y M,
(x - x
u
y) = / , ( ) - / f (y) = 0 since f
x
= / f on M.
Therefore, #2 M-
1
. Thus, we have constructed the required decom-
position. Concerning uniqueness, if
x = X\ + #2 = x'i + #2
w
^
n
#i> #i G M, X2, ^2 M
x
,
then for all y M
(x,t/) = (x
x
,y) = (x'
v
y),
and by Corollary 2.2.3 applied to M, X\ = x
;
j, and therefore also #2 = #2-
q.e.d.
Of course, the reader knows the preceding result in the case where H
is finite dimensional, i.e. a Euclidean space. x\ is interpreted as the
orthogonal projection of x onto the subspace M, and therefore Corollary
2.2.4 is called the projection theorem.
The next result will be needed for Sections 4.2 and 4.3 when we estab-
lish the existence of minimizers for lower semicontinuous, convex func-
tionals.
Theorem 2.2.4 ( Mazur) . Suppose (x
n
)nN converges weakly to x in
some Banach space V. For every e > 0, we may then find a convex
combination
N N
] P A
n
X
n
(A
n
> 0, ] P A
n
= 1)
71=1 71=1
with
I
N
I / ^ A
n
x
n
< e. (2.2.16)
| n=l
Proof. We consider the set Co of all convex combinations of the x
n
, i.e.
{
N N }
2 J Ax
n
with A
n
> 0, 2 ^ A
n
= 1 > .
71=1 n=l J
2.2 Dual spaces and weak convergence 143
Replacing all x
n
by x
n
x\ and x by x - #i , we may assume 0 E Co. If
(2.2.16) is not true, then there exists e > 0 with
| | x - y | | > f o r a l l t / e C
0
. (2.2.17)
C\ := {z V : ||* - y\\ < - for some y C
0
}
is convex and contains the ball with radius | and center 0. We consider
the Minkowski functional p of C\ defined by
p(z) : =i n f { A>0 ; X^zed}.
p is convex in the sense of Definition 2.1.5 since C\ is convex, and contin-
uous since C\ contains the ball of radius | > 0 about 0. Since, because
of (2.2.17),
||x - z\\ > - for every z e C\,
we have
p(x) > 1.
More precisely, there exists t/o with
x = A
_1
i/o, 0 < A < 1
p(Vo) = 1.
We consider the linear subspace
V
0
= {fiyo^eR}cV
and the linear functional
/ o ( ) = fionVo.
Then
fo <pon Vo,
and by the Hahn-Banach Theorem 2.1.1, there exists an extension / of
/o to all V with
/ < p .
Since p is continuous, / is also continuous (see Lemma 2.2.1). We have
sup f(y) < sup f(y) < sup p(y) = 1
yec
0
yeCi yeCi
<\ -
1
=f(\ -
1
yo) = f(x).
144 Banach spaces
This, however, contradicts the fact that (x)
n
iv C Co converges weakly
to x. Thus, (2.2.17) cannot hold, and (2.2.16) is established.
q.e.d.
2.3 Linear operators between Banach spaces
The results of this section will be used in Chapter 8. In Section 2.2, we
considered linear functional
f:V-+R;
in the beginning, V was a normed linear space, with norm denoted by
11-||, and later, we also assumed that V was complete, i.e. a Banach space.
In the present section, we replace the target E by a general Banach space
W, with norm also denoted by ||*||. We thus consider linear operators
and we put
T-.V-+W,
| | r | | : =s u p i 3 GR
+
U{ c x ) } .
Lemma 2. 3. 1. The linear operator T : V
only if\\T\\ < oc.
Proof. If | | T| | < oo, then the inequality
\\TX\\<\\T\\\\X\\
(2.3.1)
W is continuous if and
(2.3.2)
implies that T is continuous. (Of course, this uses the linearity of T.)
Conversely, if T is continuous, we recall the usual e 6 criterion for
continuity, and so for e = 1, we find some 6 > 0 with the property that
\\Ty\\ < 1 if | M| < 6.
For x G V \ {0}, we then have with y = <5pn (| | j/| | < 6)
\\Tx\\ =
[
Ty
<-
s
M
Thus
i mi < ^< oo.
q.e.d.
2.3 Linear operators between Banach spaces 145
The space of continuous linear operators T : V W between the
normed spaces (V, ||-||) and (W, ||-||) is denoted by L(V, W). It becomes
a normed space with norm ||T||.
Lemma 2.3.2. / / (W, ||-||) is a Banach space
f
then so is (L(V, W), | | -| | ).
The proofIs the same as the one of Lemma 2.2.2, simply replacing (M, | |)
by WH- II) .
Remark 2.3.1. Again, (V, ||-||) need not be a Banach space here.
Lemma 2.3.3. Let T L( V, W). Then
kerT:={xV : Tx = 0}
is a closed linear sub space of V.
Proof, ker T = T
- 1
(0) is the pre-image of a closed set under a continuous
map, hence closed.
q.e.d.
In the sequel, we shall encounter bijective continuous linear operators
T: K- > W
between Banach spaces. It is a general theorem in functional analysis,
the inverse operator theorem, that the inverse of T, denoted by T
- 1
, is
then continuous as well. Here, however, we do not want to prove that
result, and we shall therefore frequently assume that T"
1
is continuous
although that assumption is automatically fulfilled in the light of that
theorem.
Lemma 2.3.4. Let
T.V-+W
be a bijective continuous linear map between Banach spaces, with a con-
tinuous inverse T
- 1
. If S L(V,W) satisfies
l|T-S||<p^|, (2.3.3)
then S is bijective, and S~
l
is continuous, too.
Proof. We have
S = T{Id-T~
1
{T-S)).
146 Banach spaces
As with the geometric series, the inverse of S then is given by
Br'tr-sirV
1
, (2.3.4)
provided that series converges. However,
2(r-
1
(r-5))"
< E HCr
1 i/=m
<E(ir
- i ( r _
Ml l i r
5))1|
-5|ir,
and since |JT* -
1
11 \\T S\\ < 1 by assumption, the series satisfies the
Cauchy property and hence converges to a linear operator with finite
norm.
q.e.d.
If V is a vector space, we say that V is the direct sum of the subspaces
Vi, V
2
,
V = Vi 0 V
2
if for every x G V, we can find unique elements #i G Vi, #2 V2, with
x = #i 4- #2-
We then also call V\ and V
2
complementary subspaces of V. Easy lin-
ear algebra also shows that if V\ possesses a complementary subspace of
finite dimension, then the dimension of that space is uniquely deter-
mined, i.e. if Vi 0 V
2
= Vi 0 V
2
', then dim V
2
= dim V
2
'.
We now consider a normed vector space (V, | | -| | ). Then every finite
dimensional subspace Vo is complete, hence closed. We also have:
Lemma 2.3.5. Let V
0
C V be a finite dimensional subspace of the
normed vector space (V, ||*||). Then VQ possesses a closed complemen-
tary subspace V\, i.e. V = Vo 0 V\.
Proof. Let ei, . . . , e
n
be a basis of V
0
, /Q : Vo > R be the linear function-
a l with
/o(
e
) =
6
v ( M = l-,n).
By Corollary 2.1.1, we may find extensions /? : V R with fL = /<J.
2.3 Linear operators between Banach spaces 147
We define 7r: V > V as
n
7ra?:= J ^ / ^ a : ) ^ .
7r is continuous, with TT(V) = Vo.
V"i := ker7r
then is closed as the kernel of a continuous linear operator (Lemma
2.3.3), and every x V admits the unique decomposition
x = 7r(ar) -f (x 7r(#))
with 7r(a:) Vo, a: 7r(a:) Vi, because w on = n.
q.e.d.
Definition 2. 3. 1. Let T : V + W be a continuous linear operator
between Banach spaces (V, ||-||) and (W, | | -| | ). T is called a Fredholm
operator if the following conditions hold:
(i) Vo = kerT is finite dimensional Consequently, according to
Lemma 2.3.5, there exists a closed subspace V\ ofV with
V = Vb0Vi . (2.3.5)
(ii) There exists a finite dimensional subspace WQ of W, called the
cokernel ofT (cokerT) giving rise to a decomposition ofW into
closed subspaces
W = W
0
0 Wi (2.3.6)
with
W
x
= T(V) =: R(T) (range ofT).
Thus, T yields bijective continuous linear operator T\ : V\ >W\.
We finally require
(iii) T~
l
: W\ > V\ is continuous.
For a Fredholm operator T, we call
ind T := dim V
0
- dim Wo (= dim ker T - dim coker T)
the index ofT. The set of all Fredholm operators T : V > W is denoted
byF{V,W).
Remark 2.3.2. Question to the reader: Why is F(V, W) not a vector
space?
148 Banach spaces
Remark 2.3.3. As mentioned, condition (iii) is automatically satisfied as
a consequence of the inverse operator theorem.
Remark 2.3.4- I*
1 o u r
conventions, the cokernel of T is only determined
up to isomorphism, i.e. any Wo satisfying (2.3.6) with W\ = T(V) is a
cokernel. Usually, one defines the cokernel as the quotient space W/W\,
but here we do not want to introduce quotient spaces of Banach spaces.
Theorem 2.3.1. Let V,W be Banach spaces. F(V,W) is open in
L(V,W), and
ind : F(V, W) -> Z
is continuous, hence constant on each connected component of F(V, W).
Proof. Let T : V > W be a Fredholm operator. We use the decomposi-
tions
V^ Vb e Vi with V
0
= kerT
W = W
0
W
1
with W
0
= cokerT
of Definition 2.3.1. For S L(V, W), we define a continuous linear op-
erator
S':VixW
0
->W
(x,z) *-+ Sx + z,
and we obtain a continuous linear operator
L(V, W) -> L(Vi x W
0
, W)
Since T\ : V\ > Wi is bijective with a continuous inverse, T' is also
bijective with a continuous inverse, and by Lemma 2.3.4 this then also
holds for all S in some neighbourhood of T. For such 5, S'(Vi) is closed
as V\ is closed and S" is continuous, and we have the decomposition
w = 5
,
( Vi) e5
,
( w
0
) ,
and since S'(Vi) = S(Vi) also
i y = 5( Vi ) e5
,
( W
0
) , (2.3.7)
and since Wo is finite dimensional, so is S
f
(Wo). Then 5( F) D S(V\) is
also closed since S(V\) is closed and possesses a complementary subspace
of finite dimension.
Finally, the dimension of the kernel of S is upper semicontinuous.
2.3 Linear operators between Banach spaces 149
Namely, if S is in our above neighbourhood of T, then since S is bijective,
S is injective on Vi, and hence the kernel of S is contained in some
complementary subspace of V\, and as observed above, the dimension of
such a subspace equals the one of Vo- Thus
dim ker S < dim ker T (2.3.8)
if 5 is in a suitable neighbourhood of T in L(V, W).
Altogether, we have verified that S is a Predholm operator if it is
sufficiently close to T.
Prom the preceding, we see that there exist finite dimensional sub-
spaces VQ = ker S and VQ of V with
v = vv'v
u
and thus
dim VJf + dim Vg = dim V
0
(V
0
= ker T). (2.3.9)
S thus is injective on VQ V\, and since S coincides with S
f
on V\, we
get a decomposition
W = 5(Vi)5(V
r
o
/ /
)eWS,
with H^ = cokerS' and from (2.3.7)
dim S{V) + dim WJ = dim S'(W
0
) = dim W
0
(2.3.10)
since 5
;
is bijective.
Consequently
ind S = dim ker S dim coker S
= dimV
r
0
/
-dimW' o
= (dim V
0
- dimF
0
") - (dim W
0
- dimS(Vf)) by (2.3.9), (2.3.10)
= dim VQ - dim W
0
since 5 is injective on VQ
= indT.
for S in some neigborhood of T.
g.e.d.
The following result motivates the definition of a Predholm operator:
Theorem 2.3.2 ( F redholm alternative) . Let V be a Banach space,
T : V > V a Fredholm operator of index 0. We consider the equation
Tx = y. (2.3.11)
150 Banach spaces
Either
(i) Either Tx = y is solvable for all y, and thus T is surjective, hence
also injective as i ndT = 0, and so the solution x is uniquely
determined by y,
or
(ii) Tx = y is only solvable if y is contained in some proper subspace
ofV (with a finite dimensional complementary subspace), and for
each such y, the solutions x constitute a finite dimensional affine
subspace.
Proof A direct consequence of the definition.
q.e.d.
2.4 Calculus in Banach spaces
In this section, we collect some material that will only be used in Chap-
ters 8 and 9.
Definition 2.4.1. Let (V, \\-\\v)> (W, \\-\\w) be Banach spaces, F : V
W a map. F is called differentiable (in the sense of Prechet) at u V
if there exists a bounded linear map
DF(u) :V-*W
with
H m
\\F(u + v)- F(u) - DF(u)(v)\\
w = Q
^ o , I M I v
/ is called differentiable in U C V if it is differentiable at every u G U.
f is said to be of class C
1
if DF(u) depends continuously on u. f is
said to be of class C
2
if DF(u) is differentiable in u and the derivative
D
2
F(u) := D(DF)(u) depends continuously on u.
It is easy to show that a differentiable map is continuous.
We now wish to derive the implicit and inverse function theorems in
Banach spaces that will be used in Chapter 8. We shall need a technical
tool, the Banach fixed point theorem:
Lemma 2.4.1. Let A be a closed subset of some Banach space (V, | | -| | ).
Let 0 < q < 1, and suppose G : A A satisfies
\\Gyi - Gy
2
\\ < q\\yi - y
2
\\ for all y
u
y
2
A. (2.4.2)
2.4 Calculus in Banach spaces 151
Then there exists a unique y A with
Gy = y. (2.4.3)
/ / we have a continuous family G(x) where all the G(x) satisfy (2.4-2)
(with q not depending on x), then the solution y = y(x) of (2.4-3) de-
pends continuously on X.
Proof. We choose y
0
A and put iteratively
y
n
:= Gy
n
-\.
We have
n n
y
n
= J2 (th - vi-i)+2/0 = 53 (
G
'
_1
^ -
Gi
~
l
y)+y- (
2
-
4
-
4
)
t = l i = l
We obtain from (2.4.2)
Y,\\G
l
-
1
vi-G
i
-
1
yo\\ < E ^
1
llyi -2/0II < ^ llyi -itoll-
Consequently, the series y
n
in (2.4.4) converges absolutely and uniformly
to some y A
1
noting that A is assumed to be closed and the limit
function y = y(x) is continuous. We have
y = lim Gy
n
= G ( lim y
n
) = Gy,
n+00 \ n+00 /
hence (2.4.3). The uniqueness of a solution of (2.4.3) follows from (2.4.2),
since q < 1.
q.e.d.
Theorem 2.4.1 ( Implicit F unction Theorem) . Let Vi,V
2
,W be Ba-
nach spaces with all norms denoted by \\-\\, U C V\ x V
2
open, (xo^yo)
U', F C
X
{U^ W), i.e. F is continuously differentiable. For purposes of
normalization solely, we assume
F(x
0l
yo)=0. (2.4.5)
We also suppose that
D
2
F(x
0
,yo)'-V
2
-+W,
the derivative of F(x$, -) :V
2
^ W aty = yo, is invertible. By our differ-
entiability assumption, D
2
F(xo
1
yo) is continuous, and we assume that
152 Battach spaces
its inverse is likewise continuous. Then there exist open neighbourhoods
U\ of XQ, U
2
ofyo with U\ x U
2
U, and a differentiate map
with
F(x
y
ip(x)) = 0 (2.4.6)
and
D<p(x) = ~(D
2
F(x, (fix)))-
1
o DiF(x, (p(x)) for all xeUi
(2.4.7)
(D\F(-,y) :V\>W is the derivative of F(-,y) : V\ W). In fact, for
every x U\
9
<p(x) is the only solution of (2.4-6) in U
2
.
The content of the implicit function theorem is that the equation
F(x,y) = 0
can be solved locally uniquely for y as a function of #, if the derivative
of F w.r.t. y is continuously invertible.
Proof. The idea is to transform the problem into a fixed point problem
for which the Banach fixed point theorem is applicable. We put
l:=D
2
F(
X(h
yo)-
With this notation, our fixed point equation is
*( *, y) = y - r
l
F{x,
y
) =
y
(2.4.8)
which clearly is equivalent to our orginal equation F(x, y) = 0. For every
x, we thus want to find a fixed point of
yt-+$(x,y).
Using l~
x
ol = id (note that / is invertible by assumption), we get
*(ar, yi) - *(x, y
2
) = r
1
(D
2
F(x
0j
y
0
)(yi - y
2
) - (F(ar, t/i) - F(x, y
2
)).
In Lemma 2.4.1, we take q = ~, and by the differentiability of F at
(#o>2/o) and the continuity of Z"
1
, we may find 6' > 0, > 0 with the
property that for
Wx-xoW^S'
and
\\yi - Vo\\ < ^, I life - yo\\ < e ( hence also \\yi - y
2
\\ < 2e ),
2.4 Calculus in Banach spaces 153
we have
||*(rc,yi) - *(a?,i&)|| < - \\yi - !fe||.
Furthermore, we may find 6" > 0 with the property that for
\\x sco11 < <$">
we have
||*(a?,b)-*(a?o,d)ll < 2*
Since $(#o,2/o) = 2/o by assumption, we then have for \\y yo\\ < e
\\Q(x,y) - voll < | | *(, ) " *(aMfo)ll + 11*0*. 0o) " *(*o, tt>)||
<^\\y-yo\\ + ^<e
whenever \\x XQ\\ < 6 := min(<$',<$"). This means that if \\x XQ\\ < <5,
$ (#,?/) maps the closed ball
A:={yV
2
:\\y-y
0
\\<e}
onto itself. By Lemma 2.4.1, for every x with \\x XQ\\ < <5, there exists
a unique y =: (>(#) with \\y y
0
\\ < e and y = $(x, t/), i.e. F(x
1
y) = 0.
Moreover, t/ depends continuously on x. We consider the open balls
Ui := {x : \\x - x
0
\\ < (5}, t/
2
:= {2/: ||v - Vo\\ < >
($(#,) also maps the open ball U
2
onto itself.) By choosing <$, > 0
smaller, if necessary, we may assume
f / i x t /
2
C U.
It remains to show that <p(x) is differentiate and that its derivative is
given by (2.3.7). We consider
(a?i,<p(a?i)) UiX U
2
,
and abbreviate y\ := <p(x\). We put
Zi :=>iF(a?i,0i),J
2
:= D
2
F(x
uyi
).
Since F is differentiable, we may write
F(x,y) =h(x-xi) + l
2
{x-x
2
) +r(x,y)
where the remainder term satisfies
lim ^ 4 - = 0 . (2.4.9)
x-+xi \\x - xi\\ + \\y - yi\\
y - 2/i
154 Banach spaces
Since F(x,<p(x)) = 0 for x G U\ by construction of <p, we obtain
ip{x) = -l^hix-xi) +2/i -l2
X
r{x,<p{*))- (2.4.10)
By (2.4.9), we may find r),p> 0 such that for
I k - a : i | | <r), \\y-yi\\ <p
\ \ r(x,y)\ \ <
2
| |
r
i | | ( H
a
-
Xl
H
+
H - WID-
Thus
| | r(x, <p(x))|| < ^TjpTjjfll* - xi | | + IM*) ~ >(*i)ID- ( 2-4.11)
By (2.4.10), ( 2.4.11) ,
||v>(x) - v(x,)t| < WtfhW \\x - and + ^ ||x - xi | | + i | Mx) - ^ ( xi ) | | ,
hence
\\<p(x) - <f(xi)\\ < c\\x x\\\ for a constant c.
We abbreviate VQ(X) := ^"V (#,</?(#)) and rewrite (2.4.10) as
y>(a?) - ^(a?i) = -l^hix - a?i) -f r
0
(x), (2.4.12)
with
lim
r
^ = 0 from (2.4.9). (2.4.13)
x-+xi \\x Xi\\
(2.4.12) and (2.4.13) yields the differentiability of <p and (2.4.7).
q.e.d.
Corollary 2.4.1 ( Inverse F unction Theorem) . LetV,W be Banach
spaces, U C V open, y
0
U. Let f : U W be continuously differen-
tiable, and assume that the derivative Df(yo) is invertible with a con-
tinuous inverse. Then there exist open neighbourhoods U2 C U of yo,
Ui f f(Vo) XQ SO that f maps U2 bijective onto U\, and the inverse
<p := f~
l
: U\ U2 is differentiable with
D<p(x
0
) = (Df(yo))-K (2.4.14)
Proof. We shall apply Theorem 2.4.1 to F(x
1
y) := f(y) - x, and find
an open neighbourhood U\ of XQ and a differentiable function
<p:Ui-*V
2.4 Calculus in Banach spaces 155
with ^>(U\) C C/2 for a neighbourhood U2 of yo
1
with <p(
x
o) = Vo
a n
d
F(x,<p(x)) = 0, i.e. f((f(x)) = x for x G C/"i.
As y>(/i) = f~
1
(U\) is open, we may redefine [/
2
as ^(C/i), and tp then
yields a bijection between U\ and C/2. As f(<p(x)) = x, the chain rule
implies
Df(ip(x
0
)) ity(*o)) = id, i.e. (2.4.14).
g.e.d.
The next topic concerns ordinary differential equations in Banach spaces.
In Chapter 9, we shall use the Picard-Lindelof theorem in a Banach
space that we shall now derive.
We need the integral of a continuous function
x.I^V
from some interval / = [a, 6] C R into some Banach space V,
1
J a
x(t)dt.
This can be defined as a Riemann integral as in the case of real-valued
functions through approximation by step functions.
Given a continouous
we say that x(t) solves the ODE (ordinary differential equation) on / ,
--x{t) = x(t) = $(t, x(t)) with x(a) = x
0
(2.4.15)
at
if for dlltel
x ( t ) = x
0
+ / ${T,x(r))dT. (2.4.16)
J a
Theorem 2.4.2 ( Picard-Lindelof) . Suppose that $ is uniformly Lip-
schitz continuous, i.e. suppose there exists some L < oc with
| | *( t i , *i ) - *( t
2
, x
2
) | | < L (|*i - t
2
\ 4- ||xi - x
2
| | )
for all t e I,x
u
x
2
e V. (2.4.17)
Then for any XQ V, there exists a unique solution of (2.4-15).
156 Banach spaces
Proof. We shall solve (2.4.16) with the help of Lemma 2.4.1. For a con-
tinuous y : / V
1
we define Gy G C( / , V),
(Gy)(t):=x
0
+ f *(r, i / (r))dr.
J a
We note that C( / , V), the space of continuous functions from / with
values in V, is a Banach space w.r.t. the norm
\\y\\
c
o-SMv\\y{t)\\.
tet
(To verify this, one just needs to observe that any sequence (y
n
)neN C
C(I,V) with
lim \\y
n
- y
m
\\
C
o [ = lim sup \y
n
{t) - y
m
{t)\) = 0
n,m oo \ n,m*oo ^ j J
converges uniformly to some continuous function y : / V.)
We have
\\Gyi -Gy
2
\\
C
o = sup / ($(r,*/i(r)) - $( T, 2/
2
( T) ) ) GM
*/ |/
a
I
<\t a\ L\\y\ 2/211 c7
0
because of (2.4.17).
We choose e > 0 so small that
Lemma 2.4.1 with V replaced by C([a, a -f e], V) and with q = \ then
implies that there exists a unique t / GC([a, a + e],F) with
f
b
Gy(t) = xo + / $( r , y(r))dr iov a<t <a + e.
J a
Repeating the construction with a -f e in place of a and t/(t-fe) in place
of xo yields the solution on [a, a + 2c], and so on.
q.e.d.
Remark 2.^.1. If / is an infinite or semi-infinite interval, e.g. / = [a, oo),
and if (2.4.17) holds on J, we obtain a solution of (2.4.15) on / , since
Theorem 2.4.2 yields a solution on every interval [a, b] with b < oo.
Corollary 2.4.2. Let the assumptions of Theorem 2.4-2 be satisfied on
the interval I = [0, oo), and suppose that $ does not depend explicitly
ont, i.e. $ : V V, $ = $(x). For XQ V we thus consider the ODE
x(t) = *(x(t)), x(0) = a?
0
. (2.4.18)
Exercises 157
(:r(0), the value at 'time' 0, is called initial value). We denote the solu-
tion by x(xo,t). Then for s, t > 0,
X(XQ, t + s) = x(x(t), s) (semigroup property).
Thus, the solution with initial value XQ at 'time
1
1 + 5 is the same as
the solution with initial value x(t) computed at 'time
1
s.
Proof. This follows from the uniqueness statement in Theorem 2.4.2, as
both sides of (2.4.18) are solutions.
q.e.d.
Exerci ses
2.1 Let (V, \\'\\
v
) (W, \\'\\
w
) be normed linear spaces. For a linear
functional
put
f:V-+W,
l l / ( * ) l l
W
sup
zV\{0} iFllV
Show that / is continuous iff | | / | | < 00. Let L(V,W) := {/ :
V -> W linear with | | / | | < 00}. Show that if (W^IHI^) is a
Banach space then so is (L(V, W), | | -| | ).
2.2 Show that a normed space (V, ||-||) is uniformly convex if the
following condition holds:
Whenever (x
n
)
neN
, (y
n
)neN C V satisfy
limsup| | ar
n
| | < 1 , limsup | | y
n
| | < 1
and
then
lim ||ar
n
+ y
n
| | = 2,
lim (x
n
- y
n
) = 0.
n+oo
2.3 A normed space (V, ||-||) is called strictly normed if the following
condition holds: Whenever x,y V, x,y ^ 0 satisfy
ll* + vll = IMI + llvlU
then there exists a > 0 with
x = ay.
158 Banach spaces
Show that any uniformly convex normed space is strictly normed.
2.4 Does the Banach fixed point theorem (Lemma 2.4.1) continue
to hold if we replace (2.4.2) by the condition
\\Gyi - Gy
2
\\ < \\yi - y
2
\\ for all
Vl
,y
2
e A?
3
LP and Sobolev spaces
3.1 L
p
spaces
In the sequel, instead of functions f : A -> RU {00} (A measurable),
we shall consider equivalence classes of functions, where / and g are
equivalent if f(x) = g(x) for almost all x A. We shall be lax with
the notation, however, not distinguishing between a function and its
equivalence class. The equivalence class of the zero function is called the
null class, and a function in that class is called a null function.
Definition 3. 1. 1. Let A C R
d
be measurable, peR\{0}.
L
P
(A) = {(equivalence classes of) measurable
functions f : A R U {oc} with
\f(x)\"eCHA)}.
For f G LP (A), we put
\\f\\
p
--=\\f\\l,
(
A)--=(J
A
\m\
P
dxy (3.1.1)
The notation suggests that ||-|| is a norm, and we now proceed to
verify this for p > 1. First of all,
| | / | |
p
= 0 & f is a null function. (3.1.2)
Thus, ||-|| is positive definite (on the set of equivalence classes). Next,
for c R,
l| c/| |
p
= | c | | | / | |
p
. (3.1.3)
It remains to verify the triangle inequality. This is obvious for p 1:
| | /i + MLHA) ^ WMLHA) + WMILHA) (3-1-4)
159
160 L
p
and Sobolev spaces
For p > 1, we need
Lemma 3.1.1 ( Holder' s inequality) . Letp,q > 1 satisfy ~ 4- ~ = 1,
fr If(A), f
2
L(4). Then f
u
f
2
L
1
^ ) , and
| | / i / 2| l i <| | / i | |
p
| | / 2| |
f l
. (3.1.5)
Proof, By homogeneity, we may assume w.l.o.g.
Il/i| l
p
= l , ll/all
g
= l. (3.1.6)
Recalling Young's inequality, namely
a
p
b
q
1 1
ab < 4- for a, b > 0 , p, q > 1 , - + - = 1, (3.1.7)
p g p g
we have for x A
/ 1 W/ 2 W < - + - >
p q
hence by our normalization (3.1.6)
/ l/i(^)/a(a:)|te < 1 + 1 = 1 = H/ill^ll/alU-
q.e.d.
We now obtain the triangle inequality:
Lemma 3.1.2 ( Mink owsk i' s inequality) . Let / 1 , f
2

P
0A), p > 1.
T/ien
ll/i + / 2 | |
p
< | | / i | |
p
+ | | / 2| |
p
. (3.1.8)
Proof The case p = 1 is given by (3.1.4). We now consider p > 1 and
put q := ^ (so that \ + = 1). For </>(*) := l/i(*) + /
2
( z ) r \ we
have
^ = i/i+/
2
r,
i.e. V L
q
{A). Since
l/iCx) + /
2
( x) |
p
< \h(x)*{x)\ + \MxMx)\,
we get
| | / i + /
2
| |
P
' <| | / i V<| |
1
+ ll/2</'||
1
<H/l||plMI, + ll/2|lplMI,
by Holder's inequality
= ( ll/l||
p
+ l l /
2
| l
P
) l l / l +/ 2| | lp
3.1 LP spaces 161
Since p - 2 = 1, (3.1.8) follows.
q.e.d.
We have thus verified that ||-|| is a norm on L
P
(A). In fact, we have:
Theorem 3.1.1 ( Riesz-F ischer) . Let A be measurable, p > 1. Then
L
P
(A) is a Banach space.
Proof. Let (/
n
)nN C L
P
(A) be a Cauchy sequence. For every v N, we
may then find n
v
N with
ll/n - / nj | p < 2^
for a11 U
^
n
^*
This implies that the series
oo
\ \ u\
P
+J2\ \ f^-^\ \ p <
3 1
-
9
)
converges. We claim that the series
CO
then converges in L
P
(A). Since all elements of the series are nonnega-
tive, (<7
m
)
m
N converges to some g : A M
+
U {oo} pointwise in A,
and Corollary 1.2.1 implies that (g
m
) also converges to g in L
P
(A). In
particular, g(x) < oo for almost all x e A. Thus, our original sequence
(3.1.10) is absolutely convergent for almost all x A, towards some /
with | / | < g -f | / m| ; in particular / G L
p
(0). We interrupt the proof to
record:
Lemma 3.1.3. Let (/
n
)nN converge to f in L
P
(A). Then some subse-
quence converges pointwise almost everywhere to f.
In order to complete the proofs of Lemma 3.1.3 and Theorem 3.1.1,
it remains to show that the series (3.1.10) converges to / in L
P
(A).
(Then a subsequence of (/
n
) converges to / in L
P
(A). Since (f
n
) was
assumed to be a Cauchy sequence in L
P
(A), the whole sequence has to
converge in L
P
(A). It is in general not true, however, that the whole
sequence also converges pointwise almost everywhere to / . )
This is easy:
oo
/ n , W + ( / n
+ 1
( x ) " / ( * ) ) " fix)
162 L
p
and Sobolev spaces
converges to 0 almost everywhere in A, and since
fni (* ) + Yl C^ +i ^ ) ~ ^ ^) " - ^
t / =i
<2|
5
( x) | + 2| /
ni
( x) | ,
we may apply Lebesgue's Theorem 1.2.3 on dominated convergence to
conclude that we get convergence also w.r.t. ||-|L.
q.e.d.
Corollary 3. 1. 1. L
2
(A) is a Hilberi space with scalar product
(/i,/
2
):= [ h(x)f
2
(x)dx.
J A
Proof. It follows from Holder's inequality (Lemma 3.1.1) that
i( /i,/
2
) i<ii/iii
2
ii/
2
||
2
.
Thus (, ) is finite on L
2
(A) x L
2
(A). All the other properties are obvious
or follow from Theorem 3.1.1.
q.e.d.
Definition 3.1.2. Let A C R
d
be measurable, f : A -> R U {00}
measurable.
ess sup/ ( x) := inf {A G R | f(x) < A for almost all x G A)
xeA
(essential supremum), and
L(A) := {(equivalence classes of) measurable
functions f : A R U {00} with
ll/lloo = I I / I I L~
( A )
: =esssup| / ( x) | < 00}
xA
Theorem 3.1.2. L(A) is a Banach space.
Proof If is straightforward to verify that IHI^ is a norm. It remains to
show completeness. Thus, let (f
n
)neN be a Cauchy sequence in L. For
v G N, we find n G N such that for m,n>_n
v
Thus
Win / m| l oo < cy
v

\x*A\ | / n ( r r ) - /
m
( r r ) | >l j
3.1 L* spaces 163
is a null set for ra, n > n
u
, and so then is
N:= ( J | xGi 4| \fn(x)-f
m
(x)\>^
m,n>n
l
as the countable union of null sets. Since
\ fn(x) ~ f
m
(x)\ <
for Vfi^n>
m
n
v
and x G A \ iV, /
n
converges uniformly on A \ N towards
some / . We simply put f(x) = 0 for x G N. Then
ess sup | /(x) - / ( x) | = ess sup \f
n
(x) - / ( a ) | ,
since the essential supremum is not
affected by null sets,
< j p
and f
n
converges to / in L(A).
q.e.d.
We also note that Holder's inequality admits the following extension
to the case p = 1, q = oo:
Lemma 3.1.4. Let f
x
G L
1
(A), f
2
G L(A). Then f
x
f
2
G L
1
(A), and
ll/i/alli < IIAIIi ll/alloo - (3.1.H)
Proof.
/ | /i(x)/
2
(a:))| dx<esssup| /2(x)| / \fi(x)\dx
J A xA J A
= ll/
2
IUI/illi-
q.e.d.
Theorem 3.1.3. Let A C R
d
be measurable. Let 1 < p < oo, q = -^y,
P Q
L
P
(A) is reflexive.
i.e. - + j = 1. Then L
q
(A) is the dual space of L
P
(A). In particular,
Remark 3.1.1. The dual space of L
X
(A) is given by L(A) while the
dual space of L(A) is larger than L
l
(A). Therefore, neither L
l
(A) nor
L(A) is reflexive.
In order to prepare the proof of Theorem 3.1.3, we first derive:
164 LP and Sobolev spaces
Theorem 3.1.4 ( Clark son) . Let A C R
d
be measurable, 2 < q < oo.
Then L
q
(A) is uniformly convex.
Remark 3.1.2. Clarkson's theorem holds more generally for 1 < q < oo.
The proof for 1 < q < 2 is a little more complicated than the one for
2 < q < oo.
The proof of Theorem 3.1.4 is based on:
Lemma 3.1.5. Let 2 < q < oo, f,g G L
q
(A). Then
11/+g\\
q
q
+11/ - * n; < 2* -
1
( H/ H; + MI ; ) . ( 3.1.12)
Proof. For x, y > 0, we have
(x
q
+ y
q
)* < ( x
2
+i /
2
) 5 < 2 ^ ( x
9
+ i /
9
) i (3.1.13)
(In order to verify the left inequality in (3.1.13), we may assume w.l.o.g.
x
2
-f y
2
= 1. Then x
q
< x
2
, y
q
< y
2
since q < 2, and the desired
inequality easily follows. The right inequality follows for example from
Holder's inequality (Lemma 3.1.1) applied to the following functions
ut) = { %
/ i , /
2
: ( - l , l ) - R
/ i = l,
x
2
for - 1 < t < 0
b
2
for 0 < t < 1. )
The left hand side of (3.1.13) implies
(|o +
b
\
q
+ \a- fe|
9
)' < (|o + bf + \a- 6|
2
) *
<V2{a
2
+b
2
)i (3.1.14)
for a, 6 R, and by the right-hand-side of (3.1.13), we have
V2(a
2
+ b
2
)^ < 2 ^ (\
a
\
q
+ | 6|
9
)' . (3.1.15)
Equations (3.1.14) and (3.1.15) imply
| / 0r) + g(x)\" + \f(x) - g{x)\" < 2"~
l
(| / (x)|
9
+ |ff(x)|' ), (3.1.16)
and (3.1.12) follows by integrating (3.1.16).
q.e.d.
Proof (Theorem 3.14)- Let f,g e L
q
(A) with
3.1 LP spaces 165
By (3.1.12),
H/+ffll2 + l i /- 0l i 2< 2' .
Therefore, for e > 0, we may find 6 > 0 such that
l l /- 3l l , < <
whenever | | | ( / + #)| | > 1 5. This shows uniform convexity.
g.e.d.
Proof (Theorem 3.1.3). We consider the map
i: L
p
(A) -> L
q
(A)
with
<(/)() := / f(x)g(x)dx.
J A
By Holder's inequality (Lemma 3.1.1)
| | i ( / ) | | = sup | i ( / ) ( s ) | < | | / | |
p
. (3.1.17)
gLl(A)
\ \ g\ \
q
<i
Thus i(f) is indeed an element of L
q
(A)*. We claim that we have equality
in (3.1.17). This means that there exists some g L
q
with
/ f(x)g(x)dx
\JA
HI /I UML. (3.1 .1 8 )
We put g(x) := si gn/ (x) \f(x)\
p
-\ Then \g\
q
= | / |
p
, hence g e L
q
(A),
and
/ f(x)g(x)dx\^ / |/(x)p(ar)|da?
I.M I J A
= [\ f(x)\
p
dx
JA
=
( A
l / ( X) | P r f X
)
P
( ^
l / ( X) | P r f X
)
9
= ll/ltllffllg-
This verifies (3.1.18), hence equality in (3.1.17). Equality in (3.1.17)
implies that i is an isometry, in particular injective. In order to complete
the proof we need to show that i is surjective. Suppose on the contrary
that
L"{A)*\i{U{A))^%.
166 L
p
and Sobolev spaces
Since L
P
{A) is complete and i is continuous, i(L
p
(A)) is complete, hence
closed. By the Hahn-Banach theorem (Corollary 2.1.1), there then exists
veL(A)**,v ^ 0 , with
v\i(LP(A)) = 0.
We now suppose for a moment that 1 < p < 2. Then 2 < g < oo, and
L
q
{A) is reflexive by Theorems 3.1.4 and 2.2.3. We may therefore find a
g in L
q
{A) with
F(g) = v(F) for all F E L
q
{A)\
We then have for any y> E L
P
(A)
0 = v(i(<p)) = i(<p){g) = / <p(x)g(x)dx,
J A
hence g = 0 (by a reasoning as in the derivation of (3.1.18)), hence also
v = 0, a contradiction. We have shown that i furnishes an isomorphism
between L
P
(A) and L
q
{A)\ Since L
q
(A) is reflexive, so is L
q
(A)* by
Corollary 2.2.2, hence L
P
(A). In conclusion, L
P
(A) has to be reflexive
for any 1 < p < oo, and its dual space is given by L
q
(A).
q.e.d.
3.2 Approximation of L
p
functions by smooth functions
( mollification)
In this section, we shall smooth out LP functions by integrating them
against smooth kernels. As these kernels approach the Dirac distribution,
these regularizations will tend towards the original function. For that
purpose, we need some g E C^{R
d
)\ with
g{x) > 0 for all x E R
d
(3.2.1)
g(x) = 0 for |x| > 1 (3.2.2)
/ g(x)dx(= f g(x)dx)=l. (3.2.3)
Ju
d
\ JB{O,I) J
Such a g is called a Friedrichs mollifier. In this , fi will always denote an
open subset of R
d
. Let / E L
x
(0). We extend / to all of R
d
by putting
t For O C K
d
open, C*Q(0) is the space of all C functions <p on Q for which the
closure of {x G ft | <p(x) ^ 0}, the support of <p (suppv?), is a compact subset of
O. Elements of Cg(Q) are often called test functions.
3.2 Approximation of LP functions by smooth functions 167
f{x) = 0 for x e R
d
\ ft. Let h > 0.
/ f c (
*
) : =
/ ,X ^ )
/ ( y ) d
* -
(3
-
2
-
4)
fh is called the mollification of / with parameter h. In order to appre-
ciate this definition, we first observe
suppQ \j^~) C B(
Vl
h) := {zR
d
\ \z-y\< h}, (3.2.5)
where Q ( ^ ^ ) is considered as a function of x, and
hL'icir)*-*-
<3
-
2
-
6)
For these reasons, one expects that fh tends towards / as h tends to 0.
It remains to clarify the type of convergence, however. The advantage
of approximating / by fh comes from:
Lemma 3. 2. 1. Let Q,' CC fif, h < dist(fy,dfi). Then
f
h
C(fi').
Proof By Corollary 1.2.2, we may differentiate w.r.t. x under the inte-
gral sign in (3.2.4), and since g C so then is fh-
q.e.d.
We now start investigating the convergence of fh towards / .
Lemma 3. 2. 2. If f C(Q), then for each ft
f
CC ft, fh converges
uniformly to f on fi' as h 0. In symbols: fh^f on ft
f
as h 0.
Proof We have
f{x) = / g(w)f{x)dw by (3.2.3) (3.2.7)
J\w\<l
and
fh(x) = [ g(w)f(x - hw)dw (3.2.8)
J\w\<l
by using the substitution w = ^ in (3.2.4). For Q
f
CC ft and h <
f
t
Q* CC fi' means t hat t he closure of fi' is compact and contained in fi. We say
t hat Q' is relatively compact in Q.
168 L
p
and Sobolev spaces
\ dist(n' ,dfi), we then have
sup | / (x) - h{x)\ < sup / Q(W) \f(x) - f(x - hw)\dw
xQ' xeQ' J\ w \ <l l\ w \
< sup | / (x) - f(x-hw)\
\ W\ <1
using (3.2.3) once more.
Since fi' is bounded, {x G fi1 dist(x, fi') < h) is compact (recall the
choice of h). Therefore, / is uniformly continuous on that set, and we
conclude that
sup | / (x) - fh(x)\ -> 0 as h -+ 0,
xQ'
i.e. uniform convergence.
q.e.d.
Theorem 3. 2. 1. Let f G L
P
(Q), 1 < p < oo. T/ien /^ converges to f
in L
p
(Q) as h -+ 0.
Froo/. We have for p G 17 (f2)
/ \g
h
(x)\
p
dx
Jn
= Q(
w
)g(
x
hw)dwdx
JQJ\W\<I
< / I / g(w)dw J I / g(w) \g(x - /ut;)|
p
cfe; J .
JQ \J\W\<1 J \J\u)\<l J
by Holder's inequality
= / g(w) / \g(x - hw)\
p
dxdw,
J\ w \ <l JQ
using (3.2.3) and Fubini's theorem,
= / Q(w) / \g{y)\
p
dydw
J\ w \ <l JR
d
= j \ g(y)\
p
dy,
JQ
using (3.2.3) again.
\\9h\\
LPiQ)
<\\9\\
LP{Q
y (3.2.9)
Thus
3.2 Approximation of LP functions by smooth functions 169
Let e > 0. By Theorem 1.1.4, (6), we may find ip G C#(R
d
) with
I I / - V I I L P
(
R - ) < | - (
3
-
2
-
10
)
Since (p has compact support, we may apply Lemma 3.2.2 to conclude
that for sufficiently small h > 0,
I I > - WI I I LP( R* ) ^ 3' ( 3.2.11)
Applying (3.2.9) to / - </?, we obtain
| | / n
<
/
:?
n|lLP( Rd) < I I / - VI I LP
(
R- ) - (
3
-
2
-
12
)
(3.2.10)-(3.2.12) yield
11/ - Al l ^ n) < 11/ - /nlli> ( R-) < ( 3-2.13)
g.e.d.
Corollary 3. 2. 1. For 1 < p < oo, C(n) is dense in L
p
(fi).
Proo/. Let / G L
p
(0), e >0 . We may then find O' CCf i with
H/llLp(n\n')
<
2*
We put / ' := fxLp{Q')- Then
H/ - /
/
l l Lp ( n ) < | - (
3
-
2
-
14
)
By Theorem 3.2.1, for sufficiently small h,
l l /
/
- / f cl l Lp( n
)
<5- (3-2-15)
By (3.2.13), (3.2.14)
11/ ~~ //IIILP( Q)
<
2*
Since f'
h
G C^(Q) for h < dist(fi' ,dfi), the claim follows.
q.e.d.
Corollary 3. 2. 2. L
p
(fl) is separable for 1 < p < oc. jBvcry / G L
P
(Q)
con 6e approximated by piecewise constant functions.
Proof By Corollary 3.2.1, it suffices to find a countable subset BQ of
L
P
(Q) with the property that for every (p G CQ(Q) and every e > 0,
there exists some a G BQ with
IIP-al l i en) < ( 3-2.16)
170 L
p
and Sobolev spaces
Let B the set of all functions a on M
d
of the following form: There exist
some fc, N N and rational numbers c*i,..., a* and cubes Qi , . . . , Qfc G
M
d
with corners having all their coordinates in jjZ and of edge length
^ such that
for x E Qi
otherwise.
Clearly, B is countable. Since a continuous function (p with compact
support is uniformly continuous, we may easily find some a B with
" <*> = { ?
11 - ^ I LP( Q ) ^ ll
a
~ <P\ \ LP(VL* ) <
e
- ( 3.2.17)
We put BQ := {axn I # } . #n is likewise countable, and from
(3.2.15), (3.2.16), we conclude that BQ is dense in L
P
(Q).
q.e.d.
Remark 3.2.1. The separability of L
P
(Q) can also be seen by using Corol-
lary 3.2.1 and the Weierstrass approximation theorem that allows the
approximation of continuous function with compact support by polyno-
mials with rational coefficients.
The preceding results do not hold for L(fi). Namely, if a sequence
of continuous functions converges w.r.t. IHI^oom), then it converges uni-
formly, and therefore, the limit is again continuous. Therefore, noncon-
tinuous elements of L(Q) cannot be approximated by continuous func-
tions in the L-norm. Also, L(Q) is not separable. To see this, let
(a>n)neN be any subsequence of {0,1}, i.e. a
n
{0,1} for all n. To (a
n
),
we associate the function /(
an
) on (0,1) defined by
ff
, - J
1
for^<x<^rrifa
fc
= l
' ^ - " t o f o r ^ < x < ^
T
i f a , = 0
torkeN
-
Then for any two different sequences (a
n
), (6
n
),
| | /(a
n
) -/(&n)|lr,~((o,i))
==1
-
Since the set of subsequences of {0,1} is uncountable, this implies that
L((0,1)) is not separable. Of course, a similar construction is possible
for L(fi), fi any open subset of R
d
.
We finally note:
Lemma 3. 2. 3. Let f L
2
(Vt), and suppose that for all tp e CQ(Q)
[ f(x)<p(x)dx = 0.
JQ
Then / = 0.
3.3 Sobolev spaces 111
Proof. Since Cg(fi) is dense in L
2
(fi), and since
gy-> I f(x)g(x)dx
Jn
is a continuous linear functional on L
2
(fi), we obtain that
f(x)g(x)dx = 0 for all g G L
2
(0).
/
/n
Putting # = / yields the result.
q.e.d.
3.3 Sobolev spaces
In this section, we wish to introduce certain extensions of the LP spaces,
the so-called Sobolev spaces. They will play a fundamental role in subse-
quent chapters because they constitute function spaces that are complete
w.r.t. norms naturally occurring in variational problems. In this section,
ft will always denote an open subset of R
d
. We shall use the following
notation: For a d-tuple a := ( a i , . . . ,a</) of nonnegative integers,
N-t .. .=-( sr) " ' -( s' r-
Definition 3. 3. 1. Let u,v G L
1
(fi). Then v is said to be the ot-th weak
derivative of u, v := D
a
u, if
I\pvdx = ( - 1 )
N
J uD
a
ipdx (3.3.1)
for every (p G Cg .
We can now define, for k G N and 1 < p < oo, the Sobolev space
W
k
*(Q) := {u G L
p
(Q) | D
a
u exists and lies in
LP(Q)forall\a\ <k),
liv*.p(fl)
:
= ] C / l
D

u
l
P
Fino/fe fe* #*'
P
(Q) andH^
p
(Q) be the closures o / C ( l ] ) n ^ ( ( l )
and C$ n W
fc
'P(fi), respectively in W
k
*(Q).
172 L
p
and Sobolev spaces
We shall use the following abbreviations for u G W
1
'
1
(fi), 1 < i < d.
D{U is the weak derivative for the multiindex ( 0, . . . , 0, 1, 0, . . . , 0), 1 at
the i
t h
position, and Du is the vector ( Di t t , . . . , Ddu) of all first weak
derivatives.
The following result is obvious.
Lemma 3. 3. 1. Letu G C
k
(Q), and suppose all derivatives ofu of order
< k are in L
P
(Q). Then u G W
k,p
(Q), and the weak derivatives are given
by the ordinary derivatives.
q.e.d.
Thus, the W
kyP
spaces constitute a generalization of the spaces of
k times differentiate functions. The W
k)P
norm is considerably weaker
than the C
fc
-norm, and so the W
k
*
p
spaces are larger than the C
k
spaces.
Before investigating the properties of these spaces, it should be useful
to consider an example: Let ft = (1,1) C M, u(x) := \x\. We claim that
u G W
l,p
(Q) for 1 < p < oc. In order to see this it suffices that the first
weak derivative of u is given by
Indeed, we have for (p G C Q( ( - 1 , 1))
/ (p(x)v(x)dx = / <p'(x) \x\ dx.
We claim, however, that u is not contained in W
2,p
(ft). Namely if w(x)
were the second weak derivative of it, it would have to be the first weak
derivative of v
)
and consequently, we would have w(x) = 0 for x ^ 0.
The rule for integration by parts (3.3.1) would then require that for all
y?e Q
1
( ( - i , i ) )
0 = / (f(x)w(x)dx
= (p
f
(x)v(x)dx
= / (p'(x)dx / ip'{x)di
= 2(p(0)
which is not the case. Thus, v does not have a first weak derivative.
3.3 Sobolev spaces 173
Remark 3.3.1. Some readers may have encountered the notion of a dis-
tributional derivative. It is important to distinguish between weak and
distributional derivatives. Any L
1
(fi) function possesses distributional
derivatives of any order, but as the preceding example shows, not nec-
essarily weak derivatives. In the example, of course, the second distri-
butional derivative of u is 2<5o, where <5o is the Dirac delta distribution
at 0. u does not possess a second weak derivative because the delta
distribution cannot be represented by an L
1
function.
Theorem 3. 3. 1. The Sobolev spaces W
k,p
(fl) are separable Banach
spaces w.r.t. ||-|liy*,P(n)-
Proof. That ||-||^fc,p(n)
1S a
norm follows from the fact that IHI^pm) is
a norm (see section 3.1). Similarly, we shall now derive completeness of
W
k
*{Q) from the completeness of the L
P
(Q) spaces (Theorem 3.1.1).
Thus, let (v
n
)nN C W
kiP
(Q) be a Cauchy sequence w.r.t. ||-||w*.p(n)*
This implies that (D
a
u
n
)
n
^ is a Cauchy sequence w.r.t. IHI^pm) for
all \OL\ < k. By Theorem 3.1.1, (D
a
u
n
) therefore converges in L
p
(fl)
towards some v
a
.
[ D
a
u
n
ip = (-l)l
a
> / u
n
D
a
<p. (3.3.2)
J Q J Q
Therefore, v
a
is the a-t h weak derivative of t>o, the L
p
-limit of (u
n
)
ne
w,
and consequently VQ W
k,p
(ft). The separability again follows from the
corresponding property for L
p
(fi) (Corollary 3.2.2).
q.e.d.
Theorem 3.3.2. W
k
* (Q) = H
k
* (n).
This result says that elements of W
kyP
(Q) can be approximated by
C(fi) functions w.r.t. |Hlwfc,
P
(m. I*
1
general, however, for k > 1 one
has ifo'
p
(0) ^ W
fc
'
p
(fi) so that W
k
*(fl) functions cannot be approxi-
mated by Cg(fi) functions, in contrast to L
P
(Q) functions where this
is possible (Corollary 3.2.1). This is seen from the following simple ex-
ample:
11 = (1,1) C K, u(x) = 1. If ((p
n
)neN C CQ (ft) converges to u in
L
1
(fi), then after selection of a subsequence, it converges pointwise al-
most everywhere (Lemma 3.1.3), and therefore, for sufficiently large n,
there exists x
n
G (-1, 1) with (p
n
(x
n
) > \. Since <p
n
(1) = 0 = y>
n
(l)>
this implies that
j _ \ <p'
n
{x)\ dx> 1.
174 L
p
and Sobolev spaces
Therefore, <p
f
n
cannot converge to u
f
= 0 in L
p
(( 1,1)), and therefore
(p
n
cannot converge to u in W
1, p
((1,1)).
Proof (Theorem 3.3.2). We have to show that any u W
k
>
p
(Q) can
be approximated by C(fi) functions. As in 3.2, we extend u to be 0
outside fi and consider the mollifications Uh C(fi). We compute
A*(u/i(a?)) = yi J D
a
,
x
Q ( j- J w(y)dy (using Corollary 1.2.2)
where D
a
,x is the derivative w.r.t. x,
= (-l)jf
-
i?,,e(^)-(y)d
= j ^ Q ( ^ ) A,u(y)d
by definition of D
a
u
= (D
a
u)
h
(x). (3.3.3)
Thus, the derivative of the mollification is the mollification of the deriva-
tive. Since D
a
u L
p
(n), by Theorem 3.2.1, (D
a
u)h converges to D
a
u
in L
P
(Q) for h 0. By (3.3.4), we conclude that D
a
(uh) converges to
D
a
u in 1^(0), for all | a| < fc, and this means that UH converges to u in
w
k
* {n).
q.e.d.
Theorem 3.3.3. W
k
*
p
(Sl) is reflexive for k e N, 1 < p < oo.
Proof. It follows from Theorem 3.1.3 that the dual space of W
kiP
(fl) is
given by W
k,q
(Q
l
)
)
with -f ^ = 1. This implies reflexivity.
g.e.d.
Theorem 3.3.4. iJo' ^fi) is closed under weak convergence in W
kyP
(Q).
Proof This follows from Lemma 2.2.5, since HQ
,P
(Q) by its definition is
a closed subspace (w.r.t. strong convergence) of W
k
'
p
(U).
q.e.d.
Theorem 3.3.5. For 1 < p < oo, k G N, any sequence in W
k,p
(Q) that
is bounded w.r.t. ||*||v^fc,p(n) contains a weakly convergent subsequence.
3.4 Rellich's theorem, Poincare and Sobolev inequalities 175
Proof. By Theorems 3.3.1 and 3.3.3, W
k
'
p
(Q) is separable and reflexive.
Therefore, the result follows from Corollary 2.2.1.
q.e.d.
3.4 Rellich' s theorem and the Poincare and Sobolev
inequalities
The compactness theorem of Rellich is:
Theorem 3. 4. 1. Let ft C R
d
be open and bounded. Let (u
n
)
ne
^ C
H
0
yP
(ft) be bounded, i.e. \\u
n
\\
W
i,
p
,
Q
\ < c (independent of n). Then a
subsequence of (u
n
)
n
^ converges in L
p
(fl).
Remark 3.4.1- Rellich originally proved the theorem for p = 2. Kon-
drachev proved the stronger result that some subsequence converges in
Z/*(n) for 1 < q < ^ if p < d and for 1 < q < oo if p > d. Of
course, these exponents come from the Sobolev Embedding Theorem
(see (3.4.12)). See Corollary 3.4.1 below.
Proof. Since u
n
G H
0
,P
(Q), for every n G N and e > 0, there exists some
v
n
eC^{Q) with
| Pn ~ Vn| | jyi, p( n) < ( 3. 4. 1)
Therefore
||t>n||||ri,p
(
n) <
C
' ( =
C
+ f ) ' (
3 A 2
)
We consider the mollification
v
nA
x
) = yt J e \ h^)
V
^y)
d
y
of v
n
and estimate
\v
n
(x) -V
nyh
(x)\
by (3.2.7), (3.2.8) / g(w)(v
n
(x) - v
n
(x - hw))dw
|^NI<i
r
h\ w \ | Q
< / Q{w) / \^-v
n
(x-r'd)
J\w\<l J0 \
r
w
drdw with ^ = 7 - 7 . (3.4.3)
\w\
176
This implies
LP and Sobolev spaces
/ \v
n
(x) - v
Uyh
(x)\
p
dx
JQ
f ( f f
hM
I d I Y
< / I / g(w) / \v
n
(x vd) \drdw\ dx
JQ \J\W\<I JO \or \ J
f ( f / \ rh\ w \ I ft I \
= / I / (g(w)
1
~p) g(w)p / \v
n
(x rfi)\drdw J dx
JQ \J\W\<I
V
' Jo \
r
I /
< ( / g(w)dw) I [ g(w)h
p
\w\
p
f \Dv
n
(x)\
p
dxdw) ,
VM< i / v M< i J J
using Holder's inequality, Fubini's theorem and the notation
n (
d d
Since fi
w
i
<1
g(w)dw = 1 (by (3.2.3)), we obtain
K - Vn,h\\
LP{
n) ^
h
W
Dv
n\\
L
P(Q)
< he' by (3.4.2)
Next,
< - if h is sufficiently small. (3.4.4)
\VnM
x
)\ ^ yi
C
o\\
V
n\\
L
i
{
U)
with Co : sup^ g(z) by definition of v
n
,h
< -jcoimeasny-p \\v
n
\\
LP(n)
(3.4.5)
by Holder's inequality,
and similarly
O -j
(3.4.6)
with Ci := sup
2
| ^r ^( ^) | - From (3.4.2), (3.4.5), (3.4.6), we see that for
fixed h > 0,
|t>n,h||
C
i
(n)
< constant (3.4.7)
(where the constant depends on h). Therefore, (v
n
,/i)nN contains a uni-
formly convergent subsequence by the Arzela-Ascoli theorem. Since uni-
3.4 Rdlich's theorem, Poincare and Sobolev inequalities 177
form convergence implies L
p
-convergence (e.g. by Theorem 1.2.3), the
closure of t>
n
,h, is compact in L
P
(Q). Since a compact subset of a metric
space (e.g. a Banach space) is totally bounded, there exist finitely many
u?i , . . . , WN L
P
(Q) such that for every n N there exists 1 < j < N
with
e
IKft-WjllLp(n) < g- (
3
-
4
-
8
)
By (3.4.1), (3.4.4), (3.4.8), for every n <E N we find 1 < j < N with
\
u
n
w
j\\LP{Q) "^

'
Thus, (w
n
)nN is totally bounded in L
p
(ft). Therefore, the closure of
(u
n
)
n
f$ in L
p
(fi) is compact (again, a general result for metric spaces),
and it thus contains a convergent subsequence in L
p
(ft).
q.e.d.
We now come to the Poincare inequality:
Theorem 3. 4. 2. Let ft C R
d
be open and bounded. For any u E HQ
,P
(Q)
i
| | u|
W) * ( r )
d
H ^ ' W (
3
-
4
-
9
)
lu/iere u;^ is the Lebesgue measure of the unit ball in R
d
.
Proof. Since CQ(Q) is dense in HQ' ^ Q) , we may assume u e Co (ft). We
put u(x) = 0 for all x e K
d
\ ft. For d e R
d
with |tf| = 1, we have
/DO
w(x) = / Tr
w
(
x
+
r
$)dr.
Jo or
Integration w.r.t. $ yields
I 1 f f d
\u(x)\ = \ -, / / uix + ntydfidr
I dud Jo J\0\=i dr
<xr / l ^r r l ^( ) l *.
<^d JQ \ X - vf
178 LP and Sobolev spaces
Therefore
p
' dx ( jf|u( * ) f<
P
- l
by Holder's inequality
^ ( / w ' * )
f
( / ^ ) <
3
-
4i(,

using Fubini's theorem to exchange the order of integration in the first
factor.
In order to control
JQ \x-y\
we choose R with
meas ft = meas B(y, R) = uJdR
(B(y,R):={zeR
d
\ \z\ < R}).
Since
1 1
-jr < -^Jr for la? i/l > ^
^ d - i -
R
d-i l * -
1 1
JZT > ^ n for \x-y\< R,
we have
I x- yl " "
1
- ^ -
/ jzrdx < I I^i^x
JQ \X - y\ JB(
y
,R) \x - y\
= dw
d
R (3.4.11)
= dw
d
a
(rneas 0 )
3
.
Equations (3.4.10) and (3.4.11) yield (3.4.9).
q.e.d.
3.4 Rellich's theorem, Poincare and Sobolev inequalities 179
We now come to somewhat stronger results that will however only be
needed in Chapter 9. Namely, we have the Sobolev inequalities.
Theor em 3. 4. 3. Let u G H^
P
(Q).
(i) Ifp<d, then u G L^F( f i ) , and
I HI ^L <c||to||
p
. (3.4.12)
(ii) Ifp>d, then u G C(fi), and
sup\u\ < c(measfi)^~p \\Du\L (3.4.13)
n
v
with constants c depending only onp and d. (Actually, by a Theo-
rem of Morrey, forp > d, u G H
0
,p
(ft) is even Holder continuous
with exponent 1 ^.)
We only prove (i) as (ii) will not be used in the present book:
Proof. We first assume u G CQ(Q). Since u has compact support, we
have
/
oo
|Diu(y)|dy* fori = l , 2, . . . , d.
-oo
Multiplying these inequalities for i = 1, . . . , d yields
/
d
r \ ^
W(y)\
7=T
<\JlJ_ IA"(2/)l<Vj
Using Holder's inequalityf, we compute
/
d
\u(x)\*^ dx
1
-co
IDMy^dy
1
) / M ] / IAti()|d* dx
1
-co / J co \ i
=
2^" /
1
a
oo \ J Z T / < * /* oo \ 3^ r
JDmMlVJ (n/ _ lAtid/JId^dx
1
) .
f More precisely, one uses Exercise (2) below with p\ = = Pd-i ~ d\.
180 L
p
and Sobolev spaces
Iteratively also integrating w.r.t dx
2
,..., dx
d
finally yields
iHi
L
A
( n)
^ ( n/
n
i^ wi^ )
< 3 f Y\Diu(x)\dx
This is (3.4.12) for p = 1. The case of general p may now be obtained by
applying (3.4.14) to \u\^ for suitable \i > 1 and using Holder's inequality.
Namely, from (3.4.14) for |u|
M
in place of u
lun^K^Jjuixr-'iDuw idx
<^\\\ur
1
d
by Holder's inequality.
For p < d, we may take \i = * 7 '
p
and obtain
\\Du\\
LP
f o r j + i
Li p q
dp
L.l l M- 1 i i n- . M
LP ' H
M ^ ft II I I M1 11 r-k
ud_ < -i \ w r ud \ \ Du
which yields (3.4.12), since = ^ .
As a consequence, we obtain the theorem of Kondrachev:
q.e.d.
Corollary 3. 4. 1. Let ft E R
d
be open and bounded. Let (u
n
)
ne
^ C
H
0
,p
(ft) be bounded for some 1 < p < d. Then a subsequence converges
inL
q
(fl) for any I <q< - ^ .
Proof. Prom Theorem 3.4.1 we know already that a subsequence con-
verges in L
p
{ft). We may assume q > p as otherwise the result is an easy
consequence of Holder's inequality since ft is bounded. We denote this
converging subsequence again by (u
n
).
Prom Holder's inequality, we obtain
\\u
n

u
m\\La(Q) < \\U
n
Um\\Li(Q) \\
u
n ~ ^m| | d_
if [i satisfies - = /i -f (1 /i) (
q \p d
<c\\u
n
- u
m
| |
1 ( n )
\\D(u
n
- Wm)|lI;
(
M
n) (3.4.15)
by Theorem 3.4.3 (i).
Exercises 181
Since Du
n
is bounded in L
P
(Q) by assumption, and (u
n
) is a Cauchy
sequence in L
P
(Q), hence also in I/
1
(fi), (3.4.15) then implies the Cauchy
property in L
q
(ft).
q.e.d.
3.1
3.2
Let
Ax := {x G R
d
and consider
Exercises
> 1} , A
2
:= {x G
<1> .
f(x) = \\x\\
x
for AGE.
For which values of d,p, A is / G L*(Ai), or / G U
)
(A
2
)
c
l
Let A C R be measurable. Let pi , . . . ,Pfc > 1 ]Ci=i p
=
*>
/< G LP'(A) for i = 1, . . . , k. Show / i ... f
k
G L
X
(A), with
II* ^ n /iiip.
i <=i
3.3 Let A C R
d
be measurable, meas A < oo, 1 <p < q < oc. Then
L(A) C LP (A), and for / G L*(A)
3.4
<
T I I / I I I
(meas A) p (meas A)
(Hint: Apply Holder's inequality with fi = I, f
2
= f)
Let A C i
d
be measurable, 1 < p < tf < r, ^ = f +
i i
7
i
,
/ G LP(A) n L
r
(A). Then / G L*(A), and
L ( A)
<
l - Q
LP(A) UJ \\L
r
(A)
3.5 Let A C R
d
be measurable, meas A < 00, / : A -> R U {00}
measurable. Then
lim
1
T 11/11
LP{A)
P-* (meas A) p
(where we allow these quantities to be infinite).
L(A)
L
p
and Sobolev spaces
Let A C M
d
be measurable, (f
n
)neN C L
P
(A) with
11fn||p < constant.
Suppose f
n
converges pointwise almost everywhere on A to some
/ . Is / e L
P
(A), and do we necessarily get
l l / n - / | | p - 0 as n~+oo?
Let A\,A
2
,f be as in exercise 1). For which d, fc,p, A is / in
W
k
*(A
x
) or in W
k
*(A
2
)?
Consider the sequence (sin(nx))
ne
N in L
2
((0,1)). Does it con-
verge in the L
2
-norm? Does it converge weakly? If so, what is
the limit?
4
The direct methods in the calculus of
variations
4.1 Description of the problem and its solution
The typical problem of the calculus of variations is to minimize an inte-
gral of the form
F(u) := / f(x,u(x),Du(x))dx
Jn
where fi is some open subset of R
d
(in most cases, Q is bounded), among
functions
u: n - > R
belonging to some suitable class of functions and satisfying a boundary
condition, for example a Dirichlet boundary condition
u(v) = 9(y)
f o r
ytdft
for some given g : dQ R. Thus, the problem is
F(u) > min for u EC,
where C is some space of functions. The strategy of the direct method
is very simple: Take a minimizing sequence (w
n
)neN C C, i.e.
lim F(u
n
) = inf F(u),
n>oo uC
and show that some subsequence of (u
n
) converges to a minimizer u G C.
To make this strategy be successful, several conditions should be met:
(1) Some compactness condition has to hold so that a minimizing
sequence contains a convergent subsequence. This requires the
careful selection of a suitable topology on C.
183
184 Direct methods
The limit u of such a subsequence should be contained in C. This
is a closedness condition on C.
In particular, for (1) and (2) to hold, C should not be too restric-
tive. In other words, one should not specify too many properties
for a solution u in advance.
Some lower semicontinuity condition of the form
F(u) < liminf F(u
n
) if u
n
converges to u
n> -oo
has to hold, in order to ensure that the limit of a minimizing
sequence is indeed a minimizer for F.
The lower semicontinuity condition becomes easier if the topology of
C is more restrictive, because the stronger the convergence of u
n
to u
is, the easier that condition is satisfied. That is at variance, however,
with the requirement of (1) since for too strong a topology, sequences
do not always contain convergent subsequences. Therefore, we expect
that the topology for C has to be carefully chosen so as to balance these
various requirements. In order to gain some insights into this aspect, it
is useful to approach the problem from an abstract point of view. Thus,
we shall return to the concrete integral variational problem raised in the
beginning only later.
4.2 Lower semicontinuity
We say that a topological space X satisfies the first axiom of countability,
if the neighbourhood system of each point x G l has a countable base,
i.e. there exists a sequence (t/
I/
)
1/
eN of open subsets of X with x .U
V
with the property that for every open set U C X with x G U there exists
n e N with
V
n
cV.
X satisfies the second axiom of countability if its topology has a count-
able base, i.e. there exists a family {U
u
)
y
^n of open subsets of X with
the property that for every open subset V of X, there exists n < N with
U
n
CV.
We note that separable metric spaces X satisfy the second axiom of
countability. In fact, let (X^)^

N be a dense subset of X, and let ( r ^ ) ^ ^


be dense in 1R
+
. Then
{7(x,r
M
) := {x e X : d(x,x) < r^}
4-2 Lower semicontinuity 185
(d(-, ) the distance function of X) forms a countable base for the topol-
ogy.
If the first countability axiom is satisfied, topological notions usually
admit sequential characterizations. For example, if (#
n
)nN C X is a se-
quence in a topological space X satisfying the first axiom of countability,
then any accumulation point of (x
n
) (i.e. any x G X with the property
that for every neighbourhood U of x and any m G N, there exists n > m
with x
n
G U) can be obtained as the limit of some subsequence of (x
n
).
Although we shall often employ weak topologies which typically do not
satisfy the first axiom of countability, for our purposes it will usually be
sufficient to use sequential versions of topological properties. For that
reason, we shall define our topological notions in sequential terms, with-
out adding the word 'sequentially'.
Definition 4. 2. 1. Let X be a topological space. A function F : X
1 := R U { i oo} is called lower semicontinuous (Isc) at x if
F(x) < l i mi nfF(x
n
)
n+oo
for any sequence (x
n
)
ne
n C X converging to x. F is called lower semi-
continuous if it is Isc at every x G X.
The following properties are immediate:
Lemma 4. 2. 1.
(i) IfF:X-+Ris Isc, A > 0, then \F is Isc.
(ii) If F,G : X ~+ R are Isc, and if their sum F + G is well defined
{i.e. there is no x G X for which one of the values F(x),G(x) is
+oo and the other one is oo), then F + G is also Isc.
(iii) For F, G : X -+ R Isc, inf (F, G) is also Isc.
(iv) If (Fi)ii is a family of Isc functions, then s\xp
ieI
Fj is also Isc.
Examples.
(1) Any continuous function is lower semicontinuous.
(2) If X satisfies the first axiom of countability, then A C X is open
if and only if its characteristic function \A is Isc.
Definition 4.2.2.
(i) Let X be a normed space, with norm ||-||. F : X R is weakly
proper, if for every sequence (x
n
)
n
N C X with \\x
n
\\ oo we
have F(x
n
) oo for n oo.
186 Direct methods
(ii) Let X be a topological space. F : X E is coercive if every
sequence (x
n
) C X with F(x
n
) < constant (independent of n)
has an accumulation point
We now formulate the following general existence theorem for mini-
rnizers:
Theorem 4. 2. 1. Let X be a separable reflexive Banach space, F : X
R weakly proper and lower semicontinuous w.r.t. weak convergence. Then
there exists a minimizer XQ for F, i.e.
F(x
0
) = inf F(x) (> - oo) .
Proof. Let (x
n
)
n
^ be a minimizing sequence for F, i.e.
lim F(x
n
) = inf F(x).
Since F is weakly proper, | | x
n
| | is bounded. Since X is reflexive, after
selection of a subsequence, x
n
converges weakly to some xo G X by
Corollary 2.2.1. By lower semicontinuity of F,
F(x
0
) < lim F(x
n
) = inf F(x),
n>oo xX
and since XQ G X, we must have in fact equality. Also, since F assumes
only finite values by assumption, this implies that
inf F(x) > oo.
xex
q.e.d.
Remark 4-2.1. The argument of the preceding proof also shows that in a
separable reflexive Banach space, a weakly proper functional is coercive
w.r.t. the weak topology.
Lower semicontinuity w.r.t. weak convergence is a rather strong prop-
erty, in fact much stronger than lower semicontinuity w.r.t. to the Ba-
nach space topology of X. Fortunately, there exists a general class of
functionals, namely the convex ones for which the latter property im-
plies the former.
Definition 4.2.3. Let V be a convex subset of a vector space; F : V
R is called convex if for any x,y V, 0 <t < 1,
F(tx + (1 - t)y) < tF(x) + (1 - t)F(y)
(convexity of V means that tx + (1 i)y G V whenever x,y G V, 0 <
*< 1).
4-3 Existence of minimizers 187
Lemma 4.2.2. Let V be a convex subset of a separable reflexive Banach
space, F : V R convex and lower semicontinuous. Then F is also
lower semicontinuous w.r.t. weak convergence.
Proof Let (x
n
)nN C V converge weakly to x G V. We may assume that
F(x
n
) converges to some K G R. By Theorem 2.2.4, for every m G N and
every e > 0, we may find a convex combination
N N
Vm *= Yl ^
nXn
(^
n
> y ^ n = 1)
nm n=m
with
\\Vm -x\\ < e.
Since F is convex,
N
F(y
m
) < J2
X
nF(x
n
). ( 4.2.1)
n = r a
Given e > 0, we choose m = m(e) G N so large that for all n > m,
F(x
n
) <K + .
Letting e tend to 0, we get from (4.2.1)
limsup F(y
m
) < K.
m*oo
Since F is lower semicontinuous
F(x) < liminf F(y
m
) < limsupF(y
m
) < K = lim F( x
n
) .
m- >oo m- >oo
This shows weak lower semicontinuity of F.
q.e.d.
4.3 The existence of minimizers for convex variational
problems
We return to the concrete variational problem discussed in Section 4.1
and begin with:
Lemma 4. 3. 1. Let Q c R
d
be open, f : ft x R
d
-+ R, with / (. , v)
measurable for all v G R
d
, / ( #, ) continuous for all x G fi, and
/ ( a, v) > - a ( x ) +&M
P
188 Direct methods
for almost all x G ft, and all v G R
d
, with a G ^(ft), b G R, p > 1.
77ien
$(t;) := / f(x,v(x))dx
Jn
is a lower semicontinuous functional on L
p
(ft), $ : L
p
(ft) > EU {oo}.
Proof. Since / is continuous in v, f(x,v(x)) is a measurable function,
and so $ is well-defined on L
p
(ft), by Theorem 1.1.2. Suppose (v
n
)neN
converges to v in L
p
(ft). Then a subsequence converges pointwise almost
everywhere to v by Lemma 3.1.3. We shall denote this subsequence again
by (^n)> noting that the subsequent arguments may also be applied to
any remaining subsequence. Since / is continuous in v (actually, it would
suffice to have / lower semicontinuous in v), we have
/ ( x, v(x)) - b \v(x)\
p
< lim inf (/(x, v
n
(x)) - b \v
n
(x)\
p
).
n oo
Because of the lower bound
/ ( x, v
n
(x)) - b \v
n
(x)\
p
> -a(x)
with a G I/
1
(fi), we may apply the Theorem 1.2.2 of Fatou to conclude
/ (f(x,v(x)) - b\v(x)\
p
)dx < liminf / (f(x,v
n
(x))-b\v
n
(x)\
p
)dx.
Jn
n
^ Jn
Since v
n
converges to v in L
p
(fi),
/ b\v(x)\
p
dx = lim / 6| v
n
(x)|
p
dx,
and we conclude lower semicontinuity, namely
/ f(x,v(x))dx < liminf / f(x,v
n
(x))dx.
Jn
n
^ Jn
q.e.d.
Lemma 4.3.2. Under the assumptions of Lemma 4-3-1, assume that
/ (#, -) is a convex function on R
d
for every x G ft. Then $(v) :=
f
Q
f(x,v(x))dx defines a convex functional on L
p
(ft).
4.3 Existence of minimizers 189
Proof. Let v,w L
P
{Q), 0 < t < 1. Then
$(tt; + (1 - t)tu) = / f(x,tv{x) + (1 - t)u;(a:))da:
< / {tf{x,v{x)) + {l-t)f(x,w(x))}dx
JQ
by the convexity of /
= t$(v) + (1 - t)$(w).
q.e.d.
We may now obtain a general existence result for the rninimizer of a
convex variational problem.
Theorem 4. 3. 1. Let Q C R
d
be open, and suppose f : Q x R
d
R
(i) /(,!>) w measurable for all v G M
d
.
(ii) / ( #, ) 25 convex for all x Eft.
(iii) f(x,v) > a(x) + 6|t;|
p
/or almost all x E ft, all v R
d
, with
ae ^(ft), b>0,p> l.
Let g G H
1
*^), and let A := g + H^
P
(Q). Then
F(u) := / f(x,Du(x))dx
JQ
assumes its infimum on A, i.e. there exists uo G A with
F(u
0
) = inf F(u).
ueA
Proof. By Lemma 4.3.1, F is lower semicontinuous w.r.t. H
1,p
(ft) con-
vergencef, and by Lemma 4.2.2, F then is also lower semicontinuous
w.r.t. weak H
1,p
(ft) convergence, since H
1
>
P
(Q) is separable and reflex-
ive for p > 1 (see Theorems 3.3.1 and 3.3.3). Let (u
n
)neN be a minimizing
sequence in A, i.e.
lim F(u
n
) = inf F(u).
n>oo uA
Since
/ \Du
n
\
p
< \F(u
n
) + lf a(x)dx,
JQ o b J
Q
(Du
n
)
n
N is bounded in L
p
(ft), hence (w
n
)nN C g+H
0
,p
(ft) is bounded
in H
l
*(Sl) by the Poincare inequality (see Theorem 3.4.2). Since H^
p
(ft)
t Note that convex functions on R
d
are continuous.
190 Direct methods
is a separable reflexive Banach space, by Theorem 3.3.5, after selec-
tion of a subsequence, (t/
n
)nN converges weakly to some uo E A (A
is closed under weak convergence, Theorem 3.3.4). Since F is convex
by Lemma 4.3.2 and lower semicontinuous by Lemma 4.3.1, it is also
lower semicontinuous w.r.t. weak H
liP
(ft) convergence by Lemma 4.4.2.
Therefore
F(u
0
) < lim F(u
n
) = inf F(u),
and since uo A, we must have equality.
q.e.d.
Remark 4.3.1. The condition u 6 g+HQ
,P
(Q), i.e. u-g HQ
,P
(Q), is a
(generalized) Dirichlet boundary condition. It means that u = g on dft
in the sense of Sobolev spaces.
4.4 Convex functionate on Hilbert spaces and Moreau-Yosi da
approximation
In this section, we develop a more abstract method for showing the ex-
istence of minimizers of variational problems. It has the advantages that
it does not need the concept of weak convergence and that it provides a
constructive approach for finding the minimizer. In order to concentrate
on the essential aspects, we shall only treat a special situation.
Definition 4. 4. 1. Let X be a metric space with metric d(-,-), and let
F : X E U {oo} be a functional For X > 0, we define the Moreau-
Yosida approximation F
x
of F as
F\x) := mf(\F(y) + d
2
(x, y)) (4.4.1)
yX
for x X.
Remark 4-4-1' This is different from the definition in Section 5.1 where
we shall take d(x,y) instead of d
2
(x,y). Here, one might take d
a
(x,y)
for any exponent a > 1. For our present purposes, it is most convenient
to work with a = 2.
We now let if be a Hilbert space with scalar product (, ) and norm
||-|| and induced metric d(x,y) = ||ar y||. Let D(F) C if, and let
F : D(F) -> E be a functional. We say that F is densely defined if D(F)
4-4 Convex Junctionals 191
is dense in H. For x ^ D( F) , we put F(x) = oo. We say that F is convex
if whenever 7 : [0,1] > H is a straight line segment, then for 0 < t < 1
F(
7
( t ) ) < t F(
7
(0)) + (1 - t ) F(
7
( l ) ) . (4.4.2)
In particular, if 7(0), 7(1) G >(F), then also 7(f) G D(F) for 0 < t < 1.
Lemma 4. 4. 1. Let F : if E U {00} 6e convex, bounded from below,
and lower semicontinuous. Then for every x G H and X > 0, there exists
a unique
y
x
=: J\x)
with
F
x
{x) = XF{y
x
) + d
2
(x, y
A
) (4.4.3)
Proof. We have to show that the infimum in (4.4.1) is realized by a
unique y
x
.
Uniqueness: Let y^y^ be solutions of (4.4.3), and let
2/o = ^(2/i
A
+ 2/2)
be their mean value. By convexity of F
F(ti)<\{F{v?)+F{ti)), (4.4.4)
and by Euclidean geometry, if y
x
^ y$, we have
l k - %
X
H
2
<^( l k - ^i r + | | x- ^| |
2
) , (4.4.5)
hence
\F(y
x
) + \\x-y
x
\\
2
<\F(y
x
) + \\x-y
x
\\
2
= ^F(y
x
) + \\x-y
x
\\
2
,
contradicting the minimizing property of y
x
and y. Thus, we must have
Vi = 2/2 P
r o v m
S uniqueness.
Existence: (4.4.5) may be refined as follows: For 2/1,2/2 # and
2/o := 2(2/1+2/2)
we have for any x H
\\* - J/oll
2
= 5 (| | x - j/xll
2
+ ||x - y
2
| |
2
) - \ l b! - y
2
\\
2
(4.4.6)
192 Direct methods
We now let (y
n
)neN be a minimizing sequence, i.e.
XF(y
n
) + \\x - 2/||
2
- inf (xF(y) + ||x - y| |
2
) =:
A
. (4.4.7)
We claim that (y
n
) is a Cauchy sequence. For /, k G N, we put
Vi,k ' = 9 ^
f c +
^ ) '
Using the convexity of F as in (4.4.4) and (4.4.6), we obtain
\F(y
ktl
) + \\x-y
ktl
\\
2
l|2
< \ (\F(y
k
) + \\x - y
k
\\
2
) + \ (\F(
yi
) + \\x - | |
2
) - \ \\Vk
(4.4.8)
By definition of K\ (see (4.4.7)), the left hand side of (4.4.8) cannot be
smaller than A, and so we conclude that
l l i fc- wl l
2
- o
as fc, I > oo, establishing the Cauchy property. Since the norm is con-
tinuous and F is assumed to be lower semicontinuous, the limit y
x
of
(2/n)nN then solves (4.4.3). q.e.d.
Lemma 4.4.2. Let F and y
x
= J
x
(x) be as in Lemma 4.4.I. Let x be
in the closure of D(F). Then
x = lim J
x
(x). (4.4.9)
A0
Proof. Since x is in the closure of D(F), for every 6 > 0, we may find
x
6
B(x, 6):={yeH:\\x~ y\\ < 6}
with
Then
F(xs) < oc.
lim (\F(x
6
) 4- ||x - x^| |
2
) < &
2
and therefore
lim sup K\ < 0 (4.4.10)
A-+0
(see (4.4.7) for the definition of K,\).
Let us now assume that there exists a sequence A
n
> 0 for n 00 with
\\x-y
Xn
\\
2
>a>0 for all n. (4.4.11)
4-4 Convex junctionals 193
Then from (4.4.10)
l i msup( A
n
F( i /
An
) 4- | | x- i /
A n
| |
2
) < 0, (4.4.12)
hence
F (y
Xn
) -oo a s n ^ o o . (4.4.13)
(4.4.12) and (4.4.13) imply
F(y
x
) + \\x - y
1
^
2
<F(y
Xn
) + \\x-y
Xn
\\
2
-+-oo as n -+ oo
which is impossible. Thus, (4.4.11) cannot hold, and (4.4.9) follows.
q.e.d.
Theor em 4. 4. 1. Let F : H MU{oo} 6e convex, bounded from below,
and lower semicontinuous, and F ^ oo. For x M, we let y
x
= J
x
(x)
as in Lemma 4-4-1- If (y
Xn
)
n
en is bounded for some sequence X
n
> oo,
then (y
x
)\>o converges to a minimizer of F as X oo.
Proof Since (y
Xn
)
n
eN is bounded and since y
Xn
minimizes
F(y) + ~\\x-y\\
2
,
we obtain
F(y
x
")^ inf F(y)
yH
so that (y
Xn
)neN is a minimizing sequence for F. We now claim that
\\*-y
x
\\
2
is monotonically increasing in A. Indeed, let 0 < ii\ < fa- Then by
definition of t/
Ml
hence
W
2
) H ||ar y\\
2
> F(y^) + ||x - y^\\
2
,
Mi Mi
1 ,, - i . 2 ^ D / . u n , 1 ii . . u i i i 2
W ) H ||ar y"
a
| f > F ( ^ ) + ||x - i/"
1
M2 M2
+
( ^ - ^ ) ( ^ - ^
l
H
2
-
| | a :
- ^ l l
2
)
This is compatible with the minimizing property of y^
2
only if
\x-y^\\
2
>\\x-y^\\
2
194 Direct methods
and monotonicity follows. This monotonicity then implies that
II A l l
2
II ^ I I
is bounded independently of A since it is assumed to be bounded for the
sequence A
n
> oo. We next claim that
F(y
x
)
monotonically decreases towards
miF(y).
Indeed, from the definition of t/
A
,
F(y
x
) = inf F(y),
{y.\ \ x-y\ \ <\ \ x-y
x
\ \ }
and therefore y
x
has to decrease since | \x y
x
11 increases. The limit has
to be inf
y
# F(y) since this is so for the subsequence (y
An
)nN- We now
claim that (y
x
)\>o satisfies the Cauchy property, i.e. for every e > 0,
there exists Ao > 0 such that for all A,/u > A
0
\\y
x
-y\\
2
<e.
For that purpose, we choose Ao so large that for A, /x > Ao
\x - y
x
\\
2
- \\x - y < \ (4-4.14)
which is possible by the preceding monotonicity and boundedness re-
sults. We may also assume
F(y
x
) > F(y"). (4.4.15)
We let
y^:=l(y^+y").
Then from the convexity of F, (4.4.15), and (4.4.6),
* V' " ) + ^ | | * - 2/
A
' 1|
2
<F(
y
>)
+
1
-(\ \
X
-y>\ \
2
+
l-\ \ \
y
>-y\
by (4.4.14).
4-5 Euler-Lagrange equations 195
This, however, is compatible with the minimizing property of y
x
only if
\ \ y
x
-y\ \
2
< <
Thus {y
x
)\>o satisfies the Cauchy property for A oo, and it therefore
converges to some y H. y then minimizes F, because F(y
x
) decreases
towards inf
y
/f F(y) for A oo, and F is lower semicontinuous.
q.e.d.
The preceding reasoning is adapted from J. Jost, Convex functionals
and generalized harmonic maps between metric spaces. Comment. Math.
Helv. 70 (1995), 659-673.
For a more general construction, see J. Jost, Nonpositive Curvature:
Geometric and Analytic Aspects, Birkhauser, Basel, 1997, pp. 61-4. In
particular, the method also works in uniformly convex Banach spaces.
General references for Moreau-Yosida approximation are the books of
Attouch and dal Maso quoted in Chapter 6.
Theorem 4.4.1 yields an alternative proof of Theorem 4.3.1 in case
p = 2. Namely, Lemma 4.3.1 implies the lower semicontinuity, Lemma
4.3.2 the convexity of the functional, and the Poincare inequality the
boundedness of any minimizing sequence, as described in the proof of
Theorem 4.3.1. The present proof, however, does not need the concept
of weak convergence. As mentioned, the method extends to uniformly
convex Banach spaces, and thus can handle also arbitrary values of p > 1
(see Remark 3.1.2).
4.5 The Euler-Lagrange equations and regularity questions
In this section, we return to the variational problems considered in Sec-
tions 4.1 and 4.3; we consider variational integrals of the form
*(
M
)
:
= f / (
x
, u(x), Du(x))dx, for u H
x
'
p
(n)
Jn
on a bounded, open subset ft of R
d
, and we make the following assump-
tions o n / : f t x R x R
( i
- > R = RU {oo}:
(i) / (, u, v) is measurable for all u R, v R
d
.
(ii) / ( #, , ) is differentiable for almost all x ft.
(iii) \f(x,u,v)\ < Co -f ci \u\
p
-f c
2
\v\
p
, c
0
, ci , c
2
constants, for almost
all xGf i , and all u R, v R
d
.
Condition (iii) implies that $(u) is finite for u H
l,p
(fl), since O is
bounded. (If fi is unbounded, this still holds provided c
0
= 0.) In the
196 Direct methods
preceding section, we have obtained some results on the existence of a
minimizer for # in the class g + ifQ
,p
(fi), for given g E H
lyP
(fl). In the
present section, we wish to characterize such minimizers by necessary
conditions. These conditions will assume the form of differential equa-
tions. In fact, these differential equations will hold for arbitrary critical
points of # (as specified in the assumptions of our subsequent results),
and not only for minimizers.
Theorem 4. 5. 1. Let f satisfy (in addition to (i)-(iii))
(iv)
d
du
(x,u,v)
d
f( ^
< c
3
-f c
4
| u|
p
+ c
5
|t>|
p
,
C3, C4, C5 constants, for almost all x E fi, and all u GR, V E M.
d
.
Let u be a minimizer for # in the class g + H
0
,p
(Q) (g E H
1,p
(ft) given).
We then have for all (p E CQ(Q)
= 0. (4.5.1)
Proof. Since u is a minimizer for $ in g + if
0
'
p
(O),
*(u) < *(u 4- *</?) for t E R, <p E <7<J(
n
)- (4.5.2)
We have
$(w + t ^ ) = / f(x,u(x)+tip(x),Du(x)+tDip(x))dx.
Jn
By (ii), (iii), (iv), we may apply Corollary 1.2.2 to conclude that $(u+tip)
is differentiate w.r.t. </?, and
dt
$(u + tip)
\ l(x,u(x)+tip(x),Du(x)+tDip(x))ip(x)
dip(x)"
+ E dii & <
x
)+^ (
x
) ' ^ ^ ) + " W) -f-}
dx
- (
4
-
5
-
3
)
t = i
4.5 Euler-Lagrange equations 197
Furthermore, (4.5.2) implies
Equations (4.5.3) and (4.5.4) imply (4.5.1).
q.e.d.
Remark 4-5.1. From the preceding proof, it is clear that we do not need
to assume that u is a minimizer for 3>. If suffices that u is a critical point
for # in the sense that
~ $ ( u + tip)\
t=0
for all if C5(fi). (4.5.5)
at
Corollary 4. 5. 1. Suppose that f satisfies (i)-(iv), and in addition, f
C
2
. If u E C
2
(Q) minimizes $ in the class g + HQ
}P
(SI) (or, more gen-
erally, satisfies (4-5.5)), then
\ r*
d2
f t < x r. . xx d
2
u Js d
2
f , ,
x
^ ,
XN
3u
S 5^
(
* ' *
( x)
' ^
( x) )
&^
+E ^ ( ^ " W. ^ ) ) - |( ^ , w,^ ( x) ) =o.
t =i
(4.5.6)
Definition 4. 5. 1. Equation (4-5.6) is called the Euler-Lagrange equa-
tion for $.
Proof (Corollary 4-5.1). By the differentiability assumptions made, we
may integrate (4.5.1) by parts to obtain
t = l
t = l
From Lemma 3.2.3 (applied to supp</? CC Q, so that the term in { }
is in L
2
), we then obtain (4.5.6).
q.e.d.
198 Direct methods
Equations (4.5.6) constitutes a quasilinear partial differential equa-
tion of second order for u. Many such partial differential equations arise
as Euler-Lagrange equations of variational problems. Therefore, if one
wants to solve such an equation, one might try to find a minimizer of
the associated variational problem. However, the existence theory for
minimizers as described in Section 4.3 naturally yields an element u
of the Sobolev space H
Q
'
P
(Q), whereas in Corollary 4.5.1 it is required
that u be of class C
2
(Q). Thus, there exists a gap, since in general
elements of H
0
,P
(Q) are not of class C
2
. It is the task of regularity
theory to bridge this gap, i.e. to show that under suitable assumptions
on / , any minimizer of # is smooth, and specifically here of class C
2
.
The theory of partial differential equations indicates that such a result
does not hold without additional assumptions on / , like an ellipticity
assumption, meaning that the matrix (a
t,7
'())i,j=i,...,d with coefficients
a
tJ
(x) =
dv
i$
v
j (x, u(x), Du(x)) is positive definite. Indeed, examples
show that without such an assumption, in general one does not get
smoothness of minimizers. On the positive side, however, we do have de
Giorgi's and Nash's:
Theorem 4.5.2. Let Q, be open and bounded inR
d
, f : Q, x R
d
R be
of class C, with
0)
AH
2
</ ( x, t > ) <A( l + H
2
)
and
for all x G fi, u E l , t>, G M
rf
, with constants A > 0, A < oc,
oil
&<* >
< M( l + \v\) for a constant M < oo.
Let u G g + HQ' (ft) be a bounded minimizer of F(u) :=
/
f(x,Du(x))dx (g G H
1,P
(Q) given). Then u is smooth in fi (u G
n
C( ft) ) .
The proof of the theorem of de Giorgi and Nash is too long to be pre-
sented here. We refer to M. Giaquinta, Introduction to Regularity Theory
for Nonlinear Elliptic Systems, Birkhauser, Basel, 1993, pp. 76-99 and
4-5 Euler-Lagrange equations 199
J. Jost, Partielle Differentialgleichungen, Springer, Berlin, 1998 where a
detailed proof is given. Of course, there also exist extensions of this result
to more general integrands of the form / ( #, u, v). We refer the interested
reader to O. Ladyzhenskaya, N. Ural'tseva, Linear and Quasilinear El-
liptic Equations, Academic Press, New York, 1968 (translated from the
Russian), Chapters IV-VI.
One remark is in order here: Since Sobolev functions are only equiva-
lence classes of functions (in the sense specified at the beginning of Sec-
tion 3.1), a more precise version of Theorem 4.5.2 is: Under the stated
assumptions, the equivalence class of u contains a function of class C.
This point, however, usually is assumed to be implicitly understood in
statements of regularity theorems.
In order to display at least one regularity result, however, we consider
a particular example:
For a bounded, open fi C M
d
, g H
1,2
(Q), we wish to minimize
Dirichlet's integral
D(u) := f \Du(x)\
2
dx (4.5.8)
Jn
in the class g + H
0
,P
(Q). By Theorem 4.3.1, a minimizer u exists, and
by Theorem 4.5.1, it satisfies
/ Du(x) Dip(x)dx = 0 f o r a n >e C ( n ) (4.5.9)
JQ
(here Du(x) Dip(x) := Yli=i Diu(x)Di<p(x)). If u can be shown to be
of class C
2
, it would satisfy
Au(
X
) = 0 i nO ( A : = | : ^
(A is called Laplace operator.) by Corollary 4.5.1, i.e. it is harmonic.
This is the famous Di ri chl et pri nci pl e: obtain a harmonic function u
in fi with boundary values g by minimizing the Dirichlet integral among
all functions with those boundary values.
In order to justify Dirichlet's principle it thus remains to show that any
solution of (4.5.9) is of class C
2
. Actually, one can show more, namely,
u e C (in fact, u is even real analytic in Q but this will not be demon-
strated here), and at the same time weaken the assumption. Namely, we
have:
200 Direct methods
Theorem 4.5.3 ( Weyl' s l emma) . Let u L
1
(fi) satisfy
u(x)A<p{x) = 0 for all v C(fi). (4.5.10)
/
J Q /n
ThenueC{n).
Remark 4-5.2.
(1) Clearly, (4.5.9) implies (4.5.10) by definition of Du.
(2) The remark made after Theorem 4.5.2 again applies.
Proof {Theorem 4-5.3). We consider the mollifications with a rotation-
ally symmetric p (and we express this by writing p as a function of |x|)
Mx)
"h J
a
Q
{^ir)
u{v)dv
as in Section 3.2. Given tp Co(f2), we restrict h to be smaller than
dist(supp</?, dft). We obtain
/ u
h
(x)A(p(x)dx = J2 e[rJ u(y)dyA(p(x)dx
= / u(x)A<ph{x)dx, (4.5.11)
JQ
using Fubini's theorem.
q.e.d.
Remark 4-5.3. We have also used the fact that A commutes with mol-
lification, i.e.
(A<p)
h
= A(<p
h
). (4.5.12)
For this, one needs that Q is a function of |x| only, i.e. rotationally
symmetric. Also, this point needs the rotational invariance of the Laplace
operator A. Therefore, the present proof does not generalize to other
variational problems.
After this interruption, we return to (4.5.11) and conclude that
u
h
(x)A(p(x)dx = 0 (4.5.13)
/
J Q
by applying (4.5.10) to <fh Co(^) (by our choice of h). Since UH is
smooth, we obtain e.g. from Corollary 4.5.1
Au
h
= 0
4-5 Euler-Lagrange equations 201
in Qh := {% fi | dist(x,dft) > /i}. Also
/ \uh{y)\dy
JQ
h
< J \u(x)\dx (4.5.14)
JQ
I f f\x y\\
by Fubini's theorem, using - j / f --- 1 dy = 1 by (3.2.3)
< oc since u L
1
^ ) .
Therefore, the functions it^ are uniformly bounded in L
1
. We now need
Lemma 4. 5. 1. Let f C
2
(ft) 6e harmonic, i.e.
Af(x) = 0 m n.
TTien / satisfies the mean value property, i.e. for every ball B(xo, r) C 0,,
/ (so) = ~^1 I f(*)
dx
= TTT^T I f{x)da{x) (4.5.15)
^ d
r
J
B
(x
0
,r) dw
d
r
d
JdB(x
0
,r)
where Wd is the volume of the unit ball in E
d
.
Proof. For 0 < g < r
0 = / Af{x)dx
JB(xn,p)
-(x)da(x),
I
JdB(x
0
,Q)
'B(xo,Q)
dv
where v denotes the exterior normal of B(xo
7
Q)
-J-(y +
Q
u)
Q
d
-
l
dw
i) OQ
in polar coordinates UJ = ^
z
^
^ Q
Q
d
~
l
%- / f(v+ &)*
J
^Q JdB(0,l)
4-1
d
dg
d-i
d
j
JdB(0,l)
g
l
-
d
f f(x)d<r(x))
JdB(0,l) J
\cLj
d
e
d
-
1
J
d
i
dg \ (kj
d
Q
d
1 J
d
B(xo,Q) I
202 Direct methods
Thus,
- TTY / f(x)da(x)
is constant in Q, and since its limit for g 0 is f{xo) as / is continuous,
it has to coincide with f(xo) for all 0 < g < r. Since
-i-r / f(x)dx = ~ r ( - T - V T / f{x)da(x) I g
d
~
l
dg,
W
d
J
B
(
X0
,
e
)
rd
h \^
d
Q
d
-
1
J
d
B(
X0
,
6
) J
UdT
a
the first inequality in (4.5.15) also follows.
q.e.d.
We return to the proof of Theorem 4.5.3: Since Uh is harmonic, it
satisfies the mean value properties of Lemma 4.5.1. Since the family Uh
is bounded in L
1
,
Uh{xo) = J / u
h
(x)dx
UdT
a
JB{xv,r)
is bounded for fixed r with B(xo,r) C fi^. Therefore, the Uh are uni-
formly bounded in flh
Q
for 0 < /i < 4&. Furthermore, from (4.5.15)
\u
h
(xi) - u
h
(x
2
)\ < j ( - ) / \u
h
{x)\dx
dWd XT I J B( x
1
, r ) \ B( x
2
, r )
X
' UB( x
2
, r ) \ B( x
1
, r )
< c(r) |xi - X2I (4.5.16)
for some constant depending on r, if B(a;i, r), B(^2, r) C O^
0
. Therefore,
the gradient of Uh is also uniformly bounded on Qh
0
Likewise, deriva-
tives of Uh of all orders can be uniformly bounded on H^
0
(0 < h < ^- ) ,
either by repeating the same procedure, or by observing that together
with Uh, also all derivatives of Uh are harmonic so that (4.5.16) can
be iteratively applied to all derivatives in order to convert a bound on
some derivative into a bound for a higher one. Therefore, a subsequence
of Uh converges towards some smooth function 1;, together with all its
derivatives, as h 0. Since all the Uh satisfy Auh = 0 so then does v:
Av = 0 in ft.
Since on the other hand Uh converges to u in L
l
(ft) by Theorem 3.2.1,
the two limits have to coincide (e.g. by Lemma 3.1.3). Therefore u = v,
and consequently u is smooth and harmonic.
q.e.d.
As an application, we consider the following
Exercises 203
Example 4-5.1. Let a : R R be Lipschitz continuous with
0 < A < a(y) < A < oo for all y e R.
Let ft C R
d
be open. We want to minimize
r
d
F(u):= / y2a(u(x))Diu(x)Diu(x)dx (4.5.17)
in the class A := # + #d'
P
( n) ,
w i t n
S
i v e n
9
e
H
1
*^). By the Picard-
Lindelof theorem, the ordinary differential equation
dU X
(4.5.18)
dv y/a(u)
admits a solution u(v) of class C
1
'
1
. We then have
,
s
du du ,
Since ^ > A" 2 > 0, the inverse function v(u) exists and is of class C '
as well, and we have by (4.5.19) and a chain rule for Sobolev functions
that easily follows from the chain rule for differentiable functions by an
approximation argument that
d d
^ a(u)DiuDiU = ] P DivDiV.
Therefore, (4.5.17) is transformed into Dirichlet's integral
F{u) = D{v).
Since the latter admits a smooth minimizer, the original problem (4.5.17)
then admits a minimizer that is of class C
1
'
1
in ft.
Exerci ses
in (iv) of The- 4.1 Weaken the growth assumption required for
orem 4.5.1. Hint: Use the Sobolev Embedding Theorem.
4.2 Compute the Euler-Lagrange equations for the variational in-
tegral
A(u) := / y/l + \Du{x)
2
\dx.
Jn
{A(u) represents the volume of the graph of u over ft. Critical
points are minimal hypersurfaces that can be represented as
graphs over ft.)
204 Direct methods
4.3 Compute the Euler-Lagrange equations for
E(u) := / g
ij
(x)Diu(x)Dju(x) (detgij(x))^ dx,
Jn
where (g
l
-i(x))i,j=i
i
...
i
d is the inverse matrix of ((fo())t,j=i,...,d.
Assume that (<7ij(x))i,j=i,...,d is positive definite for all x G ft.
Show that for given g G iif
1, 2
(0), there exists a unique minimizer
of E among all u G H
la
(9) with u- g G if
1
'
2
(f2). (Minimizers
for E are harmonic functions w.r.t. the metric gij(x).)
5
Nonconvex functionals. Relaxation
5.1 Nonlower semicontinuous functionals and relaxation
From Section 4.3, we recall the following
Theorem 5. 1. 1. Let Q C R
d
be open, I < p < oo, / : fi x i
d
^ 1
measurable and suppose:
(i) For almost all x Q, f(x, ) is convex on R
d
(ii) There exist a G L
1
^), b G R with
f(x,v) > -a{x) + b\v\
p
for almost all x Q and all v G M
d
.
Then
F(u):= [ f{x,Du(x))dx
Jn
is Isc and convex on H
1,p
(i}) equipped with its weak topology and assumes
its infimum in the class of all f G H
ltP
(l) with f g G H
0
,P
(Q) for
some given g G H
1,P
(Q).
Here, (ii) is just a coercivity condition ensuring that a minimizing
sequence stays bounded w.r.t. the H
liP
-norm (w.l.o.g. F ^ oo) (i)
implies that F is lsc, w.r.t. the norm topology of H
x
'
p
, and the convexity
then implies that F is also lsc w.r.t. the weak H
1,p
topology. Since
bounded sequences in H
1,p
have weakly convergent subsequences, any
minimizing sequence has a convergent subsequence, and a limit of such
a subsequence then minimizes F by lower semicontinuity.
Not all functionals that one wishes to consider in the calculus of varia-
tions are convex, however. As a motivation for what follows, we consider
205
206 Nonconvex functionals. Relaxation
the following example of Bolza:
n = ( 0 , l ) c R, u : ( 0 , l ) - > R, u(0)=0 = u(l)
F(u) = / (u
2
(x) + (u'(x)
2
- l )
2
) dx.
We claim that
M{F(u) : u G H*
A
((0,1))} = 0. (5.1.1)
For the proof, we consider 'sawtooth'-functions: Let n G N,
u
n
{x) := <
i
r
2* 2z + 1
f
0r
< X <
n 2n 2n
, t + 1
f
2z 4-1 ^ 2i 4- 2
- x H for < x < -
n 2n 2n
(i = 0 , l , . . . , n - l ) .
u
n
is contained in H
hoo
((0,1)) C #
M
( ( 0, 1) ) and satisfies:
For all x e (0,1) 0 < u
n
{x) < , (5.1.2)
zn
t i
n
( 0) =0 = ti(l), (5.1.3)
for almost all x G (0,1) \u'
n
(x)\ = 1. (5.1.4)
Consequently
lim F(u
n
) = 0.
n+CXD
Since F(u) is nonnegative for every w, (5.1.1) follows. The inflmum of F
therefore cannot be realized by any i ^ ' function, because if we had
F(u) = 0,
then u(x) = 0 for almost all x G (0,1) and |w
;
(x)| = 1 for almost all
x G (0,1), and these two conditions are not compatible. (In fact, since
d = 1 here, any u G # Q'
4
( ( 0 , 1)) is absolutely continuous, and so u = 0 if
u{x) = 0 a.e., hence u is differentiable and u' = 0. (More generally, any
Sobolev function that is constant on some set A has a representative u
whose derivative Du vanishes on A.)
We have thus shown that the problem
F(u) -> min in #
0
M
(fi )
does not have a solution.
5.1 Nonlower semicontinuous Junctionals and relaxation 207
We observe that our minimizing sequence (u
n
) converges to zero
r l , 4
weakly in H
0
y
, by (5.1.2) and
/ u'
n
(x)<p(x)dx = - f u
n
(x)<p'(x)dx -> 0 for all <p e C
0
((0,1)).
Jo Jo
However,
F(0) = 1 > 0 = lim F(u
n
).
noo
Therefore, F is not lsc w.r.t. weak H
1
^-convergence although the inte-
grand is continuous in u'. As we shall see this results from the lack of
convexity of the integrand. We also observe that any sequence of saw-
tooth functions u
n
, i.e. satisfying
1^ 1 = 1 a.e.
that converges to 0 in L
2
is a minimizing sequence for F.
Remark 5.1.1. Functional of the type of our example often arise in op-
timal control theory as described in Section 5.2 of Part I. For example,
one considers problems of the following type
rT
f(t, u(t),a(t))dt -+ min (5.1.5)
/
J o
/o
under the side conditions
u(0) = u
0
, u(T) = u
T
(5.1.6)
u'(t)=g(tMt),(t)) (5.L7)
with given functions / and g. u is called a state variable, a a control
variable. This means that one assumes that u describes the state of
some system evolving in time t whose derivative or rate of change can
be controlled through a parameter a. The aim then is to choose a in
such a manner that the functional, often considered as 'cost function',
is minimized.
Thus, one needs to find some equation
a(t) = <p(tM*))
for an optimal control a at time t assuming a given state u(t) of the sys-
tem. If one knows the optimal control, one can reconstruct the evolution
u(t) of the state of the system from (5.1.6) and (5.1.7) under appropriate
assumptions. The simplest control equation (5.1.7) is
u
;
(t )=<r(t ),
208 Nonconvex functionals. Relaxation
and this leads to minimizing functionals of the type
/ f{t,u{t),u\t))dt.
Jo
Expressions of the type (u'(i)
2
l )
2
can occur in many technical exam-
ples, like boats sailing against the wind.
Faced with a problem that one cannot solve, one may contemplate
several options:
One could try to modify the problem, or one might generalize the
concept of a solution, or both.
We shall discuss several such strategies. We first modify the problem
via relaxation. This is an important method in the calculus of variations,
and we therefore discuss it in some generality.
Definition 5.1.1. Let X be a topological space, F : X E. We define
the lower semicontinuous envelope or relaxed function sc~F of F as
follows:
(sc~F)(x) := sup {$(#) : $ : X R is lower semicontinuous
with $(y) < F(y) for allyeX}
Lemma 5.1.1. sc~F is the largest Isc function on X that is < F
everywhere.
In particular, F is lower semicontinuous if and only if F = sc~F.
Proof. sc~F is lsc as a supremum of lsc functions, see Lemma 4.2.1 (iv).
Obviously, sc~F < F, and for all lsc $ with # < F, we have
$ < sc~F
by definition of sc~ F.
q.e.d.
Theorem 5.1.2. Let X be a topological space, F : X > R a function.
Then every accumulation point of a minimizing sequence for F is a
minimum point for sc~F. Consequently, if F is coercive, then sc~F
assumes its minimum, and
min sc~ F = inf F.
x x
5.1 Nonlower semicontinuous junctionals and relaxation 209
Proof. Let (#
n
)neN C X be a minimizing sequence for F with accumu-
lation point xo- Then
(sc~F)(x
0
) < liminf(sc~F)(x
n
) by lower semicon-
tinuity of sc F (see
Lemma 5.1.1)
< liminf F(x
n
) since sc~F < F
n> oo
= inf F(y) since (x
n
) i sami n- (5.1.8)
lmizing sequence tor F.
On the other hand, the constant function
$ ( x ) = inf F(y)
is lsc and < F, hence by Lemma 5.1.1 for every x X
inf F(t/) < (sc-F)(x). (5.1.9)
From (5.1.8) and (5.1.9) we conclude
(sc~F)(x
Q
) = inf F(y) = mm(sc~F)(x). (5.1.10)
yX xX
This implies the first claim. If F is coercive, then every minimizing
sequence has an accumulation point, and the second claim also follows.
q.e.d.
What does Theorem 5.1.2 tell us for our example?
It simply says that if we cannot minimize our original functional F due
to its lack of lower semicontinuity, we then minimize another functional
instead, one that is lower semicontinuous and as close as possible to F.
Theorem 5.1.2 then says that limits (or more generally, accumulation
points) of minimizing sequences for F do not minimize F, but the re-
laxed functional sc~F. Since sc~F is the largest lsc functional < F by
Lemma 5.1.1 that is the best one can hope for.
It then remains the task to determine the relaxed functional of some
given F. Before proceeding to do so for our example, let us relax ourselves
a little and derive some easy consequences of the definition of the relaxed
functional and consider some easier examples first.
Lemma 5.1.2. Let X satisfy the first axiom of countability. Then sc~F
is the relaxed function for F : X > ft iff the following two conditions
are satisfied:
210 Nonconvex junctionals. Relaxation
(i) whenever XJI X
(sc"F)(x) < liminf F( x
n
)
n oo
(ii) for every x X, there exists a sequence x
n
x with
(sc"F)(x) > lim F(x
n
)
n> oo
Proof. We claim that, since X satisfies the first axiom of countability,
(sc~F)(x) = inf{liminf F(x
n
) : x
n
- x in X} . (5.1.11)
We denote the right hand side of (5.1.11) by F~(x). Then F~~ is lsc. In
order to verify this, we have to check
liminf (inf {liminf F (y^
n
) : y
vn
y
v
}) > inf {liminf F(x
n
) : x
n
-> x}
(5.1.12)
whenever y
v
> x. Indeed, otherwise, for some <5 > 0, we would find some
diagonal sequence y
v
,n
v
~

% as v oo with
^ (jfi/.u,,) < inf {lim inf F(
n
) : x
n
- a:} - 5
which is impossible. Thus, F~~ is sequentially lsc, hence lsc, because X
is assumed to satisfy the first axiom of countability. Also, F~ < F, and
for every lsc $ < F, we have for x
n
>
$(#) < liminf #( x
n
) < liminf F( x
n
) ,
n oo n oo
and hence
*(z) < F~{x).
Thus, F~ is the largest lsc functional < F, and (5.1.11) follows from
Lemma 5.1.1. It is then easy to see (and left as an exercise) that F~(x)
satisfies and is characterized by the properties (i) and (ii).
q.e.d.
Example 5.1.1. Let X be a topological space, A c l a subset. The
indicator function %A is defined by
t \ J
if:
IA{X) := <
{ oo if:
' x e A
' x A.
We then have
sc i
A
- i
A
,
where A is the closure of A in X.
5.1 Nonlower semicontinuous Junctionals and relaxation 211
The characteristic function \A is defined by
, , f 1 if x
^
( x ) : =
i o if* .
A
iA.
Then
sc XA = XA
where A is the complement of X \ A.
Example 5.1,2. Let fl C E
d
be open, 1 < p < oo,
J : L
p
(fi) - R
denned by
/( )
:
= I /n 1^ 1"
d
* + In M" <** * C
1
^ )
1 oo otherwise.
(Note that J(it) may also be infinite for some u C
l
(ti).) We claim
(sc-I)(u) = I In \
Du
\
P dx
+ In ll
P d
*
i f u
if ^( f l )
I oo otherwise.
In order to show this, we shall verify the conditions of Lemma 5.1.2:
(i) (sc~I) is lower semicontinuous on L
p
which yields condition (i).
The lower semicontinuity is seen as follows:
Suppose u
n
u in L
p
(fi). For the purpose of lower semicontinu-
ity, we may select a subsequence (w
u
)
v
^n C (u
n
)
ne
^ with
lim (sc~"J)(w) = liminf(sc~J)(ii
n
),
v+00 n KX)
and we may also assume that this limit is finite. (yjv)
v
^ then
is bounded in H
1,p
(fl). A subsequence of (w
u
) then converges
weakly in H
1
'
p
(f2) (Theorem 3.3.5), and by the Rellich-Kondra-
chev compactness Theorem 3.4.1, it also converges strongly in
L
p
(fl). The limit has to be u, because the original sequence (u
n
)
was assumed to converge to this limit. Since the J
c
f
1
'
p
-norm is lsc
w.r.t. weak H
l,p
convergence (Lemma 2.2.7), we have
(sc~I)(u) < lim (sc~I)(w
l/
)
= liminf(5C~/)(ii
n
).
212 Nonconvex junctionals. Relaxation
(ii) Let u e H^
p
(ft). Since C^fl) 0 H
l
*(tt) is dense in H
hp
(fl), we
may find a sequence (u
n
)
ne
^ C C
1
(f2) D H
llP
(fl) with
lim ( f \Du
n
\
p
+ / \u
n
\
p
) = f \Du\
p
+ \u\
p
,
i.e.
l i m/ (i i
n
) = (sc~I)(u).
If ugH
1
*^), then
I(u) = (sc~I)(u) = oo.
This verifies condition (ii).
Example 5.1.3. Similarly, for

W
' \ o o if u G L
p
(fi) \ CQ
1
(fi),
the relaxed functional is
I cx) otherwise.
Remark 5.1.2. We may also define the above functionals / , I
0
on Lf
oc
(f2)
instead of L
p
(fl). The relaxed functionals will be given by the same
formulae.
Remark 5.1.3. For p = 1, the relaxations of / and 7
0
a r
e not given
anymore by the H
1
'
1
-norm, but by the BV-norm which is defined in
Chapter 7.
In metric spaces, there is an alternative useful characterization of the
relaxation of a given functional which we now want to describe.
Definition 5.1.2. Let X be a metric space with distance function d(-, -),
F : X R U {oo} be bounded from below, F ^ oo. For \ > 0, we define
the Moreau-Yosida transform of F as
F
x
(x) := inf (F(y) + \d(x,y)). (5.1.13)
yex
Theorem 5.1.3. The functionals F\ satisfy
| F
A
(*i) - F
x
(x
2
)\ < Ad(xi,a?
2
) (5.1.14)
for every X > 0, xi , x
2
X. In particular, they are Lipschitz continuous.
For any x X
(sc~F)(x) = lim F
x
(x). (5.1.15)
A+oo
5.2 Representation of relaxed Junctionals via convex envelopes 213
Proof. For xi,X2,t/ G X, A > 0, we obtain from the triangle inequality
F(y) + Xd(x
u
y) < F(y) + \d(x
2
,y) + Xd(x
u
x
2
).
The definition of F\{x
2
) implies then
inf (F(y) + Xd(x
u
y)) < F
x
(x
2
) + \ d(x
u
x
2
),
yex
hence
F
x
{xi) < F
x
(x
2
) + Ad( xi,x
2
) .
Interchanging the roles of X\ and x
2
, we conclude
\ F
x
(xi) - F
x
(x
2
)\ < Xd(x
u
x
2
).
Since we have now shown that F\ is Lipschitz continuous, and since
^A < F, we obtain
F
x
< sc-F,
hence for all x X
supF
A
(x) < (sc'F)(x). (5.1.16)
A>0
For any A > 0, we find x\ X with
F(x
x
) + \d(x,x
x
)<F
x
(x) + j .
Therefore
lim x\ = x
A+oo
and
(s<TF)(x) < lim inf F(x
x
) < liminf F
A
(x). (5.1.17)
A+oo A*oo
Equations (5.1.16) and (5.1.17) imply (5.1.15).
q.e.d.
5.2 Representation of relaxed functionals via convex
envelopes
Theorem 5.2.1. Let Q, C R
d
be open, I <p < oo, / : R
d
- > R contin-
uous with
Co M
P
< f{v) < C\ |t>|
p
+ C2 for some constants Co, CI, c
2
.
214 Nonconvex functionate. Relaxation
Let F:{u H ^ f i ) : u - u
0
H^
P
{Q)} -+ R be given by
F(u) := / f{Du(x))dx , (u
0
H^
P
(Q) given).
Then the relaxed function of F w.r.t. the weak H
1
^ topology is given by
(sc~F)(u) = / (cvx~ f) (Du(x)) dx
where
(cvx~ f)(v) := s\ip{g(v) : g < f,g convex}
is the largest convex function < f.
For the proof, we shall need the following:
Lemma 5.2.1. Let W = Y\i=i(
a
iiPi) C ^
d
be an open rectangle,
1 < p < oo. We let f L
P
(W) and extend f periodically to R
d
, i.e.
f (x
1
+ mi (ft - a i ) , . . . , x
d
+ m
d
(/?
d
- a
d
)) = / ( x \ ...,x
d
)
for mi,. ^md Z, x = (x
1
, . - . , x
d
) W, and put
fn(x) := / ( nx) /or n G N.
T/ien we #e the weak convergence
fn-f = ~z [ f(x)dx in L
p
(W) for n -* oo. (5.2.1)
meas W J
w
Proof. First
/ \f
n
(x)fdx= f \f{nx)\*dx=\f \f(y)\
p
dy= f \f(x)fdx
JW JW
n
JnW JW
by the periodicity of / . Thus
ll/n|l
L
p(w) = ll/llL,(W)- (5-2-2)
In the same manner,
/ f
n
{x)dx= [ f(x)dx= [ fdx. (5.2.3)
Jw Jw Jw
Let now Wo be a subrectangle of W, written in the form
d
W

=
I I (
a
*
+ bi0ih ai + bi
^ '
or more compactly
Wo = a + bW (a = ( a i , . . . , a
d
) ,6 = (&i,... ,&<*)).
5.2 Representation of relaxed junctionals via convex envelopes 215
Then
/ (/
n
(x) -f)dx= [ (f(nx) - f) dx
JWQ Ja+bW
= hl (f(v)-f)dy
71
Jna+nbW
h f (f(v) ~ f) dy
1
Jna+\nb]W
+
n
Jna+[nb]W
3 / (f(y) - f) dy
n
Jna+(nb-[nb])W
r
n
d
/
11
Jna
+ I {f(y)-f)dy
Jna+(nb-[nb])W
by periodicity of / .
The first term in the right-hand side vanishes by (5.2.3), and thus, again
using the periodicity of / ,
1 / {fn(x)-f)dx\<^ [ \f(y)-f\dy.
\Jw
0
I n
a
Jw
Letting n oo, we obtain for every subrectangle WQ of W
lim / (f
n
(x)-f)dx = 0. (5.2.4)
n
^ J Wo
Let now g L
q
(W), with J + - = l. We have to show
lim / f
n
(x)g(x)dx = / fg(x)dx. (5.2.5)
Given e > 0, we then find subrectangles W\,... ,Wk (k = k(e)) and
Xi eR (i = l , . . . , fc) with
Yl
X
iXw
%
< e
(5.2.6)
Li(W)
(The possibility of approximating L
q
(fl) functions g (Q open in R
d
) in
such a manner by step functions can easily be seen as follows:
Since C(fi) is dense in L
9
(ft), there exist y>

E C(n) with
ll# ~" ^elli^m) < f ^
ls
then
eas
Y to construct a step function Y2 XiXw
{
(Xi E, Wi disjoint rectangles contained in supp</?

) with
sup
SUpp V?
c
<
2 meas supp </?

216
Then indeed
Then
Nonconvex functionals. Relaxation
g-Yl
Xi
*
w
i
LP(Q)
< .
+
k
\ (fn(x) ~ f) 9{x)dx
I {fn{x) ~ f)Y]\iXwM)
J
w
Y
/ (fn(x)-f){g(x)-Y^\iXW
t
(
x
)
< I > l | / (fn(*)-r)dx\+e\\f
n
-f\\
}
i=1
\ JWi I
by (5.2.6) and Holder's inequality (Lemma 3.1.1).
The first term tends to zero as n oo by (5.2.4), whereas the second one
is bounded by 2e | | / | | , P(W) ^ (5.2.2) and can hence be made arbitrarily
small. Therefore, (5.2.5) holds.
q.e.d.
The proof of Theorem 5.2.1 will be broken up into several steps:
(1) We put
( g - ^ ^ ^ i n f j ^ ^ ^ / ^ + D ^ x ) ) ^ :
v
G H**{U),
U bounded domain in M.
d
>,
(5.2.7)
and we claim:
Lemma 5.2.2.
(sc-F)(u)< I\q~f){Du{x))dx.
JQ
Proof Replacing F(u) by G(v) := F(v -f ^o) for v = u - u
0
,
we may assume u
0
= 0, i.e. u G HQ
,P
(FI). Since the piecewise
affine functions, i.e. those u for which Du is constant on disjoint
rectangles Wi c f2, with Q \ (J Wi arbitrarily small, are dense in
H
1,p
(for the same reason that the functions that are piecewise
constant on disjoint rectangles W{ are dense in L
p
), and since F
5.2 Representation of relaxed Junctionals via convex envelopes 217
is continuous under strong H^-convergence, it suffices to treat
the case where
Du = t>o = constant
on some rectangle W. We next observe that for a given constant
vector v, (q~f)(v) is independent of the choice of U in (5.2.7).
First, the value of the inf on the right hand side of (5.2.7) does not
change under translations or hornotheties of U. The general case
of U\ and U2 then is handled by approximating U\ by disjoint
homothetical translations of U2 and vice versa. We may therefore
take U = W in (5.2.7). We now choose a sequence ((p
n
)
n
en C
H^
P
(W) with
( <T/ ) K) + - > ^ T / f(v
0
+ Dtp
n
(x))dx > (q-f)(v
0
).
n meas W J
w
(5.2.8)
We extend (p
n
periodically from W to R
d
and put
u(x) := vox (then Du = vo)
and
u
n
(x) := u(x) + -<p
n
(nx).
n
By Lemma 5.2.1, u
n
converges to u weakly in H
1,p
. Then u
n
= u
on dW by periodicity of (p
n
and y?
nj
= 0 . We have
/ f(Du
n
(x))dx = / f(y
0
+ -D(p
n
(nx))dx
Jw Jw
n
=
~d f(vo + D<Pn(y))dy
nd
JnW
f(v
0
+ D<p
n
(y))dy (5.2.9)
/
Jw
W
since (p
n
is periodic.
Equations (5.2.8) and (5.2.9) imply
lim F(u
n
) = lim / f(Du
n
(x))dx
n+00 n+00 y t y
= / (<l~f)(vo) = (q~f){v
0
)measW.
Jw
The claim then follows from the characterization of (sc~F), see
e.g. Lemma 5.1.2(i).
q.e.d.
218 Nonconvex junctionals. Relaxation
(2) We observe
(q~f)(v)<f(v) (put ^ = 0 in (5.2.7)). (5.2.10)
With
(q~F)(u):= [(q-f)(Du(x))dx,
Jn
we obtain from Lernrna 5.2.2 and (5.2.10)
sc"F = sc-(q-F)
1
( 5.2.11)
and upon iteration
sc-F = sc~((q~)
n
F), ( 5.2.12)
where (q~)
n
means performing the construction q~~ iteratively
n times. Prom the growth conditions on / assumed in The-
orem 5.2.1, we conclude that
is monotonically decreasing and bounded from below in n, hence
converges to some limit
(Qf)(v).
From B. Levi's Theorem 1.2.1, we conclude
lim (q-
n
F)(u) = lira f (q~
n
f)(Du(x))dx
n*oo nKX) J
= j{Qf){Du{x))dx =: (QF)(u). ( 5.2.13)
Since by (5.2.12)
(sc~F)(u) < {q-
n
F)(u) for all n,
we conclude from (5.2.13)
(sc-F)(u) < (QF)(u).
Prom the definition of Q/ , we also conclude
Qf(v) = inf { i / (Qf)(v + Dip(x))dx,
K
meas U Ju
ip e H^
P
(U), U C R
d
open, bounded}. (5.2.14)
As before, this expression is independent of the choice of U.
5.2 Representation of relaxed functionals via convex envelopes 219
Definition 5. 2. 1. g : E
d
E is called quasiconvex if for all
v G R
d
, <p C HQ'
P
(U), U cR
d
bounded and open
g
^ ^ ^Z^u I
g
(
v
+
D
V>(x))dx. (5.2.15)
mease/ Ju
Equation (5.2.14) then implies that Qf is quasiconvex.
(3) Lemma 5.2.3. / : E
d
E is convex if and only if it is quasi-
convex.
Proof. ' =>' :
Jensen's inequality says that if / is convex, for every ip G
L
1
(R
d
,R
d
)
f
(-fi>(x)dx\ < J f(il>(x))dx (5.2.16)
(see Theorem 1.1.6). Since, as observed above, in Definition 5.2.1
it suffices to consider one fixed domain (7, we may assume
meas U = 1
and put
tp(x) = v -f Dip(x).
Since <p G #
0
, J>0*0
= v meas U = v, and (5.2.16) therefore
implies that / is quasiconvex.
We assume that / is quasiconvex, i.e.
f(v
0
) = 77 / f(v
0
)dx
meas U Ju
meas U Ju
for all <p Ho'
p
(U).
We have to show that for all v
x
, v
2
<E R
d
, 0 < t < 1
f{tv
x
+ (1 - t)v
2
) < tf(
Vl
) + (1 - t)f(v
2
). (5.2.18)
Equation (5.2.17) implies
f(t
Vl
+ (1 - t)v
2
) < 1 / / ( t
Vl
+ (1 - t)v
2
+ Dip{y)) dy
meas U Ju
(5.2.19)
for all U and all <^ G H^
P
(U). After a rotation, we may assume
220 Nonconvex Junctionals. Relaxation
that v\ V2 is a positive multiple of the first basis vector of our
standard basis of E
d
, i.e. v\ v<i points in the x^direction. We
shall take a cube W : (a, b)
d
C R
d
as our set U and construct a
family of functions
(Vn)n
6
N C H^{W)
with
on a set W C W with meas
W? = t(b-a){b-a-%)
d
on a set W% C W with meas
Vy>(x)
( l - * ) ( V l - V
2
)
u / n
_ ^ L ^ / L 2 \ d - l
and
11 ^ ^ n 11 ,00 (v^) < Co for some fixed constant Co that does not
depend on n.
Using these (p
n
in (5.2.19) yields
f{tvx + (1 - t >
2
) < t / (vi ) + (1 - t)f{v
2
) + p
n
with p
n
0 as n oo, hence (5.2.18).
It remains to construct (p
n
. We divide the interval (a, b) into 2
n
+*
subintervals as follows:
h = (a,a+ (b-a))
h = (a+^(b-a),a+^(b-a))
h = (a+ i^(b~ a),a+ (b- a) + ^-{b- a))
i.e. the intervals hv-i have length ~( 6 a), and they alternate
with the intervals J
2
of length A ^ ( 6 - a). We then put
WT: =( U
/
^ - i )
x
(
a +
-
6
- - )
d
"
1
^ n n
W
2
n
: =( M/
2 l
, ) x( a+-
)
6- - )
d
~
1
-
i / = l
5.2 Representation of relaxed functional via convex envelopes 221
We then put y>
n
(a, x
2
,..., x
d
) = 0,
d<p
n( ) =
( (1 - *)|vi " v
2
\ for x G W?
dx
l{X)
\ -t\v
1
-v
2
\ f or xe W
2
n
,
d(p
n
{x)
dx
i 0f or i = 2, . . . , d.
(Remember that we assume that V\ v
2
points in the positive
^-direction.)
We then have </?
n
(6, x
2
, ...,x
d
) = 0. We also put
(p
n
= 0 on dW,
and on W \ (WJ
1
U WJ
1
) we choose an interpolation that is afflne
linear in x
2
, ...,x
d
. Since
sup |y>(x)| <
0 n
(6 - a)\vi - v
2
\ =:
xewruwp
2 n 2 n
we get
sup . I < n.
a:VV
,
\( W
1
n
UW
a
n
) t =2, . . . , d Gte* ^
n
Thus, for large enough n,
I*
ax
1
sup \V<p
n
(x)\ = sup . < |vi - t/
2
| =: c
0
.
This completes the construction of </?
n
and the proof of Lemma
5.2.3.
q.e.d.
(4) We may now complete the proof of Theorem 5.2.1 Prom (2), we
know
(sc'F)(u) < QF{u) = f Qf(Du(x))dx.
By Lemma 5.2.3, Qf is convex. By Lemma 4.3.1, Qf(u) therefore
is lsc w.r.t. weak H
1,p
convergence. Since QF < F (see (5.2.10)
and the definition of QF), we must also have from the definition
of sc~F that
QF(u) < (sc-F)(u).
Hence equality. Thus
(sc-F)(u) = f(Qf)(Du(x))dx.
222 Nonconvex junctionals. Relaxation
Moreover, for every convex function g < / ,
G(u) := J g(Du(x))dx
is a weakly H
ltP
lsc functional < F. Therefore, from the defini-
tion of sc~F, the convex function Qf must in fact be the largest
convex function < / . This completes the proof.
q.e.d.
Corollary 5.2.1. F as in Theorem 5.2.1 is weakly lower semicontinu-
ous in H
lyP
if and only iff is convex.
Proof. Lemma 4.3.1 says that convex functional are weakly lower semi-
continuous. If / is not convex, then by Theorem 5.2.1 sc~F ^ F, hence
F is not weakly lsc by Lemma 5.1.1.
q.e.d.
Remark 5.2.1. One may also consider variational problems for vector
valued functions u : ft C R
d
-+ E
n
,
F(u) := f f(Du{x))dx.
Jn
Again, / is called quasiconvex if for all open and bounded U CR
d
and
all</?e#
0
1>p
(t/;R
n
), veR
nd
In this case, however, while convex functions are still quasiconvex, the
converse is no longer true. Theorem 5.2.1 continues to hold but with con-
vexity replaced by quasiconvexity. Also, one may consider more general
problems of the form
F(u) = / f(x,u(x),Du(x))dx
with similar results and conceptually similar, but technically more in-
volved proofs.
Remark 5.2.2. The notation of quasiconvexity and many of the basic
corresponding lower semicontinuity results are due to C. Morrey. In
fact, the quasiconvex functionals are precisely the weakly lower semicon-
5.2 Representation of relaxed junctionals via convex envelopes 223
tinuous ones. For detailed references to the work of Morrey and other
researchers, see the book of Dacorogna quoted at the end of this chap-
ter.
Remark 5.2.3. Theorem 5.2.1 can be considered as a representation the-
orem for relaxed functionals. In particular, it says that a functional on
H
1,p
obtained by integrating an integrand f(Du(x)) (with certain tech-
nical assumptions on / ) has a relaxed functional of the same type, i.e.
again representable by integration w.r.t. to some integrand g(Du(x)) of
the same type. Furthermore, g may be computed explicitly from / .
We now return to our initial example
F(u) = f iu
2
(x) + (u'(x)
2
- l )
2
} dx
f or i i i f o'
4
( ( 0, l ) ) . F( i i ) i s t he sum of a functional which is continuous
w.r.t. strong L
2
-convergence, hence also w.r.t. weak H
1
*
4
convergence,
and another one to which Theorem 5.2.1 applies. We conclude that
(sc~F)(u)= f {u
2
{x) + Q(u'(x))}dx,
Jo
with
Q(v) = r:
1)2 tf
! r
1
-.
1
1 n
otherwise,
the largest convex function < (v
2
l )
2
.
References
For the definition of relaxation and its general properties:
G. dal Maso, An Introduction to V-Convergence, Birkhauser, Boston 1993,
pp. 28-37.
G. Buttazzo, Semicontinuity, Relaxation and Integral Representation in the
Calculus of Variations, Pitman Research Notes in Math. 207, Longman
Scientific, Harlow, Essex, 1989, pp. 7-28.
For Theorem 5.2.1 and generalizations thereof:
B. Dacorogna, Direct Methods in the Calculus of Variations, Springer,
Berlin, 1989, pp. 197-249.
Nonconvex Junctionals. Relaxation
Exerci ses
Determine sc~F and discuss the relaxation for
F(u) = / (1 ~ u'{x)fu{x)
2
dx for u G H
1A
with u(-l) = 0,1/(1) = 1,
F(u)= f (2x-u
f
(x))
2
u(x)
2
dx for ueH
1
'
4
wi t hi t ( - l ) = 0,ii(l) = 1,
and
F(u) = f ((u(x)
2
- a) + ( ^( x)
2
- 1)) cte for u G #
M
with a G R.
Determine 5C~7 for J : L
p
(fi) -+ ft (11 G R
d
open and bounded),
J(tt):=
f/
n
|Dtirdx + /
n
HA(b ifuecHn)
I oo otherwise.
Why does the proof of Lemma 5.3.3 not work for vector-valued
mappings R
d
-+ E
n
with n > 1, i.e. # : R
dn
-+ R, v G R
dn
, (p G
To
,P
(K,R
n
) as in Remark 5.2.1?
6
T-convergence
6.1 The definition of T-convergence
In this chapter, we treat the important concept of T-convergence, intro-
duced and developed by de Giorgi and his school.
Definition 6. 1. 1. Let X be a topological space satisfying the first ax-
iom of countability, F
n
: X R functions (n N). We say that F
n
T-converges to F,
F = T- lim F
n
n+oo
if
(i) for every sequence (x
n
)nN converging to some x E X,
Fix) < liminf F
n
(x
n
)
n>oo
and
(ii) for every x G l , there exists a sequence x
n
converging to x with
F(x) = lim F
n
(x
n
).
n+oo
Example 6.1.1. F
n
: R - R
1
F ( x) := <
1 for x >
n
nx for < x <
n n
1 f or x < .
n
Then
(r-UmF
n
)(* ) = P
for x > 0
for x < 0
225
226 Y-convergence
while the pointwise limit is 0 for x = 0, 1(-1) for x > 0(< 0).
Example 6.1.2. F
n
: R -+ E
{
nx for o < x <
2 - nx for - < x <
n n
0 otherwise.
Then
(r-l i mF
n
)(x) = 0
which is the same as the pointwise limit.
Example 6.1.3. F
n
: E -+ E
F
n
(x) :--
-nx for 0 < x < -
n
1 2
nx 2 for - < x < -
n n
0 otherwise.
Then
(r-l i mF
n
)(x) = {
1 for x = 0
0 otherwise.
whereas the pointwise limit is again identically 0. Note that the F
n
of
6.1.3 is the negative of the F
n
of 6.1.2. Thus, in general
Example 6.1.4- F
n
:
F
n
(x) := {
( r - l i mF
n
) ^ r - l i m( - F
n
) .
nx for 0 < x <
n
nx 2 for < x < for odd n
n n
0 otherwise
0 for even n.
F
n
then converges pointwise to 0, but does not T-converge at x = 0.
Example 6.1.5. F
n
: R -+ R
F
n
(x) = sinnx.
Then
( r - l i mF
n
) ( x) = - l ,
whereas F
n
does not converge pointwise.
6.1 The definition ofT-convergence 227
From Examples 6.1.4 and 6.1.5, we see that among the two notions of
pointwise convergence and T-convergence, neither one implies the other.
Example 6.1.6. F
n
: X R converges continuously to F : X R if for
every x G X and every neighbourhood V of F(x) in R (i.e. F = {y G
R : | F(x) - y\ < e} for some e > 0 in case F(x) G R, V = {y G R :
t/ > i f}U{ 00} for some K G R in case F(x) = oc, and analogously for
F(x) = 00), there exist no G N and a neighbourhood U of x with
for all n > no, y U.
F
n
converges continuously if and only if both F
n
and F
n
converge
to F and F, respectively. Continuous convergence implies pointwise
convergence, and we conclude from Examples 6.1.2 and 6.1.3 that T-
convergence is weaker than continuous convergence.
Example 6.1.7. Let X satisfy the first axiom of count ability,
F
n
= F : X -+ R
a constant sequence. Then
r - l i mF
n
= (sc~F)
is the relaxed function of F. Thus, we have the remarkable phenomenon
that a constant sequence may converge to a limit different from the
constant sequence element.
Remark 6.1.1. Without changing the content of the definition of T-
convergence, condition (ii) may be replaced by the following condition
which is weaker and therefore easier to verify:
(ii') for every x G l , there exists a sequence x
n
converging to x with
limsupF
n
(:z
n
) < F(x).
n+00
The following result is useful in approximation arguments:
Lemma 6. 1. 1. Let X satisfy the first axiom of countability. Suppose
(
x
m)meN converges to x in X, and
l i msupF( x
m
) < F(x).
m>oo
Suppose that (ii
f
) is satisfied for every x
m
(i.e. for every m, there exists
a sequence (x
m) n
)
n
N converging to x
m
with
limsup F
n
( x
m
,
n
) < F(x
m
)).
n+00
228 T-convergence
Then (ii
f
) also holds for x.
Proof. Since X satisfies the first axiom of countability, we may take a
neighbourhood system {U
u
)
u
^ of x and renumber it and take intersec-
tions so that
x
m
U
m
for all m G N,
and that every sequence (t/)t,
e
N with y
u
G U^
u
) for all v and some
sequence ^i(y) oo as v > oo converges to x. For n G N, we let
m
n
:= max
Then
I m G N : x
m
,n U
m
, F
n
(a:
m
,
n
) < + F( x
m
) > .
lim m
n
= oo.
n* oo
Namely, otherwise, we would find fco G N with
f n, ( Xf c,
n
J> +
F
(
X
k)
or
#fc,n ^ t^fc for all k> ko and some
sequence n oo for
v > oo.
To see that this is impossible we simply observe that since Xk
0
G Uk
0
and since Xk
0
,
n
converges to Xk
0
as n oo we have
^fc
0
,n 4
0
f
r a
^ sufficiently large n,
and likewise since we assume
limsupF
n
(x
fco>n
) < F(x
fco
),
nKX>
we have
F
n
(x
fco
,
n
) < F(x
fco
) + -j for all sufficiently large n.
We then have
F
n
(x
mn
,
n
) < F(x
m
J + .
m
n
Therefore y
n
:= #
mn
,
n
converges to x as n oo, and
limsupF
n
(t/
n
) < lim sup ( F(x
mn
) + ) < F(x)
6.1 The definition of V-convergence 229
by assumption and since m
n
oo as n > oo. Thus, (y
n
)nN is the
desired sequence.
q.e.d.
Let F : X - R U {00} satisfy
inf F(y) > - 00.
Given e > 0, we say that x G X is an e-minimizer of X if
F ( x ) < inf F(y) + e.
yex
Note that x is a minimizer of F if it is an e-minimizer for every e > 0.
In contrast to minimizers, e-minimizers always exist for any e > 0.
The following result is a trivial consequence of the definition of T-
convergence, but quite important.
Theorem 6. 1. 1. (Let X satisfy the first axiom of countability). Let the
sequence of functions F
n
: X > R F-converge to F : X > R.
Let mf
y
x F
n
(y) > - oo for every n G N.
Let x
n
be an e
n
-minimizer for F
n
.
Assume e
n
> 0 and x
n
+ x for some x G X. Then x is a minimizer
for F
t
and
F(x) = lim F
n
(x
n
). (6.1.1)
n> oo
Proof If x were not a minimizer for F, there would exist x' e X with
F(x' ) < F(x). (6.1.2)
Since F
n
T-converges to F, there exists a sequence (x'
n
) C X with
lim x'
n
= #'
l i mF
n
( x; ) =F( x
/
) .
We put 8 := \(F(x) F(x')). We may choose n so large that
e
n
< 8 (6.1.3)
F
n
( x ^ ) < F( x ' ) + (5 (6.1.4)
F
n
(x
n
) > F(x) - 6 (by property (i) of Definition 6.1.1). (6.1.5)
230 r-convergence
Since x
n
is an e
n
-minimizer of F
n
,
F
n
) > F
n
(x
n
) - e
n
(6.1.6)
>F
n
(x
n
)-6 by (6.1.3)
>F( x) - 2<5 by (6.1.5).
Prom (6.1.4) and (6.1.6), we get
F(x) < F(x') + 3<5
contradicting (6.1.2) by definition of 6. Thus, x is a minimizer for F. If
(6.1.1) did not hold, then after selection of a subsequence,
F(x) <l i mF
n
( x
n
)
whereas by property (ii) of Definition 6.1.1, there would exist a sequence
(x'
n
) converging to x with
F(x) = \imF
n
(x'
n
),
and we would again contradict the e
n
-minimizing property of x
n
.
q.e.d.
Corollary 6.1.1. (Let X satisfy the first axiom of countability.) Let
F
n
: X R T-converge to F : X > E. Let x
n
be a minimizer for F
n
. If
x
n
> x, then x minimizes F, and
F(x) = liminf F
n
(x
n
).
The following result is similarly both trivial and important.
Theor em 6.1.2. (Let X satisfy the first axiom of countability.) Let F
n
Y-converge to F. Then F is lower semicontinuous.
Proof. Otherwise, there exist some x X and some sequence (#rn)m(EN
with
lim x
m
= x
m>oo
lim F{x
m
) < F(x). (6.1.7)
moo
By T-convergence, for every m, there exists a sequence (x
m>n
)
n6
N C X
with
lim Xfn
n
=
Xjyi
n+oo
lim F
n
{x
m
,n) = F(x
m
). (6.1.8)
6.2 Homogenization 231
We assume oo < l i mF( x
m
) , F(x) < oo simply to avoid case distinc-
tions. We let
6 := \ (F(x) - lim F(x
m
)) > 0 by (6.1.7).
4 \ TO+oo /
For every ra G N, we may find n
m
G N with
Fn
m
(x
m
,
nm
)-~F(x
m
)<6 (6.1.9)
lim x
m>Hm
= x , lim n
m
= oo.
771> 00 TO OO
Then by T-convergence
F(x)< lim i n f F
n m
( x
m
,
n
J . (6.1.10)
TO+00
We may then choose ra so large that
F(x
m
) < F(x) - 36 (6.1.11)
and
F
nm
(x
m
,
n
J>F(x)-S. ( 6.1.12)
Equations (6.1.9), (6.1.11) and (6.1.12) are not compatible, and the re-
sulting contradiction proves the lower semicontinuity.
q.e.d.
Remark. As a consequence of Corollary 3.2.2 and Theorems 3.1.3, 3.3.1,
and 3.3.3, in combination with Lemma 2.2.4, the weak topology of L
P
(Q)
and W
kyP
(Q) for 1 < p < oo satisfies the first axiom of countability so
that the preceding notions are applicable.
The reference for this section is
G. dal Maso, An Introduction to V-Convergence, Birkhauser, Boston,
1993
6.2 Homogeni zati on
In this section and the next one, we describe two important examples
of T-convergence. They are taken from H. Attouch, Variational Conver-
gence for Functions and Operators, Pitman, Boston, 1984.
In the discussion of these two examples, we shall be more sketchy
about some technical details than in the rest of the book, because
the main point of these examples is to show how the concept of T-
convergence can be usefully applied to concrete problems that arise in
various applications of the calculus of variations.
232 T-convergence
Let M be a smooth subset of the open unit cube (0, l)
d
of R
d
. M is
considered as a hole. Let
M
c
: = ( J e(M + ra)
mez
d
(e(M -f ra) := {x = y + era with ^ M}) be a periodic lattice of 'holes'
of scale e.
Let ft C M
d
, fi
c
:= ft \ (M
c
O ft), i.e. a domain with many small
holes. Such domains occur in many physical problems like crushed ice,
porous media etc. Often, the physical value of e is so small that it is
useful to perform the mathematical analysis for e * 0. This is called
homogenization. Let
(
v , v JO f o r x e M
d
\ Mi
a{x) :=tRd
Wl
(x) := <
x
{ oo for x 6 Mi
be the indicator function of R
d
\ M\. a(~) then is the indicator function
of R
d
\M
e
. We consider the functional
F

(u) := \e
2
f \Du(x)\
2
dx + / a (-) u
2
{x)dx (6.2.1)
minimizer of the functional
F
e
(u) / f(x)u(x)dx
Jn
(for given / G
2
(fi)) satisfies
Au = - ~ in fi
c
and u = 0 on <9fi
c
. (6.2.2)
Here 9Q
e
= dQU (dM
e
nfi ). The boundary condition on dft comes from
the requirement that u H
0
y2
(ft), while the boundary condition on dM
e
is forced by the functional.
Theorem 6. 2. 1. With respect to weak L
2
(ft) convergence
r - l i m F
e
= F, (6.2.3)
with
where
H(M) := / \Dr](x)\
2
dx= j n(x)dx,
J(0,l)
d
\M Ao,l)
d
6.2 Homogenization 233
and rj is the solution of
Arj = -1 in (0, l)
d
\ M
77 = 0 inM (6.2.4)
77 is lL
d
-periodic (i.e. t](x + m) = rj(x) for x (0, l)
d
, m G Z
d
).
Proof. We put r/
c
(x) := r?(f). By Lemma 5.2.1, r/
c
converges weakly in
L
2
(fi) to fi(M) as e 0. Let now u e L
2
(ft). By approximation, we
may assume that u is smooth, e.g. contained in W
1,2
(Q.) O C(fi). We
put
u
<
:=
^ o^ " -
Then u
e
converges weakly in L
2
(Q) to u, and
u
e
= 0 on M
c
.
Moreover
F . K) = ~ / l ^ d
2
(6.2.5)
2
.2
^ J (u
2
\D
Ve
\
2
+ 2ur
le
Du Dr,< + r,
2
\Du\
2
) .
2 n{M)
If U C fi is open, because of the periodicity, f
v
\Dr]
e
\
2
asymptotically
behaves like
e
J(0,e)
d
\ M
e

J(0,l)
d
\ M
This means that
lim e
2
/ \Drj
e
\
2
= measU [ \Drj\
2
= measU fi{M) (6.2.6)
c
~>
0
JLT J ( 0, l )
d
\ M
hence, approximating u by step functions, we also get
lim e
2
/ u
2
\Drj
e
\
2
= plM) [ u
2
(6.2.7)

- Jn

Jn
(note that we assume u to be continuous). Moreover, since rj
e
is bounded
independently of e,
lime
2
/ r)
2
\Du\
2
= 0, (6.2.8)
and from (6.2.6), (6.2.7) and the Schwarz inequality, also
lim e
2
/ ur]
e
Du Drj
e
= 0. (6.2.9)
234 F-convergence
Equations (6.2.5)-(6.2.9) imply
lim F
e
(u
e
) = F(u). (6.2.10)
c+0
In order to complete the proof of T-convergence, we need to verify that
whenever functions v
e
that vanish on M
e
converge weakly in L
2
(Q) to
u, then
l i mi nfF
e
(i ;
e
)>F(t i ). (6.2.11)
By an approximation argument, we may assume u Co(fi). We put
1
u
*
=
IW)^
as before. We have
F
e
(v
e
) + F
e
(u
e
) > e
2
f Dv
e
Du
e
I fa .
/ vieDv
e
'Du + uDv
e
'Drj
e
). (6.2.12)
Using (6.2.10), we obtain from (6.2.12) in the limit e -+ 0
liminf FJv
e
) +
n
* f u
2
> l i mi nf - ^ / uDv
e
D<n
e
(6.2.13)
since the other term on the right hand side of (6.2.12) goes to 0 by a
similar reasoning as above. Equation (6.2.4) implies
e
2
Ar)
e
= - 1 in fi
c
. (6.2.14)
Moreover
e
2
/ Duv

Dr)

<e
2
( f \Du\
2
v
2
j (f \Dr)
e
\
2
X - 0, (6.2.15)
since v

as a weakly converging sequence is bounded in L


2
, \Du\ is
bounded by our approximation assumption that u is smooth enough,
and since we may use (6.2.6).
Integrating the right-hand side of (6.2.13) by parts, and using (6.2.14)
and (6.2.15), we obtain
liminfF
e
(v
e
) + .... I u
2
> liminf / v
e
u
6.3 Thin insulating layers
235
since v
e
converges weakly in L
2
to u. This implies (6.2.11) and concludes
the proof.
q.e.d.
6.3 Thi n insulating layers
We consider an insulating layer of width 2e and conductivity A, and we
want to analyse the limit where e and A tend to 0.
Let Q C R
3
be bounded and open, S a smooth complete surface in
R
3
, e.g. a plane, E := ft n 5,
S
e
:= {x R
3
: dist(z, 5) < e}
E
C
-.= nns
e
fi
c
:=fi\
c
.
Conductivity coefficient
(1 onQ
c
^ A onL
c
Variational problem:
(A > 0).
J
C
'
A
: #
0
1, 2
(fi) -> R
I

'
A
(ti) := J / \Du\
2
dx + ~ f \Du\
2
dx
P>
x
(u) - I fu-> min ( / L
2
(Q) given).
(where n^/ denotes the exterior normal of a set U)
u
i
\ = 0 on dfl.
(6.3.1)
The Euler-Lagrange e quations are
Au
Ci
x + / = 0 on fi
c
AAii
C)A
+ / = 0 on S
c
^C,A|Q
C
= ^C,A|E

on dtt
e
fl <9
c
A d ^ *
_ _ du
iX
on dQ
e
H <9
c
(6.3.2)
(6.3.3)
(6.3.4)
(6.3.5)
(6.3.6)
236 r-convergence
Theor em 6. 3. 1. We let e - 0, A - 0. If - a with 0 < a < oo, then
u
y
\ > u weakly in L
2
(ft) u
y
\ =4 u uniformly on every fto CC fi \ E,
w/iere it solves
Au + f = 0 o n f i \ E
^lan = 0
du dit
r
,
and on\
2
where [U]J: is the jump ofu across E, and! | | , and | ^, ore /ie exterior
normal derivatives for the two components of ft \ E. (7n case a = oo, u
is continuous across E, and! Aw = / m ft.) Furthermore
- 5j L
,D
"
l > +
U
M i d
J
C
'
A
r-converges w.r.t. the weak L
2
-topology to I(u):
0<a<oo:I(u) = {Un\z\u\
2
+ lMyV ifueH
l
0
>
2
(n\ll)
I oo otherwise
a = 00
:
I{u) =
lUn\Du\
2
ifueH^n)
I oo otherwise
(in this case,
the result holds
a = 0 : J( ) = ( i /n\= l^ l
2
*
e
#
1,3
(
n
\
E
) { T ^ f
nff
l oo otherwise
L
-topology
in place of
the weak one)
Thus, in case a = 0, we obtain a perfect insulation in the limit,
whereas for a = oo, the limiting layer does not insulate at all.
We assume for simplicity S = {x
3
= 0}.
Lemma 6. 3. 1. There exists a constant c\ (depending on / , fi, 5, but not
on e, A) such that for all sufficiently small e, A
/
n
< >< c(l
+
(^)
/ \ Du,,^ + \ l |IHi , |' < c
L
( l + -|) .
6.3 Thin insulating layers 237
Proof.
i
2
, A|
f \Du
e
,
x
\
2
+ A / \Du^
= - / Au
c
,
A
ii
C)A
+ / u
yX
~^ - A / Au
c>A
u
c
,
A
+ A / U
C
,
A
-T7
= / / w
C)A
because of the Euler-Lagrange equations
^ I / I L
2
( 0) ' l
M
.*lz,
2
(n) *
By the Poincare inequality (Theorem 3.4.2),
/ u
2
x
<
c
2 / | ^ C, A|
2
/ <
A
< c
3
/ \Du
yX
\
2
.
By a change of scale
y
3
= ex
3
(y
1
= x\y
2
= x
2
),
/ u
2
eX
< c
3
e / \Du^
JJ2

JY>

| 2
:, A|
(we only get e instead of e
2
, because the area of the portion of #E
C
on
which u
e
,\ vanishes, namely dQ D 5

, is proportional to e). Altogether


f <
A
< c
4
( l 4- j) (J \ Du^
x
\
2
+ \ J |Du
f
2
A|
<
and the estimates follow.
q.e.d.
Proof {Theorem 6.3.1). We only consider the case 0 < a < oo (the
other cases follow from a limiting argument). We first observe
r - lim I
e
>
x
(u) = oo 'due L
2
(VL) \ H^
2
(ft \ E).
We assume for simplicity
X = X{e) = ea ,T :=I
e
^
e
l
Let u if
1, 2
(fi \ E). We first check property (ii) of T-convergence:
238 T-convergence
We need to find u
e
* u weakly in L
2
(fi) with HmI
e
(u
e
) = I(u). We
define
[ u(x
l
,x
2
,x
z
) if | z
3
| > e
u
e
(x ,x , x ) := <
~{u(x\x
2
,e)+u(x\x
2
,-e)}
if | r z
3
| <.
+{u(x
1
,x
2
,e)-u(x
1
,x
2
,-e)}
Then u
e
G #o'
2
(ft \ E), u
e
- - u weakly in L
2
(fi) for e -+ 0,
r ( u
e
)
= 5 / \Du\
2
+ ^ I \]-D(u(x\x
2
,e) + u(x\x
2
,-e))
/ 3 \ \
2
+D(^(u(x\x
2
,e)-u(x
1
,x
2
,-e)))\
~ \ f \Du\
2
+ ?- [ \D(x*(u(x\x
2
,e)-u(x\x
2
,-e)))\
2
~ 5 / | ^ |
2
- f / \u(x\a*,e)-u(x\x
2
,-e)\
2
+ terms that contain a:
3
and go to zero as e 0 (|rr
3
| < e).
If w is smooth (which we may assume by an approximation argument),
therefore for e > 0
i >
c
) - 4/ l^^l
2
+ ? / M
2
J Q\ E
4
J E
2
E*
We now check property (i) of T-convergence:
Let v
e
* u weakly in L
2
(Q). We need to show
For u
e
as above,
ae f
\Dv<\'
lim inf F
e- 0
<
+
Hjou,r
(Ve
>
)>n
ae I
u).
Du
e
Dv
(
> f / (D{u(.,e)+u(.,-e)}
J E
e
X
3
+-D{u(;e)-u(;-e)}
e3
+-(u(;e)-u(;-e)))-Dv
(
,
where e$ of course is the unit vector in the x
3
direction.
6.3 Thin insulating layers 239
We may assume u smooth (otherwise, we use an approximation argu-
ment). Then as above
U
^
f
?/
E
J^ I
2
n/
E
M-
> lim inf / (DW, 6)-ti(. , -e))). ^x
3
+ lim inf - / e
:3
Dv
e
(u(-, e) - ii(-, - e ) ) .
Without loss of generality liminf
e
_^o I
e
{v

) < oc. Then


supe / |D*;
C
|
2
< oo. (6.3.7)
Consequently,
lima^y (^Du(-,e) + ^Du(-,-e)YDv
e
<cae(J \Dv
e
\
2
J
cae
-+ 0 for e -> 0.
Similarly,
lim sup a / (D(u(-,e) - w(-,-e))) Dv
e
x
3
0
e->o JE
C
since \x
3
\ < e. Thus
W T / J ^ + U M*
> lim i nf - / e
3
' Dt ;
e
( w( ' , 6) - u( - , - c ) ) .
Since it(-, e) and u(-, e) do not depend on x
3
, we obtain by integration
~ / e
3
- I t o

( u( - , ) - u( - , - ) )
= ? / ^ K- , ) - w(-, - c)) - ^ / t;

, c) - u(- , - c) ) ,
where here of course dZf = 0 D {x
3
= c}. Since we may assume
liminf
c
_o^
c
(^c) < oo, v
e
is bounded in jfiF
1,2
(n
c
). Therefore, we may
240 r-convergence
assume that the traces of v

on d

converge*)". Since u is assumed smooth


and v

converges to u weakly in L
2
, we may assume
v
,(ra
e
) - " u^O*) weakly in L
2
(<9
e
).
We then get
l i m | / e
3
-Dv
e
(u(-
y
c)-u(',-e)) = - I [ti]| .
Altogether
liminf | j f \Dv
e
\
2
+ <\ Jju]l > f / M | .
liminf I >
e
) > \f \Du\
2
+ | J[u}l
Therefore
q.e.d.
Exerci ses
6.1 Determine the T-limits of the following sequences of functions
F
n
: R -+ R:
F
n
(#) := n(sinn -f 1)
v
' l l forx = 0
2
x for 0 < a; <
F
B
( x) :-{ ;
F
n
(#) := sin n# 4- cos nrr.
2n - n
2
x for - < x < -
n n
6.2 Show the following result: Let X be a topological space satisfy-
ing the first axiom of countability, F
n
, G
n
: X R. Suppose
that F
n
T-converges to F, G
n
T-converges to G, F
n
4- G
n
T-
converges to H (assume that the sums F
n
+ G
n
, F + G are
always well defined; for example, there must not exist x X
with F(x) = oo, G(x) = oo or vice versa). Then
F + G<H.
Does one get equality ' =' instead of ' <' here? (Hint: Consider
F
n
(x) sinnrr, G
n
(x) = sinnz.)
f For this technical point, see e.g. W. Ziemer, Weakly Differentiable Functions,
Springer, GTM 120, New York, 1989, pp. 189ff.
7
BV-functionals and T-convergence:
the example of Modica and Mortola
7.1 The space BV{9)
Let C#(]R
d
) be the space of continuous functions on E
d
with compact
support. For each Radon measure ^ and each //-measurable function
v : E
d
+ E with \v\ = 1 fi-almost everywhere, we can form a linear
functional
L : C$(R
d
) -* R
L(f) = [ fvdfi.
Conversely, we have the Riesz representation theorem, given here with-
out proof (see e.g. N. Dunford, J. Schwartz, Linear Operators, Vol. I,
Interscience, New York, 1958, p. 265).
Theorem 7.1.1. Let L : Cg(E
d
) E be a linear functional with
\\L\\
K
:= sup{ L(/) : / C
0
(R
d
), | /| < l , s upp/ C K) < oo (7.1.1)
for each compact K C E
d
. Then there exist a Radon measure fi on
R
d
and a fi-measurable function v : E
d
+ E with \v\ = 1 fi-almost
everywhere with
L(f)= [ fvdn forallfC{R
d
). (7.1.2)
If L is nonnegative, i.e. L(f) > 0 whenever / > 0 everywhere, then
v = 1, i.e.
L(f) = f fdtx. (7.1.3)
JR
d
241
242 Modica-Mortola example
Thus, the Radon measures on R
d
are precisely the nonnegative linear
functionals on Co(K
rf
). (Note that (7.1.1) automatically holds if L is
nonnegative; namely
\\L\\
K
= L(
X
K)
in that case where \K is the characteristic function of K.)
The same result more generally holds for Co(R
d
,.ff") where H is a
finite dimensional Hilbert space with scalar product (,). Then linear
functionals L : Cg(R
d
, H) > R satisfying (7.1.1) are represented as
Hf)= I (f^)d^ ( 7.1.4)
Jm
d
where // again is a Radon measure and v : R
d
> H is //-measurable with
\v\ = 1 //-almost everywhere. Also, in the situation of Theorem 7.1.1,
one has
/i(n) = sup{ L(/) : / G Cg(Q), | / | < 1}
for any open f ! c R
d
.
The expression vdfi in (7.1.4) (\v\ = 1 //-almost everywhere) is called
a vector-valued signed measure, (/i is supposed to be a Radon measure
and v a //-measurable function with values in H.)
Definition 7.1.1. Let ft R
d
be open. The space BV(Q) consists of
all functions u L^fi ) for which there exists a vector-valued signed
measure v^i with //(fi) < oo and
/ udivg = / 9vd[i (7.1.5)
for all g CQ(Q, R
d
). In this case, we write Du = vfi, DiU = i/^/x
(y = ( i /
l 5
. . . ,i/
d
) ,f = 1, . . .,rf). Foru BV(fl), we put
| | Z^| | (fi ) :=//==
sup {f
Q
udivgdx : g = (g\ .. .,g
d
) C ( f i , R
d
) , \g(x)\ < 1
for all x fi}
< oo
and
IMlBV(n)
:
=IMlL(0) + I P
u
l l ( f i ) -
1.1 The space BV(fi) 243
For u BV(ft), \\Du\\ is a Radon measure on ft:
\\Du\\ (n
0
) = sup Ij u&ivgdx : <? G C
0
(fio,M
d
), M < l }
for fio open in fi. We write
| | Di i | | ( no) =: / HDfill,
Jn
0
and also
| | Di t | | ( / ) =: / /| | jDit| | for a nonnegative Borel
n
measurable function /
on fi.
We have for / Cg(fi), / > 0:
\\Du\\(f)
= sup{[udivg:gCZ{n,R
d
),\g(x)\<f(x) V x fij . (7.1.6)
Lemma 7.1.1. J / uG W
1
'
1
^ ) , tfien u G BV( f2) , and
cfyx = | Du| dx where Du is the weak
derivative ofu anddx
is d-dimensional Lebesgue
measure,
and
DU{X)
ifDu(x)^0
i/(x) = { \Du(x)\
0 otherwise.
The proof is obvious. q.e.d.
On a compact hypersurface 5 C R
d
of class C, we have an induced
metric and in particular a volume form dS. The (d l)-dimensional
volume of S then is
\ S\ i-! = f
Js
dS.
s
Lemma 7.1.2. Let E be a bounded open set in R
d
with a boundary dE
of class C. Then
m^-WDxBWW), (7.1.7)
where \E is the characteristic function of E.
244 Modica-Mortola example
Proof. We have to show
\dE\
d
_
x
= sup U div<? : g e C
0
(R
d
,R
d
), \g\ < 1J .
By the Gauss theorem
[ divg= f g(x)n(x)d(dE)
JE JOB
where n{x) is the exterior normal. Therefore
\dE\
d
_
x
> sup IJ dwg : g C^(R
d
,R
d
), \g\ < 1J .
For the converse inequality, we use a partition of unity to extend n to a
C-vector field V on R
d
with \V(x)\ < 1 for all x R
d
. For ip C^{R
d
)
with \tp\ < 1, we put g = <pV and get
f divg= f <pd{dE).
JE JdE
Consequently
sup lj div<? : g G CHR^R**), \g\ < l |
> sup | f <pd(dE) : <p C^(R
d
), \<p\<l\
= \dE\
d
_
x
.
This completes the proof.
q.e.d.
The same conclusion holds if E CC ft for some bounded open set;
namely
\dE\
d
_
x
= \\D
X
E\m = s u p | ^ d i v
5
:g e ^ ( f i . R
4
) , |?| < l }
in that case.
Definition 7.1.2. -4 Bore/ se " C R
d
has finite perimeter in an open
set ft if XE\ BV(ft). The perimeter of E in ft in that case is
P(E,Sl) := \\D
XB
\\(n)
( = s up| ^di v
5
:
f f
C
0
o o
( n
)
R
d
) , M<l | ) . (7.1.8)
E is a set of finite perimeter if XE BV(R
d
).
7.1 The space BV(ft) 245
The following lower semicontinuity result is easy to prove and very
useful.
Theor em 7.1.2. Let ft cR
d
be open, (u
n
)
n
n C BV(ft), and suppose
u
n
u in L
1
^).
Then for every open U C ft
\\Du\\ (U) < liminf \\Du
n
\\ (U). (7.1.9)
n+oo
If in addition
sup {\\Dun\\ (ft) : n N} < oo, (7.1.10)
then
u BV(ft).
Proof. Let g C((7,R
d
) with \g\ < 1. Then
/ udivg= lim / u
n
div<7 < liminf | | Du
n
| | (U).
Taking the supremum over all such g, we obtain (7.1.9). If (p Co(fi),
then for i = 1, . . . d
lim / ipDiu
n
= lim / u
n
Di(p = / n D ^
and hence
/ uDup < sup |(^| liminf | | Dii
n
| | (ft) < oo
in case (7.1.10) holds. Since C%(ft) is dense in Cg(fi), for z = 1, . . . , d
Diu((p) :== - / uDi</?,
then is a bounded linear functional on Co(fi), and thus u BV(ft).
q.e.d.
We next discuss the approximation of BV-functionals by smooth ones
through mollification. As usually, we let p C^
>
(M
d
) by a mollifier
with p > 0, suppp C S(0, 1), J
Rd
p(x)dx = 1, and we also impose the
symmetry condition
p(x) = p(-x). (7.1.11)
246 Modica-Mortola example
We then put as in Section 3.2
and for u L
X
(Q), we extend u to L
x
(R
d
) by defining u(x) = 0 for
x R
rf
\ Q and put
w^(a;) := ph * u(s) := / ph{x - y)u{y)dy e C(fi).
JR
d
Theorem 7.1.3. J/w 5 7 ( 0 ) , tften u
h
-* u in L
l
{Sl) and \\Du
h
\\ ->
||Dw|| m the sense of Radon measures as h -+ 0, i.e. for every f G Co(fi)
Urn f f\\Du
h
\\^ f f\\Du\\. ( 7.1.12)
In particular,
lira ||Z)u
h
|| (0) = ||>u|| (0). (7.1.13)
h ( )
JPTW/. iz^ u in L
1
^ ) by Theorem 3.2.1. It suffices to consider the
case / > 0. Prom (7.1.3) it follows as in the proof of Theorem (7.1.2)
that for every / <E Cg(fi) with / > 0
[ f\\Du\\<limmf [ f\\Du
h
\\. (7.1.14)
It thus remains to prove that for such /
l i m s u p / / | | D ^ | | < f f\\Du\\. (7.1.15)
h->o Jn Jn
For that purpose, we first obtain from (7.1.6)
f f\ \ Du
h
\ \ =
Jn
sup | / g(x)Du
h
(x)dx : g e C^(il,R), \g{x)\ < f(x) V x n\ .
(7.1.16)
Here, Du^ = (g|rW/,, , gfj^ft) is the gradient of Uh, since Uh is smooth.
7.1 The space BV{Q) 247
For any such g as in (7.1.16)
/ g(x)Duh(x)dx = / Uh(x) divg{x)dx
= ~ / / Ph(x - y)u(y)dy divg(x)dx
= - Ph(y ~ x) divg(x)dx u(y)dy by (7.1.11)
= -Ju(y)div(g
h
)(y)dy. (7.1.17)
Since we assume \g\ < / , we have
\9h\ < \g\
h
< A,
and since / is continuous, fh =3 / uniformly as h 0 (see Lemma
3.2.2), i.e. \f
h
(x) - f(x)\ < f)
h
for all x G ft, with l i m^orfr = 0. By
definition of | | JDU| | , the right hand side of (7.1.17) therefore is bounded
byS
n
(f + vh)\\Du\\.
Thus, for every such g
lim / g(x)Du
h
(x)dx < [ f \\Du\\,
h
and (7.1.15) follows (cf. (7.1.16)).
q.e.d.
Corol l ary 7. 1. 1. Let Q be a bounded, open subset of R
d
. Then any
sequence (u
n
)
ne
n C BV(Q) with
WUUWBV - K f
or some
K
contains a subsequence that converges in L
1
^ ) to some u G BV(Q) with
\\u\\
BV
<K.
Proof. By Theorem 7.1.3, there exist functions v
n
G C(fi) with
\ \ Un ~
v
n\ \
L
i(Q) < ~
\\Dv
n
\\(n)<K + l.
Therefore (f
n
)nN is bounded in W
lyl
(ft). By the Rellich-Kondrachev
compactness theorem 3.4.1, after selection of a subsequence, (f
n
)nGN
converges in L
x
(f2) to some u G L
1
(fi). (u
n
) has to converge to u as well
(in L
1
^ ) ) . By Theorem 7.1.1, u G BV{Q), and
\\u\\
BV
<K.
q.e.d.
248 Modica-Mortola example
A reference for the BV theory is W. Zierner, Weakly Differentiate
Functions, Springer, GTM 120, New York, 1989, Chapter 5.
7.2 The example of Modi ca-Mortol a
We now come to the theorem of Modica-Mortola:
Theorem 7.2.1. Let
FJu) = I / | ^ ~ +nsin
2
(7rntx) i for u e H
1
* n L
1
^ )
v oo otherwise,
F{u) : =
/ \ j ^ \Du\ = i \\Du\\ for u BV( R<)
I oo otherwise.
Then w.r.t. to L
1
(R
d
) convergence
F = T- lim F
n
. (7.2.1)
n~>oo
Proo/.
(i) We first want to show
F(u)<limmfF
n
(u
n
) (7.2.2)
n*oo
whenever
u
n
- - *u i n i ^ R* ) .
For that purpose, we put
1 f
nt
h
n
(t) := - / |sin(7rr)|dr.
Jo
We note that
I M) - M*) l < k ~ *l
f o r a11
neN,s,teR.
Therefore
\\h
n
ou
n
h
n
o u\\
Ll
< \\u
n
u\\
L
i ~ 0 as n > oo. (7.2.3)
Also
lim / i
n
(0 = -* (7.2.4)
n+oo 7T
1.2 The example of Modica-Mortola 249
We now obtain
, 2
fl
n
O U
n
U
7T
< \\h
n
ou
n
-h
n
ou\\
L1
-f
L
1
, 2
h
n
ou u
7T
L
1
0 as n -^ oc (7.2.5)
by (7.2.3), (7.2.4), and Lebesgue's Theorem 1.2.3 on dominated
convergence. We may assume
u
n
G H
h2
(R
d
) for every n G N, (7.2.6)
because otherwise F
n
(u
n
) = oo, and (7.2.2) is trivial. Then
liminf F
n
{u
n
) > 21iminf / \Du
n
\ |sin(7rrai
n
)|
nKX) n>oo J^d
= 21iminf [\D(h
n
ou
n
)\
noo J
>- I \Du\ by (7.2.5) and Theorem 7.1.2
= F(u).
This shows (7.2.2).
(ii) We want to show that for every u G L
x
(E
d
), there exists a se-
quence (ii
n
)riGN C L
l
(R
d
) converging to u in L
1
(M
d
) with
Ums\ipF
n
(u
n
)<F(u), (7.2.7)
n+cx>
thereby completing the proof of T-convergence. This inequality
will be much harder to show than (7.2.2), however. We shall pro-
ceed in several steps:
(1) We may assume u G Co(E
d
). By a slight extension of the
reasoning of Theorem 7.1.3, we may find UH G C(R
d
)
(take a smooth <ph with <>h = 1 on B(0, ^) , (f(h) = 0 on
R
d
\ JB(0, + 1), \Dip
h
\ < 2 and multiply the mollification
of u with parameter h by ip^) with
lim / \uh(x) - u(x)\ dx = 0
lim F(u
h
) = F(u).
h+0
Applying Lemma 6.1.1, we may indeed assume u G CQ(R
d
).
(2) We now want to show that it suffices to verify the claim
for certain step functions.
Modica-Mortola example
By (1), we assume u e C%(R
d
). By Sard's theorem, for
almost all t e R ,
u"
1
^) = {x:u(x) = t}
is a hypersurface of class C. For every v Z, n N, we
may then choose
>n
with this property, with
- *-*
u + l
n n
and satisfying
The coarea formula (Theorem A.l) then implies
/ \Du{x)\dx^ I |iz""
1
(*)|
rf
_
1
rf*
* E / " h-' wL,-!*
I / = OO n
oo
1
> -k^L x
t / = OO
OO j
=
1 ] ~| | ^X{ t z>^,
n
}| | by Lemma 7.1.2.
n
t/ = OO
We choose N{n) N with A^(n) > (nmax \u\ + 1) and put
tf(n) 1 ^
: = + - 2w X{>t>-
i / =- N( n)
The preceding inequality implies
lirnsupF(t/
n
) < F(t/).
n* oo
If
>n
< u(x) < ^i/,n4-i> then w
n
(x) = . Therefore
n
suppi/
n
C supp?/,
and for all x
2
\u(x) - u
n
(x)\ < - .
n
7.2 The example of Modica-Mortola 251
Since u is assumed to have compact support, therefore
lim / \u
n
(x) u(x)\ dx = 0.
Lemma 6.1.1 then implies that it suffices to prove the claim
for the functions u
n
.
(3) In (2), we have reduced the claim to step functions
N
U =
X^Xn*,
where the fti are disjoint bounded open sets with bound-
ary dfli of class C. Since the general case is completely
analogous, for simplicity, we only consider the case N = 1,
i.e.
u = a
X
n (7.2.8)
with fl bounded and dQ of class C. Thus
F(u) = -a \dQ\
d
_
1
(cf. Lemma 7.1.1). (7.2.9)
We let 0 < p < Co, where eo is given in Lemma B.l. Thus,
the signed distance function d(x) as defined in Appendix
B is smooth on {x G R
d
: dist(x, dQ) < p). We need the
following auxiliary result:
Lemma 7.2.1. Let n G N, let a
n
G R, with lim
n
_>oo a
n
=
a G R, na
n
G Z,
4>n{x) := / ( + nsin
2
(7rn
X
W)} dt
be the one-dimensional analogue of F
n
. Then there exist
Lipschitz functions \n R R with
Xn(t) = 0 for t<0
Xn(t) =OLn for t> =
\/n
0<Xn(t)<a
n
for 0<t < -=,
and
l i ms u p 0
n
(
X
n ) < - a . (
7
-
2
-
10
)
n+oo 7T
252 Modica-Mortola example
We postpone the proof of Lemma 7.2.1 and proceed with
the proof of the theorem. We choose a sequence
(an)nR C R
with
lim a
n
= a,
noo
and na
n
G Z as in Lemma 7.2.1. We put
fi
n
:= j z <Efi:d(:z) < ~L 1
and
w
n
(a:) := Xn(^(#)) with Xn as in Lemma 7.2.1.
(7.2.11)
Then
u
n
(x) = 0 for xR
d
\n
V>n(x) = #n for X fi \ f2
n
0 < u(x) < a
n
for x Q
n
.
We also note
lim | n
n
L = 0. (7.2.12)
Thus (cf. (7.2.8))
lim / \u(x)-u
n
(x)\dx = 0, (7.2.13)
n->oo J^d
and u
n
converges to u in L
1
. We also let (as in Appendix
B)

t
:= {x M
d
: d(x) = t}.
We note
Du
n
{x) = 0 , sin(n7ru
n
(z)) = 0 for x e R
d
\ n
n
,
(7.2.14)
and
\Dd(x)\ = 1 for xEft
n
by Lemma B.l. (7.2.15)
7.2 The example of Modica-Mortola 253
Then
l i msupF
n
(i i
n
)
f I \ Du (x)\
2
\
= limsup / I h nsm
2
(n7TU
n
(x)) 1 \Dd(x)\dx
= limsup [^ | l ^Xn( 01
+ n s i n
2 (
n 7 r X w
(
t
) ) | \E
t
\
d
^dt
n-*oo JO \
n
J
by Corollary Bl (coarea formula)
< limsup sup 0
n
( Xn) *|
s
t l d- i I
n-oo \ 0 < t < ^ J
4
< a \dl\
d
_
x
by Lemma 7.2.1 and Lemma B.l
= F(u) (cf. (7.2.9)).
This is (7.2.7).
(4) It only remains to prove Lemma 7.2.1:
The idea is of course to minimize <t>
n
(x) under the given
side conditions on \- The Euler-Lagrange equations for <p
n
are

x
" = Trnsin(Trnx) cos(Trnx),
and these are implied by
^ X
, 2
= sin
2
(7rnx) + ci. (7.2.16)
We now construct a solution of (7.2.16) with the desired
properties: w.l.o.g. a > 0 (the case a < 0 is analogous).
We choose c
x
= in (7.2.16). We put
* ( * ) := / ~ ( i rU > I
ds
Jo n \^+sm
2
(mTs))
Then
We let
Vn : = </> n( c* n) .
0 < T]
n
< ~J=OC
n
.
Xn : [0, fin] -+ [0, a
n
]
254 Modica-Mortola example
be the inverse of ip
n
. Then Xn is of class C
l
and
\x'
n
{t) = ( i + sin
2
(n7r
X
(<))) * (7.2.17)
We extend Xn to K as a Lipschitz function by putting
Xn
(t) = 0 for t < 0
X(<) = a
n
for > T?.
Then
<f>n(Xn)
= f" (~^- + nshvVxnW)) dt
< p &- + n Q +sin
2
(7rnXnW)) dt
= / " "
2
Q + sin
2
(7rnxn(<))) * Xn(*)*
b
y (
7
-2.17)
= 2 / ( + sin
2
(7rns) I ds
Jo \
n
J
and Lemma 7.2.1 follows.
References
q.e.d.
L. Modica and St. Mortola, Un esempio di T -convergenza, Boll. U.M.I. (5),
14-B (1977), 285-99.
L. Modica, The gradient theory of phase transitions and the minimal
interface criterion, Arch. Rat Mech. Anal 98 (1987), 123-42
Let us also quote without proof the following result of L. Modica,
loc. cit., which plays an important role in the theory of phase transitions:
Let Q cR
d
be open and bounded with Lipschitz boundary, W : R R
+
be continuous with precisely two zeroes a, (3 {which then are absolute
minima, because W is nonnegative)
F
n
{u) := { L ( \\Du(x)\\
2
+ nW(u(x))) dx for u H^Sl)
v oo otherwise
and
F (u) = < ^
c
fn H^^ll f
or u
^ BV{0) and for almost all x G M
1 oo otherwise
7.2 The example of Modica-Mortola 255
with
f
0
i
co= W2(s)ds.
J a
Then F$ is the T-limit of F
n
w.r.t. L
1
-convergence.
The proof is similar to the one of Theorem 7.2.1, except that we cannot
apply Sard's lemma anymore, because even for a smooth function it, a
and (3 need not be regular values. Thus, one has to consider nonsmooth
level sets as well and appeal to some general results about BV-functions
and sets of finite perimeter.
The interpretation of Modica's theorem is the following:
Consider first the problem
under the constraint
/ W(u{x))dx min
JQ
/ u(x) = 7,
meas
with a < 7 < /? (w.l.o.g. assume a < (3). A minimizer then is of the
form
{ ;
for A
2
C fi
such that Ai U A
2
= fi,
a meas A\ + /3 meas A
2
= 7 meas ft. (7.2.19)
Uj thus jumps from the value a to the value (3 along dAiDfl = d^f l f t =:
T. However, apart from the preceding relations (7.2.19), A\ and A
2
and
hence also T are completely arbitrary. In particular, Y may be very
irregular. In order to gain some control over the transition hypersurface
T, one adds the the regularizing term J
n
\\Du(x)\\ to the functional, al-
beit with an arbitrarily small weight, and in fact one passes to the limit
where this weight vanishes so that one preserves (7.2.18), (7.2.19). Al-
though this regularizing term disappears in the limit it still has the effect
of regularizing the hypersurface V along which the transition from a to
(3 occurs. Namely, the hypersurface of discontinuity of the minimizer u
now is constrained by the requirement that the BV norm oft/, J
n
||2?w||,
be minimized. This means that T is a so-called minimal hypersurface.
The existence and regularity theory for such minimal hypersurfaces may
be found for example in E. Giusti, Minimal Surfaces and Functions of
Bounded Variation, Birkhauser, Boston 1984, pp. 3-134.
256 Modica-Mortola example
Exerci ses
7.1 Try to construct bounded sets in M.
d
that do not have a finite
perimeter.
7.2 Prove the preceding theorem of L. Modica for d 1.
Appendix A
The coarea formula
Theorem A. l ( coarea formula for smooth functions) .
Let u e C$(R
d
). Then by Sard's theorem,
C
u
:= {t e R : Bx e R
d
: Du(x) = 0, u(x) = t}
has one-dimensional Lebesgue measure zero, and thus, for almost all
t R, u~
1
(t) is a smooth hypersurface by the implicit function theorem.
We then have for every open ft C R
d
/ \Du{x)\dx= J lir^Onfil^d*. (A.l)
JQ J-OO
Proof.
(1) We first show the result for a linear map
/ : R
d
-> R
(w.l.o.g. / ^ 0). Let 7r : R
d
> R be the projection onto the first
coordinate. We may find A e G/(1,R), R e 0(d, R)f with
I = AO 7T O R.
For every measurable subset E of R
d
, we have by Pubini's theorem
\E\
d
= I" lEnir-Ht^dt,
Joo
where
XE
R
d
t Gl(d,R) := {d x d-matrices A with real entries and detA / 0}, 0(d,R) := {,4 G
G?l(d,R) | A
1
= A"
1
} (orthogonal group).
257
258 Appendix A
is the Lebesgue measure of E. Since R is orthogonal, we likewise
have
/
oo
-oo
We then change variables via s At and obtain
/
oo
lEnR-'on-'oA'Hs^ds
-oo
/ o o
{EnrH^ds. (A.2)
-oo
Since \A\ = |d/|, and / is linear, this is the coarea formula for
linear maps.
(2) Let
S
u
= {xeR
d
: Du(x) = 0}
U
t
:= {# R
d
: u(s) > } for t R.
We put
W<
' I -XR-\t/
ft
if
* < -
Then
u(x) = / u
t
{x)dt.
JR
Let y? e C^(R
d
\ S
u
), \<p\ < 1. Then
/ ii(x)div^(x)rfx = / / u
t
(x) div (p(x)dtdx
JR
d
JU
d
JR
= I u
t
(x) div (p(x)dxdt
JmJm
d
by Fubini's theorem. (A.3)
By definition of S
u
and the implicit function theorem, u~~
l
(t) D
R
d
\ S
u
is a hypersurface of class C
d
. Since we assume supp (p C
R
d
\ 5
W
, we may apply the divergence theorem to obtain
/ div(p(x)dx= / (p(x)n(x)d(dU
t
)(x)
Ju
t
J(dU
t
)nR
d
\s
u
and
/ div (p(x)dx~ / (p(x)n(x)d(dU
t
)(x)
y
jR
d
\U
t
J (dU
t
)r\R
d
\S
u
The coarea formula 259
where n(x) is the exterior normal of /*. We use this in (3) (recall
the definition of Ut) to obtain
/ Du(x)ip(x)dx
JR
d
= / u(x) div (f(x)dx
JR*
= [ [ <p{x)n{x)d{dU
t
)(x)dt
<
/ \u
l
(t)DR
d
\S
u
\
d
_
1
dt since we assume \<p < 1|
JR
JR
Taking the supremum over all such 9?, we obtain
/ \Du(x)\dx= [ \Du(x)\dx< f {u"
1
^. dt. (A.4)
JR
d
JR
d
\S
u
JR
(3) We now prove the reverse inequality. We let l
n
: R
d
E be
piecewise linear maps with
lim / \l
n
-
u
\=0 (A.5)
n
- * jR
d
lim / \Dl
n
\= f \Du\. (A.6)
n
- * JR
d
jR
d
Let
C/f := {x e R
d
: l
n
(x) > t}.
By (A.5), there exists a countable set T\ C E with the property
that for all t $ T
x
lim / |
X t
-
x
| =
0
, (A.7)
n
-* jR
d
where Xt is the characteristic function of {u(x) > }, and Xt the
one of {/
n
(#) > t}. As noted above, by Sard's theorem and the
implicit function theorem, there exists a null set T2 C E such
that for all t T2, u"
1
^) is a smooth hypersurface of class C
d
.
We put
T:=T
1
UT
2
.
(A.9)
(A.10)
260 Appendix A
Let t e R\ T, e > 0. By Lemma 7.1.2, there exists g e Q ( R
d
, E
d
)
with \g\ < 1 and
l
u _ 1
(*)!,,_! < / div g(x)dx +
E
-. (A.8)
We let M := J
Rd
\div g(x)\dx. We choose no so large that for
n> UQ
Then for n >UQ
\ div g(x)dx - / div g(x)dx
\J{u(x)>t} J{i
n
(x)>t}
<M f \
X
t-X?\ dx<~.
(A.8) and (A. 10) imply
K
1
(*)!,,_!< / divg{x)dx+

-
J{u(x)>t}
2
< / div g(x)dx + e
J{ln(x)>t}
= / g(x)n(x)d(d{l
n
(x) > t}) 4- e,
Jd{i
n
(x)>t}
n(x) denoting the exterior normal of {/
n
(x) > t}
<K
1
(t)\
d
_
l
+ ^
Thus, for t i T,
| ~
1
( t ) |
-
_
1
< luntaf | l^
1
(*)|
r f
_
1
.
From Fatou's lemma (Theorem 1.2.2), (A.2) and (A.6), we obtain
l
d
h
-1
( * ) Li * * i ^ s
f
i I
1
-
1
WU
d t
<l i mi nf / \Dl
n
(x)\dx
= / \Du(x)\dx. (A. ll)
(A.4) and (A. ll) easily imply the claim.
q.e.d.
The coarea formula 261
Corollary A. l . Let u e C$(R
d
), g : R-> R integrable, Q C R
d
open.
Then
J g(u(x)) \Du{x)\ dx= I g(t) \u~\t) D n|
d
_
1
dt. (A.12)
Proof. (A.12) follows from Theorem A.l if g is the characteristic func-
tion of an open set and similarly if g is the characteristic function of a
measurable set. By considering
$+(*):= max( 0,s( t) )
g-(t):=max(0,-g(t))
separately, it suffices to consider the case where g > 0, since always
g(t) = g+(t) g~~(t). We thus assume g > 0. Let now (p
n
)neN C R
+
with
lim p
n
= 0
n oo
oo
]T) Pn = 00,
n=l
and put inductively
A
n
:= I x R : $(x) > p
n
+ 5^X4,0*0 \ .
Then for all x R
d
OO
^(x) = ^ p
n X
A
n
( x ) . (A.13)
n=l
Since we observed that (A.12) holds for \A
n
in place of g, the repre-
sentation in (A.13) in conjunction with Beppo Levi's Theorem 1.2.1 on
monotone convergence then implies (A.12) for g.
q.e.d.
Remark A.l. The coarea formula is due to Federer. It holds more gen-
erally for Lipschitz functions u : R
d
> R. See H. Federer, Geometric
Measure Theory, Springer, New York, 1969, pp. 241-760, 268-71.
Appendix B
The distance function from smooth
hypersurfaces
We also need some elementary results about the (signed) distance func-
tion from a smooth hypersurface. Let Q C R
d
be open with nonempty
boundary dft. We put
At - f di st ( x, an) i f x e f t
{X) :
" { - dist(ar, dCl) if x K
d
\ SI.
d is Lipschitz continuous with Lipschitz constant 1. Namely, for x,y G
E
d
, we find 7r
y
G dfi with d(y) = \y 7r
y
|, hence
d(x) < |x - 7r
y
| < \x - y\ + \y ~ ?r
y
| = \
x
~y\ + % ) ,
and interchanging the roles of x and y yields
\d(x)-d(y)\<\x-y\.
We now assume that dQ is of class C
2
. Let XQ G dfi. Let n(xo) be the
outer normal vector of Q at #o, and let Ti
0
be the tangent plane of OS),
at xo- We rotate the coordinates of R
d
so that the x
d
coordinate axis
is pointing in the direction of - n( xo) . In some neighbourhood U(x
0
) of
#o, dil can then be represented as
x
d
= f(x') (B.l)
with x' = (x\... ,x
d
~
1
), where / C
2
(T
Xo
n tf(s
0
)), - D/ K) = -
T h e
Hessian D
2
f(xo) is symmetric, and therefore, after a further rotation of
coordinates, it becomes diagonalized,
/ K I 0
D
2
f(x
0
)=\ . | . (B.2)
\ 0 K
d
- 1 .
262
The distance function from smooth hypersurfaces 263
KI, . . . , Kd-\ are the eigenvalues of D
2
f(xo), and they do not depend on
the special position of our coordinates. They are invariants of dQ, and
are called the principal curvatures of dQ, at XQ. The mean curvature of
dQ, at #o is
H(xo) = ^ - j X > = ^-j A/ ( x
0
) . (B.3)
The outer normal vector n(x) at x S 9f2 f~l U(xo) has components
nUx) =
dx
'
JK
' , , t = l , . . . , d - l (B.4)
(l + l D/ ^ ) ! ) '
n
d
(x) =
r
(B.5)
(l + \Df(x'W
(#' = (or
1
,..., x
d _ 1
) ) . In particular
7j-jn*(xo) = Mi j for t , j = 1, . . . ,d - 1. (B.6)
Lemma B. l . Suppose Q is open in R
d
and that dQ is bounded and of
class C
k
with k > 2. For rj e R, put
Zr, := {x R
d
: d(x) = r/}.
TTiere e:riste eo > 0 (depending on dQ) with the property that for
M < co,
Hrj is a hypersurface of class C
k
. Also,
l i mJ E . I H ^ L - i - (B.7)
Proof. Since dfl is compact and of class C
2
, there exists e > 0 with the
following property: Whenever |r/| < e for each XQ dQ,, there exist two
unique open balls B
u
B
2
with B
x
C ft, JB
2
C R
d
\ Q,
B
1
ndn = xo = B
2
n on
of radius \rj\. The eigenvalues of the Hessian D
2
f(xo) of a normalized
representation / of dQ at #o as above then have to lie between - and
I i-e.
M < i (B.8)
for the principal curvatures i , . . . , Kd-i- If x is a centre of such a ball,
264 Appendix B
then XQ = 7r
x
. Also, by uniqueness, these balls depend continuously on
XQ <9fi. Thus, if \r/\ < e, each x E^ is the centre of such a ball, and
n
x
= x -f n(x)d(x) with n(x) := n(7r
x
) (B.9)
is the unique point in dSl with \x 7r
x
| = |rf(x)|. We once again employ
the coordinates used for the definition of / and rewrite (B.9) as
x = F(x', d) = (x', f(x')) - n(x', f(x'))d. (B.10)
Then F C
f c_1
(((r
xo
n U(x
0
)) x R) , R
d
) and at the point (x'
0
,d(x))
(1 - Kid(x) 0 \
DF =
V
0
By (B.8) and since \r/\ < e,
1 - Kd~\d(x)
det DF ^ 0.
by (6) . (B. l l )
1 /
By the inverse function theorem, x' and d therefore locally are C
k l
-
functions of x (cf. (B.9)). Since
d{x) = d(x
0
- ^n(xo)) = rj,
we have
Dd(x) - n(x
0
) = - 1 .
Since d is Lipschitz with Lipschitz constant 1, we conclude
\Dd{x)\ = 1
and
Dd(x) = -n(x
0
) C
h
~
l
.
Thus d E C
k
locally, and the level hypersurfaces T,
v
are of class C
k
. For
(B.7), we may w.l.o.g. take rj > 0 as the case rj < 0 succumbs to the
same reasoning. We consider the vector field
V(x) = Dd(x).
The Gauss theorem yields
/ d i v V = / V(x)n(x)d{Z
0
}(x)+ f F ( x K( ) d { j ( z ) ,
J{0<d(x)<r,} JEQ ^
The distance function from smooth hypersurfaces 265
where n^ is the normal vector of E^ pointing in the direction opposite
to n. Since the measure of {0 < d(x) < 77} goes to zero with 77 and
V(x) = -n(x) for x e E
0
= dQ
V(x) = n
v
(x) for x ET,
V1
(B.7) easily follows.
g.e.d.
References
D. Gilbarg, N. Trudinger, Elliptic Partial Differential Equations, Springer,
Berlin, 2nd edition, 1983, pp. 354-6.
8
Bifurcation theory
8.1 Bifurcation problems in the calculus of variations
We wish to consider a variational problem depending on a parameter A,
and to investigate how the space of solutions depends on this parameter.
We thus consider
I(u,\):= / F(t,u(t),u(t),\)dt.
J a
A is supposed to vary in some open set A c S
1
. Often, one has 1 = 1.
We assume that
F : [a, b] x R
d
x R
d
x A -+ R
is sufficiently often differentiable so that all derivatives taken in the
sequel exist. For that purpose, one may simply assume that F is of class
C in all its arguments although that is a little stronger than needed
in the sequel.
Remark 8.1.1. One may also impose boundary conditions depending on
A, i.e.
u(a) = wi (A)
u(b) = w
2
(A),
and finally, one may vary the boundary points themselves,
a = a(A)
6 = 6(A).
This latter variation, however, can formally be incorporated in the vari-
ation of F, by transforming the integral.
266
8.1 Bifurcation problems in the calculus of variations 267
Let
r(.,A):[a(Ao),6(Ao)]-[a(A),6(A)J
be a bijective linear map, for some fixed Ao- Then
,6( A)
/ F(T,u(T),il(T))dT
Ja(X)
I
Ja
F( T, ( T) , ( T) ) 2I ^dt ,
a( A
0
) <
and putting
v(t):=u(T(t,X))
F(tMt)Mt)A):=F^r(t,X),v(t)Mt) ( ^ r
1
) " ) ^ ^
r
fr( A
0
)
left
rO(A
0
)
J(t;, A):= / F(t, v, t>(t), A)i
/a( A
0
)
yields a parameter-dependent variational integral for v with fixed bound-
ary points a(Ao),6(Ao).
As established in Theorem 1.1.1 of Part I, a critical point u of /(, A) of
class C
2
satisfies the Euler-Lagrange equations
F
pp
(t, u(t), u(t), X)u(t) + F
pu
(t, u(t), u(t), X)ii(t) (8.1.1)
+F
pt
{t,u(t),u(t), A) - F
u
(t,u(t), ii(t), A) = 0.
We abbreviate (8.1.1) as
L
A
u = 0. (8.1.2)
In the light of Theorems 1.2.2 and 1.2.4 and Lemma 1.3.1 of Part I, we
shall assume
det F
pp
(t, u(t), u(t), A) ^ 0 (8.1.3)
for all functions u occurring in the sequel. Equation (8.1.3) implies that
(8.1.1) can be solved for u in terms of u and w, i.e.
u = -F
pp
(t, u(t), u(t), A)"
1
{F
pu
(t, u(t), ii(t), \)u(t)
+F
pt
(t, u(t), ii(t), A) - F
u
(t, u(t), u(t), A)} (8.1.4)
-f(t,u(t),u(t),\),
268 Bifurcation theory
see (1.2.10) of Part I. (8.1.2) thus is equivalent to
u(t) ~ / ( t , u(t), ii(t), A) = 0. (8.1.5)
The topic of bifurcation theory then is to study the space of solutions of
(8.1.5) in its dependence on the parameter A. Before approaching this
problem from a general point of view in the next section, we should
briefly comment on the relations with the Jacobi theory introduced in
Section 1.3 of Part I. For a critical point u of /(, A) and 77 G D\{I,lk
d
),
we had established the expansion
I(u + sr}, A) = I(u, A) + ^-s
2
6
2
I{u, 17, A) + o(s) for s -+ 0, (8.1.6)
with
d
2
f
b
6
2
I(u, rj, A) = Q
x
(v) := ^ J F(t, u(t) + srj{t), u{t) + *i)(*), \)dt^
0
= / {Fpipi (*>
u
, u, X)ViTjj 4- 2F
pluJ
(t, u, it, X)riirjj (8.1.7)
J a
+ F^ui (*, u, u, X)rjiTij } dt,
abbreviated as
rb
{FxmWI + 2F\,
P
ufjV + F\,uuWn} dt.
I
Ja
Critical points of Q satisfy the Jacobi equations
J
x
(u)rj:= ~(F
pp
(t,u,u, \)r) + F
pu
(t,u,u, \)r)) (8.1.8)
-F
pu
(t, u, ii, X)rj - F
uu
(t, u, w, X)rj = 0.
J\{u) is called the Jacobi operator associated with the critical point u
of /(, A). We also observe that
J\(u)v = J~
S
L
^
U
+ *V)\8=o. (8.1.9)
Of course, this is not surprising since L\ represents the first variation of
/(, A) and J\ the second one. Prom the expansion (8.1.6) we see that
I(u + s<n, A) < I(u, A) if 6
2
I(u, 17, A) < 0. (8.1.10)
No such conclusions can be achieved, however, if
6
2
I(u,rj,\)=0. (8.1.11)
8.1 Bifurcation problems in the calculus of variations 269
Now by Lemma 1.3.2 of Part I, for a Jacobi field f] that vanishes at
the boundary points a and 6, (8.1.11) holds. This indicates that Jacobi
fields play a decisive role for deciding about the minimizing property of
a critical point u of /(-, A). Jacobi fields satisfy
J\(U)TI = Q, (8.1.12)
i.e. are solutions of the linearization of the equation L\U = 0 satisfied
by u. This also indicates that Jacobi fields will play a decisive role in
analysing the bifurcation behaviour of L\u 0 as A varies. Namely,
in finite dimensional problems, the presence of a nontrivial solution of
the linearization of a parameter-dependent equation L\u = 0 at some
parameter value Ao either results from a nontrivial family u(r) of solu-
tions of L\
0
U(T) 0 by differentiating the equation w.r.t the parameter
r, or it indicates a nontrivial bifurcation as A varies in the vicinity of
Ao- In the next section, we shall see that under appropriate assump-
tions, the same also holds in the present infinite dimensional context.
In fact, the bifurcation problem will be reduced to a finite dimensional
one via Lyapunov-Schmid reduction. The reason why this is possible in
our variational context is that under our assumption (8.1.3), the space of
Jacobi fields is always finite dimensional. Namely, analogously to (8.1.4),
(8.1.5), the assumption (8.1.3) implies that (8.1.8) can be solved w.r.t
77, i.e
fj - (p(t, u, ii, 77,77, A) = 0. (8.1.13)
(Although this is not indicated by the notation, (8.1.13) is a linear equa-
tion for 77, and so the space of solutions is a linear space.)
Now suppose that we have a sequence (?7n)nN of solutions of (8.1.13)
(for fixed u, A) that are bounded in some appropriate function space
like C
2
(I) or W
2,2
(I). For concreteness, let us consider C
2
(7), i.e. for
example
\ \ r)n\ \
C
2(I) < 1-
By the Arzela-Ascoli theorem, after selection of a subsequence, (f]
n
)neN
then converges in C
l
(I) to some limit denoted by 770. (8.1.13) then im-
plies that (rj)nen converges in C(I) (as it follows from our assumptions
on the differentiability of F that </? is smooth, in particular continu-
ous). Thus (since the uniform limit of derivatives is the derivative of the
limit), (77
n
)
ne
N converges in C
2
{I) to 770, and consequently 770 also solves
(8.1.13). From this compactness result, one easily deduces that the space
of solutions of (8.1.13) has finite dimension.
270 Bifurcation theory
8.2 The functional analytic approach to bifurcation theory
We consider the following general situation. We have Banach spaces
V, W, and a parameter space A. We assume that A is an open subset
of some Banach space. We consider a parameter dependent family of
equations
L
x
u = 0, (8.2.1)
with
VxA->W
(w, A) H-* L\u.
We assume that L\u is sufficiently often differentiate w.r.t. to u and
A so that all subsequent expansions are valid. The aim of bifurcation
theory is to study the set of solutions u of (8.2.1) as A varies, to identify
the bifurcation values of A, i.e. those values of A where the structure
of the solution set changes, and to investigate that structure at such
bifurcation points. In order to arrive at concrete results, we need an
additional assumption. We consider the derivative of L\u w.r.t. it,
J
x
(u)v := (D
u
L
x
(u))v := -^L
x
{u + to)|
t=0
(8.2.2)
for v V. We assume that J\ is a Predholm operator of index 0, i.e. that
ker J\ and coker J\ are of finite and equal dimension, and furthermore
that there exists a canonical isomorphism
ker J
x
= coker J
A
. (8.2.3)
We first consider the case where
L
Xo
u
Q
= 0 (8.2.4)
ker J
XQ
(UQ) = {0} for some Ao G A,u
0
G V. (8.2.5)
We shall see that in this case, no bifurcation can occur at Ao- Namely,
we have:
Theorem 8. 2. 1. Let L\
0
UQ 0 for some Ao A, u$ V, ker J\
0
{v>o) =
{0}. Then there exist neighbourhoods U(\Q) of XQ in A and V(uo) of uo
in V such that for all A E U(\Q), there exists a unique u V(UQ) with
L
x
u = 0.
8.2 The functional analytic approach to bifurcation theory 271
Proof. Since J\
0
is assumed to be a Predholm operator of index 0, (8.2.5)
implies that
Jx
0
V - W
is an isomorphism. Thus the derivative w.r.t. the variable u of the map
(it, A) H-* L\u
is an isomorphism at (ito,Ao), and the implicit function Theorem 2.4.1
implies that the equation
L
x
u = 0
can be locally resolved w.r.t. it, i.e. there exist neighbourhoods
/(Ao), V(uo) and a map
U(X
0
) -> V(u
0
)
A I-+ U(X)
such that
L
x
u = 0
precisely if
u = u(X).
q.e.d.
We next consider the case where
L
Xo
u
0
= 0 (8.2.6)
K : kerJ\
0
(uo) is one-dimensional. (8.2.7)
The assumption that this kernel is one-dimensional may look restrictive,
but it is typically satisfied in variational problems, and in this situation,
we can already see the typical phenomena of bifurcation while avoid-
ing additional technical complications that arise for higher dimensional
kernels. In the sequel, we shall assume for simplicity
u
0
= 0
272 Bifurcation theory
(which can always be achieved by changing the dependent variables in
our equation by a translation). In the sequel, we shall also usually write
J\
0
in place of J\
0
(UQ) = J\
o
(0). We may write
V = KV
U
(8.2.8)
and in view of (8.2.3), we may also write
W = KWi, (8.2.9)
with
W
l
= J
Xo
(V) = J
x
(V
1
). (8.2.10)
We denote by
7T : V - + K
the projection onto K according to (8.2.8), and we consider TT(V) as a
subspace of W, according to (8.2.9). Thus, if
u = + w
wi t h eK, w E Vi, then
n{u) = .
In particular,
?r(0) = 0.
We consider the map
A
Xo
: V -+ W
u > L\
0
u 4- TT(U).
Lemma 8. 2. 1. A\
Q
is a local diffeomorphism, i.e. the derivative
DA\
0
= DA\
0
(0) : V W is an isomorphism.
Proof. The derivative is computed as
DA
Xo
v = J
Xo
v + TT(V) for v e V. (8.2.11)
The Predholm operator J\
0
yields a bijective continuous linear map be-
tween V\ and W\ because of the decompositions (8.2.8), (8.2.9), (8.2.10),
and its inverse is likewise continuous (by Definition 2.3.1). Prom the
definition of K and n and (8.2.3) we then conclude that DA\
0
is an
isomorphism.
q.e.d.
8.2 The functional analytic approach to bifurcation theory 273
We now consider the map
A:V xA->W
(u, A) >-+ A\(u) := L\(u) + TT(U).
By Lemma 2.3.4, there exists a neighbourhood V(Ao) of \
0
in A such
that for all A G V(Ao), A\(0) is a local difFeomorphism. We may therefore
apply the implicit function Theorem 2.4.1. Consequently, as
i4(0,A
0
) = 0, (8.2.12)
there exist neighbourhoods U(0) of no = 0 in V, U\(0) of 0 in W such
that for all A G V(Ao) and G Ui(0), there exists a unique u G t/(0)
with
;4(ii,A) = f, (8.2.13)
i.e.
L
x
u + ir{u) = ^. (8.2.14)
We write
tx = u(^,A)
for the solution u of (8.2.13). We have in particular
ii(0,A
o
) = 0, (8.2.15)
since L\
o
0 = 0, 7r(0) = 0 (remember u
0
= 0). In this notation, (8.2.13)
is
A(u(Z,\ ),\ ))=.
The aim now is to find with
*M, A)) = , (8.2.16)
because (8.2.14) will then give
L
A
u(,A) = 0, (8.2.17)
which is the equation that we wish to solve. Since the image of n is
assumed to be one-dimensional (and in any case finite dimensional as J\
is supposed to be a Fredholm operator), we have reduced our bifurcation
problem to a finite dimensional problem. In the sequel, we shall thus let
vary only in K, the image of n. Thus, we may consider as a scalar
quantity, = ao, with a G R, where o is a generator of K. We denote
274 Bifurcation theory
the derivative of u(a
0
, A) w.r.t. a and A, respectively, at a = 0, A = Ao
by
d
a
u and d\u, respectively.
(Note that A in general is not a scalar quantity, as we do not assume
that A is one-dimensional.) Differentiating (8.2.14) w.r.t. a yields
Jx
o
d
a
u + 7r(d
a
u) = Z
0
. (8.2.18)
Since
0
K, also
./AoCo+*(&) = &. (8-2.19)
Lemma 8.2.1 then implies
d
a
u =
0
. (8.2.20)
We are now ready for the essential point, namely the asymptotic expan-
sion of the equation (8.2.16), i.e.
7r(u(,A)) = (8.2.21)
near = 0, A = Ao.
We let d^u.d^u be the second derivatives of u{a^, A) w.r.t. a and A,
respectively, at a = 0, A = Ao, and likewise d\
x
u be the mixed second
derivative w.r.t. a and A. Higher derivatives will be denoted similarly by
corresponding symbols. The Taylor expansion of (8.2.16) then is
= 7r(w(, A)) = 7r(0) -f- ir(d
a
u)a 4- ir(d\u)n
+ - 7r ( ^u) a
2
+ 7 r ( 9
2
A ? i ) a / i
+ ^ ( ^ ^ ( ^ / i )
-f terms of higher order in cv and /i. (8.2.22)
Since 7r(0) 0 and since, by (8.2.20), d
a
u =
0
, hence 7r(9
a
w)a a
0
=
, we may write (8.2.22) as
0 = 7r(3\w)/i -f 7:^(^u)a
2
4- higher order terms in a only (8.2.23)
f 7r(9^
A
u)a/i -f higher order terms that also involve /i.
Remark 8.2.1. In order to interprets the terms in this expansion, we
differentiate (8.2.14), i.e
L
x
u(, A) + 7r(tx(, \m = ^ o (8.2.24)
twice w.r.t. a. One differentiation yields
J
x
(u)d
a
u + n(0
a
u) =
0
, (8.2.25)
8.2 The functional analytic approach to bifurcation theory 275
and differentiating once more gives
DJ
x
{d
a
u)
2
+ J
x
d
2
a
u + 7r{d
2
a
u) = 0. (8.2.26)
We put A = Ao and project onto K in the decomposition (8.2.9). We
may also denote that projection by 7r, and we then have IT O J\
0
0.
Also, from (8.2.20), d
a
u = o
5
a n
d so we get
ir(DJ
x
e
0
) = -Adlu). (8.2.27)
Thus, the first term in the expansion of Q in (8.2.24) can be expressed via
D J\. In a variational context, J\ represents the second variation, and so
DJ\ represents the third variation of the variational integral. Likewise,
if d^u vanishes, i.e. if the third variation vanishes on the Jacobi field o?
then ir(d^u) can be expressed by the fourth variation, and so on.
We now discuss the simplest case of a bifurcation, namely where
ir{d
2
a
u) ^ 0. (8.2.28)
We put a : t r, /1 =: t
2
/i, a
0
:= ir(d\u)p,, a\ := ~n(d^u), and (8.2.23)
becomes
0 = a
0
t
2
+ ait
2
r
2
+ t
2
E(t, r, ft) (8.2.29)
with
(, r, p) = 0(t) for fixed r, p for t -+ 0. (8.2.30)
For t ^ 0, (8.2.29) is equivalent to
0 = a
0
+ a
x
r
2
+ E(t, r, p). (8.2.31)
We shall now see by a simple application of the implicit function theorem
that the bifurcation behaviour of equation (8.2.31) is equivalent to the
one of
0 = a
0
+ a
1
T
2
. (8.2.32)
We assume ao = 0; as will be discussed below (see Lemma 8.2.2), this
can be derived from a suitable assumption about the variation of LA as a
function of A. (8.2.28) of course means that a\ ^ 0. If ^ > 0, then there
is no solution r of (8.2.32), whereas for ^ < 0, we have two solutions
T15T2. We keep /x fixed for the moment and write (8.2.31) as
0 = a
0
+ ai r
2
-f (, r, p) =: #(, r ) . (8.2.33)
276 Bifurcation theory
We consider the ease ^ < 0 with the solutions Ti,T2 of (8.2.32). As
E(0,t,/2) = 0, we have
$( 0, r
i
) = 0for i = 1,2, (8.2.34)
whereas
$(0, n) ^ 0, because of a
0
, a
x
^ 0 and (8.2.30). (8.2.35)
or
The implicit function theorem then implies the existence of (locally
unique) functions
n(t) : ( - e , e ) - >R
for i = 1,2, 0 < || < , for some e > 0 that satisfy
#( t , r
i
( t ) ) = 0. (8.2.36)
We have thus found two solutions Ti{t),r
2
{t) of (8.2.33), hence (8.2.22),
hence (8.2.16), hence (8.2.17), i.e. (8.2.1) for t ^ 0, for the parameters
A
t
= A
0
+ t
2
ji. (8.2.37)
In the other case, ^ > 0, (8.2.30) implies that for sufficiently small
\t\ ^ 0, there is no solution of (8.2.33), i.e. of (8.2.1). Thus, as promised,
the bifurcation behaviour in case Tr(d^u) ^ 0 ( (8.2.28)) is completely
described by the simple quadratic equation (8.2.32). Of course, replacing
p by p changes the sign of ao and thus interchanges the cases ^ > 0
and ** < 0.
( Xl
We summarize our result in:
Theorem 8.2.2. We consider a parameter dependent family of equa-
tions
L
x
u = 0 (8.2.38)
as above,
V xA-~>W
(u
y
X) >- L
x
u,
where V, W are Banach spaces and A is an open subset of some Banach
space, and L\u is smooth in u and A. We suppose that
L
Xo
0 = 0,
and we wish to find the solutions of L\u = 0 in the vicinity of 0 as A
8.2 The functional analytic approach to bifurcation theory 277
varies in the vicinity of \Q .
With
J
x
(u)v := (D
u
L\(u))v = j
t
L
\{u + *v)|
t=
o,
we assume that there is a canonical isomorphism
ker J
x
= coker J
x
(see (8.2.3)), (8.2.39)
and we let
7r:V->kerJ
Xo
(J
Xo
= J
Ao
(0))
6e a projection as defined above (see (8.2.8)-(8.2.10)). We assume fur-
thermore that
dim ker J
Xo
= 1 (see (8.2.7)). (8.2.40)
We assume that there exists some p, with
a
0
:= 7r(d
A
u)/i (= ^7r(u(0, A
0
-f */x)),
t=0
) ^ 0 (8.2.41)
(see Lemma 8.2.2 below), and also
2ai := 7r(5gti) ^ 0 (8.2.42)
(nonvanishing of the third variation, see Remark 8.2.1). Then there exist
e > 0 and a variation X
t
= Ao -f t
2
fi of \Q with the property that for
0 < t < e, there exists a neighbourhood Ut of 0 in V such that the
number of solutions u G U
t
of
L
Xt
u = 0 (8.2.43)
equals the number of solutions of the quadratic equation
a
0
+ a i r
2
= 0. (8.2.44)
q.e.d.
Remark 8.2.2. Since kerJA
0
, the image of 7r, is assumed to be one-
dimensional, we have simply considered n(d
x
u), 7r(d^u) as scalar quan-
tities.
We now consider the case where
TT(<9
2
U) = 0, (8.2.45)
but
Tr(dlu) ^ 0. (8.2.46)
278 Bifurcation theory
(8.2.23) then becomes
0 = ix{d\u)ji 4- ~n(dlu)a
3
+ rc(dl
x
u)a/i + higher order terms. (8.2.47)
For a complete description of the bifurcation behaviour, this time we
need to consider a two parameter variation. We assume that there exist
/ii, /i
2
with
ir(d
x
u)fii ^ 0 (see Lemma 8.2.2 below) (8.2.48)
and
*(dl,\u)V>2 >
b u t
Ad\u)^2 = 0. (8.2.49)
We put a := tr, /x = t
3
bifii -\-t
2
b
2
f.i
2
, with parameters bi,b
2
, and rewrite
(8.2.47) as
0 = ^(nidxujfjubx + 7r(<9^
A
u)/i
2
6
2
r (8.2.50)
+ g7r(^tx)r
3
H-E(t,r,/xi,/x
2
))
=: c
0
t
3
(a
0
+ a
x
r -f r
3
-f E(, r, m, /i
2
)),
with Co = ^n(da
u
) " 0 (
s e e
(8.2.46)). For t ^ 0, this is equivalent to
0 = a
0
+ ai r + r
3
+ E(,r, /ii, /i
2
). (8.2.51)
Again
(, r, /ii,/i
2
) = 0() as - 0, for fixed r, / i i , / i
2
. (8.2.52)
As before, we may thus invoke the implicit function theorem to conclude
that the qualitative description of the bifurcation behaviour is furnished
by the solution structure of the cubic equation
0 = a
0
+ CUT + T
3
. (8.2.53)
In particular, locally there exist at most three solutions. We summarize
our result in:
Theorem 8.2.3. As in Theorem 8.2.2, assume the general conditions
(8.2.38)-(8.2 40). Furthermore, we assume that there exist parameter
variations /ii, /x
2
with
Adxu)!*! ^ 0, (8.2.54)
*{dl
iX
u)ii2 0, but n(d
x
u)fi
2
= 0 (see (8.248), (8.249)). (8.2.55)
8.2 The functional analytic approach to bifurcation theory 279
Then there exist e > 0 and a two-parameter variation of \o,
\
t
= A
0
+ t
3
6i/xi +
2
6
2
/i
2
, (8.2.56)
such that for 0 < t < e, there exists a neighbourhood Ut of 0 in V for
which the number of solutions u G Ut of
L
Xt
u = 0 (8.2.57)
equals the number of solutions of the cubic equation
a
0
+ a i r - f r
3
= 0. (8.2.58)
(ao = 6ir(d\u)biHi/n(daU),ai = 6ir(d^
A
n)6
2
/i2/
/
7r(9
3
n), noting Remark
8.2.2 again.) q.e.d.
What we are seeing in Theorem 8.2.3 is the so-called cusp catastrophe
(in the language of R. Thorn's theory of catastrophes), the bifurcation of
the zero set of a cubic polynomial depending on the parameters a
0
, a\. In
the same manner, one may also identify conditions where the bifurcation
behaviour is described by other so-called elementary catastrophes, as
classified by R. Thorn (see e.g. Th. Brocker, Differentiable Germs and
Catastrophes, LMS Lect. Notes 17, Cambridge Univ. Press, Cambridge,
1975). The higher the order of the polynomial involved, however, the
more independent parameters one needs. The general idea is that the
singular behaviour at a bifurcation point, in particular the nonsmooth
structure of the solution set at such a point, is simply the result of the
projection of a smooth hypersurface in the product of the solution space
and the parameter space onto the solution space. The singularity arises
because that hypersurface happens to have a vertical tangent plane over
the solution space at the bifurcation point.
In order to discuss the assumption (8.2.41), (8.2.54), we provide
Lemma 8.2.2. Assume that for every / 3 G1 , there exists some \i with
(3 = 7r(D
A
L
Ao
n(0, A
0
) (*x) (:= 7r(^L
Ao+t M
u(0, A
0
))|
t =0
). (8-2-59)
(Again, we write (3 in place of /3^
0
an
d consider it as a scalar quantity,
as the image of n is assumed to be one-dimensional.) Then for every
j3 R, there exists some /i with
ir{(d
x
u)fj) = 0. (8.2.60)
Proof We abbreviate
\
t
= AQ -f t/i( as A is open, AQ -f tfi A for sufficiently small \t\).
280 Bifurcation theory
By (8.2.14)
LA.UK, A
t
) +* ( ( , A
t
) ) = .
Taking the derivative w.r.t. t at t = 0 and = 0 gives
n((d\u)ij) = - (L
At
u(0, A
t
))|
t =0
= - ( ^^A
t
MO, Ao) |
t
=o
~(D
u
L
Xo
) w(0,A
t
)|
t =0
.
Since D
U
L\
0
= J^
0
, and 7r O J
AO
= 0 by definition of 7r, applying 7r to
both sides of the preceding equation gives
7r((d
x
u)ii) = -7r(>
A
L
Ao
)u(0, A
0
),
and by assumption (8.2.59), we may find \i for which the right-hand side
becomes /?
0
- (We take -(3 in place of (3 in (8.2.59).)
q.e.d.
The approach to bifurcation theory presented here originated with
L. Lichtenstein, Untersuchung (iber zweidimensionale regulare Vari-
ationsprobleme, Monatsh. Math. Phys. 28 (1917), and was developed in
X. Li-Jost, Eindeutigkeit und Verzweigung von Minimalflachen, Thesis,
Bonn, 1991, see also X. Li-Jost, Bifurcation near solutions of variational
problems with degenerate second variation, Manuscr. Math. 86 (1995),
1-14, J. Jost, X. Li-Jost, X. W. Peng, Bifurcation of minimal surfaces
in Riemannian manifolds, Trans. A MS 347 (1995), 51-62, Correction
ibid. 349 (1997), 4689-90.
The reduction of a bifurcation problem in an infinite dimensional set-
ting to a finite dimensional one is an example of the Lyapunov-Schmid
reduction which we now wish to discuss.
As before, we consider a parameter dependent family of equations
L
x
u = 0 (8.2.61)
with
V xA-*W
(n, A) i- L\u.
(V, W Banach spaces, A an open subset of some Banach space) near
(u
0
, A
0
) with
L
Xo
u
0
= 0. (8.2.62)
8.2 The functional analytic approach to bifurcation theory 281
Again, we assume that J\(u) = D
u
L\{u) is a Fredholm operator. Thus
V
0
:=kevJ
Xo
(u
0
)
is finite dimensional, and we have decompositions
V = V
0
0 Vi (8.2.63)
W = W
0
Wu with Wi = R(J\
0
(u
0
)), W
0
finite dimensional.
(8.2.64)
(R denotes the range of an operator as in Definition 2.3.1.) We let
7T : W - W
0
be the projection defined by the decompostion (8.2.64). Then our equa-
tion L\u = 0 is equivalent to
TTL
X
U = 0 and (Id - TT)L
X
U = 0. (8.2.65)
We first consider
(Id-ir) :VixV
0
xA^W
u
and with X := VQ X A, we write
L
x
(v" + v') = </(t/", t/', A) with t/ G Vb, v" V
u
A G A.
Then, at t/' + v' = uo,
D
v
g(v'\ v\ A
0
) = D
V
L
XQ
(V" +i / ) : Vi -+ W^
is an isomorphism by definition of Vi, V^i; namely it is simply JA
0
( ^O) ,
considered as a map from V\ to W\. Therefore, by the implicit function
Theorem 2.4.1, near (no, Ao), we may find a unique
v" = <p(v',\)
with
(Id - ir)L
x
(v' + <p(v\ A)) - 0. (8.2.66)
Thus u = v' + <p(v
f
, A) solves L\u = 0 if and only if
nL\(v' -f <p(v', A)) = 0. (8.2.67)
Equation (8.2.67) is a finite dimensional system of equations, because
the image of TT, Wo, is finite dimensional. This is a Lyapunov-Schmid
reduction, and we have seen an instance of this in detail in the preceding
for the case where Vo and W
0
are one-dimensional. A general reference
for this and other topics and methods in bifurcation theory is S. N. Chow,
J. Hale, Methods of Bifurcation Theory, Springer, New York, 1982.
282 Bifurcation theory
8.3 The existence of catenoids as an example of a bifurcation
process
We consider the variational problem
I(u)= f F(t,u(t),ii(t))dt (8.3.1)
J a
with
F(t, u, it) = u \ / l + u
2
. (8.3.2)
This variational problem is of the type considered in Section 1.1 of Part I.
I(u) with F given by (8.3.2) is the area of the surface of revolution ob-
tained by rotating the curve u(t), a < t < 6, about the -axis. Crit-
ical points are so-called minimal surfaces of revolution. According to
Theorem 1.1.1 of Part I, the corresponding Euler-Lagrange equation is
computed as
^F
p
(tMt)Mt))-Fu(tMt)Mt))
F
pp
(*, u(t), u(t)^ju(t) + F
pu
(t, u(t), u(t)^jii(t)
+ F
pt
(t, u(t), (*)) - F
u
(t, u(t), ii(t))
= 0
which in the present case becomes
uu
+
= - \ / l + u
2
= 0, (8.3.3)
or equivalently
nix - -u
2
- 1 = 0. (8.3.4)
By (1.1.7) of Part I, we have F uF
p
= constant, since F
t
= 0, hence
in our case F = u\/\ -f u
2
,
u
constant =: A.
vTTV
i
2
Therefore, for A ^ 0, the general solution of (8.3.4) is
't-t
0
y
u(t) = A cosh
l
A
8.3 Example: bifurcation of catenoids 283
with parameters A, to- Here A ^ 0, and we may assume A > 0 as the case
A < 0 is symmetric to the case A > 0. Also, since to just represents a
translation of the independent variables, we may assume to = 0, i.e.
n(t) = Ac os h( ^ J . (8.3.5)
The curve u(t) is called a catenary, and the minimal surface of revolution
obtained by revolving u(t) about the t-axis is called a catenoid. For the
sake of normalization, we consider the interval I = [1,1]. In order to
use the general theory of Section 8.2, we need to choose appropriate
Banach spaces V, W and A = E and consider the operator
L
x
: V xA-^W (8.3.6)
( n , A ) f _ >
( ^ ( ^ = ^ ) - V
/
T T ^ , n ( l ) - A c o s h i , 7 x ( - l )
A cosh j ) .
On the right hand side, we have a differential operator of second order
and a Dirichlet boundary condition. The boundary values are real num-
bers, and so W should contain E
2
as a factor as we have two boundary
points. Otherwise, V and W shouM differ by two orders of differentia-
bility. Thus, possible choices are Sobolev spaces
y
=
w
k+2
*{I), W = W
k
>
p
(I) x E
2
for some k,p
or spaces of differentiate functions
V = C
f c+2
(/), W - C
k
(I) x E
2
.
Here, we shall take
V = 1F
2
'
2
(/), W = L
2
(I) x E
2
, (8.3.7)
but the reader should also convince herself or himself that the other
choices work as well, although the space I? will always play some aux-
iliary role.
In the sequel, we shall denote the scalar product in L
2
(I) by (, -)L
2
5
i.e.
/
(WI,W
2
)L* = / w
x
(t)w
2
(t)dt
284 Bifurcation theory
for wi,w
2
G L
2
(I). The scalar product on W = L
2
(I) x E
2
for w
x
=
(wi,si), w
2
= (w
2
,s
2
) (w
u
w
2
G L
2
( / ) , 5i , 5
2
e l
2
) ,
(wi,w
2
)
w
= (m,w
2
)
L
2 + si - s
2
is obtained from the scalar products on L
2
(I) and on E
2
.
The Jacobi operator is given by
J\{u)v = D
u
L\(u)v
by (8.3.5). In order to determine the kernel of JA(W), we need to solve
the equation
j
x
(
u
)v = 0. (8.3.9)
This is equivalent to
v{t) - I tanh {v(t) + ^v(t) = 0 (8.3.10)
v( - l ) = i;(l) = 0 . (8.3.11)
The space of solutions of (8.3.10) is generated by
vi(t) = sinh j
v
2
(t) = cosh j - j sinh j .
(These solutions are simply obtained by differentiating the general solu-
tion A cosh (^x*
1
) of (8.3.4) w.r.t. the parameters A and t
0
(at t
0
= 0),
cf. Theorem 1.3.3 of Part I.) The boundary condition (8.3.11) cannot be
satisfied by v\, and so we have to find out for which values of A
v
(t) := y
2
(t) = cosh { - { sinh { (8.3.12)
satisfies v(l) = v(1)=0. This is the case precisely if
A = tanh A. (8.3.13)
We agreed above to consider only positive values of A, and this equation
has precisely one positive solution which we denoted by A
0
, and likewise,
we put
uo(t) = A
0
cosh ( Y) '
cf. (8.3.5).
8.3 Example: bifurcation of catenoids
285
The only solutions of (8.3.10), (8.3.11) are av(t) with a E and v(t)
given in (8.3.12), and so we have
dimker J
Ao
(u
0
) = 1. (8.3.14)
We call v a weak solution of the Jacobi equation if
L
v
{dFj
m
)
dt+
lhdn
vitHt)dt
-
(8
-
3
-
15)
for all 17 GCg(
J
)-
In the sequel, we shall need a little regularity result, namely that any
solution v of (8.3.15) of class L
2
(I) is automatically smooth, in fact of
class C(I). As we are dealing with a one-dimensional problem here,
this result is not too hard to demonstrate, but since that would lead us
too far astray, we omit the proof. It can be found in most good books on
differential equations or functional analysis, e.g. K. Yosida, Functional
Analysis, Springer-Verlag, Berlin, 5th edition, 1978, pp. 177-82.
Of course, if v is of class C
2
, (8.3.15) is equivalent to
for all 77 e C n ' ) ,
and by Lemma 1.1.1 of Part I, this is equivalent to v being a solution of
the Jacobi equation.
We shall now identify ker J\
0
(uo) and coker J\
0
(uo) as required in
(8.2.3). We shall simply write J\
Q
in place of J\
0
(uo). According to our
choice (8.3.7), we consider J\
Q
as an operator
J
Xo
: W
2
'
2
(I) -> L
2
(I) x R
2
.
If w Ro(Jxo)
:=
R(J\
0
\H
2
>
2
(I))I
l
-
e
- tf there exists v G H
0
' (I) with
J\
0
v = w, then for all cp ker J\
0
(w,<p)
w
= (JXOVI^L* = (v,J\
0
cp)
L
2
= 0
(in the same manner as the equivalence of (8.3.15) and (8.3.16) and
noting that cp is smooth and v and cp both vanish on dl.) Thus if
w R
0
(J\
0
), then also w e (ker JAQ)"
1
* where -
1
denotes the orthog-
onal complement in the Hilbert space L
2
(I), as in Corollary 2.2.4. Con-
sequently, if we denote the closure of a linear subspace M of L
2
(I) x E
2
286 Bifurcation theory
by M, then also
Ro(J\o) C (kerJA,)-
1
.
Conversely, if w i?o(^A
0
)
X
(
=r:
(Ro(J\o))

), then
(iu, J
Ao
^)vy = 0 for all v e H
2
'
2
(I).
By the regularity result mentioned above, this implies that w is smooth,
and so we may integrate by parts to get
(w, J\
0
v)
w
= (J
Xo
w, v)
w
for all v e H^
2
(I)
and hence w ker J\
0
. Altogether, we have shown that
ker J
Ao
= RoiJxv)
1
.
Since, according to Corollary 2.2.4, we have the decomposition
L
2
(I) = R
0
(JX
0
)
L2
(BRoiJxo)-
1
,
we may also consider Ro(J\
0
)

as coker J
Ao
, and so we get the required
identification
kerJ
Ao
* coker J
Ao
. (8.3.17)
We note that this depends on the fact that J
Ao
is formally self-adjoint
in the sense that
(v,J
Xo
w) = (J
Xo
v,w) (8.3.18)
if e.g. v,weH
2
>
2
(I).
Remark 8.3.1. The situation here is slightly different from the one in
Section 8.2 inasmuch as we identify coker J
A()
here with RQ(J\
Q
)
L
and
not with R{J\
Q
)
L
. Therefore, in the present situation, if IT denotes the
orthogonal projection onto ker J
Ao
= coker J
Ao
, we have
ir(J
Xo
v) = 0
only for v H
2,2
(I), but not for all v H
2
'
2
(I). This is for example
relevant for the argument of the proof of Lemma 8.2.2.
Regularity theory also implies that R(J\
0
) is closed. Namely, if for
K)nN C (ker Jxo)-
1
C W
2
'
2
(I), we have
J\oVn
= :
Jni
and f
n
converges to /
0
in L
2
{I) x M
2
, then ||v
n
||w
2
'
2
(/) is bounded.
8.3 Example: bifurcation of catenoids 287
By Rellich's Theorem 3.4.1, after selection of a subsequence, v
n
then
converges in W
l,2
(I). Prom the equation, i.e.
1 .. d ( 1 \ 1 1
cosh
2
{
Vn +
dt ^cosh
2
{ )
Vn +
A^ cosh
2
{
Vn
~
/ n
'
we then see that U
n
also converges in L
2
{I). Thus, v
n
converges in
W
2,2
(I), and the limit VQ then satisfies
<A
0
^0 = / o.
Thus, the image of JA
0
is closed. Thus, J\
Q
is a Fredholm operator of
index 0.
Our aim is now to check that the assumptions of Theorem 8.2.2 hold.
In order to verify (8.2.42), i.e. Tr(d^u) ^ 0, according to Remark 8.2.1,
we need to compute dJ\
0
, i.e. the second derivative of L\
Q
. Starting from
(8.3.3) and inserting (8.3.6), i.e. no = Aocosht/Ao, we obtain
J T
d ( 2 . 3At a nh| .
2
\ 1 .
2
By (8.2.27), we have to project this onto the kernel of J\
Q
{uo) and check
that the result is nonzero, for our Jacobi field v given in (8.3.12), i.e.
v = cosht/A t/Asinht/A. Since here the projection IT is given by the
orthogonal projection in the Hilbert space L
2
(I) x E
2
onto ker J\
0
(uo),
which is generated by the Jacobi field i>, we simply have to verify that
the L
2
~ product of dJ\
Q
(uo)(v,v) with v is nonzero. Thus, we compute
f { d ( 2 3At a nh|
9
\
{dJ(*o){v,v),v
)La
= J^ [^rrvmt) - ^ Ht f j
l
irT
v{t)
2
\v{t)dt
cosh
A
/
3At a nh| 3 ., ,
9
. , | ,
M* f ~ Z ^ T^ M* )
dt
i i cosh j cosh' j
by an integration by parts
J 3v(t)
2
T
( At a nh| v(t) - v(t)) dt.
11 cosh*
3
f
Now with v = cosh j j sinh j , we have
A tanh | v(t) - v{t) = ~ cosh j ,
288 Bifurcation theory
and so
(dJ
Xo
(u
0
)(v,v),v)
L2
= "
3
/ ^ T < -
Thus, indeed
ir(dlu) > 0. (8.3.19)
We finally consider (8.2.41). Thus, we have to verify that
ir(d\u) ^ 0, with d\u = -T-U(0, Xt)\t=o for a suitable family
dt
X
t
of parameters. (8.3.20)
We start with (8.2.14), i.e. in the notations of Section 8.2
L
Xt
u(tX
t
)+ir(u(^Xt)) = t (8.3.21)
In the present case L\ is given by (8.3.6), and IT is the orthogonal
projection in L
2
(I) onto ker JA
0
, the one-dimensional space generated
by v(t) = cosh t/Xo - t/Xo sinh t/A
0
(see (8.3.12)), where A
0
is so chosen
that v(l) = v(l) = 0. Thus, this v can be taken as the
0
of Section 8.2.
However, since
( Acoshi )
| A==Ao
=0
by choice of A
0
(see (8.3.13)), we shall need to employ a variation of the
parameter somewhat different from the family At = A -f tfi employed in
Section 8.2. Here, we put
i/
0
:= A
0
cosh^-
and choose the family X
t
such that
A
t
cosh ^ = i/
0
4- t\i. (8.3.22)
We then differentiate (8.3.21) w.r.t. t at t = 0, = 0 to obtain
J\
0
(uo)d
x
u + ^ (d
x
u,v)
L
2
V
= 0 (8.3.23)
\ \
v
\ \ L
2
(I)
d\u(l) = dxu(-l) = //.
Then
(J\
0
(u
0
)d
x
u,v)
L
2 = / |
c
^
h 2
^ (dxu(t))* j
+ ^co2
2

dxU{t))v(t)dt
Exercises 289
Ao
integrating by parts and using JA
0
(
W
O) ^ = 0 v(l) =
0 = v ( - l ) .
2
= = - / Li
Ao cosh ~
< 0 for fi < 0.
Equation (8.3.23) then implies
7T{d\u) = -r-rrx ( 9 * ^ , V ) ^ 7^ 0,
\\
v
\\ L
2
( / )
i.e. (8.3.20).
We thus have verified all the assumptions of Theorem 8.2.2 (for the
family X
t
defined by (8.3.22) in place of the family A
t
= A
0
+ tfi).
Theorem 8.2.2 thus describes the bifurcation behaviour of the solu-
tions of (8.3.3) or (8.3.4), i.e. the critical points of (8.3.1), (8.3.2) near
Uo(t) = A
0
cosh^-: For boundary values u(l) = u ( - l ) < Aocosh^-,
there is no solution (at least in the vicinity of no), whereas for u(l) =
u(l) > Aocosh^-, we may find two solutions. Of course, this may
also be verified directly without going through all the abstract machin-
ery of Section 8.2, but hopefully this example can serve to illustrate
the general scheme. The catenoids are frequently discussed in books on
the calculus of variations, e.g. O. Bolza, Vorlesungen ilber Variation-
srechnung, Teubner, Leipzig, Berlin, 1909, or M. Giaquinta, St. Hilde-
brandt, Calculus of Variations, Springer, Berlin, 1996, I, p. 366 and
II, pp. 263-70. A discussion in terms of bifurcation theory also in the
case of not necessarily symmetric boundary conditions (i.e. not requir-
ing u(l) = u(1)) is given by H. Wenk, Extremverhalten der Stabilitat
von Catenoiden als Rotationsminimalflache, Diplom thesis, Bochum,
1994.
Exerci ses
8.1 How many parameters are needed for a complete description of
the bifurcation behaviour of the roots of a fourth-order polyno-
mial?
Bifurcation theory
Consider the problem of finding critical points
I(u) = fu{t)^/l + u(t)
2
dt
U(K) = U(K) = 1
for a parameter n > 0. Determine the value no for which a
bifurcation occurs.
(Hint: This problem can be reduced to the one considered in
Section 8.3.)
Consider geodesies on S
2
as in Chapter 2 of Part I. More pre-
cisely, we take two points p,q S
2
with distance rf(p, q) = A,
and consider geodesic arcs between p and q of length A, i.e.
length minimizing arcs. What happens at A = 7r? Does this fit
into the framework described in Section 8.2?
9
The Palais-Smale condition and unstable
critical points of variational problems
9.1 The Pal ai s-Smal e condition
In this chapter, we take up a direction that has already been presented in
Chapter 3 of Part I, namely the search for nonminimizing critical points
of variational problems. This chapter will consequently be independent
of Chapters 4-8 of the present Part II. In Section 3.1 of Part I, we pre-
sented existence results for unstable critical points of functionals F of
class C
1
on some finite dimensional Euclidean space R
d
. We only needed
a coercivity condition on the functional guaranteeing that a critical se-
quence (x
n
)
n
N (i.e. satisfying DF(x
n
) 0, | F( x
n
) | bounded) stayed
in a bounded set. The local compactness of R
d
then allowed the extrac-
tion of a convergent subsequence whose limit XQ satisfied DF(xo) = 0,
because of the continuity of DF. In Sections 2.3 and 3.2 of Part I, we
also presented examples where variational problems could be reduced to
such finite dimensional problems. The domain was a little more compli-
cated than E
d
, but being finite dimensional, it was still locally compact
so that we had no difficulties finding limits of subsequences for critical
sequences. In the remainder of this book, however, we have had am-
ple opportunity to realize that variational problems are typically and
naturally posed on some infinite-dimensional Hilbert or Banach space.
Such a space is not locally compact anymore w.r.t. its Hilbert or Banach
space topology, so that the previous strategy encounters a serious prob-
lem. Also weak topologies do not help much as the functionals under
consideration typically are not continuous w.r.t. the weak topology. If
one searches for minimizers, this problem can be overcome by introduc-
ing convexity assumptions as we have seen in Chapters 4 and 8, but any
convexity assumption excludes the existence of critical points other than
minima.
291
292
The Palais-Smale condition
Nevertheless, the lack of compactness of the underlying space must
be compensated by an assumption on the functional that guarantees the
appropriate compactness of critical sequences. In other words we do not
require the compactness of arbitrary bounded sequences on our space
which is impossible as argued but only of critical sequences. This is
the idea of the Pal ai s-Smal e condition which we now formulate:
Definition 9.1.1. Let (V, ||-||) be a Banach space, F : V > E a func-
tional of class C
1
. We say that F satisfies the Palais-Smale condition,
abbreviated as (PS) , if any sequence (x
n
)
ne
n C V satisfying
(i) | F( x
n
) | < c for some constant c
(ii) | | >F(x
n
)| | ->0 forn~+oc
contains a convergent subsequence.
Note that a limit Xo of such a subsequence satisfies DF(xo) = 0 (i.e.
is a critical point of F) because DF is continuous.
A direct consequence of the definition is:
Lemma 9. 1. 1. Suppose F : V > E satisfies (PS). Then for any a G E
K
Q
:={xeV: F(x) = a, DF(x) = 0}
(the set of critical points of F with value a) is compact.
q.e.d.
We also have:
Lemma 9.1.2. Suppose F : V E satisfies (PS). For a E E, we put
U,
P
:= | J {zeV\ | | x- *| | <p} (p>0),
xeK
a
N
a
,
6
:= {y E V | \F(y) - a| < 6 , \\DF(y)\\ < 8} (8 > 0).
Then the families (U
a
,
P
)
P
>o and (N
a
,s)6>o are fundamental systems of
neighbourhoods of K
a
(i.e. each neighbourhood of K
a
contains some U
a
,
P
and some N
ay
s).
Proof If is clear that U
a
,
P
and N
Qy
s are neighbourhoods of K
Q
for p > 0
respectively 8 > 0. It follows from the compactness of K
a
that each
neighbourhood of K
a
contains some U
a
,
P
> Concerning the same prop-
erty of the Naj, let us assume on the contrary that there exist a neigh-
bourhood U of K
a
and a sequence (y
n
)neN with y
n
N
a
x\(Uf)N
a
j.)
9.1 The Palais-Smale condition 293
for all n. (PS) implies that a subsequence of (y
n
)nN converges to some
yo K
a
C U, contradicting the openness of U.
q.e.d.
In our applications below, we shall also encounter the situation where
we want to find critical points of the restriction of some functional F to
the level hypersurface G(x) = (3 of some other functional G. For that
purpose, we shall need a relative version of the Palais-Smale condition
which we shall formulate only for the case of a Hilbert space:
Definition 9.1.2. Let (H,< , >) be a Hilbert space, F,G : H -* E
junctionals of class C
1
, J 8 G 1 . Suppose
DG{x) ^ 0 for all x with G(x) = (3.
We say that F satisfies (PS) relative to G = f3 if every sequence
(x
n
)
n
N C H with G(x
n
) = (3 and satisfying
(i) | F( x
n
) | < c for some constant c
(" )
\\DG(x
n
)\\
2 K
0 for n * oc
contains a convergent subsequence.
A limit #o of such a subsequence then satisfies
G(x
0
) = (3 (9.1.1)
and
_ (DFMDGjxo))
=
| | DG(rr
0
)| |
2 V
i.e. is a critical point of the restriction of F to G(x) = (3. Of course, re-
sults analogous to Lemmas 9.1.1 and 9.1.2 hold in the relative case. One
simply intersects the corresponding sets with {G(x) = (3} and replaces
DF by its projection to that level set.
As in Sections 2.3, 3.1, 3.2 of Part I, in order to find critical points of
a functional, one needs to construct (local) deformations that decrease
the value of the functional except at or at least away from critical points.
We shall now do so in stages of increasing generality.
We start with a functional
F:H ^R
of class C
2
on some Hilbert space (H, (, )) that satisfies (PS). For each
294 The Palais-Smale condition
u H, DF{u) is a linear functional on H, and by Corollary 2.2.3, it can
therefore be identified with an element VF(u) of H, called the gradient
of F at u. Thus, VF(u) satisfies
DF()(VF()) = | | >F(
U
)| |
2
(9.1.3)
\\VF(u)\\ = \\DF(u)\\. (9.1.4)
Since F is assumed to be of class C
2
, DF and hence VF are of class C
1
in their dependence on u. In particular, VF is locally Lipschitz.
We now consider the (negative) gradient flow induced by F:
i/>(u, t) = -VF(i>(u, t)) for t > 0 (9.1.5)
V>(u,0) = u. (9.1.6)
Because of the Lipschitz property, by Theorem 2.4.2 and Corollary 2.4.2,
for small t > 0, there exists a unique solution ^(u, t) satisfying the
semigroup property
^( M + *)=^W>(M),a ) (9.1.7)
for sufficiently small 5, t > 0.
Moreover,
ip(u,t) = it for all u with VF(w) = 0, i.e. for all critical points of F.
(9.1.8)
Finally
F(1>(u, t)) = F(u) + j ^ F M t i , r ) ) dr
= F(u) + J DF(i>(u,t))^(u,T)dT
= F(u) - f \\DF^(u,r))\\
2
dr by (9.1.5), (9.1.3)
< F(u) for t > 0, if DF(ti) = DF{^{u, 0)) ^ 0, (9.1.9)
i.e. if u is not a critical point of F.
Thus, we have found the prototype of a deformation that decreases the
value of F except at its critical points. For technical reasons, however,
the above flow will need some modifications and generalizations.
First of all, a solution of (9.1.5) need not exist for alH > 0 because it
9.1 The Palais-Smale condition 295
may become unbounded in finite ' time' t. This can be easily remedied
by using the Lipschitz function
77 : M
+
- M
+
1 for 0 < s < 1
rl{s) =
<
1
-
f o r 5
> i
s
and putting
V F ( ) : = T / ( | | V F ( U ) | | ) V F ( )
(i.e. VF(u) = VF(u) for | | VF(u)| | < 1 and VF(ti) < 1 for all u) and
replacing (9.1.5) by
j ^ ( M ) = - VF( i H, t ) ) . (9-1.10)
Of course, we still use (9.1.6).
Since VF(w) < 1 for all u, the solution of (9.1.10), (9.1.6) now exists
for all t > 0, and satisfies (9.1.7) for all s,t > 0. Equation (9.1.8) also
still holds, and as in the derivation of (9.1.9), we get
F( V( M) ) = F ( t O- / Tj (l | VFWi i , r))| | )| | DF(^(t i , r))| |
2
dT
Jo
< F(u) for t > 0,
if u is not a critical point of F. More generally, we have
F(ip(u, t)) < F(ip(u, s)) whenever 0 < s < t, for all u.
Next, we wish to localize the construction near a level a. Thus, for given
eo > 0 and a neighbourhood U of K
a
we want to have a flow ip(u, t)
with (9.1.7), (9.1.8) and also
ip(u,t) = u i f | F ( u ) - a | >c
0
, (9.1.11)
and the following more explicit local decrease of the value of F: For
a e R, we put
F
a
:={veH\ F(v) <a}.
We want to find 0 < e < e
0
with
*P(F
a
+

\ U, 1) C F
Q
_
C
(9.1.12)
# , l ) c F
a
_
e
U [ / , (9.1.13)
296 The Palais-Smale condition
and of course also
F{4>{u,t)) < F{4>(u,s)) H0<s<t for all u. (9.1.14)
We let if : E E be Lipschitz continuous with
<p(s) = 0 for \a - s\ > e
0
(p(s) = 1 for \a s\ < -
0 < (f(s) < 1 for all s
and replace (9.1.10) by
J ^ K t) = -<p(FMu, t)))VF(1>(u, t)). (9.1.15)
Again, a solution ip(u,t) exists for all > 0 and satisfies (9.1.7) for all
s, > 0, as well as (9.1.8) and (9.1.14) (for the latter it was necessary to
require (p > 0). (9.1.11) also is clear from the choice of if. We now verify
(9.1.12), (9.1.13). If 0 < e < f and u e F
a +
and if F(i/>(u, 1)) > a - e,
from (9.1.14)
\F{4>{u, t)) -a\<e for all 0 < t < 1,
and therefore
<p(F(i/>(u, t))) = 1 for all 0 < t < 1. (9.1.16)
As before, we may now compute
= F(U) + J jL
F
^(u,T))dT
= F(u)- f ^( F( ^, r ) ) r 7( | | VF( ^, r ) ) | | ) | | I } F( ^, r ) ) | |
2
( i r
Jo
<a + e- f mi n( l , | | DF( ^( w, r ) | |
2
) dr (9.1.17)
Jo
since we assume u G F
a + C
, using (9.1.16) and the properties of r\.
By Lemma 9.1.2, we may find 6, p > 0 with
iV
a
,
6
C *7
a
,p C C/
a
,
2
p C *7 (9.1.18)
(here, we are using (PS)!). Prom the definition of N
a
j, thus
\\DF(ip(u
}
r)\\
2
> 6
2
whenever ip(u,T) g N
Q
j- Without loss of gen-
erality 6 < 1. (9.1.17) then yields
F{${u> 1)) < a + c - (meas {0 < r < 11 i/>(u, r) i 7V
M
}) 6
2
. (9.1.19)
9.1 The Palais-Smale condition 297
From (9.1.18), we have for v G H \ U
dist(t>, N
Qy
s) ( : = inf \\v - w\\ 1 > p.
\ wN
ai6
J
Since
d
m
* P(u,t) < 1, (9.1.20)
therefore, if u $ U, then also ?/>(u, r ) N
a
j for 0 < r < p, and similarly,
if ^(u, 1) ^ /, then also ip(u,T) ^ 7V
a
^ for 1 - p < r < 1. Therefore, if
either u < U or ^(u, 1) ^ /, then
meas {0 < r < 11 ip(u, r) i V^ } > p.
Thus, from (9.1.19), if u U or if ip(u, 1) [/,
F( ^( u, 1)) < a + e - p6
2
< a - e if we choose e < -p<5
2
. (9.1.21)
Thus, for 0 < e < min(| eg, | p6
2
) , we get (9.1.12), (9.1.13).
In conclusion, we have shown the following deformation result:
Theorem 9. 1. 1. Let F : H R be a C
2
functional on a Hilbert space
H, satisfying (PS). Let a G E, and put
F
a
:= {v G H : F(v) < a} ,
K
a
:={veH: F(v) = a,DF(v) = 0} .
Let o > 0 and a neighbourhood U of K
a
be given. Then there exist
0 < e < Co and a continuous family
4>: H x [0, oo) -+ H
with the semigroup property ip(ip(u,s),t) = tp(u,s 4-1) for all s,t > 0,
u G H and with
(i) ip(u, 0) = u for all u G H
(ii) F(ip(u,t)) is nonincreasing in t for all u G H
(iii) ip(u,t) = u for all t whenever DF(u) = 0, in particular for u G
(iv) ip(u,t) = u whenever \F(u) a\ > e
0
, for all t
(v) ^ ( F
a + C
\ C/, 1) C F
a
_
0
^ ( F
a + C
, 1) C F
a
_
c
U *7
(vi) IfF(u) is even (i.e. F(u) = F(-u) for allu), then also F(ip(u, t))
is even in u for all t (i.e. F(ip(u,t)) = F(il>(-u,t))).
298 The Palais-Smale condition
(Property (vi) follows from the construction: All the auxiliary functions
are invariant under replacing u by -u if F is even, and VF( - w) =
-VF(u) in the even case.) q.e.d.
Corollary 9.1.1. If under the preceding assumptions, F has no critical
point with value a, i.e. K
a
= 0, then there exist a deformation ip with
the preceding properties and
ip ( F
a +C
, 1) C F
a
_
e
for some e > 0. (9.1.22)
Proof. If K
a
= 0, we may choose U = 0 in Theorem 9.1.1.
q.e.d.
We shall now extend Theorem 9.1.1 in two directions. First, we con-
sider the relative case, where in addition to F, we have another C
2
functional F : H -> R with
DF(x) ^ 0 for all x with G(x) = /?,
for some given value /? R. We wish to find critical points of the re-
striction of F to G = (3. We assume that F satisfies the relative (PS)
condition of Definition 9.1.2 on G = (3. We then perform the preceding
construction with
VF(u) := VF(u) -
{VF{u)
'
VG
i
u))
VG(u) (9.1.23)
in place of VF(u). We then have
| GWM) )
= -<P(F(TP(U, t)))r,(\ IV
G
F(u)| | ) (V
G
F(TP(U, t)), VG(i>(u, t)))
from the chain rule and the analogue of (9.1.15)
= 0,
since (V
G
F(v),VG(v)) = 0 for all v G H. Therefore, the flow ip(u,t)
now leaves G = (3 invariant. We obtain:
Theor em 9.1.2. Let F, G : H M be C
2
functional on a Hilbert space
(if, (, )) with F satisfying (PS) relative to G = (3. Let a E K,
FJ :={veH\ F(v) < a,G(v) = (3) ,
K%J := {v e H | F(v) = a, G(v) = /?, V
G
F(t;) = 0} .
9.1 The Palais-Smale condition 299
Let e
0
> 0, and let U be a neighbourhood of K^ in {G(v) = (3). Then
there exist e > 0 and a continuous semigroup family
^:{G(v) = /3}x[0,oo)^{G(v) = l3}
satisfying
(i) ip(u, 0) = u for all u G {G(v) = 0}
(ii) F(' 0(ii,t)) is nonincreasing in t
(iii) ip(u, t) = u for all u G K%#
(iv) ^(u^t) = u for all t if \F(u) a\ > eo
(v) ^{F \U,1)C F
Q
G
4, <P(Ff
(
, 1) C Fl1 U U
(vi) If F and G are even, so is F(^(-, t )) for all t.
Secondly, we wish to extend the preceding construction to functionals
on Banach spaces. For a functional on a Banach space, in general one
does not have a good notion of a gradient. We therefore need to introduce
Palais' concept of a pseudo-gradient:
Definition 9. 1. 3. Let (V, ||-||) be a Banach space, U C V, F : U -+ R
a functional of class C
1
. A pseudo-gradient vector field for F is a locally
Lipschitz continuous vector field v : U V satisfying
(i) | Ku) | | <mi n( l , | | DF( ) | | )
(ii) DF(u)(v) > lmin(| | 2?F()| | , | | Z?F(u)| |
2
)
for all u G U.
Lemma 9. 1. 3. Let F : V E be a functional of class C
1
on the
Banach space V. Then F admits a pseudo-gradient vector field on
V' \={utV\ DF(u)^0}.
Proof For each u G V, we can find w = w(u) with
| HI <mi n( l , | | DF( t i ) | | ) (9.1.24)
DF(u)(w) > \ min(| | DF(ii)| | , \\DF(u)\\
2
). (9.1.25)
Since DF is continuous (as we assume F G C
1
), w satisfies (9.1.24),
(9.1.25) also for all v in some neighbourhood N
u
of u. Since {N
u
: u G
V'} is an open covering of V, it possesses a locally finite refinement
{M
a
}
aG
/t- Let
P a ^ d i s t ^ V^ Ma ) .
f This holds for any open covering of a paracompact set, see e.g. J. Dieudonne,
Grundziige der Modemen Analysis, 2, Vieweg, Braunschweig, second edition, 1987,
pp. 26-9; V is paracompact for example because it is metrizable.
300 The Palais-Smale condition
p
a
is Lipschitz continuous, and p
a
(v) = 0 for v ^ M
a
. We put
, v Pa(v)
Since each v is only contained in finitely many Mp, because of the local
finiteness of the covering, the denominator of (p
a
is a finite sum. ((p
a
)aei
is a partition of unity subordinate to { M
a
}, i.e. 0 < (p
a
< 1, <p
a
= 0
outside M
a
, Ylaei V^a = 1- Also, the y?
a
are Lipschitz continuous. Then
v(u) := S
aG
/<a(w)w(wa) for some u
a
G M
a
is a convex combination of vectors satisfying (9.1.24), (9.1.25) and hence
satisfies these relations, too.
v(u) thus is a pseudo-gradient vector field for F.
q.e.d.
Note that we only need to require F G C
1
, and not F G C
2
, in order
to construct a locally Lipschitz pseudo-gradient field. We then have the
following deformation for C
1
-functional on Banach spaces.
Theorem 9.1.3. Let F : V R 6e a C
1
-functional on a Banach space
V satisfying (PS). Let a G R, e
0
> 0, U a neighbourhood of K
a
as
in Theorem 9.1.1. Then there exist 0 < e < 1 and a continuous family
ip : V x [0, oo) V satisfying the semigroup property w.r.t. t > 0, and
(i) ^(t/,0) = u for allu eV
( ii) F(ijj(u, s)) < F(I/J(U, t)) whenever 0 < t < s, u G H
(iii) -0(ii, t) = u for all t whenever DF(u) = 0
(iv) ?/>(*/, t) = u whenever \F(u) a\ > eo, for all t
(v) ^ ( F
a + C
\ (7,1) C F
Q
_
C
, ^((7,1) C F
a
_
c
U C/
(vi) If F(-) is even, so is F(i/j(-,t)) for all t.
Proof The proof is the same as the one of Theorem 9.1.1, replacing
VF(u) by a pseudo-gradient vector field v(u) except for the following
technical point: Lemma 9.1.3 asserts the existence of a pseudo-gradient
field only on {x G V \ DF(x) ^ 0}. We therefore have to choose another
Lipschitz continuous cut-off function 7 : V R with 0 < 7 < 1, y(u)
0 if u G N
a
s, 7(1/) = 1 for u G V \ N
a
j. We may then consider
^ j M
=
-j(rp(
u
,t)MF(i>(u,t)))r](Hi>(u,t))\\)v(^(u,t)) ( 9.1.26)
with v?, 77 as before. This has the additional effect that
dtp(u,t)
0* ~ '
9.2 The mountain pass theorem 301
whenever tp(u,t) N
a
, which is a neighbourhood of K
a
, while the
evolution is the same as before (with v(u) in place of VF(u)) outside
N
a
j. This cut-off near K
a
does not affect the rest of the construction.
If F is even, we may also choose 7 even.
However, there might still exist critical points of F in F
a + C
\ N
a
j. In
order to take account of those, we strengthen the requirements on the
above cut-off function (p to
c
<p(s) = 0 for | a s| > min(e
0
, - )
<p(s) = 1 for | a - s\ < mi n( y, - ) .
With such a </?, the right-hand side of (9.1.26) vanishes near any critical
point of F, and it is therefore defined on all of V. If we then also impose
the additional restriction
everything works out as before.
q.e.d.
It is possible, and not overly difficult, to extend Theorem 9.1.3 to the
relative case and to obtain a result analogous to Theorem 9.1.2. Here,
however, we refrain from doing so.
9.2 The mountain pass theorem
With the help of the deformation theorems of the previous section, one
may easily derive existence results for critical points of a functional sat-
isfying (PS). To illustrate this point, we start with the trivial
Lemma 9. 2. 1. Let F : V E be a C
1
functional on a Banach space
satisfying (PS). If
a := inf F(u) > 00,
then F possesses a critical point UQ with value a (i.e. F(uo) = a,
DF(u
0
)=0).
Proof. Suppose that K
a
0. Then U = 0 is a neighbourhood of K
a
. Let
Q > 0 be arbitrary. Choose e as in Theorem 9.1.3. From the definition
of a,
F
a + C
^ 0 , F
a
_
c
= 0.
302 The Palais-Smale condition
Therefore, it is impossible that as Theorem 9.1.3 (v) asserts, the defor-
mation ?/>(, 1) maps F
a + C
into F
a
_
c
. This contradiction implies K
Q
^ 0,
which means the existence of the desired critical point.
q.e.d.
Of course, the methods presented in Chapter 4 yield more general
existence results for minimizers of variational problems. The strength
of the Palais-Smale approach rather lies in its capability of producing
nonminimizing critical points. To demonstrate this, we now present the
mountain pass theorem of Ambrosetti-Rabinowitz.
Theorem 9. 2. 1. Let F : V R be a C
1
functional on a Banach space
(V, l(-ll) satisfying (PS). Suppose F(0) = 0 and
(i) 3p > 0, 0 > 0: F(u) > (3 for all u with \\u\\ = p
(ii) 3u\ with \\ui\\ > p and F(u) < (3.
We let
r : = {
7
C ( [ 0 , l ] , V) | 7 ( 0 ) = 0 , 7 ( l ) = t i i } .
Then
a := inf sup F(^(T)) (> 0)
^
r
r [ 0 , l ]
is a critical value ofF (i.e. there exists UQ with F(UQ) = a, DF(u
0
) = 0).
Proof. Suppose again that K
a
= 0, and take the neighbourhood U = 0
of K
Q
. We let e
0
= min(/?, (3 E(u\)). Choose e as in Theorem 9.1.3.
Prom the definition of a, there exists 70 G T with
sup F( 7O( T) ) < a + e,
r[0,l]
while no such 7 can satisfy
sup F(*y(t)) < a - e,
*[0,1]
i.e. satisfy 7([0,1]) C F
a
_
c
. However, if we apply the deformation ?/>(, 1)
of Theorem 9.1.3, we obtain a path
7 ( r ) : = ^ ( 7 o ( r ) , l ) c F
a
_
c
with 7(0) = 7o(0) = 0 and 7(1) = 7o(l) = u\ by choice of e
0
. This
contradiction implies K
Q
^ 0, i.e. the existence of the desired critical
point.
q.e.d.
9.2 The mountain pass theorem 303
Let us summarize the essential features of the preceding reasoning:
(1) One chooses a family of sets, here T, that exploits some properties
of F and is invariant under the deformation ^(-, 1).
(2) This family yields a minimax value a.
(3) a can be estimated from above with the help of any member of
our family T (a < sup
r
[
0)1
] F(7())) f
r a n v
7 H
anc
* fr
m
below through the constraints that the members of T have to
satisfy (in Theorem 9.2.1, every 7 G T intersects dJE?(0, p), and
therefore a > (3 > 0, and therefore in particular, the critical
point produced is different from 0).
(4) A reasoning by contradiction, based on the deformation theorem,
shows that a is a critical value.
As an application of the mountain pass theorem, we consider the fol-
lowing example:
Theorem 9.2.2. Let Q, C R
d
be a bounded domain, 2 < p < -^
{respectively < 00 for d= 1,2). Then the Dirichlet problem
Au+\u\
p
~
2
u = 0 in Q (9.2.1)
u = 0 on dQ, (9.2.2)
admits at least two nontrivial (weak) solutions ('nontrivial' means not
identically 0).
Proof. If u is a solution, so is u. Therefore, it suffices to verify the ex-
istence of one nontrivial solution. (9.2.1), (9.2.2) are the Euler-Lagrange
equations in iJ
0
'
2
(fi) for the functional
\Du\
2
-- f \u\
p
. (9.2.3)
P JQ
This functional is a continuous functional on H
0
' (fi), because f
Q
\Du\
clearly is continuous there, and J \u\
p
too, because of the Sobolev Em-
bedding Theorem 3.4.3 as we assume p < ^4>. F is also differentiable,
with
DF{u)(ip) = f Du-Dip- [ \u\
p
~
2
uip. (9.2.4)
JQ JQ
Again
<p t / Du - Dip
JQ
F(K)
=\ L
I / \ u\
p
~
2
u<p
(9.2.5)
(9.2.6)
304 The Palais-Smale condition
clearly is continuous on HQ'
2
(Q), whereas
Jn
is continuous, because we have by Holder's inequality
a- f GOO'
by the Sobolev Embedding Theorem 3.4.3
for some constant c
0
. Thus F : Hl'
2
(Q) R is of class C
1
.
We shall verify the Palais-Smale condition for F: Suppose (w
n
)nN C
HQ
,2
(Q) satisfies
| ^( ^n) | < c\ for some constant c\ (9.2.7)
DF(u
n
) ^ 0 foru^oo. (9.2.8)
Thus
\l f l/fclnl
2
-- /| l*nH<Ci (9.2.9)
and
sup J / Du
n
- Dtp- J \u
n
^~
2
sup / Du
n
'D<p- / |w
n
|
:
^
,
MMI
H
i,2<il-
/
n ^
^ nV?
0 for n oo.
(9.2.10)
In (9.2.10), we use <p = >u i ? ^
2
a n
d obtain
-J\ Du
n
\
2
+ j \ u
n
\
p
< c
2
\ \ u
n
\ \
Hl
,
2
.
Since p > 2, (9.2.9) and (9.2.11) imply
/ \Dun\
2
< c
3
\\u
n
\\
H
i
a
+ c
4
.
Since by the Poincare inequality (Theorem 3.4.2)
IKHtf
ll2(n)
= j K|
2
+ J \Du
n
\
2
< C
5
y |Z?tl
n
|
2
,
we conclude from (9.2.12)
IK| |
H
i , 2
(
n) < ce-
Thus, any 'critical sequence' (u
n
)
ne
^ is bounded. We now claim that
(9.2.11)
(9.2.12)
(9.2.13)
(9.2.14)
9.2 The mountain pass theorem 305
such a sequence (u
n
)
ne
^ contains a convergent subsequence, thereby
completing the verification of (PS). We need to show that, after selection
of a subsequence,
/
\Du
n
Du
m
\ 0 for n,m oo (9.2.15)
(using again the Poincare inequality as in (9.2.13)). Now
/ Du
n
D(u
n
- Um) - \ \u
n
f~ u
n
(u
n
- u
m
) 0 for n, m oo
(9.2.16)
by (9.2.10), (9.2.14).
By the Rellich-Kondrachev theorem (Corollary 3.4.1), we may also as-
sume (by selecting a subsequence) that (w
n
)nN is a Cauchy sequence in
1^(0,). Then, using Holder's inequality as in (9.2.5),
p - i i
/ \ u
n
\
P
~
2
U
n
(u
n
- U
m
)\ < I / \ u
n
\
P
) I / \ u
n
~Um\
P
\ - 0
f or n, r a- >oo. (9.2.17)
Equation (9.2.16) then implies
Du
n
- D(u
n
- u
m
) 0 for m, n > oo,
/
which implies (9.2.15). We have thus verified (PS) for F. We shall now
check the remaining assumptions of Theorem 9.2.1. First of all,
F(0) = 0.
Recalling that by the Sobolev Embedding Theorem 3.4.3 (and the
Poincare inequality, see (9.2.13))
i
we have
F(u) > ( i - | M |
( n )
) | | | |
2
B
. . .
( 0)
> > 0
if IMI#i.2(m = p is sufficiently small.
Finally, take any u
2
HQ
,2
(Q) with J
Q
\u
2
\
p
^ 0. Then for sufficiently
large A > 0, u\ Xu
2
satisfies
F{u
1
) = j\Du
a
\
2
-jJ
a
\u
3
\>'<0.
306 The Palais-Smale condition
We have now verified all the assumptions of the mountain pass Theorem
9.2.1, and we consequently get a critical point u of F with
F(u) > /? >0 .
This is the desired nontrivial solution. (In fact, regularity theory im-
plies that any weak solution of (9.2.1) is smooth in fi, see e.g. Gilbarg-
Trudinger, loc. cit.)
q.e.d.
Remark 9.2.1. By the same method, we can also treat the equation
Au - Xu 4- \u\
p
'
2
u = 0 for any A > 0. (9.2.18)
9.3 Topological indices and critical points
In Section 3.2 of Part I, we have seen an example where a topologi-
cal construction permitted to deduce the existence of more than one
(unstable) critical point of a functional. In the present section, we first
give an axiomatic approach to such constructions and then apply this
in conjunction with the Palais-Smale condition to a concrete variational
problem to show the existence of infinitely many solutions.
Such global topological constructions originated with the work of
Lyusternik. Contributors also include Schnirelman, and, more recently,
Rabinowitz, and many others. The reader will find detailed references
in the monographs quoted at the end of this chapter.
Definition 9.3.1. Let X be a topological space, F : X R continuous,
x X is called a special point for F, with value a,
x spec
a
F (a then is called a special value)
if x is contained in all Ac X with the following property: For each open
U D A there exist e = e(U) > 0 and a continuous
ij) : X x [0,1] -+ X
satisfying
(i) if>(y, 0)=y foryeX
(ii) F{i>{y,t)) < F(*l>(y,s)) for ally e X, 0 < s <t < 1
9.3 Topological indices and critical points 307
(iii) For every y G X \U with
F(y) < a + c,
we have
F(il>(y,l))<a-e.
Of course, the ip of the preceding definition is an abstract version of
the deformations constructed in Section 9.1, and the notion of special
point is a topological version of the notion of critical point.
Remark 9.3.1. Since the composition of any two deformations ip\,ip2
satisfying the properties of Definition 9.3.1 continues to satisfy these
properties, the intersection of any two sets A\, A
2
still satisfies the prop-
erty expressed in Definition 9.3.1 if A\, A
2
do. Therefore, if spec
a
F 0,
we may take U = A = 0 in Definition 9.3.1 and find a deformation ip
that satisfies (i)-(iii) for all y G X.
In order to illustrate the notion of special point as well as the topo-
logical constructions to follow, we now present the simple:
Lemma 9.3.1. Let F : X E be a continuous function on the topo-
logical space X. Let M be a (nonempty) class of nonempty subsets of X.
If spec
a
F = 0, we require that M is invariant under the deformations
considered in Definition 9.3.1:
IfAeM, then also \j)(A, 1) G M. (9.3.1)
Suppose
- oo < a = inf s upF( i / ) <oo. (9.3.2)
AeM
yeA
Then a is a special value for F, i.e. there exists
x
0
spec
a
F.
Proof. Suppose s pec
a
F = 0. According to the preceding remark, we
may then take U = 0 and find ip : X x [0,1] X and e > 0 with
F(ip(y, 1)) < a - e whenever F(y) <a + e. (9.3.3)
We may find AQ G M. with
sup F(y) < a + e, (9.3.4)
yA
0
308 The PalaisSmale condition
but no A G M. can satisfy
supF(s/) <a~e. (9.3.5)
yA
However, if we take A\ := ip(A
0
, 1) then A\ G M by assumption, and
by (9.3.3)
sup F(t/) < a e,
contradicting (9.3.5). Thus, spec
a
F ^ 0.
g.e.rf.
In order to obtain the existence of further special points, we now shall
introduce the notion of a (topological) index. Such an index is based
on symmetry or invariance properties of the functional under considera-
tion. Here, we only consider the case of the simplest nontrivial symmetry
group, namely Z
2
, although the subsequent constructions easily gener-
alize to any compact group G. We thus make the following symmetry
assumptions:
X is a topological space with a nontrivial involution, i.e. there exists
a continuous map j : X X, j / id, with
j
2
= id.
F : X > R is continuous and even, i.e.
F(j(x)) = F(x) for all x X.
M:= {Ac X\ j(A) = A and for all x G A,j(x) ^ x
(i.e. A contains no fixed points of j)} .
We now also require il)(j(x),t) = j(ip{x, t)) for all the deformations of
Definition 9.3.1.
Definition 9.3.2. An index for (X, F) is a map
i : <M- +{ 0, l , 2, . . . , oo}
satisfying for all A, Ai,A
2
G M:
(i) i(A) = 0 & A = 0
(ii) A finite (A ^ 0) = t(A) = 1
(Hi) i ( ^i U A
2
) < i{Ai) 4- <(i4
2
)
(iv) Ai C A
2
= i(Aj) < i(A
2
)
(v) i(A) < i(j(A))
9.3 Topological indices and critical points 309
(vi) A compact = 3 neighbourhood U of A in X with U M,
i(A) = i(U) < oo.
For n { 0, 1, 2, . . . , oo}, we put
M
n
--={AeM\ i(A)>n}.
Remark 9.3.2. More precisely, one should call an i as in Definition 9.3.2
an index for (X, F, Z
2
), in order to specify the symmetry group involved.
For n { 0, 1, 2, . . . , oo}, we define
a
n
:= inf supJF(y).
AM
n
yA
Theor em 9. 3. 1. Suppose the above symmetry assumptions hold, an in-
dex i for (X, F) exists, and
- oo < a
n
< oof
(i) Then
s p e c
a n
F ^ 0 (9.3.6)
(ii) If furthermore for some k > 1, a
n
= a
n
+i = . . . = c*
n
-ffc, then
spec
an
F is infinite.
Proof We note that property (v) of Definition 9.3.2 implies that M
n
is invariant under (symmetric) deformations ip. Therefore, Lemma 9.3.1
implies spec
Qn
F ^ 0. For the second statement, we claim that for A
0
=
spec
Qn
F,
t(i4o) >fc + l. (9.3.7)
If k > 1, property (ii) of Definition 9.3.2 then implies the existence of
infinitely many special points with value a
n
.
Suppose on the contrary that
i(A
0
) < k. (9.3.8)
By Definition 9.3.2 (vi), we may find a neighbourhood U of ,4
0
with
U e M and
i(Ao) = i(U). (9.3.9)
Since A
0
consists of special points, we may find a (symmetric) deforma-
tion ip with
F{ip{y, 1)) < a
n
- e for all y e X \ U with F{y) < a
n
+ e
f Since the infimum over an empty set is oo, this contains the assumption M
n
/ 0-
310 The Palais-Smale condition
for some e > 0. Since a
n
a
n +
k, we may find A G M
n
+k with
sup F(y) < a
n
4- e,
hence
sup F(z) < a
n
- e. (9.3.10)
* V( i4\I/,l)
We have
i ( i 4\ 7) >i ( i 4) - i ( 7) by (iii)
>n + k-k, using (9.3.8), (9.3.9), ^ G M
n
+
k
= n.
Thus
A\UeM
n
,
hence A\ ( 7 ^ 0 by (i). Since, as noted in the beginning, A4
n
is invariant
under i/>, we get
1>{A\U
9
l)eM
n
,
hence
sup F(y) > a
n
,
contradicting (9.3.10).
q.e.d.
In order to apply the preceding considerations, we need to construct
an index with the properties listed in Definition 9.3.2. We shall present
here Coffman's version of the genus of Krasnoselskij.
Definition 9.3.3. Suppose the symmetry assumptions stated before Def-
inition 9.3.2 hold. The genus of A ^ 0, A G M. is defined as follows:
gen(A) := inf {n G { 1, 2, 3, . . . , oo} | 3 continuous f : A -> R
n
\ {0}
with f(j(x)) = f{x) for all x A}
while gen(0) := 0.
As an example, we state:
Lemma 9.3.2. The genus of the unit sphere S'
n
~
1
= {||x|| = 1} in R
n
(with involution j(x) = x) is equal to n.
9.3 Topological indices and critical points 311
Proof. The inclusion map S
n
~
l c
-^ R
n
satisfies the properties of Def-
inition 9.3.3, and so gen(5
n _1
) < n. If n > 2, S
n
"
1
is connected,
and therefore, by the mean value theorem, there is no continuous map
/ : S
n
~
l
-+ R
x
\ { 0} with f(-x) = -f(x) for all x. Hence gen(5
n
-
1
) > 2.
In fact, by the Borsuk-Ulam theoremf, there is no such continuous map
to R
m
\ {0} with m<n. Therefore, gen(5
n
~
1
) > n.
q.e.d.
Corollary 9.3.1. The genus of the unit sphere S := {x V : \\x\\ = 1}
in an infinite dimensional Banach space (V, ||-||) is oo.
Proof For any n-dimensional subspace V
n
of V,
gen(S) > gen(S' n V
n
) > n by Lemma 9.3.2 .
q.e.d.
Theorem 9.3.2. The genus as defined in Definition 9.3.3 is an index
in the sense of Definition 9.3.2.
Proof. We need to check the properties (i)-(vi) of Definition 9.3.2.
(i) is obvious.
(ii) If A M is finite, then A is of the form {x
v
, j(x
u
) \ v 1, . . . , k}
for some k. We define / : A -+ R
1
\ {0} by f(x
u
) = 1, f(j(x
u
)) =
1 for all v (of course, we may assume x
M
^ j{xv) f
r a
U A*
9
v).
(iii) Let gen(A) = n
v
< oo, v = 1,2, and let the continuous f
v
:
A
u
-> R
n
" \ {0} satisfy U{j(x)) = -f(x) for all x. By the
Tietze extension theorem!, f
v
can be continuously extended to
By considering ^(f
l/
(x) f
l/
(j(x))) in place of / , we may assume
that the extension still satisfies
UU(x)) = ~fv{x) for all x.
The map (/i, /
2
) : A
x
U A
2
-+ R
n
*
+n
2 \ {0} then shows that
gen(Ai U A
2
) <ni+n
2
= gen(Ai) 4- gen(A
2
).
(iv) is obvious.
(v) follows, since / o j shares the necessary properties with / .
f See e.g. E. Zeidler, Nonlinear Functional Analysis and its Applications, I, Springer,
New York, 1984, p. 708, for a proof.
$ See E. Zeidler, loc. cit., p. 49.
312 The Palais-Smale condition
(vi) Let A M be compact. Since j(x) ^ x for all x A (by the
properties of Ai), for each x A, we may find a neighbourhood
U(x) with /(#) O j(U(x)) = 0. Since A is compact, it can be
covered by finitely many such neighbourhoods /, v = 1, . . . , n.
For each /, we choose a continuous function ip
u
: X R with
</?j,(x) > 0 for x t/j,, <^(#) = 0 for x X \ U
v
. We then define
/i = ( / i \ . . . , / i
n
) : A ^ R
n
\ { 0 } by
h
*
( x ) : =
J M*) torxU
) for x A \ U
u
, in particular for x j(U
v
).
(Since every x Ais contained in some /, we have h(x) ^ 0 for
all x A.)
Thus gen(A) < n < oo.
If A M is compact with gen(A) = n, and
/ : A -> R
n
\ {0} is continuous with f(j(x)) = - / ( x ) ,
we may extend / as before to / : X R
n
(with the same symme-
try property). Since A is compact, so is / (A), and therefore, we
may find an open neighbourhood V of f(A) with 7 c l
n
\ {0}.
Then U := J~
l
(A) satisfies
n = gen(A)
< gen({7) by (iv)
< n since J(U) is contained in V C R
n
\ {0}.
Thus gen(J7) = gen(A) as required.
q.e.d.
We may now obtain a general existence theorem for critical points of
functionals satisfying (PS):
Theorem 9.3.3. Let F,G : H R be C
2
functionals on a Hilbert
space (iJ, (, )) that are even, i.e. F(x) = F(-x), G(x) = G(-x) for all
x H. Suppose F satisfies (PS) relative to G = /3, and is bounded from
below. Let
M := {A C {G(x) = /3}\0<A and (xA<*-x A)}.
Let 70 := sup{gen(if) | K M compact} (< oo). Then F possesses at
least 7o critical points relative to G = (3.
Proof. Since (PS) holds, by Theorem 9.1.2, all special points (in the
9.3 Topological indices and critical points 313
sense of Definition 9.3.1) for the restriction of F to X := {x G H \ G(x) =
/?} are critical points for F relative to G = (3. Hence, it suffices to pro-
duce 70 special points of F on 1 . Let
a
n
:= inf supF( x) .
AeM,gen(A)>n
X
A
Since F is bounded below, and since in the definition of 70, we only
consider compact sets, we have
00 < a
n
< 00 whenever n < 70.
By Theorem 9.3.2, we may apply Theorem 9.3.1 to the genus as an index.
We have in fact
00 < ot\ <a2 <"- <ot
n
<---< 00 whenever n < 70.
If we always have strict equality, then the
x
n
spec
a n
F
produced by Theorem 9.3.2 (i) are all different, because their values
F(x
n
) are all different. If however any two such numbers a
n
_i and a
n
are equal, then by Theorem 9.3.2 (ii) we even obtain infinitely many
special points. Thus, in any case, we have at least 70 special, hence
critical points.
q.e.d.
As an application of Theorem 9.3.3, we consider the example of the
previous section:
Corollary 9.3.2. Let Q C R
d
be a bounded domain, 2 < p < ^ ~
(respectively < 00 for d = 1,2). Then for any A > 0, the Dirichlet
problem
Aw - Xu + \u\
p
~
2
u = 0 infl (9.3.11)
u = 0 on dfl (9.3.12)
admits infinitely many (weak) solutions.
Proof. We consider the even Junctionals
G(u) = l f \ u f .
PJn
We claim that F satisfies (PS) relative to G = 1. The proof is similar
314 The Palais-Smale condition
to the argument employed for the demonstration of Theorem 9.2.2: let
{v<n)nN be a critical sequence, i.e.
F(u
n
) < ci
(G(u
n
),DF(u
n)
)
\\DG(u
n
)\\
2
(9.3.13)
0 for n - oo (9.3.14)
where all norms and scalar products are from H
0
}
(ft). From (9.3.13)
(and the Poincare inequality in case A = 0), we obtain
M l ^ n ) ^
c
2-
(9.3.15)
We obtain as in the proof of Theorem 9.2.2 (cf. (9.2.5)), by using Holder's
inequality, that
\DG(u
n
)(u
n
- u
m
)\ = / \u
n
\
p
~~
2
u
n
(u
n
- Um)\
p-l 1
< (J \ u
n
\ A
P
(j \ u
n
- u
m
A
P
.( 9.3.16)
Since p < j ~ , from (9.3.15) and Sobolev's Embedding Theorem 3.4.3,
we conclude that / \u
n
\
p
is bounded, whereas (9.3.15) and the Rellich-
Kondrachev theorem (Corollary 3.4.1) imply that {u
n
)
n
^ is a Cauchy
sequence in Z7(fi). Thus, from (9.3.16)
Also
DG(u
n
)(u
n
Um) 0 for n, m oo.
\DG{u
n
)(w)\
(9.3.17)
||ZX?(u)|| = sup
w<=Hl'\n)
\ W\ \
U
U2
>
\DG(u
n
)(u
n
)\
U
n
\ \
H
l,2
/ K
U
n
1,2
* n\ \ H,
> 0 from (9.3.15) and
- [ \ u
n
\
p
= G(u
n
) = 1.
P J
(9.3.18)
(9.3.19)
9.3 Topological indices and critical points 315
Prom (9.3.17), (9.3.18) we conclude that there exist h
nm
e HQ
,2
(Q) with
DG(u
n
)(u
n
-Um + h
nm
) = 0 for all n, m (9.3.20)
H^nmll//
1
-
2
0 for n, ra oo. (9.3.21)
Therefore, from (9.3.14)
DF(u
n
)(u
n
-u
m
+ h
nm
) -+ 0,
i.e.
/ (Du
n
(D(u
n
- u
m
) 4 Dh
nm
) 4 Xu
n
(u
n
-u
m
+ h
nm
)) - 0
for n, m > oo
and because of (9.3.21) then also
/ (Du
n
(D(u
n
- u
m
)) 4 Xu
n
(u
n
- Um)) -+ 0.
This implies
/ (\(D(u
n
- u
m
)\
2
4- A \(u
n
- u
m
)\
2
J - 0 for n, ra -+ oo,
and consequently, (w
n
)nN is
a
Cauchy sequence in Ho
, 2
(n). This verifies
(PS) relative to G = 1.
In order to apply Theorem 9.3.2, we thus only need to check that in
the present case, 70 = 00. However,
juetf^nj^lMl^i}
is the intersection of a sphere centered at the origin in L
p
(fi) with the
subspace HQ
,2
(Q). Therefore, the argument of Lemma 9.3.2 easily im-
plies 70 = 00. Theorem 9.3.2 thus produces infinitely many solutions
DF(u
n
)-(
DG
^
DF
^DG(u
n)
=0,
n>
\\DG{u
n
)\\
2 X n>
i.e. with
=
(DG(u
n
),DF(u
n
))
M
" ' \\DG(u
n
)\\
2
'
weak solutions of
Au
n
- \u
n
4 \i
n
\u
n
\
p
~~ u
n
= 0 in Q
u
n
0 on dil.
316 The Palais-Smale condition
If we choose v
n
with i/P~
2
/i
n
= 1, then v
n
:= v
n
u
n
solves (9.3.11),
(9.3.12) weakly. Again, we remark that elliptic regularity theory implies
that all u
n
and v
n
are smooth in Q, so that in fact we obtain classical
solutions of (9.3.11), (9.3.12).
q.e.d.
In Theorem 9.2.2 and in Corollary 9.3.1, we had imposed the restric-
tion
2d
p < (
m c a s e
d > 3) ,
a 2
and the reader may wonder whether this is necessary. To pursue this
question, we shall now discuss the theorem of Pohozaev:
Theor em 9.3.4. Let Q C M.
d
be a smooth domain which is strictly star
shaped w.r.t. 0 M
d
(this means that the outer normal vofVt satisfies
(x, v(x)) > 0 for all x dft). Then for X > 0, any solution of
Au - Xu 4- \u\ *** u = 0 in ft (9.3.22)
u = 0 ondfl (9.3.23)
vanishes identically.
We shall present a complete proof only for A > 0 and for smooth
solutions u (elliptic regularity implies that any weak solution of (9.3.21),
(9.3.22) is automatically smoothf on Q, but the present book does not
treat this topic):
We multiply (9.3.22) by f
= 1
x*-^ and obtain
(Au -XU+ \U\&* u) XX 1^7 (9.3.24)
By (9.3.23), we have w = 0on OVt, hence also X > * 0 = I > V g *
(y = ( i /
1
, . . . , v
d
) is the exterior normal of Q). Integrating (9.3.25) there-
fore yields
2
xV = 0.
(9.3.26)
f See for example Appendix B in M. Struwe, Variational Methods, Springer, Berlin,
2nd edition, 1996.
0
f ,r, .2 Ad r ,
l2
d ~ 2 r , ,JLL i f
JQ
Z
JQ
Z
JQ
Z
Jdc
9.3 Topological indices and critical points 317
On the other hand, multiplying (9.3.22) by u leads to
/ \Du\
2
+ A / \u\
2
- / \u\& = 0. (9.3.27)
JQ JQ JQ
Equations (9.3.26) and (9.3.27) imply
Jn Jdt
2A , _ , , X > V = 0. (9.3.28)
If A > 0, this implies u = 0, hence the result. (If A = 0, one still concludes
that | ^ = 0 on dft. Since also u = 0 on dfi by (9.3.23) one may invoke a
unique continuation theorem for solutions of elliptic equations to obtain
u = 0 in ft. We omit the details.)
q.e.d.
Theorem 9.3.4 implies that for p j ~ in Theorem 9.2.2 and Corol-
lary 9.3.2, the Palais-Smale condition no longer holds. Namely, if it did,
the proofs of those results would yield the existence of nontrivial solu-
tions. It also shows that if the Palais-Smale condition fails the whole
scheme developed in the present chapter for producing critical points
breaks down.
Since for p < j ~ , (PS) does hold, the case p j ^ ,
c a n
be considered
as as limit case for (PS). In fact, such limit cases of the Palais-Smale con-
dition occur in many variational problems that are of importance in Rie-
mannian geometry, e.g. the Yang-Mills functional on a four-dimensional
Riemannian manifold, two-dimensional harmonic maps, surfaces of con-
stant mean curvature, the Yamabe functional etc. The interested reader
is for example referred to
K. C. Chang, Infinite Dimensional Morse Theory and Multiple Solution
Problems, Birkhauser, Boston, 1993,
J. Jost, Riemannian Geometry and Geometric Analysis, Springer, Berlin,
2nd edition, 1998,
M. Struwe, Variational Methods, Springer, Berlin, 2nd edition, 1996,
and the references contained therein.
The basic references that have been used in writing the present chapter
are the monograph of M.Struwe just quoted, as well as
P. Rabinowitz, Minimax Methods in Critical Point Theory with Applications
to Differential Equations, CBMS Reg. Conf. Ser. 65, AMS, Providence,
1986
and
318 The Palais-Smale condition
E. Zeidler, Nonlinear Functional Analysis and its Applications, III, Springer,
Berlin, 1984.
These three monographs contain not only detailed bibliographical ref-
erences which the reader is urged to consult in order to find the
original sources of the results of the present chapter but also many
further results and examples concerning the Palais-Smale condition and
index theories.
Exerci ses
9.1 Why is Theorem 9.2.1 called 'mountain pass theorem'? Hint:
Try to find an analogy between the statement of that result and
the geometry of mountain passes.
9.2 Try to find conditions for a function
so that the reasoning of Theorem 9.2.2 can be extended to the
Dirichlet problem
Au(x) = f(x,u(x)) for x ft
u{x) = 0 for x dfl
in a smooth bounded domain fl. (An answer can be found in
Theorem 6.2 of the quoted monograph of M.Struwe.)
9.3 Develop an index theory for a general compact group G in place
of Z
2
.
9.4 Extend Theorem 9.1.3 to the relative case as indicated at the
end of Section 9.1.
Index
*- = E?- i *V=*V, xv *' : =*, 79
III
2
= x x, xv meas, 118
u( t) = 4 ( t ) , xv f
A
f{x)dx off, 120
c' rm' xv ^ ( A) , 120
II'
1 2 5
~ W R + : = { t e R | t > 0 } , 130
Df 4 V := {f : V -+ R linear with
F
P
, 5
x
n
* x, 135
M
x
:=
S^lTi4
/(U + 5r/ )
' "
a=010
"" { *H: ( , y) = 0 for all y G M},
AC( la,6J) , 11
1 4 1
S^ '
U
11*11 := s u p ^
0
l|gffl

R+ U { oo} , 144
'
d
2 L( V, W) , 145
6
2
I(u,rj) := y J ( u + afj) ,_a=ol9
k e r
T : = { x 6 V : T x = 0} , 145
F
p
i
p
i77i77, = Ef ,
i =
i FpipJWj, 19 V = Vi 0 V
2
, 146
r(f\ . _ <k m **2
C

f c e r
'
1 4 7
L
(
c
)
:
= /o
T
l^ (* )l <** = indT, 147
f
T
^
d
f t f * ^ * 32 F ( V, IV) , 147
Jo \ L,
a
=i l
c
) ; *
6Z
DF{U), 150
( c ) : =/
0
T
| c ( t ) |
2
dt = C* 150
1 cT^d / -
a
\ 2 ,
Q0
C^, 150
S
/
0
E i ( O * . 32
D
2
F ( U
) ,
1 5
0
U
J
) t j =i , . . . , n '
3 9
ODE, 155
#i . *
: =
&9V>
3 9
I Wco
:
=
s u
Pt / IMOII* 156
r
j
f e
:
= Wtefrk+gkU-gM), 39 ll/H
: =
||/||
LPM)
:= (/
A
| / ()| ' dx) J,
5
n
:= 159
{(x
1
x
n + 1
) G R
n + 1
^ (x* )
2
= l l
e s s s u
P* > i / (
x
) := inf { A R| f(x) <
\ ' ' " ' ' i =i
V }
J ' A for almost all xG A} , 162
d( p,g)
9
:= mf{ L( c) | c : [a,6] - / * ( * )
: =
/ R- * ( * * * ) / GO* . 167
M rectifiable curve with c(a) = Cg(Q), 166
p,c(b) = q}, 51 slippy, 166
319
320 Index
ft' c c n, 167
fh =* / , 167
a : = ( a i , . . . , a
d
), 171
| a | : = E ? i i . l 7 1
u := D
Q
u, 171
W
k
* (Q), 171
Nl l V*. p(n)
:
= ( E| a| <f e In IA*I
P
)
P
,
171
H
k
'P(Q), 171
H
0
f c
'
p
(n), 171
Ditt, 172
Dw, 172
/sc, 185
F
x
(x) :=mf
yex
(\ F(y) + d
2
(x,y)), 190
J
A
( z) , 191
( A : = ? . > ^ , 1 9 9
sc~F, 208
2,4, 210
q-f, 216
r - l i m
n
- . ooF
n
, 225
V( H) , 242
\ \ Du\ \ , 242
Nlf?v(n)
:
= IMlLi(n) + W
Du
W ().
242
l^ld-i.243
P( ?
i
n) : =| | D
X
J Bl | ( n)
l
244
ph *u( x) , 246
Gl(d,R), 257
0( d, R) , 257
J A( ) V, 270
( , - ) L . 2 8 3
( F5) , 292
K
a
, 292
VF( ) , 294
spec
a
, 306
gen(A), 310
accessory variational problem, 19
accumulation point, 185, 208
Ambrosetti, 302
angular momentum, 26, 28, 30
arc-length, 3
Arzeia-Ascoii theorem, 176
Banach fixed point theorem, 150, 152
Banach space, 126, 129, 132-134, 138,
145, 161, 162, 270, 291, 292,
299-301
Banach spaces, 150
Bellman equation, 105, 108
Bellman function, 105, 107
Bellman' s method, 106
bifurcation theory, 268, 270
Borel measure, 118
Borel set, 117
Borel er-algebra, 117
brachystochrone, 4
canonical equation, 85, 89, 95, 97,
99-101, 111
canonical equations, 80, 93
canonical system, 80
canonical transformation, 95-100, 103
Cantor diagonalization, 135
catenary, 283
catenoid, 283
Cauchy sequence, 126
characteristic function, 119, 211, 243
Christoffel symbols, 39
classical calculus of variations, 3
closed geodesic, 67
coarea formula, 250, 257
coercive, 186
coercivity condition, 291
Coffman, 310
cokernel, 147
compactness condition, 183
compactness of critical sequences, 292
complementary subspace, 146
complete, 126, 134
complete integral, 84, 93
completely integrable, 100
conjugate, 22, 24
conjugate point, 43
conservation law, 26
conserved quantities, 26
constant of motion, 80, 99
continuous linear functional, 133
continuous linear operator, 144
control condition, 109
control equation, 106, 108, 109, 111, 207
control parameter, 104
control problem, 109
control restriction, 105
control variable, 111, 207
converge, 125
convex, 68, 127, 130, 143, 186, 191, 193,
214, 219, 222
convex combination, 142
convex curve, 68
convex function, 122
convex functional, 188
convexity, 291
coordinate transformation, 36
cost, 105
cost function, 207
countable base, 184
countably additive, 118
critical family, 75
critical point, 5, 62, 66, 293, 294, 298,
301, 303, 306, 307, 312, 317
critical sequence, 291, 292, 304, 314
Index
321
critical value, 302, 303
cusp catastrophe, 279
de Giorgi, 225
deformation, 293, 294, 297, 298, 302,
307-309
dense, 169
diffeomorphism, 34, 95
different iable, 150
differentiable map, 150
differentiation under the integral, 124
Dirac delta distribution, 173
Dirac distribution, 166
direct method, 183
Dirichlet boundary condition, 3, 26,
183, 190
Dirichlet principle, 199
Dirichlet' s integral, 199, 203
distance, 51
distance function from a smooth
hypersurface, 262
distributional derivative, 173
dual space, 133, 163
eiconal, 82
eiconal equation, 83, 86, 90
elementary catastrophes, 279
ellipticity assumption, 198
energy, 26, 30, 32, 34
e-minimizer, 229
equivalence classes of functions, 159
essential supremum, 162
Euler-Lagrange equation, 6, 8-10, 16,
17, 19, 21-23, 29, 38, 60, 79, 80, 83,
88, 89, 111, 197, 267, 282, 303
example of Bolza, 206
extension, 130
Federer, 261
feedback control, 109
Fermat' s principle, 4
field of geodesies, 46
field of solutions, 90, 93
finite perimeter, 244
first axiom of count ability, 137, 184,
185, 209, 225, 227, 228
first conjugate point, 23
first integral of motion, 30
flow, 298
foliated by tori, 100
Frechet different iable, 150
Fredholm alternative, 149
Fredholm operator, 147-149, 270, 281,
287
free boundary condition, 26
Friedrichs mollifier, 166
fundamental lemma of the calculus of
variations, 5
T-convergence, 225, 227, 229, 231
generating function, 100
genus, 310, 311, 313
genus of Krasnoselskij, 310
geodesic, 39, 43, 45, 50, 51, 55, 57, 58,
60, 88, 102
geodesic distance, 82, 90, 93
geodesic parallel coordinates, 45, 49
geometric optics, 86
gradient, 294, 299
gradient flow, 294
great circle, 42
Holder continuous, 179
Holder' s inequality, 160, 163
Hahn-Banach theorem, 129, 134, 137,
143, 166
Hamilton-Jacobi equation, 83-86, 89,
92, 93, 101
Hamilton-Jacobi theory, 111
Hamiltonian, 80, 89
Hamiltonian flow, 95, 98
harmonic, 199, 201
harmonic oscillator, 87
Hessian, 4
Hilbert space, 126, 128, 141, 162, 293,
297
Hilbert' s invariant integral, 92
homogenization, 232
implicit function theorem, 151, 152
index, 147, 308, 311, 313, 318
indicator function, 210
inner radius, 70
insulating layer, 235
integrable, 120
integral, 155
integral of motion, 27
integral of the Hamiltonian flow, 99
invariant integral, 93
inverse function theorem, 154
inverse operator theorem, 145
involution, 308
isometry, 34
Jacobi, 22
Jacobi equation, 20, 24, 268
Jacobi field, 20-22, 24, 269
Jacobi identity, 103
Jacobi operator, 268, 284
Jacobi' s method, 99
Jensen' s inequality, 122
Jordan curve, 35
Jordan curve Theorem, 68
322 Index
Kakutani, 139
Kepler problem, 102
Kolmogorov-Arnold-Moser theory, 100
Kondrachev, 175
Lagrange multiplier, 9
Laplace operator, 199, 200
Lebesgue integral, 117, 120
Lebesgue measure, 117, 118
Legendre condition, 20, 112
Legendre transformation, 79, 88
length, 32, 34
length minimizing curve, 8
light ray, 4
limit cases of the Palais-Smale
condition, 317
linear functional, 132, 133, 241
linear functionals, 129
linear operator, 144
Lipschitz continuous, 155, 203
local chart, 25, 32
local minimum, 22
lower semicontinuity, 184
lower semicontinuous, 185, 186, 188,
193, 208, 230
lower semicontinuous w.r.t. weak
convergence, 187
lower semicontinuous envelope, 208
Lyapunov-Schmid, 280
Lyapunov-Schmid reduction, 269
Lyusternik, 306
Lyusternik-Schnirelman, 67
mean curvature, 263
mean value property, 201
measurable, 118-120
measure, 117
metric tensor, 33, 47
minimal hypersurface, 255
minimal hypersurfaces, 203
minimal surface of revolution, 282
minimax value, 303
minimizer, 4-6, 12, 183, 186, 229, 291,
302
minimizer of a convex variational
problem, 189
minimizing, 3
minimizing sequence, 183
Minkowski functional, 143
Minkowski' s inequality, 160
Modica, 254
Mobius strip, 75, 76
mollification, 167, 174, 175, 200, 245
momenta, 80
momentum, 26, 28, 30
monotonically increasing sequence, 122
Moreau-Yosida approximation, 190
Moreau-Yosida transform, 212
Morrey, 222
mountain pass theorem, 302, 303, 306,
318
neighbourhood system, 184
Newtonian motion, 81
Noether, 26
nonminimizing critical point, 291, 302
norm, 125
norm convergence, 125, 132
norm of a linear functional, 133
normed space, 125
null class, 159
null function, 159
optimal control theory, 111, 207
ordinary differential equation, 155
ordinary differential equations in
Banach spaces, 155
orthogonal, 90
orthogonal complement, 141
Palais, 299
Palais-Smale condition, 77, 292, 293,
304, 306, 312, 317
parallel surfaces, 92
parallelogram identity, 128
parameterization invariant, 34
parameterized by arc-length, 8, 35, 36,
43, 88
parameterized proportionally to
arc-length, 35, 38, 55, 89
perimeter, 244
phase space, 98, 100
phase transition, 254
Picard-Lindelof theorem, 155
Poincare* inequality, 177, 304
Poisson bracket, 102
polar coordinate, 49
polar coordinates, 48
Pontryagin function, 110, 111
Pontryagin maximum principle, 110-112
principal curvature, 263
projection theorem, 142
proper, 62
pseudo-gradient, 299, 300
quasiconvex, 219, 222
quasilinear partial differential equation,
198
Rabinowitz, 302, 306
Radon measure, 118, 241
range, 147
rectifiable, 35
Index
323
reflexive, 134, 135, 137-139, 163, 174,
186
regularity, 11
regularity theory, 198, 286, 306, 316
regularizing term, 255
relative minimum, 62, 66
relatively compact, 167
relaxation, 208
relaxed function, 208, 214
relaxed functional, 209
Rellich, 175
Rellich-Kondrachev theorem, 305
reparameterization, 8
Riccati equation, 86, 108
Riemannian manifold, 43, 52, 53
Riemannian normal coordinates, 48
Riemannian polar coordinate, 49, 51, 60
Riesz representation theorem, 241
rotational invariance, 200
Sard' s theorem, 250, 257
scalar product, 126
Schnirelman, 306
Schwarz inequality, 35, 127
second axiom of countability, 184
second variation, 18, 23
semigroup family, 299
semigroup property, 157, 294, 297, 300
separable, 135, 169, 173, 184, 186
shortest geodesic, 52, 53, 55
shortest length, 50
er- algebra, 117
signed measure, 242
simple function, 119
smoothing kernel, 166
Sobolev Embedding Theorem, 175, 179,
303, 305
Sobolev inequalities, 179
Sobolev space, 171, 173
special point, 306-309, 312
special value, 306, 307
sphere, 39
star shaped, 316
state variable, 207
step function, 119
strictly normed, 157
strong convergence, 125, 174
submanifold, 24, 32, 43, 52, 53
summation convention, xv, 19
support, 166
surface of revolution, 60, 282
symmetry assumption, 308-310
symplectic geometry, 96
symplectomorphism, 97
Taylor expansion, 274
test functions, 166
theorem of B. Levi, 122
theorem of Clarkson, 164
theorem of de Giorgi and Nash, 198
theorem of E. Noether, 28
theorem of Fatou, 123
theorem of Fubini, 122
theorem of Helly, 132
theorem of Jacobi, 84, 93, 101
theorem of Kondrachev, 180
theorem of Lebesgue, 123
theorem of Liouville, 98
theorem of Lyusternik-Schnirelman, 67
theorem of Mazur, 142
theorem of Milman, 139
theorem of Modica-Mortola, 248
theorem of Morrey, 179
theorem of Picard-Lindelof, 39, 155
theorem of Pohozaev, 316
theorem of Rellich, 175
theorem of Riesz, 141
theorem of Riesz-Fischer, 161
theorem of Sobolev, 179
theorem on dominated convergence, 123
theory of catastrophes, 279
Thorn, 279
topological space, 185
translation invariance, 118
transversality condition, 110
triangle inequality, 125, 126, 159
uniform convergence, 126, 168
uniformly continuous, 168
uniformly convex, 127, 129, 139, 157,
164
unstable critical point, 291
variational problem, 9
volume preserving, 98
weak convergence, 135-137, 142, 174,
186, 214
weak* convergence, 135
weak* convergent, 135
weak derivative, 171, 172
weak limit, 138
weak solution, 306, 315
weak solution of the Jacobi equation,
285
weak topology, 291
weak* topology, 137
weakly convergent, 135, 136
weakly lower semicontinuous, 222
weakly proper, 185
Weierstrafi, 46
Weierstrass approximation theorem, 170
Weierstrafi condition, 112
Weyl' s lemma, 199
Young' s inequality, 160
Zorn' s lemma, 131

Das könnte Ihnen auch gefallen