100%(2)100% fanden dieses Dokument nützlich (2 Abstimmungen)

52 Ansichten397 Seiten© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

100%(2)100% fanden dieses Dokument nützlich (2 Abstimmungen)

52 Ansichten397 Seiten© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 397

M820

The Calculus of Variations and

Advanced Calculus

Course Notes

Prepared by

Prof D. Richards

(December 2009)

2.1

First published 2008.

Second edition 2009.

The Open University

2.1

Contents

1 Preliminary Analysis 7

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Notation and preliminary remarks . . . . . . . . . . . . . . . . . . . . . 10

1.2.1 The Order notation . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Functions of a real variable . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.2 Continuity and Limits . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.3 Monotonic functions and inverse functions . . . . . . . . . . . . . 17

1.3.4 The derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3.5 Mean Value Theorems . . . . . . . . . . . . . . . . . . . . . . . . 22

1.3.6 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.3.7 Implicit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.3.8 Taylor series for one variable . . . . . . . . . . . . . . . . . . . . 31

1.3.9 Taylor series for several variables . . . . . . . . . . . . . . . . . . 36

1.3.10 L’Hospital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

1.3.11 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.4 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.2 General definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.3 First-order equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.3.1 Existence and uniqueness of solutions . . . . . . . . . . . . . . . 60

2.3.2 Separable and homogeneous equations . . . . . . . . . . . . . . . 62

2.3.3 Linear first-order equations . . . . . . . . . . . . . . . . . . . . . 63

2.3.4 Bernoulli’s equation . . . . . . . . . . . . . . . . . . . . . . . . . 65

2.3.5 Riccati’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2.4 Second-order equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.4.2 General ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.4.3 The Wronskian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.4.4 Second-order, constant coefficient equations . . . . . . . . . . . . 76

2.4.5 Inhomogeneous equations . . . . . . . . . . . . . . . . . . . . . . 78

2.4.6 The Euler equation . . . . . . . . . . . . . . . . . . . . . . . . . . 80

2.5 An existence and uniqueness theorem . . . . . . . . . . . . . . . . . . . 81

1

2 CONTENTS

2.7 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2.7.1 Applications of differential equations . . . . . . . . . . . . . . . . 91

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.2 The shortest distance between two points in a plane . . . . . . . . . . . 93

3.2.1 The stationary distance . . . . . . . . . . . . . . . . . . . . . . . 94

3.2.2 The shortest path: local and global minima . . . . . . . . . . . . 96

3.2.3 Gravitational Lensing . . . . . . . . . . . . . . . . . . . . . . . . 98

3.3 Two generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.3.1 Functionals depending only upon y 0 (x) . . . . . . . . . . . . . . . 99

3.3.2 Functionals depending upon x and y 0 (x) . . . . . . . . . . . . . . 101

3.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.5 Examples of functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.5.1 The brachistochrone . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.5.2 Minimal surface of revolution . . . . . . . . . . . . . . . . . . . . 106

3.5.3 The minimum resistance problem . . . . . . . . . . . . . . . . . . 106

3.5.4 A problem in navigation . . . . . . . . . . . . . . . . . . . . . . . 110

3.5.5 The isoperimetric problem . . . . . . . . . . . . . . . . . . . . . . 110

3.5.6 The catenary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

3.5.7 Fermat’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.5.8 Coordinate free formulation of Newton’s equations . . . . . . . . 114

3.6 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4.2 Preliminary remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4.2.1 Relation to differential calculus . . . . . . . . . . . . . . . . . . . 122

4.2.2 Differentiation of a functional . . . . . . . . . . . . . . . . . . . . 123

4.3 The fundamental lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

4.4 The Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . 128

4.4.1 The first-integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.5 Theorems of Bernstein and du Bois-Reymond . . . . . . . . . . . . . . . 134

4.5.1 Bernstein’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.6 Strong and Weak variations . . . . . . . . . . . . . . . . . . . . . . . . . 137

4.7 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.2 The brachistochrone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.2.1 The cycloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5.2.2 Formulation of the problem . . . . . . . . . . . . . . . . . . . . . 149

5.2.3 A solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.3 Minimal surface of revolution . . . . . . . . . . . . . . . . . . . . . . . . 154

5.3.1 Derivation of the functional . . . . . . . . . . . . . . . . . . . . . 155

5.3.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5.3.3 The solution in a special case . . . . . . . . . . . . . . . . . . . . 157

CONTENTS 3

5.4 Soap Films . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

5.5 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.2 Invariance of the Euler-Lagrange equation . . . . . . . . . . . . . . . . . 173

6.2.1 Changing the independent variable . . . . . . . . . . . . . . . . . 174

6.2.2 Changing both the dependent and independent variables . . . . . 176

6.3 Functionals with many dependent variables . . . . . . . . . . . . . . . . 181

6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

6.3.2 Functionals with two dependent variables . . . . . . . . . . . . . 182

6.3.3 Functionals with many dependent variables . . . . . . . . . . . . 185

6.3.4 Changing dependent variables . . . . . . . . . . . . . . . . . . . . 186

6.4 The Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.5 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.2 Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.2.1 Invariance under translations . . . . . . . . . . . . . . . . . . . . 196

7.3 Noether’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7.3.1 Proof of Noether’s theorem . . . . . . . . . . . . . . . . . . . . . 205

7.4 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

8.2 Stationary points of functions of several variables . . . . . . . . . . . . . 210

8.2.1 Functions of one variable . . . . . . . . . . . . . . . . . . . . . . 210

8.2.2 Functions of two variables . . . . . . . . . . . . . . . . . . . . . . 211

8.2.3 Functions of n variables . . . . . . . . . . . . . . . . . . . . . . . 212

8.3 The second variation of a functional . . . . . . . . . . . . . . . . . . . . 215

8.3.1 Short intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

8.3.2 Legendre’s necessary condition . . . . . . . . . . . . . . . . . . . 218

8.4 Analysis of the second variation . . . . . . . . . . . . . . . . . . . . . . . 220

8.4.1 Analysis of the second variation . . . . . . . . . . . . . . . . . . . 222

8.5 The Variational Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 226

8.6 The Brachistochrone problem . . . . . . . . . . . . . . . . . . . . . . . . 229

8.7 Surface of revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

8.8 Jacobi’s equation and quadratic forms . . . . . . . . . . . . . . . . . . . 232

8.9 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

9.1 Introduction: parametric equations . . . . . . . . . . . . . . . . . . . . . 239

9.1.1 Lengths and areas . . . . . . . . . . . . . . . . . . . . . . . . . . 241

9.2 The parametric variational problem . . . . . . . . . . . . . . . . . . . . 244

9.2.1 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

9.2.2 The Brachistochrone problem . . . . . . . . . . . . . . . . . . . . 250

4 CONTENTS

9.3 The parametric and the conventional formulation . . . . . . . . . . . . . 251

9.4 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

10.2 Natural boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . 259

10.2.1 Natural boundary conditions for the loaded beam . . . . . . . . . 263

10.3 Variable end points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

10.4 Parametric functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

10.5 Weierstrass-Erdmann conditions . . . . . . . . . . . . . . . . . . . . . . 271

10.5.1 A taut wire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

10.5.2 The Weierstrass-Erdmann conditions . . . . . . . . . . . . . . . . 273

10.5.3 The parametric form of the corner conditions . . . . . . . . . . . 277

10.6 Newton’s minimum resistance problem . . . . . . . . . . . . . . . . . . . 277

10.7 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

11.2 The Lagrange multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

11.2.1 Three variables and one constraint . . . . . . . . . . . . . . . . . 291

11.2.2 Three variables and two constraints . . . . . . . . . . . . . . . . 293

11.2.3 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

11.3 The dual problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

11.4 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

12.2 Conditional Stationary values of functionals . . . . . . . . . . . . . . . . 300

12.2.1 Functional constraints . . . . . . . . . . . . . . . . . . . . . . . . 300

12.2.2 The dual problem . . . . . . . . . . . . . . . . . . . . . . . . . . 304

12.2.3 The catenary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

12.3 Variable end points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

12.4 Broken extremals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

12.5 Parametric functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

12.6 The Lagrange problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

12.6.1 A single non-holonomic constraint . . . . . . . . . . . . . . . . . 317

12.6.2 An example with a single holonomic constraint . . . . . . . . . . 318

12.7 Brachistochrone in a resisting medium . . . . . . . . . . . . . . . . . . . 319

12.8 Brachistochrone with Coulomb friction . . . . . . . . . . . . . . . . . . . 329

12.9 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

13.2 The origin of Sturm-Liouville systems . . . . . . . . . . . . . . . . . . . 342

13.3 Eigenvalues and functions of simple systems . . . . . . . . . . . . . . . . 348

13.3.1 Bessel functions (optional) . . . . . . . . . . . . . . . . . . . . . . 353

13.4 Sturm-Liouville systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

CONTENTS 5

13.4.2 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . 363

13.4.3 The oscillation theorem (optional) . . . . . . . . . . . . . . . . . 365

13.5 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

14.2 Basic ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

14.3 Eigenvalues and eigenfunctions . . . . . . . . . . . . . . . . . . . . . . . 379

14.4 The Rayleigh-Ritz method . . . . . . . . . . . . . . . . . . . . . . . . . . 384

14.5 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388

15.1 Solutions for chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

15.2 Solutions for chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

15.3 Solutions for chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461

15.4 Solutions for chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471

15.5 Solutions for chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

15.6 Solutions for chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510

15.7 Solutions for chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523

15.8 Solutions for chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531

15.9 Solutions for chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544

15.10 Solutions for chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 554

15.11 Solutions for chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 574

15.12 Solutions for chapter 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . 583

15.13 Solutions for chapter 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

15.14 Solutions for chapter 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . 622

6 CONTENTS

Chapter 1

Preliminary Analysis

1.1 Introduction

This course is about two related mathematical concepts which are of use in many areas

of applied mathematics, are of immense importance in formulating the laws of theoret-

ical physics and also produce important, interesting and some unsolved mathematical

problems. These are the functional and variational principles : the theory of these

entities is named The Calculus of Variations.

A functional is a generalisation of a function of one or more real variables. A real

function of a single real variable maps an interval of the real line to real numbers: for

instance, the function (1 + x2 )−1 maps the whole real line to the interval (0, 1]; the

function ln x maps the positive real axis to the whole real line. Similarly a real function

of n real variables maps a domain of Rn into the real numbers.

A functional maps a given class of functions to real numbers. A simple example of

a functional is

Z 1 p

S[y] = dx 1 + y 0 (x)2 , y(0) = 0, y(1) = 1, (1.1)

0

which associates a real number with any real function y(x) which satisfies the boundary

conditions and for which the integral exists. We use the square bracket notation 1 S[y]

to emphasise the fact that the functional depends upon the choice of function used to

evaluate the integral. In chapter 3 we shall see that a wide variety of problems can be

described in terms of functionals. Notice that the boundary conditions, y(0) = 0 and

y(1) = 1 in this example, are often part of the definition of the functional.

Real functions of n real variables can have various properties; for instance they

can be continuous, they may be differentiable or they may have stationary points and

local and global maxima and minima: functionals share many of these properties. In

1 In this course we use conventions common in applied mathematics and theoretical physics. A

function of a real variable x will usually be represented by symbols such as f (x) or just f , often

with no distinction made between the function and its value; as is often the case it is often clearer

to use context to provide meaning, rather than precise definitions, which initially can hinder clarity.

Similarly, we use the older convention, S[y], for a functional, to emphasise that y is itself a function;

this distinction is not made in modern mathematics. For an introductory course we feel that the older

convention, used in most texts, is clearer and more helpful.

7

8 CHAPTER 1. PRELIMINARY ANALYSIS

the theory of functionals and this gives rise to the idea of a variational principle, which

arises when the solution to a problem is given by the function making a particular

functional stationary. Variational principles are common and important in the natural

sciences.

The simplest example of a variational principle is that of finding the shortest distance

between two points. Suppose the two points lie in a plane, with one point at the origin,

O, and the other at point A with coordinates (1, 1), then if y(x) represents a smooth

curve passing through O and A the distance between O and A, along this curve is given

by the functional defined in equation 1.1. The shortest path is that which minimises the

value of S[y]. If the surface is curved, for instance a sphere or ellipsoid, the equivalent

functional is more complicated, but the shortest path is that which minimises it.

Variational principles are important for three principal reasons. First, many prob-

lems are naturally formulated in terms of a functional and an associated variational

principle. Several of these will be described in chapter 3 and some solutions will be

obtained as the course develops.

Second, most equations of mathematical physics can be derived from variational

principles. This is important partly because it suggests a unifying theme in our descrip-

tion of nature and partly because such formulations are independent of any particular

coordinate system, so making the essential mathematical structure of the equations

more transparent and easier to understand. This aspect of the subject is not consid-

ered in this course; a good discussion of these problems can be found in Yourgrau and

Mandelstam (1968)2 .

Finally, variational principles provide powerful computational tools; we explore as-

pects of this theory in chapter 13.

Consider the problem of finding the shortest path between two points on a curved

surface. The associated functional assigns a real number to each smooth curve joining

the points. A first step to solving this problem is to find the stationary values of the

functional; it is then necessary to decide which of these provides the shortest path. This

is very similar to the problem of finding extreme values of a function of n variables,

where we first determine the stationary points and then classify them: the important

and significant difference is that the space of allowed functions is not usually finite

in dimension. The infinite dimensional spaces of functions, with which we shall be

dealing, has many properties similar to those possessed by finite dimensional spaces,

and in the many problems the difference is not significant. However, this generalisation

does introduce some practical and technical difficulties some of which are discussed in

section 4.6. In this chapter we review calculus in order to prepare for these more general

ideas of calculus.

In elementary calculus and analysis, the functions studied first are ‘real functions, f ,

of one real variable’, that is, functions with domain either R, or a subset of R, and

codomain R. Without any other restrictions on f , this definition is too general to be

useful in calculus and applied mathematics. Most functions of one real variable that

are of interest in applications have smooth graphs, although sometimes they may fail

to be smooth at one or more points where they have a ‘kink’ (fail to be differentiable),

or even a break (where they are discontinuous). This smooth behaviour is related to

2 Yourgrau W and Mandelstram S Variational Principles in Dynamics and Quantum Theory (Pit-

man).

1.1. INTRODUCTION 9

the fact that most important functions of one variable describe physical phenomena

and often arise as solutions of ordinary differential equations. Therefore it is usual to

restrict attention to functions that are differentiable or, more usually, differentiable a

number of times.

The most useful generalisation of differentiability to functions defined on sets other

than R requires some care. It is not too hard in the case of functions of several (real)

variables but we shall have to generalise differentiation and integration to functionals,

not just to functions of several real variables.

Our presentation conceals very significant intellectual achievements made at the

end of the nineteenth century and during the first half of the twentieth century. During

the nineteenth century, although much work was done on particular equations, there

was little systematic theory. This changed when the idea of infinite dimensional vector

spaces began to emerge. Between 1900 and 1906, fundamental papers appeared by

Fredholm3 , Hilbert4 , and Fréchet5 . Fréchet’s thesis gave for the first time definitions of

limit and continuity that were applicable in very general sets. Previously, the concepts

had been restricted to special objects such as points, curves, surfaces or functions. By

introducing the concept of distance in more general sets he paved the way for rapid

advances in the theory of partial differential equations. These ideas together with the

theory of Lebesgue integration introduced in 1902, by Lebesgue in his doctoral thesis 6 ,

led to the modern theory of functional analysis. This is now the usual framework of

the theoretical study of partial differential equations. They are required also for an

elucidation of some of the difficulties in the Calculus of Variations. However, in this

introductory course, we concentrate on basic techniques of solving practical problems,

because we think this is the best way to motivate and encourage further study.

This preliminary chapter, which is assessed, is about real analysis and introduces

many of the ideas needed for our treatment of the Calculus of Variations. It is possible

that you are already familiar with the mathematics described in this chapter, in which

case you could start the course with chapter 2. You should ensure, however, that you

have a good working knowledge of differentiation, both ordinary and partial, Taylor

series of one and several variables and differentiation under the integral sign, all of

which are necessary for the development of the theory. In addition familiarity with the

theory of linear differential equations with both initial and boundary value problems is

assumed.

Very many exercises are set, in the belief that mathematical ideas cannot be un-

derstood without attempting to solve problems at various levels of difficulty and that

one learns most by making one’s own mistakes, which is time consuming. You should

not attempt all these exercise at a first reading, but these provide practice of essential

mathematical techniques and in the use of a variety of ideas, so you should do as many

as time permits; thinking about a problem, then looking up the solution is usually of

3 I. Fredholm, On a new method for the solution of Dirichlet’s problem, reprinted in Oeuvres

4 D. Hilbert published six papers between 1904 and 1906. They were republished as Grundzüge

einer allgemeinen Theorie der Integralgleichungen by Teubner, (Leipzig and Berlin), 1924. The most

crucial paper is the fourth.

5 M. Fréchet, Doctoral thesis, Sur quelques points du Calcul fonctionnel, Rend. Circ. mat. Palermo

22 (1906), pp 1-74.

6 H. Lebesgue, Doctoral thesis, Paris 1902, reprinted in Annali Mat. pura e appl., 7 (1902) pp

231-359.

10 CHAPTER 1. PRELIMINARY ANALYSIS

little value until you have attempted your own solution. The exercises at the end of

this chapter are examples of the type of problem that commonly occur in applications:

they are provided for extra practice if time permits and it is not necessary for you to

attempt them.

We start with a discussion about notation and some of the basic ideas used throughout

this course.

A real function of a single real variable, f , is a rule that maps a real number x

to a single real number y. This operation can be denoted in a variety of ways. The

approach of scientists is to write y = f (x) or just y(x), and the symbols y, y(x), f

and f (x) are all used to represent the function. Mathematics uses the more formal

and precise notation f : X → Y , where X and Y are subsets of the real line: the set

X is named the domain, or the domain of definition of f , and set Y the codomain.

With this notation the symbol f denotes the function and the symbol f (x) the value

of the function at the point x. In applications this distinction is not always made and

both f and f (x) are used to denote the function. In recent years this has come to be

regarded as heresy by some: however, there are good practical reasons for using this

freer notation that do not affect pure mathematics. In this text we shall frequently use

the Leibniz notation, f (x), and its extensions, because it generally provides a clearer

picture and is helpful for algebraic manipulations, such as when changing variables and

integrating by parts.

Moreover, in the sciences the domain and codomain are frequently omitted, either

because they are ‘obvious’ or because they are not known. But, perversely, the scientist,

by writing y = f (x), often distinguishes between the two variables x and y, by saying

that x is the independent variable and that y is the dependent variable because it depends

upon x. This labelling can be confusing, because the role of variables can change, but

it is also helpful because in physical problems different variables can play quite different

roles: for instance, time is normally an independent variable.

In pure mathematics the term graph is used in a slightly specialised manner. A graph

is the set of points (x, f (x)): this is normally depicted as a line in a plane using rect-

angular Cartesian coordinates. In other disciplines the whole figure is called the graph,

not the set of points, and the graph may be a less restricted shape than those defined

by functions; an example is shown in figure 1.5 (page 28).

Almost all the ideas associated with real functions of one variable generalise to

functions of several real variables, but notation needs to be developed to cope with this

extension. Points in Rn are represented by n-tuples of real numbers (x1 , x2 , . . . , xn ).

It is convenient to use bold faced symbols, x, a and so on, to denote these points,

so x = (x1 , x2 , . . . , xn ) and we shall write x and (x1 , x2 , . . . , xn ) interchangeably. In

hand-written text a bold character, x, is usually denoted by an underline, x.

A function f (x1 , x2 , . . . , xn ) of n real variables, defined on Rn , is a map from Rn , or a

subset, to R, written as f : Rn → R. Where we use bold face symbols like f or φ to refer

to functions, it means that the image under the function f (x) or φ(y) may be considered

as vector in Rm with m ≥ 2, so f : Rn → Rm ; in this course normally m = 1 or m = n.

Although the case m = 1 will not be excluded when we use a bold face symbol, we shall

continue to write f and φ where the functions are known to be real valued and not vector

1.2. NOTATION AND PRELIMINARY REMARKS 11

valued. We shall also write without further comment f (x) = (f1 (x), f2 (x), . . . , fm (x)),

so that the fi are the m component functions, fi : Rn → R, of f .

On the real line the distance between two points x and y is naturally defined by

|x − y|. A point x is in the open interval (a, b) if a < x < b, and is in the closed interval

[a, b] if a ≤ x ≤ b. By convention, the intervals (−∞, a), (b, ∞) and (−∞, ∞) = R are

also open intervals. Here, (−∞, a) means the set of all real numbers strictly less than

a. The symbol ∞ for ‘infinity’ is not a number, and its use here is conventional. In

the language and notation of set theory, we can write (−∞, a) = {x ∈ R : x < a}, with

similar definitions for the other two types of open interval. One reason for considering

open sets is that the natural domain of definition of some important functions is an

open set. For example, the function ln x as a function of one real variable is defined for

x ∈ (0, ∞).

The space of points Rn is an example of a linear space. Here the term linear has

the normal meaning that for every x, y in Rn , and for every real α, x + y and αx are

in Rn . Explicitly,

and

α(x1 , x2 , . . . , xn ) = (αx1 , αx2 , . . . , αxn ).

Functions f : Rn → Rm may also be added and multiplied by real numbers. Therefore

a function of this type may be regarded as a vector in the vector space of functions —

though this space is not finite dimensional like Rn .

In the space Rn the distance |x| of a point x from

p the origin is defined by the nat-

ural generalisation of Pythagoras’ theorem, |x| = x21 + x22 + · · · + x2n . The distance

between two vectors x and y is then defined by

q

2 2 2

|x − y| = (x1 − y1 ) + (x2 − y2 ) + · · · + (xn − yn ) . (1.2)

This is a direct generalisation of the distance along a line, to which it collapses when

n = 1.

This distance has the three basic properties

(b) |x − y| = |y − x|, (1.3)

(c) |x − y| + |y − z| ≥ |x − z|, (Triangle inequality).

In the more abstract spaces, such as the function spaces we need later, a similar concept

of a distance between elements is needed. This is named the norm and is a map from

two elements of the space to the positive real numbers and which satisfies the above

three rules. In function spaces there is no natural choice of the distance function and

we shall see in chapter 4 that this flexibility can be important.

For functions of several variables, that is, for functions defined on sets of points in

Rn , the direct generalization of open interval is an open ball.

Definition 1.1

The open ball Br (a) of radius r and centre a ∈ Rn is the set of points

12 CHAPTER 1. PRELIMINARY ANALYSIS

Thus the ball of radius 1 and centre (0, 0) in R2 is the interior of the unit circle, not

including the points on the circle itself. And in R, the ‘ball’ of radius 1 and centre 0

is the open interval (−1, 1). However, for R 2 and for Rn for n > 2, open balls are not

quite general enough. For example, the open square

{(x, y) ∈ R2 : |x| < 1, |y| < 1}

is not a ball, but in many ways is similar. (You may know for example that it may be

mapped continuously to an open ball.) It turns out that the most convenient concept

is that of open set 7 , which we can now define.

Definition 1.2

Open sets. A set U in Rn is said to be open if for every x ∈ U there is an open ball

Br (a) wholly contained within U which contains x.

In other words, every point in an open set lies in an open ball contained in the set.

Any open ball is in many ways like the whole of the space R n — it has no isolated or

missing points. Also, every open set is a union of open balls (obviously). Open sets

are very convenient and important in the theory of functions, but we cannot study the

reasons here. A full treatment of open sets can be found in books on topology8 . Open

balls are not the only type of open sets and it is not hard to show that the open square,

{(x, y) ∈ R2 : |x| < 1, |y| < 1}, is in fact an open set, according to the definition we gave;

and in a similar way it can be shown that the set {(x, y) ∈ R 2 : (x/a)2 + (y/b)2 < 1},

which is the interior of an ellipse, is an open set.

Exercise 1.1

Show that the open square is an open set by constructing explicitly for each (x, y)

in the open square {(x, y) ∈ R2 : |x| < 1, |y| < 1} a ball containing (x, y) and

lying in the square.

It is often useful to have a bound for the magnitude of p

a function that does not require

exact calculation. For example, the function f (x) = sin(x2 cosh x) − x2 cos x tends

to zero at a similar rate to x2 as x → 0 and this information is sometimes more helpful

than the detailed knowledge of the function. The order notation is designed for this

purpose.

Definition 1.3

Order notation. A function f (x) is said to be of order xn as x → 0 if there is a

non-zero constant C such that |f (x)| < C|xn | for all x in an interval around x = 0.

This is written as

f (x) = O(xn ) as x → 0. (1.4)

The conditional clause ‘as x → 0’ is often omitted when it is clear from the context.

More generally, this order notation can be used to compare the size of functions, f (x)

7 As with many other concepts in analysis, formulating clearly the concepts, in this case an open

8 See for example W A Sutherland, Introduction to Metric and Topological Spaces, Oxford University

Press.

1.2. NOTATION AND PRELIMINARY REMARKS 13

and g(x): we say that f (x) is of the order of g(x) as x → y if there is a non-zero

constant C such that |f (x)| < C|g(x)| for all x in an interval around y; more succinctly,

f (x) = O(g(x)) as x → y.

When used in the form f (x) = O(g(x)) as x → ∞, this notation means that

|f (x)| < C|g(x)| for all x > X, where X and C are positive numbers independent

of x.

This notation is particularly useful when truncating power series: thus, the series

for sin x up to O(x3 ) is written,

x3

sin x = x − + O(x5 ),

3!

meaning that the remainder is smaller than C|x|5 , as x → 0 for some C. Note that in

this course the phrase “up to O(x3 )” means that the x3 term is included. The following

exercises provide practice in using the O-notation and exercise 1.2 proves an important

result.

Exercise 1.2

Show that if f (x) = O(x2 ) as x → 0 then also f (x) = O(x).

Exercise 1.3

Use the binomial expansion to find the order of the following expressions as x → 0.

p x x3/2

(a) x 1 + x2 , (b) , (c) .

1+x 1 − e−x

Exercise 1.4

Use the binomial expansion to find the order of the following expressions as x → ∞.

x p

(a) , (b) 4x2 + x − 2x, (c) (x + b)a − xa , a > 0.

x−1

by using the distance |x|. Thus, we say that f (x) = O(|x|n ) if there is a non-zero

constant C and a small number δ such that |f (x)| < C|x|n for |x| < δ.

Exercise 1.5

(a) If f1 = x and f2 = y show that f1 = O(f ) and f2 = O(f ) where f (x, y) =

1

(x2 + y 2 ) 2 .

(b) Show that the polynomial φ(x, y) = ax2 + bxy + cy 2 vanishes to at least the

same order as the polynomial f (x, y) =px2 + y 2 at (0, 0). What conditions are

needed for φ to vanish faster than f as x2 + y 2 → 0?

f (x)

f (x) = o(|x|) which is shorthand for lim = 0.

|x|→0 |x|

Informally this means that f (x) vanishes faster than |x| as |x| → 0. More generally

f = o(g) if lim|x|→0 |f (x)/g(x)| = 0, meaning that f (x) vanishes faster than g(x) as

|x| → 0.

14 CHAPTER 1. PRELIMINARY ANALYSIS

1.3.1 Introduction

In this section we introduce important ideas pertaining to real functions of a single real

variable, although some mention is made of functions of many variables. Most of the

ideas discussed should be familiar from earlier courses in elementary real analysis or

Calculus, so our discussion is brief and all exercises are optional.

The study of Real Analysis normally starts with a discussion of the real number

system and its properties. Here we assume all necessary properties of this number

system and refer the reader to any basic text if further details are required: adequate

discussion may be found in the early chapters of the texts by Whittaker and Watson 9 ,

Rudin10 and by Kolmogorov and Fomin11 .

A continuous function is one whose graph has no vertical breaks: otherwise, it is dis-

continuous. The function f1 (x), depicted by the solid line in figure 1.1 is continuous

for x1 < x < x2 . The function f2 (x), depicted by the dashed line, is discontinuous at

x = c.

y

f 2 (x )

f 1 (x )

f 2 (x )

x

x1 c x2

Figure 1.1 Figure showing examples of a continuous

function, f1 (x), and a discontinuous function f2 (x).

A function f (x) is continuous at a point x = a if f (a) exists and if, given any arbitrarily

small positive number, , we can find a neighbourhood of x = a such that in it |f (x) −

f (a)| < . We can express this in terms of limits and since a point a on the real line

can be approached only from the left or the right a function is continuous at a point

x = a if it approaches the same value, independent of the direction. Formally we have

Definition 1.4

Continuity: a function, f , is continuous at x = a if f (a) is defined and

lim f (x) = f (a).

x→a

For a function of one variable, this is equivalent to saying that f (x) is continuous at

x = a if f (a) is defined and the left and right-hand limits

lim f (x) and lim f (x),

x→a− x→a+

9A

Course of Modern Analysis by E T Whittaker and G N Watson, Cambridge University Press.

10 Principles of Mathematical Analysis by W Rudin (McGraw-Hill).

11 Introductory Real Analysis by A N Kolmogorov and S V Fomin (Dover).

1.3. FUNCTIONS OF A REAL VARIABLE 15

If the left and right-hand limits exist but are not equal the function is discontinuous

at x = a and is said to have a simple discontinuity at x = a.

If they both exist and are equal, but do not equal f (a), then the function is said to

have a removable discontinuity at x = a.

Quite elementary functions exist for which neither limit exists: these are also dis-

continuous, and said to have a discontinuity of the second kind at x = a, see Rudin

(1976, page 94). An example of a function with such a discontinuity at x = 0 is

sin(1/x), x 6= 0,

f (x) =

0, x = 0.

We shall have no need to consider this type of discontinuity in this course, but simple

discontinuities will arise.

A function that behaves as

|f (x + ) − f (x)| = O() as → 0

p

is continuous, though the converse is not true, a counter example being f (x) = |x| at

x = 0.

Most functions that occur in the sciences are either continuous or piecewise continu-

ous, which means that the function is continuous except at a discrete set of points. The

Heaviside function and the related sgn functions are examples of commonly occurring

piecewise continuous functions that are discontinuous. They are defined by

1, x > 0, 1, x > 0,

H(x) = and sgn(x) = sgn(x) = −1 + 2H(x).

0, x < 0, −1, x < 0,

(1.5)

These functions are discontinous at x = 0, where they are not normally defined. In

some texts these functions are defined at x = 0; for instance H(0) may be defined to

have the value 0, 1/2 or 1.

If limx→c f (x) = A and limx→c g(x) = B, then it can be shown that the following

(obvious) rules are adhered to:

(a) lim (αf (x) + βg(x)) = αA + βB;

x→c

(b) lim (f (x)g(x)) = AB;

x→c

f (x) A

(c) lim = , if B 6= 0;

x→c g(x) B

(d) if lim f (x) = fB then lim (f (g(x)) = fB .

x→B x→c

The value of a limit is normally found by a combination of suitable re-arrangements

and expansions. An example of an expansion is

1 3

sinh ax ax + 3! (ax) + O(x5 )

lim = lim = lim a + O(x2 ) = a.

x→0 x x→0 x x→0

sinh ax sinh ax x sinh ax x a

lim = lim = lim lim = , (b 6= 0).

x→0 sinh bx x→0 x sinh bx x→0 x x→0 sinh bx b

16 CHAPTER 1. PRELIMINARY ANALYSIS

above and below and attains its bounds. It is important that the interval is closed; for

instance the function f (x) = x defined in the open interval 0 < x < 1 is bounded above

and below, but does not attain it bounds. This example may seem trivial, but similar

difficulties exist in the Calculus of Variations and are less easy to recognise.

Exercise 1.6

A function that is finite and continuous for all x is defined by

8

A

< 2 + x + B, 0 ≤ x ≤ a, a > 0,

>

>

x

f (x) =

: C + Dx,

>

> a ≤ x,

x2

where A, B, C, D and a are real numbers: if f (0) = 1 and limx→∞ f (x) = 0, find

these numbers.

Exercise 1.7

Find the limits of the following functions as x → 0 and w → ∞.

sin ax tan ax sin ax 3x + 4 “ z ”w

(a) , (b) , (c) , (d) , (e) 1 + .

x x sin bx 4x + 2 w

For functions of two or more variables, the definition of continuity is essentially the

same as for a function of one variable. A function f (x) is continuous at x = a if f (a)

is defined and

lim f (x) = f (a). (1.6)

x→a

Alternatively, given any > 0 there is a δ > 0 such that whenever |x − a| < δ,

|f (x) − f (a)| < .

It should be noted that if f (x, y) is continuous in each variable, it is not necessarily

continuous in both variables. For instance, consider the function

(x + y)2

, x2 + y 2 6= 0,

f (x, y) = x2 + y 2

1, x = y = 0,

(x + β)2

f (x, β) = = 1 + O(x) as x → 0

x2 + β 2

and f (x, 0) = 1 for all x: for any β this function is a continuous function of x. On the

line x + y = 0, however, f = 0 except at the origin so f (x, y) is not continuous along

this line. More generally, by putting x = r cos θ and y = r sin θ, −π < θ ≤ π, r 6=π0, we

2

can approach the origin from any angle. In this representation f = 2 sin θ + so

4

on any circle round the origin f takes any value between 0 and 2. Therefore f (x, y) is

not a continuous function of both x and y.

Exercise 1.8

Determine whether or not the following functions are continuous at the origin.

2xy x2 + y 2 2x2 y

(a) f = 2 2

, (b) f = 2 2

, (c) f = 2 .

x +y x −y x + y2

Hint use polar coordinates x = r cos θ, y = r sin θ and consider the limit r → 0.

1.3. FUNCTIONS OF A REAL VARIABLE 17

A function is said to be monotonic on an interval if it is always increasing or always

decreasing. Simple examples are f (x) = x and f (x) = exp(−x) which are mono-

tonic increasing and monotonic decreasing, respectively, on the whole line: the function

f (x) = sin x is monotonic increasing for −π/2 < x < π/2. More precisely, we have,

Definition 1.5

Monotonic functions: A function f (x) is monotonic increasing for a < x < b if

f (x1 ) ≤ f (x2 ) for a < x1 < x2 < b.

A monotonic decreasing function is defined in a similar way.

If f (x1 ) < f (x2 ) for a < x1 < x2 < b then f (x) is said to be strictly monotonic (in-

creasing) or strictly increasing ; strictly decreasing functions are defined in the obvious

manner.

sometimes important because on these intervals the inverse function exists. For instance

the function y = ex is monotonic increasing on the whole real line, R, and its inverse is

the well known natural logarithm, x = ln y, with y on the positive real line.

In general if f (x) is continuous and strictly monotonic on a ≤ x ≤ b and y = f (x)

the inverse function, x = f −1 (y), is continuous for f (a) ≤ y ≤ f (b) and satisfies

y = f (f −1 (y)). Moreover, if f (x) is strictly increasing so is f −1 (y).

Complications occur when a function is increasing and decreasing on neighbouring

intervals, for then the inverse may have two or more values. For example the function

f (x) = x2 is monotonic increasing for x > 0 and monotonic decreasing for x < 0: hence

√

the relation y = x2 has the two familiar inverses x = ± y, y ≥ 0. These two inverses are

often refered to as the different branches of the inverse; this idea is important because

most functions are monotonic only on part of their domain of definition.

Exercise 1.9

(a) Show that y = 3a2 x − x3 is strictly increasing for −a < x < a and that on this

interval y increases from −2a3 to 2a3 .

(b) By putting x = 2a sin φ and using the identity sin3 φ = (3 sin φ − sin 3φ)/4,

show that the equation becomes

„ “ y ”«

1

y = 2a3 sin 3φ and hence that x(y) = 2a sin sin−1 .

3 2a3

(c) Find the inverse for x > 2a. Hint put x = 2a cosh φ and use the relation

cosh3 φ = (cosh 3φ + 3 cosh φ)/4.

The notion of the derivative of a continuous function, f (x), is closely related to the

geometric idea of the tangent to a curve and to the related concept of the rate of

change of a function, so is important in the discussion of anything that changes. This

geometric idea is illustrated in figure 1.2: here P is a point with coordinates (a, f (a))

on the graph and Q is another point on the graph with coordinates (a + h, f (a + h)),

where h may be positive or negative.

18 CHAPTER 1. PRELIMINARY ANALYSIS

Q

f(a+h)

Tangent

P φ at P

f(a)

a a+h

Figure 1.2 Illustration showing the chord P Q and the tan-

gent line at P .

The gradient of the chord P Q is tan φ where φ is the angle between P Q and the x-axis,

and is given by the formula

f (a + h) − f (a)

tan φ = .

h

If the graph in the vicinity of x = a is represented by a smooth line, then it is intuitively

obvious that the chord P Q becomes closer to the tangent at P as h → 0; and in the

limit h = 0 the chord becomes the tangent. Hence the gradient of the tangent is given

by the limit

f (a + h) − f (a)

lim .

h→0 h

This limit, provided it exists, is named the derivative of f (x) at x = a and is commonly

df

denoted either by f 0 (a) or . Thus we have the formal definition:

dx

Definition 1.6

The derivative: A function f (x), defined on an open interval U of the real line, is

differentiable for x ∈ U and has the derivative f 0 (x) if

df f (x + h) − f (x)

f 0 (x) = = lim , (1.7)

dx h→0 h

exists.

If the derivative exists at every point in the open interval U the function f (x) is said

to be differentiable in U : in this case it may be proved that f (x) is also continuous.

However, a function that is continuous at a need not be differentiable at a: indeed,

it is possible to construct functions that are continuous everywhere but differentiable

nowhere; such functions are encountered in the mathematical description of Brownian

motion.

Combining the definition of f 0 (x) and the definition 1.3 of the order notation shows

that a differentiable function satisfies

The formal definition, equation 1.7, of the derivative can be used to derive all its useful

properties, but the physical interpretation, illustrated in figure 1.2, provides a more

useful way to generalise it to functions of several variables.

1.3. FUNCTIONS OF A REAL VARIABLE 19

The tangent line to the graph y = f (x) at the point a, which we shall consider to

be fixed for the moment, has slope f 0 (a) and passes through f (a). These two facts

determine the derivative completely. The equation of the tangent line can be written

in parametric form as p(h) = f (a) + f 0 (a) h. Conversely, given a point a, and the

equation of the tangent line at that point, the derivative, in the classical sense of the

definition 1.6, is simply the slope, f 0 (a), of this line. So the information that the

derivative of f at a is f 0 (a) is equivalent to the information that the tangent line at

a has equation p(h) = f (a) + f 0 (a) h. Although the classical derivative, equation 1.7,

is usually taken to be the fundamental concept, the equivalent concept of the tangent

line at a point could be considered equally fundamental - perhaps more so, since a

tangent is a more intuitive idea than the numerical value of its slope. This is the key

to successfully defining the derivative of functions of more than one variable.

From the definition 1.6 the following useful results follow. If f (x) and g(x) are

differentiable on the same open interval and α and β are constants then

d

(a) αf (x) + βg(x) = αf 0 (x) + βg 0 (x),

dx

d

(b) f (x)g(x) = f 0 (x)g(x) + f (x)g 0 (x), (The product rule)

dx

f 0 (x)g(x) − f (x)g 0 (x)

d f (x)

(c) = , g(x) 6= 0. (The quotient rule)

dx g(x) g(x)2

We leave the proof of these results to the reader, but note that the differential of 1/g(x)

follows almost trivially from the definition 1.6, exercise 1.14, so that the third expression

is a simple consequence of the second.

The other important result is the chain rule concerning the derivative of composite

functions. Suppose that f (x) and g(x) are two differentiable functions and a third is

formed by the composition,

F (x) = f (g(x)), sometimes written as F = f ◦ g,

which we assume to exist. Then the derivative of F (x) can be shown, as in exercise 1.18,

to be given by

dF df dg

= × or F 0 (x) = f 0 (g)g 0 (x). (1.9)

dx dg dx

This formula is named the chain rule. Note how the prime-notation is used: it denotes

the derivative of the function with respect to the argument shown, not necessarily the

original independent variable, x. Thus f 0 (g) or f 0 (g(x)) does not mean the derivative

of F (x); it means the derivative f 0 (x) with x replaced by g or g(x).

A simple example should make this clear: suppose f (x) = sin x and g(x) = 1/x,

x > 0, so F (x) = sin(1/x). The chain rule gives

dF d d 1 1 1 1

= (sin g) × = cos g × − 2 = − 2 cos .

dx dg dx x x x x

The derivatives of simple functions, polynomials and trigometric functions for instance,

can be deduced from first principles using the definition 1.6: the three rules, given above,

and the chain rule can then be used to find the derivative of any function described with

finite combinations of these simple functions. A few exercises will make this process

clear.

20 CHAPTER 1. PRELIMINARY ANALYSIS

Exercise 1.10

Find the derivative of the following functions

p

a sin2 x + b cos2 x , (c) cos(x3 ) cos x , (d) xx .

p

(a) (a − x)(b + x) , (b)

Exercise 1.11

dx 1

If y = sin x for −π/2 ≤ x ≤ π/2 show that = p .

dy 1 − y2

Exercise 1.12

(a) If y = f (x) has the inverse x = g(y), show that f 0 (x)g 0(y) = 1, that is

„ «−1

dx dy

= .

dy dx

d2 x dy d2 y

(b) Express 2

in terms of and .

dy dx dx2

which is denoted by

d2 f

f 00 (x) or .

dx2

This process can be continued to obtain the functions

df d2 f d3 f dn−1 f dn f

f, , , , · · · , , ··· ,

dx dx2 dx3 dxn−1 dxn

where each member of the sequence is the derivative of the preceeding member,

dp f

p−1

d d f

p

= , p = 2, 3, · · · .

dx dx dxp−1

The prime notation becomes rather clumsy after the second or third derivative, so the

most common alternative is

dp f

= f (p) (x), p ≥ 2,

dxp

with the conventions f (1) (x) = f 0 (x) and f (0) (x) = f (x). Care is needed to distinguish

between the pth derivative, f (p) (x), and the pth power, denoted by f (x)p and sometimes

f p (x) — the latter notation should be avoided if there is any danger of confusion.

Functions for which the nth derivative is continuous are said to be n-differentiable

and to belong to class Cn : the notation Cn (U ) means the first n derivatives are continu-

ous on the interval U : the notation Cn (a, b) or Cn [a, b], with obvious meaning, may also

be used. The term smooth function describes functions belonging to C∞ , that is func-

tions, such as sin x, having all derivatives; we shall, however, use the term sufficiently

smooth for functions that are sufficiently differentiable for all subsequent analysis to

work, when more detail is deemed unimportant.

In the following exercises some important, but standard, results are derived.

1.3. FUNCTIONS OF A REAL VARIABLE 21

Exercise 1.13

If f (x) is an even (odd) function, show that f 0 (x) is an odd (even) function.

Exercise 1.14

f 0 (x)

„ «

d 1

Show, from first principles using the limit 1.7, that =− , and

dx f (x) f (x)2

that the product rule is true.

Exercise 1.15

Leibniz’s rule

If h(x) = f (x)g(x) show that

(3)

h (x) = f (3) (x)g(x) + 3f 00 (x)g 0 (x) + 3f 0 (x)g 00 (x) + f (x)g (3) (x),

n „ «

X n

h(n) (x) = f (n−k) (x)g (k) (x),

k

k=0

„ «

n n!

where the binomial coefficients are given by = .

k k! (n − k)!

Exercise 1.16

d f 0 (x)

Show that ln(f (x)) = and hence that if

dx f (x)

p0 f0 f0 f0

p(x) = f1 (x)f2 (x) · · · fn (x) then = 1 + 2 + ··· + n,

p f1 f2 fn

provided p(x) 6= 0. Note that this gives an easier method of differentiating prod-

ucts of three or more factors than repeated use of the product rule.

Exercise 1.17

If the elements of a determinant D(x) are differentiable functions of x,

˛ ˛

˛ f (x) g(x) ˛˛

D(x) = ˛˛

φ(x) ψ(x) ˛

show that ˛ 0

g 0 (x) ˛˛ ˛˛ f (x)

˛ ˛ ˛

˛ f (x) g(x) ˛˛

D0 (x) = ˛˛ + .

φ(x) ψ(x) ˛ ˛ φ0 (x) ψ 0 (x) ˛

Extend this result to third-order determinants.

22 CHAPTER 1. PRELIMINARY ANALYSIS

If a function f (x) is sufficiently smooth for all points inside the interval a < x < b,

its graph is a smooth curve12 starting at the point A = (a, f (a)) and ending at B =

(b, f (b)), as shown in figure 1.3.

f(b) P B

A Q

f(a)

a b

Figure 1.3 Diagram illustrating Cauchy’s form

of the mean value theorem.

From this figure it seems plausible that the tangent to the curve must be parallel to

the chord AB at least once. That is

f (b) − f (a)

f 0 (x) = for some x in the interval a < x < b. (1.10)

b−a

where θ is a number in the interval 0 < θ < 1, and is normally unknown. This relation

is used frequently throughout the course. Note that equation 1.11 shows that between

zeros of a continuous function there is at least one point at which the derivative is zero.

Equation 1.10 can be proved and is enshrined in the following theorem

Theorem 1.1

The Mean Value Theorem (Cauchy’s form). If f (x) and g(x) are real and differen-

tiable for a ≤ x ≤ b, then there is a point u inside the interval at which

f (b) − f (a) g 0 (u) = g(b) − g(a) f 0 (u), a < u < b. (1.12)

A similar idea may be applied to integrals. In figure 1.4 is shown a typical continuous

function, f (x), which attains its smallest and largest values, S and L respectively, on

the interval a ≤ x ≤ b.

12 A smooth curve is one along which its tangent changes direction continuously, without abrupt

changes.

1.3. FUNCTIONS OF A REAL VARIABLE 23

L

f(x)

a b

Figure 1.4 Diagram showing the upper and

lower bounds of f (x) used to bound the integral.

It is clear that the area under the curve is greater than (b − a)S and less than (b − a)L,

that is Z b

(b − a)S ≤ dx f (x) ≤ (b − a)L.

a

Z b

dx f (x) = (b − a)f (ξ) for some ξ ∈ [a, b]. (1.13)

a

Theorem 1.2

The Mean Value theorem (integral form). If, on the closed interval a ≤ x ≤ b, f (x)

is continuous and φ(x) ≥ 0 then there is an ξ satisfying a ≤ ξ ≤ b such that

Z b Z b

dx f (x)φ(x) = f (ξ) dx φ(x). (1.14)

a a

Exercise 1.18

The chain rule

In this exercise the Mean Value Theorem is used to derive the chain rule, equa-

tion 1.9, for the derivative of F (x) = f (g(x)).

Use the mean value theorem to show that

“ ”

F (x + h) − F (x) = f g(x) + hg 0 (x + hθ) − f (g(x))

and that

“ ”

f g(x) + hg 0 (x + hθ) = f (g(x)) + hg 0 (x + hθ) f 0 (g + hφg 0 )

F (x + h) − F (x)

= f 0 (g + hφg 0 ) g 0 (x + hθ),

h

and by taking the limit h → 0 derive equation 1.9.

24 CHAPTER 1. PRELIMINARY ANALYSIS

Exercise 1.19

Use the integral form of the mean value theorem, equation 1.13, to evaluate the

limits,

1 x p

Z Z x

1

dt ln 3t − 3t2 + t3 .

` ´

(a) lim dt 4 + 3t3 , (b) lim

x→0 x 0 x→1 (x − 1)3 1

Here we consider functions of two or more variables, in order to introduce the idea of

a partial derivative. If f (x, y) is a function of the two, independent variables x and

y, meaning that changes in one do not affect the other, then we may form the partial

derivative of f (x, y) with respect to either x or y using a minor modification of the

definition 1.6 (page 18).

Definition 1.7

The partial derivative of a function f (x, y) of two variables with respect to the first

variable x is

∂f f (x + h, y) − f (x, y)

= fx (x, y) = lim .

∂x h→0 h

In the computation of fx the variable y is unchanged.

Similarly, the partial derivative with respect to the second variable y is

∂f f (x, y + k) − f (x, y)

= fy (x, y) = lim .

∂y k→0 k

In the computation of fy the variable x is unchanged.

We use the conventional notation, ∂f /∂x, to denote the partial derivative with respect

to x, which is formed by fixing y and using the rules of ordinary calculus for the deriva-

tive with respect to x. The suffix notation, fx (x, y), is used to denote the same function:

here the suffix x shows the variable being differentiated, and it has the advantage that

when necessary it can be used in the form fx (a, b) to indicate that the partial derivative

fx is being evaluated at the point (a, b).

In practice the evaluation of partial derivatives is exactly the same as ordinary

derivatives and the same rules apply. Thus if f (x, y) = xey ln(2x + 3y) then the partial

derivatives with respect to x and y are, repectively

∂f 2xey ∂f 3xey

= ey ln(2x + 3y) + and = xey ln(2x + 3y) + .

∂x 2x + 3y ∂y 2x + 3y

Exercise 1.20

(a) If u = x2 sin(ln y) compute ux and uy .

∂r x ∂r y

(b) If r 2 = x2 + y 2 show that = and = .

∂x r ∂y r

The partial derivatives are also functions of x and y, so may be differentiated again.

Thus we have

∂2f ∂2f

∂ ∂f ∂ ∂f

= = f xx (x, y) and = = fyy (x, y). (1.15)

∂x ∂x ∂x2 ∂y ∂y ∂y 2

1.3. FUNCTIONS OF A REAL VARIABLE 25

∂ ∂f ∂ ∂f

and . (1.16)

∂x ∂y ∂y ∂x

the mixed derivative rule

∂2f ∂2f

∂ ∂f ∂ ∂f

= = = . (1.17)

∂x ∂y ∂y ∂x ∂x∂y ∂y∂x

Using the suffix notation the mixed derivative rule is fxy = fyx . A sufficient condi-

tion for this to hold is that both fxy and fyx are continuous functions of (x, y), see

equation 1.6 (page 16).

Similarly, differentiating p times with respect to x and q times with respect to y, in

any order, gives the same nth order derivative,

∂ nf

where n = p + q,

∂xp ∂y q

provided all the nth derivatives are continuous.

Exercise 1.21

If Φ(x, y) = exp(−x2 /y) show that Φ satisfies the equations

∂Φ 2xΦ ∂2Φ ∂Φ 2Φ

=− and =4 − .

∂x y ∂x2 ∂y y

Exercise 1.22

∂2u ∂u ∂u

Show that u = x2 sin(ln y) satisfies the equation 2y 2 + 2y +x = 0.

∂y 2 ∂y ∂x

straightforward: the partial derivative of f (x) with respect to xk is defined to be

= lim . (1.18)

∂xk h→0 h

All other properties of the derivatives are the same as in the case of two variables, in

particular for the mth derivative the order of differentiation is immaterial provided all

mth derivatives are continuous.

For a function of a single variable, f (x), the existence of the derivative, f 0 (x),

implies that f (x) is continuous. For functions of two or more variables the existence of

the partial derivatives does not guarantee continuity.

If f (x1 , x2 , . . . , xn ) is a function of n variables and if each of these variables is a function

of the single variable t, we may form a new function of t with the formula

26 CHAPTER 1. PRELIMINARY ANALYSIS

the functions (x1 (t), x2 (t), · · · , xn (t)). The derivative of F (t) is given by the relation

n

dF X ∂f dxk

= , (1.20)

dt ∂xk dt

k=1

so F 0 (t) is the rate of change of f (x) along C. Normally, we write f (t) rather than

df

use a different symbol F (t), and the left-hand side of the above equation is written .

dt

This derivative is named the total derivative of f . The proof of this when n = 2 and

x0 and y 0 do not vanish near (x, y) is sketched below; the generalisation to larger n is

straightforward. If F (t) = f (x(t), y(t)) then

= f x(t) + x0 (t + θ), y(t) + y 0 (t + φ) , 0 < θ, φ < 1,

where we have used the mean value theorem, equation 1.11. Write the right-hand side

in the form

h i h i

f (x+x0 , y+y 0) = f (x + x0 , y + y 0 ) − f (x, y + y 0 ) + f (x, y + y 0 ) − f (x, y) +f (t)

so that

F (t + ) − F (t) f (x + x0 , y + y 0 ) − f (x, y + y 0 ) 0 f (x, y + y 0 ) − f (x, y) 0

= x + y.

x0 y 0

Thus, on taking the limit as → 0 we have

dF ∂f dx ∂f dy

= + .

dt ∂x dt ∂y dt

This result remains true if either or both x0 = 0 or y 0 = 0, but then more care is needed

with the proof.

Equation 1.20 is used in chapter 4 to derive one of the most important results in

the course: if the dependence of x upon t is linear and F (t) has the form

where the vector h is constant and the variable xk has been replaced by xk + thk , for

d

all k. Since dt (xk + thk ) = hk , equation 1.20 becomes

n

dF X ∂f

= hk . (1.21)

dt ∂xk

k=1

This result will also be used in section 1.3.9 to derive the Taylor series for several

variables.

A variant of equation 1.19, which frequently occurs in the Calculus of Variations, is

the case where f (x) depends explicitly upon the variable t, so this equation becomes

1.3. FUNCTIONS OF A REAL VARIABLE 27

n

dF ∂f X ∂f dxk

= + . (1.22)

dt ∂t ∂xk dt

k=1

so

F (t) = f t, et , e−2t = et sin te−2t .

dF ∂f ∂f dx ∂f dy

= + +

dt ∂t ∂x dt ∂y dt

= xy cos(yt) + et sin(yt) − 2xt cos(yt)e−2t ,

dF

= (1 − 2t)e−t cos te−2t + et sin te−2t .

dt

The same expression can also be obtained by direct differentiation of F (t) = et sin te−2t .

The right-hand sides of equations 1.20 and 1.22 depend upon both x and t, but

because x depends upon t often these expressions are written in terms of t only. In the

Calculus of Variations this is usually not helpful because the dependence of both x and

t, separately, is important: for instance we often require expressions like

d ∂F ∂ dF

and .

dt ∂x1 ∂x1 dt

The second of these expressions requires some clarification because dF/dt contains the

derivatives x0k . Thus

n

!

∂ dF ∂ ∂f X ∂f dxk

= + .

∂x1 dt ∂x1 ∂t ∂xk dt

k=1

n

∂2f ∂ 2 f dxk

∂ dF X

= +

∂x1 dt ∂x1 ∂t ∂x1 ∂xk dt

k=1

d ∂F

= ,

dt ∂x1

the last line being a consequence of the mixed derivative rule.

Exercise 1.23

If f (t, x, y) = xy − ty 2 and x = t2 , y = t3 show that

df dx dy

= −y 2 + y + (x − 2ty) = t4 (5 − 7t2 ),

dt dt dt

28 CHAPTER 1. PRELIMINARY ANALYSIS

and that

„ «

∂ df dx dy

= 2t 1 − 4t2 ,

` ´

= − 2y − 2t

∂y dt dt dt

„ «

d ∂f d dx dy

= 2t 1 − 4t2 .

` ´

= (x − 2ty) = − 2y − 2t

dt ∂y dt dt dt

Exercise 1.24

√

If F = 1 + x1 x2 , and x1 and x2 are functions of t, show by direct calculation

of each expression that

x0 x2 (x01 x2 + x1 x02 )

„ « „ «

∂ dF d ∂F

= = √ 2 − .

∂x1 dt dt ∂x1 2 1 + x 1 x2 4(1 + x1 x2 )3/2

Exercise 1.25

Euler’s formula for homogeneous functions

(a) A function f (x, y) is said to be homogeneous with degree p in x and y if it has

the property f (λx, λy) = λp f (x, y), for any constant λ and real number p. For

such a function prove Euler’s formula:

Hint use the total derivative formula 1.20 and differentiate with respect to λ.

(b) Find the equivalent result for homogeneous functions of n variables that satisfy

f (λx) = λp f (x).

(c) Show that if f (x1 , x2 , · · · , xn ) is a homogeneous function of degree p, then

each of the partial derivatives, ∂f /∂xk , k = 1, 2, · · · , n, is homogeneous function

of degree p − 1.

An equation of the form f (x, y) = 0, where f is a suitably well behaved function of

both x and y, can define a curve in the Cartesian plane, as illustrated in figure 1.5.

y

f(x,y)=0

y+k

y

x

x x+h

Figure 1.5 Diagram showing a typical curve defined

by an equation of the form f (x, y) = 0.

1.3. FUNCTIONS OF A REAL VARIABLE 29

For some values of x the equation f (x, y) = 0 can be solved to yield one or more real

values of y, which will give one or more functions of x. For instance the equation

x2 + y 2 − 1 = 0 defines a circle in the plane and√ for each x in |x| < 1 there are two

values of y, giving the two functions y(x) = ± 1 − x2 . A more complicated example

is the equation x − y + sin(xy) = 0, which cannot be rearranged to express one variable

in terms of the other.

Consider the smooth curve sketched in figure 1.5. On a segment in which the curve

is not parallel to the y-axis the equation f (x, y) = 0 defines a function y(x). Such a

function is said to be defined implicitly. The same equation will also define x(y), that

is x as a function of y, provided the segment does not contain a point where the curve

is parallel to the x-axis. This result, inferred from the picture, is a simple example of

the implicit function theorem stated below.

Implicitly defined functions are important because they occur frequently as solutions

of differential equations, see exercise 1.29, but there are few, if any, general rules that

help understand them. It is, however, possible to obtain relatively simple expressions

for the first derivatives, y 0 (x) and x0 (y).

We assume that y(x) exists and is differentiable, as seems reasonable from figure 1.5,

so F (x) = f (x, y(x)) is a function of x only and we may use the chain rule 1.22 to

differentiate with respect to x. This gives

dF ∂f ∂f dy

= + .

dx ∂x ∂y dx

On the curve defined by f (x, y) = 0, F 0 (x) = 0 and hence

∂f ∂f dy dy fx

+ = 0 or =− . (1.23)

∂x ∂y dx dx fy

Similarly, if x(y) exists and is differentiable a similar analysis using y as the independent

variable gives

∂f dx ∂f dx fy

+ = 0 or =− . (1.24)

∂x dy ∂y dy fx

This result is encapsulated in the Implicit Function Theorem which gives sufficient

conditions for an equation of the form f (x, y) = 0 to have a ‘solution’ y(x) satisfying

f (x, y(x)) = 0. A restricted version of it is given here.

Theorem 1.3

Implicit Function Theorem: Suppose that f : U → R is a function with continuous

partial derivatives defined in an open set U ⊆ R2 . If there is a point (a, b) ∈ U for

which f (a, b) = 0 and fy (a, b) 6= 0, then there are open intervals I = (x1 , x2 ) and

J = (y1 , y2 ) such that (a, b) lies in the rectangle I × J and for every x ∈ I, f (x, y) = 0

determines exactly one value y(x) ∈ J for which f (x, y(x)) = 0. The function y : I → J

is continuous, differentiable, with the derivative given by equation 1.23.

Exercise 1.26

In the case f (x, y) = y − g(x) show that equations 1.23 and 1.24 leads to the

relation „ «−1

dx dy

= .

dy dx

30 CHAPTER 1. PRELIMINARY ANALYSIS

Exercise 1.27

If ln(x2 + y 2 ) = 2 tan−1 (y/x) find y 0 (x).

Exercise 1.28

If x − y + sin(xy) = 0 determine the values of y 0 (0) and y 00 (0).

Exercise 1.29

Show that the differential equation

dy y − a2 x

= , y(1) = A > 0,

dx y+x

„ «

1 ` 2 2 ´ 1 “ y ” 1 ` ´ 1 A

ln a x + y 2 + tan−1 =B where B = ln a2 + A2 + tan−1 .

2 a ax 2 a a

Hint the equation may be put in separable form by defining a new dependent

variable v = y/x.

The implicit function theorem can be generalised to deal with the set of functions

fk (x, t) = 0, k = 1, 2, · · · , n, (1.25)

solution for each xk in terms of t, xk = gk (t), k = 1, 2, · · · , n, in the neighbourhood of

(x0 , t0 ) provided that at this point the derivatives ∂fj /∂xk , exist and that the deter-

minant

∂f1 ∂f1 ∂f1

···

∂x1 ∂x2 ∂xn

∂f2 ∂f2 ∂f2

∂x1 ∂x2 · · · ∂xn

J = (1.26)

. . .

. .. ..

.

∂f ∂fn ∂fn

n

···

∂x1 ∂x2 ∂xn

is not zero. Furthermore all the functions gk (t) have continuous first derivatives. The

determinant J is named the Jacobian determinant or, more usually, the Jacobian. It is

often helpful to use either of the following notations for the Jacobian,

∂f ∂(f1 , f2 , . . . , fn )

J= or J = . (1.27)

∂x ∂(x1 , x2 , . . . , xn )

Exercise 1.30

Show that the equations x = r cos θ, y = r sin θ can be inverted to give functions

r(x, y) and θ(x, y) in every open set of the plane that does not include the origin.

1.3. FUNCTIONS OF A REAL VARIABLE 31

The Taylor series is a method of representing a given sufficiently well behaved function

in terms of an infinite power series, defined in the following theorem.

Theorem 1.4

Taylor’s Theorem: If f (x) is a function defined on x1 ≤ x ≤ x2 such that f (n) (x) is

continuous for x1 ≤ x ≤ x2 and f (n+1) (x) exists for x1 < x < x2 , then if a ∈ [x1 , x2 ]

for every x ∈ [x1 , x2 ]

f (x) = f (a) + (x − a)f 0 (a) + f (a) + · · · + f (a) + Rn+1 . (1.28)

2! n!

The remainder term, Rn+1 , can be expressed in the form

(x − a)n+1 (n+1)

Rn+1 = f (a + θh) for some 0 < θ < 1 and h = x − a. (1.29)

(n + 1)!

If all derivatives of f (x) are continuous for x1 ≤ x ≤ x2 , and if the remainder term

Rn → 0 as n → ∞ in a suitable manner we may take the limit to obtain the infinite

series

∞

X (x − a)k (k)

f (x) = f (a). (1.30)

k!

k=0

The infinite series 1.30 is known as Taylor’s series, and the point x = a the point of

expansion. A similar series exists when x takes complex values.

Care is needed when taking the limit of 1.28 as n → ∞, because there are cases

when the infinite series on the right-hand side of equation 1.30 does not equal f (x).

If, however, the Taylor series converges to f (x) at x = ξ then for any x closer

to a than ξ, that is |x − a| < |ξ − a|, the series converges to f (x). This caveat is

necessary because of the strange example g(x) = exp(−1/x2 ) for which all derivatives

are continuous and are zero at x = 0; for this function the Taylor series about x = 0

can be shown to exist, but for all x it converges to zero rather than g(x). This means

that for any well behaved function, f (x) say, with a Taylor series that converges to

f (x) a different function, f (x) + g(x) can be formed whose Taylor series converges, but

to f (x) not f (x) + g(x). This strange behaviour is not uncommon in functions arising

from physical problems; however, it is ignored in this course and we shall assume that

the Taylor series derived from a function converges to it in some interval.

The series 1.30 was first published by Brook Taylor (1685 – 1731) in 1715: the result

obtained by putting a = 0 was discovered by Stirling (1692 – 1770) in 1717 but first

published by Maclaurin (1698 – 1746) in 1742. With a = 0 this series is therefore often

known as Maclaurin’s series.

In practice, of course, it is usually impossible to sum the infinite series 1.30, so it is

necessary to truncate it at some convenient point and this requires knowledge of how,

or indeed whether, the series converges to the required value. Truncation gives rise to

the Taylor polynomials, with the order-n polynomial given by

n

X (x − a)k

f (x) = f (k) (a). (1.31)

k!

k=0

32 CHAPTER 1. PRELIMINARY ANALYSIS

The series 1.30 is an infinite series of the functions (x − a)n f (n) (a)/n! and summing

these requires care. A proper understanding of this process requires careful definitions

of convergence which may be found in any text book on analysis. For our purposes,

however, it is sufficient to note that in most cases there is a real number, rc , named the

radius of convergence, such that if |x − a| < rc the infinite series is well mannered and

behaves rather like a finite sum: the value of rc can be infinite, in which case the series

converges for all x.

If the Taylor series of f (x) and g(x) have radii of convergence rf and rg respectively,

then the Taylor series of αf (x) + βg(x), for constants α and β, and of f (x)a g(x)b , for

positive constants a and b, exist and have the radius of convergence min(rf , rg ). The

Taylor series of the compositions f (g(x)) and g(f (x)) may also exist, but their radii of

convergence depend upon the behaviour of g and f respectively. Also Taylor series may

be integrated and differentiated to give the Taylor series of the integral and derivative

of the original function, and with the same radius of convergence.

Formally, the nth Taylor polynomial of a function is formed from its first n deriva-

tives at the point of expansion. In practice, however, the calculation of high-order

derivatives is very awkward and it is often easier to proceed by other means, which rely

upon ingenuity. A simple example is the Taylor series of ln(1 + tanh x), to fourth order;

this is most easily obtained using the known Taylor expansions of ln(1 + z) and tanh x,

z2 z3 z4 x3 2x5

ln(1 + z) = z − + − + O(z 5 ) and tanh x = x − + + O(x7 ),

2 3 4 3 15

and then put z = tanh x retaining only the appropriate order of the series expansion.

Thus

" 2 2 #

x3 x2 x3 x4

5 x

ln(1 + tanh x) = x− + O(x ) − 1− +··· + − + O(x5 )

3 2 3 3 4

x2 x4

= x− + + O(x5 ).

2 12

This method is far easier than computing the four required derivatives of the original

function.

For |x − a| > rc the infinite sum 1.30 does not exist. It follows that knowledge of

rc is important. It can be shown that, in most cases of practical interest, its value is

given by either of the limits

(k)

an

rc = lim or rc = lim |an |−1/n where ak = f (a) . (1.32)

n→∞ an+1 n→∞ k!

Usually the first expression is most useful. Typically, we have, for large n

n! 1/n

n! n

f (n) (a) = r c 1 + O(1/n) so that = Ar c 1 + O(1/n)

f (n) (a)

n

for some constant A. Then the nth term of the series behaves as ((x − a)/rc ) , and

decreases rapidly with increasing n provided |x − a| < rc and n is sufficiently large.

Superficially, the Taylor series appears to be a useful representation and a good

approximation. In general this is not true unless |x−a| is small; for practical applications

1.3. FUNCTIONS OF A REAL VARIABLE 33

far more efficient approximations exist — that is they achieve the same accuracy for

far less work. The basic problem is that the Taylor expansion uses knowledge of the

function at one point only, and the larger |x − a| the more terms are required for a

given accuracy. More sensible approximations, on a given interval, take into account

information from the whole interval: we describe some approximations of this type in

chapter 13.

The first practical problem is that the remainder term, equation 1.29, depends upon

θ, the value of which is unknown. Hence Rn cannot be computed; also, it is normally

difficult to estimate.

In order to understand how these series converge we need to consider the magnitude

of the nth term in the Taylor series: this type of analysis is important for any numerical

evaluation of power series. The nth term is a product of (x − a)n /n! and f (n) (a). Using

Stirling’s approximation,

√ n n

n! = 2πn 1 + O(1/n) (1.33)

e

we can approximate the first part of this product by

n

(x − a)n

' √1 e|x − a|)

= gn . (1.34)

n! 2πn n

The expression gn decreases very rapidly with increasing n, provided n is large enough.

Hence the term |x − a|n /n! may be made as small as we please. But for practical

applications this is not sufficient; in figure 1.6 we plot a graph of the values of log(gn ),

that is the logarithm to the base 10, for x − a = 10.

3.5 log(gn)

3

2.5

1.5

n

1

2 4 6 8 10 12 14 16 18 20

Figure 1.6 Graph showing the value of log(gn ),

equation 1.34, for x − a = 10. For clarity we have

joined the points with a continuous line.

In this example the maximum of gn is at n = 10 and has a value of about 2500, before it

starts to decrease. It is fairly simple to show thatp that gn has a maximum at n ' |x − a|

and here its value is max(gn ) ' exp(|x − a|)/ 2π|x − a|.

The value of f (n) (a) is also difficult to estimate, but it usually increases rapidly with

n. Bizarrely, in many cases of interest, this behaviour depends upon the behaviour

of f (z), where z is a complex variable. An understanding of this requires a study

of Complex Variable Theory, which is beyond the scope of this chapter. Instead we

illustrate the behaviour of Taylor polynomials with a simple example.

First consider the Taylor series of sin x, about x = 0,

x3 x5 x2n−1

sin x = x − + + · · · + (−1)n−1 +··· , (1.35)

3! 5! (2n − 1)!

34 CHAPTER 1. PRELIMINARY ANALYSIS

Note that only odd powers occur, because sin x is an odd function, and also that the

radius of convergence is infinite. In figure 1.7 we show graphs of this series, truncated

at x2n−1 with n = 1, 4, 8 and 15 for 0 < x < 4π.

2 n=1

n=15

1

x

0 2 4 6 8 10 12

-1

n=4 n=8

-2

Figure 1.7 Graph comparing the Taylor polynomials, of order n,

for the sine function with the exact function, the dashed line.

These graphs show that for large x it is necessary to include many terms in the series

to obtain an accurate representation of sin x. The reason is simply that for fixed, large

x, x2n−1 /(2n − 1)! is very large at n = x, as shown in figure 1.6. Because the terms

of this series alternate in sign the large terms in the early part of the series partially

cancel and cause problems when approximating a function O(1): it is worth noting that

as a consequence, with a computer having finite accuracy there is a value of x beyond

which the Taylor series for sin x gives incorrect values, despite the fact that formally it

converges for all x.

Exercise 1.31

Exponentional and Trigonometric functions

If f (x) = exp(ix) show that f (n) (x) = in exp(ix) and hence that its Taylor series is

∞

X (ix)k

eix = .

k=0

k!

Show that the radius of convergence of this series in infinite. Deduce that

x2 x4 (−1)n x2n

cos x = 1− + + ··· + + ··· ,

2! 4! (2n)!

x3 x5 (−1)n x2n+1

sin x = x− + + ··· + + ··· .

3! 5! (2n + 1)!

Exercise 1.32

Binomial expansion

Show that the Taylor series of (1 + x)a is

1 a(a − 1)(a − 2) · · · (a − k + 1) k

(1 + x)a = 1 + ax + a(a − 1)x2 + · · · x +··· .

2 k!

1.3. FUNCTIONS OF A REAL VARIABLE 35

expansion

n „ « „ «

X n n n!

(1 + x)n = xk where =

k k k! (n − k)!

k=0

Exercise 1.33

1

If f (x) = tan x find the first three derivatives to show that tan x = x+ x3 +O(x5 ).

3

Exercise 1.34

The natural logarithm

1

(a) Show that = 1 − t + t2 + · · · + (−1)n tn + · · · and use the definition of

1+t Z x

1

the natural logarithm, ln(1 + x) = dt , to show that

0 1+t

x2 x3 (−1)n−1 xn

ln(1 + x) = x − + + ··· + + ··· .

2 3 n

x3 x2n−1

„ « „ «

1+x

(c) Use this result to show that ln =2 x+ + ··· + + ··· .

1−x 3 2n − 1

Exercise 1.35

The inverse tangent function

Z x

1

Use the definition tan−1 x =dt to show that for |x| < 1,

0 1 + t2

∞

X (−1)k x2k+1

tan−1 x = .

k=0

2k + 1

Exercise 1.36

x2 x3 5x4

Show that ln(1 + sinh x) = x − + − + O(x5 ).

2 2 12

Exercise 1.37

Obtain the first five terms of the Taylor series of the function that satisfies the

equation

dy

(1 + x) = 1 + xy + y 2 , y(0) = 0.

dx

Hint use Leibniz’s rule given in exercise 1.15 (page 21) to differentiate the equation

n times.

36 CHAPTER 1. PRELIMINARY ANALYSIS

The Taylor series of a function f : Rm → R is trivially derived from the Taylor expan-

sion of a function of one variable using the chain rule, equation 1.21 (page 26). The

only difficulty is that the algebra very quickly becomes unwieldy with increasing order.

We require the expansion of f (x) about x = a, so we need to represent f (a + h) as

some sort of power series in h. To this end, define a function of the single variable t by

the relation

F (t) = f (a + th) so F (0) = f (a),

and F (t) gives values of f (x) on the straight line joining a to a + h. The Taylor series

of F (t) about t = 0 is, on using equation 1.28 (page 31),

t2 00 tn

F (t) = F (0) + tF 0 (0) + F (0) + · · · + F (n) (0) + Rn+1 , (1.36)

2! n!

which we assume to exist for |t| ≤ 1. Now we need only express the derivatives F (n) (0)

in terms of the partial derivatives of f (x). Equation 1.21 (page 26) gives

m

X

F 0 (0) = fxk (a)hk .

k=1

m

X ∂f

f (a + h) = f (a) + hk fxk (a) + R2 = f (a) + h · + R2 , (1.37)

∂a

k=1

where R2 is the remainder term which is second order in h and is given below. Here

we have introduced the notation ∂f /∂x for the vector function,

m

∂f ∂f ∂f ∂f ∂f X ∂f

= , ,··· , with the scalar product h · = hk .

∂x ∂x1 ∂x2 ∂xm ∂x ∂xk

k=1

For the second derivative we use equation 1.21 (page 26) again,

m m m

!

00

X d X X

F (t) = hk fxk (a + th) = hk hi fxk xi (a + th) .

dt i=1

k=1 k=1

m

X m

X

F 00 (0) = hk hi fxk xi (a)

k=1 i=1

Xm m−1

X m

X

= h2k fxk xk (a) + 2 hk hi fxk xi (a), (1.38)

k=1 k=1 i=k+1

where the second relation comprises fewer terms because the mixed derivative rule has

been used. This gives the second order Taylor series,

m m m

!

X 1 X X

f (a + h) = f (a) + hk fxk (a) + hk hi fxk xi (a) + R3 , (1.39)

2! i=1

k=1 k=1

1.3. FUNCTIONS OF A REAL VARIABLE 37

The higher-order terms are derived in exactly the same manner, but the algebra

quickly becomes cumbersome. It helps, however, to use the linear differential operator

h · ∂/∂a to write the derivatives of F (t) at t = 0 in the more convenient form,

2 n

∂ ∂ ∂

F 0 (0) = h · f (a), F 00 (0) = h · f (a) and F (n) (0) = h · f (a).

∂a ∂a ∂a

(1.40)

Then we can write Taylor series in the form

n s

X 1 ∂

f (a + h) = f (a) + h· f (a) + Rn+1 (1.41)

s=1

s! ∂a

where the remainder term is

1

Rn+1 = F (n+1) (θ) for some 0 < θ < 1.

(n + 1)!

Because the high order derivatives are so cumbersome and for the practical reasons

discussed in section 1.3.8, in particular figure 1.7 (page 34), Taylor series for many vari-

ables are rarely used beyond the second order term. This term, however, is important

for the classification of stationary points, considered in chapter 8.

For functions of two variables, (x, y), the Taylor series is

1 2

h fxx + 2hkfxy + k 2 fyy

f (a + h, b + k) = f (a, b) + hfx + kfy +

2

1 3

+ h fxxx + 3h kfxxy + 3hk 2 fxyy + k 3 fyyy + · · ·

2

6

s

X hs−r k r ∂sf

+ + · · · + Rn+1 , (1.42)

r=0

(s − r)! r! ∂x ∂y r

s−r

where all derivatives are evaluated at (a, b). In this case the sth term is relatively easy

to obtain by expanding the differential operator (h∂/∂x + k∂/∂y)s using the binomial

expansion (which works because the mixed derivative rule means that the two operators

∂/∂x and ∂/∂y commute).

Exercise 1.38

Find the Taylor expansions about x = y = 0, up to and including the second order

terms, of the functions

(a) f (x, y) = sin x sin y, (b) f (x, y) = sin x + e−y − 1 .

` ´

Exercise 1.39

Show that the third-order Taylor series for a function, f (x, y, z), of three variables

is

f (a + h, b + k, c + l) = f (a, b, c) + hfx + kfy + lfz

1 ` 2

h fxx + k2 fyy + l2 fzz + 2hkfxy + 2klfyz + 2lhfzx

´

+

2!

1 “ 3

+ h fxxx + k3 fyyy + l3 fzzz + 6hklfxyz

3!

3hk2 fxyy + 3hl2 fxzz + 3kh2 fyxx + 3kl2 fyzz

”

+3lh2 fzxx + 3lk2 fzyy .

38 CHAPTER 1. PRELIMINARY ANALYSIS

Ratios of functions occur frequently and if

f (x)

R(x) = (1.43)

g(x)

the value of R(x) is normally computed by dividing the value of f (x) by the value of

g(x): this works provided g(x) is not zero at the point in question, x = a say. If g(x)

and f (x) are simultaneously zero at x = a, the value of R(a) may be redefined as a

limit. For instance if

sin x

R(x) = (1.44)

x

then the value of R(0) is not defined, though R(x) does tend to the limit R(x) → 1 as

x → 0. Here we show how this limit may be computed using L’Hospital’s rule13 and

its extensions, discovered by the French mathematician G F A Marquis de l’Hospital

(1661 – 1704).

Suppose that at x = a, f (a) = g(a) = 0 and that each function has a Taylor series

about x = a, with finite radii of convergence: thus near x = a we have for small,

non-zero ||,

R(a + ) = = 0 = + O() provided g 0 (a) 6= 0.

g(a + ) g (a) + O(2 ) g 0 (a)

Hence, on taking the limit → 0, we obtain the result given by the following theorem.

Theorem 1.5

L’Hospital’s rule. Suppose that f (x) and g(x) are real and differentiable for −∞ ≤

a < x < b ≤ ∞. If

x→a x→a x→a

then

f (x) f 0 (x)

lim = lim 0 , (1.45)

x→a g(x) x→a g (x)

More generally if f (k) (a) = g (k) (a) = 0, k = 0, 1, · · · , n − 1 and g (n) (a) 6= 0 then

lim = lim (n) ,

x→a g(x) x→a g (x)

Consider the function defined by equation 1.44; at x = 0 L’Hospital’s rule gives

sin x cos x

R(0) = lim = lim = 1.

x→0 x x→0 1

13 Here we use the spelling of the French national bibliography, as used by L’Hospital. Some modern

1.3. FUNCTIONS OF A REAL VARIABLE 39

Exercise 1.40

Find the values of the following limits:

(a) lim , (b) lim , (c) lim .

x→a sinh x − sinh a x→0 x cos x − x x→0 2x − 2−x

Exercise 1.41

f 0 (x) f (x)

(a) If f (a) = g(a) = 0 and lim = ∞ show that lim = ∞.

x→a g 0 (x) x→a g(x)

(b) If both f (x) and g(x) are positive in a neighbourhood of x = a, tend to infinity

f 0 (x) f (x)

as x → a and lim 0 = A show that lim = A.

x→a g (x) x→a g(x)

1.3.11 Integration

The study of integration arose from the need to compute areas and volumes. The

theory of integration was developed independently from the theory of differentiation

and the Fundamental Theorem of Calculus, described in note P I on page 40, relates

these processes. It should be noted, however, that Newton knew of the relation between

gradients and areas and exploited it in his development of the subject.

In this section we provide a very brief outline of the simple theory of integration

and discuss some of the methods used to evaluate integrals. This section is included

for reference purposes; however, although the theory of integration is not central to

the main topic of this course, you should be familiar with its contents. The important

idea, needed in chapter 4, is that of differentiating with respect to a parameter, or

‘differentiating under the integral sign’ described in equation 1.52 (page 43).

In this discussion of integration we use an intuitive notion of area and refer the

reader to suitable texts, Apostol (1963), Rudin (1976) or Whittaker and Watson (1965)

for instance, for a rigorous treatment.

If f (x) is a real, continuous function of the interval a ≤ x ≤ b, it is intuitively clear

that the area between the graph and the x-axis can be approximated by the sum of the

areas of a set of rectangles as shown by the dashed lines in figure 1.8.

y

f(x)

x

a x1 x2 x3 x4 x5 b

Figure 1.8 Diagram showing how the area under the curve y = f (x) may be approx-

imated by a set of rectangles. The intervals xk − xk−1 need not be the same length.

40 CHAPTER 1. PRELIMINARY ANALYSIS

ordered points

a = x0 < x1 < x2 < · · · < xn−1 < xn = b

to produce n sub-divisions: in figure 1.8 n = 6 and the spacings are equal. On each

interval we construct a rectangle: on the kth rectangle the height is f (lk ) chosen to

be the smallest value of f (x) in the interval. These rectangles are shown in the figure.

Another set of rectangles of height f (hk ) chosen to be the largest value of f (x) in the

interval can also be formed. If A is the area under the graph it follows that

n

X n

X

(xk − xk−1 ) f (lk ) ≤ A ≤ (xk − xk−1 ) f (hk ). (1.46)

k=1 k=1

integrals and, as will be seen in chapter 4, is the basis of Euler’s approximations to

variational problems.

The theory of integration developed by Riemann (1826 – 1866) shows that for con-

tinuous functions these two bounds approach each other, as n → ∞ in a meaningful

manner, and defines the wider class of functions for which this limit exists. When these

limits exist their common value is named the integral of f (x) and is denoted by

Z b Z b

dx f (x) or f (x) dx. (1.47)

a a

In this context the function f (x) is named the integrand, and b and a the upper and

lower integration limits, or just limits. It can be shown that the integral exists for

bounded, piecewise continuous functions and also some unbounded functions.

From this definition the following elementary properties can be derived.

Z x

P:I: If F (x) is a differentiable function and F 0 (x) = f (x) then F (x) = F (a) + dt f (t).

a

This is the Fundamental theorem of Calculus and is important because it provides one

of the most useful tools for evaluating integrals.

Z b Z a

P:II: dx f (x) = − dx f (x).

a b

Z b Z c Z b

P:III: dx f (x) = dx f (x) + dx f (x) provided all integrals exist. Note, it is not

a a c

necessary that c lies in the interval (a, b).

Z b Z b Z b

P:IV: dx αf (x) + βg(x) = α dx f (x) + β dx g(x), where α and β are real

a a a

or complex numbers.

Z

b Z b

P:V: dx f (x) ≤ dx |f (x)| . This is the analogue of the finite sum inequality

a a

n n

X X

ak ≤ |ak | , where ak , k = 1, 2, · · · , n, are a set of complex numbers or functions.

k=1 k=1

1.3. FUNCTIONS OF A REAL VARIABLE 41

Z b !2 Z b ! Z !

b

dx f (x)g(x) ≤ dx f (x)2 dx g(x)2

a a a

with equality if and only if g(x) = cf (x) for some real constant c. This inequality is

sometimes named the Cauchy inequality and sometimes the Schwarz inequality. It is

the analogue of the finite sum inequality

n

!2 n

! n

!

X X X

a k bk ≤ a2k b2k

k=1 k=1 k=1

with equality if and only if bk = cak for all k and some real constant c.

1 1

P:VII: The Hölder inequality: if + = 1, p > 1 and q > 1 then

p q

!1/p !1/q

Z b Z b Z b

p q

dx f (x)g(x) ≤ dx |f (x)| dx |g(x)| ,

a a a

is valid for complex functions f (x) and g(x) with equality if and only if |f (x)|p |g(x)|−q

and arg(f g) are independent of x. It is the analogue of the finite sum inequality

n n

!1/p n

!1/q

X X p

X q 1 1

|ak bk | ≤ |ak | |bk | , + = 1,

p q

k=1 k=1 k=1

with equality if and only if |an |p |bn |−q and arg(an bn ) are independent of n (or ak = 0

for all k or bk = 0 for all k). If all ak and bk are positive and p = q = 2 these inequalities

reduce to the Cauchy-Schwarz inequalities.

P:VIII: The Minkowski inequality for any p > 1 and real functions f (x) and g(x) is

!1/p !1/p !1/p

Z b p Z b Z b

p p

dx f (x) + g(x) ≤ dx |f (x)| + dx |g(x)|

a a a

with equality if and only if g(x) = cf (x), with c a non-negative constant. It is the

analogue of the finite sum inequality valid for ak , bk > 0, for all k, and p > 1

n

!1/p n

!1/p n

!1/p

p

apk bpk

X X X

ak + b k ≤ + ,

k=1 k=1 k=1

with equality if and only if bk = cak for all k and c a non-negative constant.

R Sometimes it is convenient to ignore the integration limits, here a and b, and write

dx f (x): this is named the indefinite integral: its value is undefined to within an

additive constant. However, it is almost always possible to express problems in terms

of definite integrals — that is, those with limits.

42 CHAPTER 1. PRELIMINARY ANALYSIS

The theory of integration is concerned with understanding the nature of the inte-

gration process and with extending these simple ideas to deal with wider classes of

functions. The sciences are largely concerned with evaluating integrals, that is convert-

ing integrals to numbers or functions that can be understood: most of the techniques

available for this activity were developed in the nineteenth century or before, and we

describe them later in this section.

There are two important extensions to the integral defined above. If either or both

−a and b tend to infinity we define an infinite integral as a limit of integrals: thus if

b → ∞ we have !

Z ∞ Z b

dx f (x) = lim dx f (x) , (1.48)

a b→∞ a

Z b Z ∞

dx f (x) and dx f (x),

−∞ −∞

Z a Z a

lim dx f (x) may exist, but the limit lim lim dx f (x)

a→∞ −a a→∞ b→∞ −b

Z a

1 + a2

x 1

dx = ln .

−b 1 + x2 2 1 + b2

If a = b the right-hand side is zero for all a (because f (x) is an odd function) and the

first limit is zero: if a 6= b the second limit does not exist.

Whether or not infinite integrals exist depends upon the behaviour of f (x) as

|x| → ∞. Consider the limit 1.48. If f (x) 6= 0 for some X > 0, the limit exist provided

|f (x)| → 0 faster than x−α , α > 1: if f (x) decays to zero slower than 1/x1− , for any

> 0 the integral diverges, see however exercise 1.52, (page 45).

If the integrand is oscillatory cancellation between the positive and negative parts

of the integral gives convergence when the magnitude of the integrand tends to zero.

In this case we have the following useful theorem from 1853, due to Chartier14 .

Theorem 1.6 R x

If f (x) → 0 monotonically as x → ∞ and if a dt φ(t) is bounded as x → ∞ then

R∞

a dx f (x)φ(x) exists.

R∞

For instance if φ(x) = sin(λx), and f (x) = x−α , 0 < α < 2 this shows that 0 dx x−α sin λx

exists: if α = 1 its value is π/2, for any λ > 0. It should be mentioned that the very

cancellation which ensures convergence may cause difficulties when evaluating such in-

tegrals numerically.

The second important extension deals with integrands that are unbounded. Suppose

that f (x) is unbounded at x = a, then we define

Z b Z b

dx f (x) = lim dx f (x), (1.49)

a →0+ a+

14 J Chartier, Journal de Math 1853, XVIII, pages 201-212.

1.3. FUNCTIONS OF A REAL VARIABLE 43

provided the limit exists. As a general rule, provided |f (x)| tends to infinity slower

than |x − a|β , β > −1, the integral exists, which is why, in the previous example, we

needed α < 2; note that if f (x) = O(ln(x − a)), as x → a, it is integrable. For functions

unbounded at an interior point the natural extension to P III is used.

The evaluation of integrals of any complexity in closed form is normally difficult, or

impossible, but there are a few tools that help. The main technique is to use the Funda-

mental theorem of Calculus in reverse and simply involves recognising those F (x) whose

derivative is the integrand: this requires practice and ingenuity. The main purpose of

the other tools is to convert integrals into recognisable types. The first is integration

by parts, derived from the product rule for differentiation:

Z b h ib Z b

dv du

dx u = uv − dx v. (1.50)

a dx a a dx

The second method is to change variables:

Z b Z B Z B

dx

dx f (x) = dt f (g(t)) = dt g 0 (t)f (g(t)), (1.51)

a A dt A

where x = g(t), g(A) = a, g(B) = b, and g(t) is monotonic for A < t < B. In these

circumstances the Leibniz notation is helpfully transparent because dx dt can be treated

like a fraction, making the equation easier to remember. The geometric significance of

this formula is simply that the small element of length δx, at x, becomes the element

of length δx = g 0 (t)δt, where x = g(t), under the variable change.

The third method involves the differentiation of a parameter. Consider a function

f (x, u) of two variables, which is integrated with respect to x, then

Z b(u) Z b(u)

d db da ∂f

dx f (x, u) = f (b, u) − f (a, u) + dx , (1.52)

du a(u) du du a(u) ∂u

provided a(u) and b(u) are differentiable and fu (x, u) is a continuous function of both

variables; the derivation of this formula is considered in exercise 1.50. If neither limit

depends upon u the first two terms on the right-hand side vanish. A simple example

shows how this method can work. Consider the integral

Z ∞

I(u) = dx e−xu , u > 0.

0

Z ∞ Z ∞

0

I (u) = − dx xe−xu and, in general, I (n)

(u) = (−1) n

dx xn e−xu .

0 0

But the original integral is trivially integrated to I(u) = 1/u, so differentiation gives

Z ∞

n!

dx xn e−xu = n+1 .

0 u

This result may also be found by repeated integration by parts but the above method

involves less algebra.

The application of these methods usually requires some skill, some trial and error

and much patience. Please do not spend too long on the following problems.

44 CHAPTER 1. PRELIMINARY ANALYSIS

Exercise 1.42

Z a

(a) If f (x) is an odd function, f (−x) = −f (x), show that dx f (x) = 0.

−a

Z a Z a

(b) If f (x) is an even function, f (−x) = f (x), show that dx f (x) = 2 dx f (x).

−a 0

Exercise 1.43 Z ∞

sin λx

Show that, if λ > 0, the value of the integral I(λ) = dx is independent

0 x

of λ. How are the values of I(λ) and I(−λ) related?

Exercise 1.44

Use integration by parts to evaluate the following indefinite integrals.

Z Z Z Z

x

(a) dx ln x, (b) dx 2

, (c) dx x ln x, (d) dx x sin x.

cos x

Exercise 1.45

Evaluate the following integrals

Z π/4 Z π/4 Z 1

2

(a) dx sin x ln(cos x), (b) dx x tan x, (c) dx x2 sin−1 x.

0 0 0

Exercise 1.46

Z x

If In = dt tn eat , n ≥ 0, use integration by parts to show that aIn = xn eax −

0

nIn−1 and deduce that

n

X (−1)n−k k (−1)n n!

In = n!eax x − .

k=0

an−k+1 k! an+1

Exercise 1.47

Z a Z a

(a) Using the substitution u = a − x, show that dx f (x) = dx f (a − x).

0 0

Z π/2 Z π/2

sin θ cos φ

I= dθ = dφ

0 sin θ + cos θ 0 cos φ + sin φ

1.3. FUNCTIONS OF A REAL VARIABLE 45

Exercise 1.48

Use the substitution t = tan(x/2) to prove that if a > |b| > 0

Z π

1 π

dx = √ .

0 a + b cos x a 2 − b2

Use this result and the technique of differentiating the integral to determine the

values of,

Z π Z π Z π Z π

dx dx cos x

2

, 3

, dx , dx ln(a+b cos x).

0 (a + b cos x) 0 (a + b cos x) 0 (a + b cos x)2 0

Exercise 1.49 Z t

1

Prove that y(t) = dx f (x) sin ω(t − x) is the solution of the differential equa-

ω a

tion

d2 y

+ ω 2 y = f (t), y(a) = 0, y 0 (a) = 0.

dt2

(a) Consider the integral F (u) = dx f (x), where only the upper limit de-

0

pends upon u. Using the basic definition, equation 1.7 (page 18), derive the

derivative F 0 (u).

Z b

(b) Consider the integral F (u) = dx f (x, u), where only the integrand depends

a

upon u. Using the basic definition derive the derivative F 0 (u).

Exercise 1.51

Assuming that both integrals exist, show that

Z ∞ „ « Z ∞

1

dx f x − = dx f (x).

−∞ x −∞

1 π

dx exp −x2 − 2 = 2 .

x e

Z−∞

∞ 2 √

You will need the result dx e−x = π.

−∞

Exercise 1.52

Find the limits as X → ∞ of the following integrals

Z X Z X

1 1

dx and dx .

2 x ln x 2 x(ln x)2

Hint note that if f (x) = ln(ln x) then f 0 (x) = (x ln x)−1 .

Exercise 1.53

Determine the values of the real constants a > 0 and b > 0 for which the following

limit exists Z X

1

lim dx a .

X→∞ 2 x (ln x)b

46 CHAPTER 1. PRELIMINARY ANALYSIS

The following exercises can be tackled using the method described in the corresponding

section, though other methods may also be applicable.

Limits

Exercise 1.54

Find, using first principles, the following limits

√

xa − 1 1+x−1 x1/3 − a1/3

(a) lim , (b) lim √ , (c) lim ,

x→1 x − 1 x→0 1 − 1−x x→a x1/2 − a1/2

„ «1/x

1+x

(d) lim (π − 2x) tan x, (e) lim x1/x , (f) lim ,

x→(π/2)− x→0+ x→0 1−x

Inverse functions

Exercise 1.55

Show that the inverse functions of y = cosh x, y = sinh x and y = tanh x, for

x > 0 are, respectively

„ «

“ p ” “ p ” 1 1+y

x = ln y + y 2 − 1 , x = ln y + y 2 + 1 and x = ln .

2 1−y

Exercise 1.56

The function y = sin x may be defined to be the solution of the differential equation

d2 y

+ y = 0, y(0) = 0, y 0 (0) = 1.

dx2

Show that the inverse function x(y) satisfies the differential equation

„ «3 Z y

d2 x dx −1 1

= y which gives x(y) = sin y = du √ .

dy 2 dy 0 1 − u2

Hint you may find it helpful to solve the equation by defining z = dx/dy.

Derivatives

Exercise 1.57

Find the derivative of y(x) where

r r

p+x q+x p

(a) y = f (x)g(x) , (b) y = , (c) y n = x + 1 + x2 .

p−x q−x

Exercise 1.58

If y = sin(a sin−1 x) show that (1 − x2 )y 00 − xy 0 + a2 y = 0.

1.4. MISCELLANEOUS EXERCISES 47

Exercise 1.59

d2 y dy

If y(x) satisfies the equation (1 − x2 ) − 2x + λy = 0, where λ is a constant

dx2 dx

and |x| ≤ 1, show that changing the independent variable, x, to θ where x = cos θ

changes this to

d2 y dy

+ cot θ + λy = 0.

dθ2 dθ

Exercise 1.60

The Schwarzian derivative of a function f (x) is defined to be

«2 !

f 000 (x) f 00 (x) d2

„

3 p 1

Sf (x) = 0 − = −2 f 0 (x) 2 .

f 0 (x)

p

f (x) 2 dx f 0 (x)

Show that if f (x) and g(x) both have negative Schwarzian derivatives, Sf (x) < 0

and Sg(x) < 0, then the Schwarzian derivative of the composite function h(x) =

f (g(x)) also satisfies Sh(x) < 0.

Note the Schwarzian derivative is important in the study of the fixed points of

maps.

Partial derivatives

Exercise 1.61

x

If z = f (x + ay) + g(x − ay) − 2 cos(x + ay) where f (u) and g(u) are arbitrary

2a

functions of a single variable and a is a constant, prove that

∂2z ∂2z

a2 2

− = sin(x + ay).

∂x ∂y 2

Exercise 1.62

If f (x, y, z) = exp(ax + by + cz)/xyz, where a, b and c are constants, find the

partial derivatives fx , fy and fz , and solve the equations fx = 0, fy = 0 and

fz = 0 for (x, y, z).

Exercise 1.63

The equation f (u2 − x2 , u2 − y 2 , u2 − z 2 ) = 0 defines u as a function of x, y and z.

1 ∂u 1 ∂u 1 ∂u 1

Show that + + = .

x ∂x y ∂y z ∂z u

Implicit functions

Exercise 1.64

Show that the function f (x, y) = x2 + y 2 − 1 satisfies the conditions of the Implicit

Function Theorem for most values of (x, y), and that the function y(x) obtained

from the theorem has derivative y 0 (x) = −x/y.

The

√ equation f (x, y) = 0 can be solved explicitly to give the equations y =

± 1 − x2 . Verify that the derivatives of both these functions is the same as that

obtained from the Implicit Function Theorem.

48 CHAPTER 1. PRELIMINARY ANALYSIS

Exercise 1.65

Prove that the equation x cos xy = 0 has a unique solution, y(x), near the point

(1, π2 ), and find its first and second derivatives.

Exercise 1.66

The folium of Descartes has equation f (x, y) = x3 + y 3 − 3axy = 0. Show that at

all points on the curve where y 2 6= ax, the implicit function y(x) has derivative

dy x2 − ay

=− 2 .

dx y − ax

Taylor series

Exercise 1.67

By sketching the graphs of y = tan x and y = 1/x for x > 0 show that the equation

x tan x = 1 has an infinite number of positive roots. By putting x = nπ + z, where

n is a positive integer, show that this equation becomes (nπ + z) tan z = 1 and

use a first order Taylor expansion of this to show that the root nearest nπ is given

1

approximately by xn = nπ + .

nπ

Exercise 1.68

Determine the constants a and b such that (1 + a cos 2x + b cos 4x)/x4 is finite at

the origin.

Exercise 1.69

Find the Taylor series, to 4th order, of the following functions:

(a) ln cosh x, (b) ln(1 + sin x), (c) esin x , (d) sin2 x.

Exercise 1.70

If f (x) is a function such that f 0 (x) increases with increasing x, use the Mean

Value theorem to show that f 0 (x) < f (x + 1) − f (x) < f 0 (x + 1).

Exercise 1.71

Use the functions f1 (x) = ln(1 + x) − x and f2 (x) = f1 (x) + x2 /2 and the Mean

Value Theorem to show that, for x > 0,

1 2

x− x < ln(1 + x) < x.

2

1.4. MISCELLANEOUS EXERCISES 49

L’Hospital’s rule

Exercise 1.72

sin ln x 1

Show that lim =− .

x→1 x5 − 7x3 + 6 16

Exercise 1.73

2 a sin bx − b sin ax

Determine the limits lim (cos x)1/ tan x

and lim .

x→0 x→0 x3

Integrals

Exercise 1.74

Using differentiation under the integral sign show that

Z ∞

tan−1 (ax) 1

dx 2)

= π ln(1 + a).

0 x(1 + x 2

Exercise 1.75

Prove that, if |a| < 1

Z π/2

ln(1 + cos πa cos x) π2

dx = (1 − 4a2 ).

0 cos x 8

2

If f (x) = (sin x)/x, show that dx f (x)f (π/2 − x) = dx f (x).

0 π 0

Exercise 1.77

Use the integral definition

Z x Z ∞

1 1

tan−1 x = dt to show that for x > 0 tan−1 (1/x) = dt

0 1 + t2 x 1 + t2

and deduce that tan−1 x + tan−1 (1/x) = π/2.

Exercise 1.78 Z 2x

Determine the values if x that make g 0 (x) = 0 if g(x) = dt f (t) and

x

(a) f (t) = et , and (b) f (t) = (sin t)/t.

Exercise 1.79

If f (x) is integrable for a ≤ x ≤ a + h show that

n

1 a+h

„ « Z

1X kh

lim f a+ = dx f (x).

n→∞ n n h a

k=1

„ «

1 1 1

(a) lim n−6 1 + 25 + 35 + · · · + n5 ,

` ´

(b) lim + + ··· + ,

n→∞ n→∞ 1+n 2+n 3n

„ “ ” „ « «

1 y 2y h i1/n

(c) lim sin + sin + · · · + sin y , (d) lim n−1 (n + 1)(n + 2) . . . (2n) .

n→∞ n n n n→∞

50 CHAPTER 1. PRELIMINARY ANALYSIS

Exercise 1.80

If the functions f (x) and g(x) are differentiable find expressions for the first deriva-

tive of the functions

Z u Z u

f (x) g(x)

F (u) = dx √ and G(u) = dx where 0 < a < 1.

0 u 2 − x2

0 (u − x)a

This is a fairly difficult problem. The formula 1.52 does not work because the

integrands are singular, yet by substituting simple functions for f (x) and g(x), for

instance 1, x and x2 , we see that there are cases for which the functions F (u) and

G(u) are differentiable. Thus we expect an equivalent to formula 1.52 to exist.

Chapter 2

Ordinary Differential

Equations

2.1 Introduction

Differential equations are an important component of this course, so in this chapter we

discuss relevant, elementary theory and provide practice in solving particular types of

equations. You should be familiar with all the techniques discussed here, though some

of the more general theorems may be new. If you already feel confident with the theory

presented here, then a detail study may not be necessary, though you should attempt

some of the end of section exercises. If you wish to delve deeper into the subject there

are many books available; those used in the preparation of this chapter include Birkoff

and Rota1 (1962), Arnold (1973)2 , Ince3 (1956) and the older text by Piaggio4, which

provides a different slant on the subject than modern texts.

Differential equations are important because they can be used to describe a wide

variety of physical problems. One reason for this is that Newton’s equations of motion

usually relate the rates of change of the position and momentum of particles to their

position, so physical systems ranging in size from galaxies to atoms are described by

differential equations. Besides these important traditional physical problems, ordinary

differential equations are used to describe some electrical circuits, population changes

when populations are large and chemical reactions. In this course we deal with the sub-

class of differential equations that can be derived from variational principles, a concept

introduced in chapter 3. A simple example is the stationary chain hanging between two

fixed supports: this assumes a shape that minimises its gravitational energy, and we

can use this fact to derive a differential equation, the solution of which describes the

chain’s shape. This problem is dealt with in chapter 12.

The term ‘differential equation’ first appeared in English in 1763 in ‘The method by

increments’ by William Emerson (1701 – 1782) but was introduced by Leibniz (1646 –

1 Birkhoff G and Rota G-C 1962 Ordinary differential equations (Blaisdell Publishing Co.).

2 Arnold V I 1973, Ordinary Differential Equations (The MIT Press), translated by R A Silverman

3 Ince E L 1956 Ordinary differential equations (Dover).

4 Piaggio H T H 1968 An Elementary treatise on Differential Equations, G Bell and Sons, first

published in 1920.

51

52 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

The study of differential equations began with Newton (1642 – 1727) and Leibniz.

Newton considered ‘fluxional equations’, which related a fluxion to its fluent, a fluent

being a function and a fluxional its derivative: in modern terminology he considered the

two types of equation dy/dx = F (x) and dy/dx = F (x, y), and derived solutions using

power series in x, a method which he believed to be universally applicable. Although

this work was completed in the 1670s it was not published until the early 18 th century,

too late to affect the development of the general theory which had progressed rapidly

in the intervening period.

Much of this progress was due to Leibniz and the two Bernoulli brothers, James

(1654 – 1705), and his younger brother John (1667 – 1748), but others of this scientifi-

cally talented family also contributed; many of these are mentioned later in this chapter

so the following genealogical tree of the scientifically important members of this family

is shown in figure 2.1.

The Bernoulli family

Nicholas (1623−1708)

(1654−1705) (1662−1716) (1667−1748)

Nicolas II

(1687−1759)

Nicholas III Daniel John II

(1695−1726) (1700−1782) (1710−1790)

(1744−1807) (1751−1834) (1759−1789)

Christoph (1782−1863)

Figure 2.1 Some of the posts held by some members of the Bernoulli family are:

James: Prof of Mathematics, Basle (1687-1705);

John: Prof of Mathematics, Groningen (1695-1705), Basle (1705-1748);

Nicholas III: Prof at Petrograd;

Daniel: Prof at Preograd and Basle (Bernoulli principle in hydrodynamics);

John II: Prof at Basle;

John III: Astronomer Royal and Director of Mathematical studies at Berlin;

James II: Prof at Basle, Verona and Petrograd.

In 1690 James Bernoulli solved the brachistochrone problem, discussed in section 5.2,

which involves solving a nonlinear, first-order equation: in 1692 Leibniz discovered the

method of solving first-order homogeneous and linear problems, sections 2.3.2 and 2.3.3:

Bernoulli’s equation, section 2.3.4, was proposed in 1695 by James Bernoulli and solved

by Leibniz and John Bernoulli soon after. Thus within a few years of their discovery

many of the methods now used to solve differential equations had been discovered.

5 Acta Erruditorum, Oct 1684

2.1. INTRODUCTION 53

The first treatise to provide a systematic discussion of differential equations and their

solutions was published in four volumes by Euler (1707 – 1783), the first in 1755 and

the remaining three volumes between 1768 and 1770.

This work on differential equations involved rearranging the equation, using alge-

braic manipulations and transformations, so that the solution could be expressed as an

integral. This type of solution became known as solving the equation by quadrature,

a term originally used to describe the area under a plane curve and in particular to

the problem of finding a square having the same area as a given circle: it was in this

context that the term was first introduced into English in 1596. The term quadrature

is used regardless of whether the integral can actually be evaluated in terms of known

functions. Other common terms used to describe this type of solution are closed-form

solution and analytic solution: none of these terms have a precise definition.

Much of this early work was concerned with the construction of solutions but raised

fundamental questions concerning what is meant by a function or by a ‘solution’ of

a differential equation, which led to important advances in analysis. These questions

broadened the scope of enquiries, and the first of these newer studies was the work of

Cauchy (1789 – 1857) who investigated the existence and uniqueness of solutions, and

in 1824 proved the first existence theorem; this is quoted on page 81. The extension of

this theorem6 , due to Picard (1856 – 1941), is quoted in section 2.3. These theorems,

although important, deal only with a restricted class of equations, which do not include

many of the quite simple equations arising in this course, or many other practical

problems.

In 1836 Sturm introduced a different approach to the subject, whereby properties of

solutions to certain differential equations are derived directly from the equation, without

the need to find solutions. Subsequently, during the two years 1836-7, Sturm (1803 –

1855) and Liouville (1809 – 1882) developed these ideas, some of which are discussed in

chapter 13. The notion of extracting information from an equation, without solving it,

may seem rather strange, but you can obtain some idea of what can be achieved by

doing exercises 2.57 and 2.58 at the end of this chapter.

Liouville was also responsible for starting another important strand of enquiry. He

was interested in the ‘problem of integration’, the main objective of which is to decide

if a given class of indefinite integrals can be integrated in terms of a finite expression

involving algebraic, logarithmic or exponential functions. This work was performed

between 1833 and 1841 and towards the end of this period Liouville turned his atten-

tion to similar problems involving first and second-order differential equations, a far

more difficult problem. A readable history of this work is provided by Lützen7 . This

line of enquiry became of practical significance with the advent of Computer Assisted

Algebra during the last quarter of the 20 th century, and is now an important part of

software such as Maple (used in MS325 and M833) and Mathematica: for some modern

developments see Davenport et al 8 .

Applications that involve differential equations often require solutions, so the third

approach to the subject involves finding approximations to those equations that cannot

be solved exactly in terms of known functions. There are far too many such methods

6 Picard E 1893 J de Maths, 9 page 217.

7 Joseph Liouville 1809-1882: Master of Pure and Applied Mathematics, 1990 by J Lützen, pub-

lished by Springer-Verlag.

8 Davenport J H, Siret Y and Tournier E 1989 Computer Algebra. Sytems and algorithms for

54 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

The current chapter has two principal aims. First, to give useful existence and

uniqueness theorems, for circumstances where they exist. Second, to describe the var-

ious classes of differential equation which can be solved by the standard techniques

known to Euler. In section 2.3 we discuss first-order equations: some aspects of second-

order equations are discussed in section 2.4. In the next section some general ideas are

introduced.

An nth order differential equation is an equation that gives the nth derivative of a real

function, y(x), of a real variable x, in terms of x and some or all of the lower derivatives

of y,

dn y

0 00 (n−1)

= F x, y, y , y , · · · , y , a ≤ x ≤ b. (2.1)

dxn

The function y is named the dependent variable, and the real variable x the independent

variable and this is often limited to a given interval of the real axis, which may be the

whole axis or an infinite portion of it. The function F must be single-valued and

differentiable in all variables, see theorem 2.2 (page 81).

Frequently we obtain equations of the form

and this is also referred to as an nth order differential equation. But in order to progress,

it is usually necessary to rearrange 2.2 into the form of 2.1, and this usually gives more

than one equation. A simple example is the first-order

p equation y 0 2 + y 2 = c2 , c a

0 2 2

constant, which gives the two equations y = ± c − y .

Another important type of system, is the set of n coupled first-order equations

dzk

= fk (x, z1 , z2 , · · · , zn ) , k = 1, 2, · · · , n, a ≤ x ≤ b, (2.3)

dx

where fk are a set of n real-valued, single-valued functions of (x, z1 , z2 , · · · , zn ). If all

the fk are independent of x these equations are described as autonomous; if one or more

of the fk depend explicitly upon x they are named non-autonomous 9 .

The nth order equation 2.1 can always be expressed as a set of n coupled, first-order

equations. For instance, if we define

z1 = y, z2 = y 0 , z3 = y 00 , ··· zn = y (n−1) ,

9 This distinction is important in dynamics where the independent variable, x, is usually the time.

The significance of this difference is that if y(x) is a solution of an autonomous equations then so is

y(x + a), for any constant a: it will be seen in chapter 7, when we consider Noether’s theorem, that

this has an important consequence and in dynamics results in energy being constant.

2.2. GENERAL DEFINITIONS 55

This transformation is not unique, as seen in exercise 2.4. Coupled, first-order equations

are important in many applications, and are used in many theorems quoted later in this

chapter, which is why we mention them here.

In this course most differential equations encountered are first-order, n = 1, or

second-order, n = 2.

A solution of equation 2.1 is any function that satisfies the equation, and it is helpful

and customary to distinguish two types of solutions. The general solution of an nth

order equation is a function

f (x, y, c1 , c2 , · · · , cn ) = 0 (2.4)

involving x, y and n arbitrary constants which satisfies equation 2.1 for all values of

these constants in some domain: this solution is also named the complete primitive.

The most general solution of an nth order equation contains n arbitrary constants, but

this is difficult to prove for a general equation.

A particular solution or particular integral g(x, y) = 0 is a function satisfying equa-

tion 2.1, but containing no arbitrary constants: particular integrals can be obtained

from a general solution by giving the constants particular values, but some equations

have independent particular solutions that are independent of the general solution:

these are named singular solutions. For instance the equation

yy 0

y 00 = has the general solution y(x) = 2c1 tan(c2 + c1 ln x) − 1

x

and also the singular solution y = c3 , where c3 is an arbitrary constant, which cannot

be obtained from the general solution, see exercise 2.49(c) (page 84). Another example

is given in exercise 2.6 (page 59); one origin of singular solutions of first-order equations

is discussed in section 2.6.

If y(x) = 0 is a solution of a differential equation it is often referred to as the trivial

solution.

The values of the n arbitrary constants are determined by n subsidiary conditions

which will be discussed later.

An important category of differential equation are linear equations, that is, equations

that are of first-degree in the dependent variable and all its derivatives: the most general

nth order, linear differential equation has the form

dn y dn−1 y dy

an (x) + a n−1 (x) + · · · + a1 (x) + a0 (x)y = h(x) (2.5)

dxn dxn−1 dx

where h(x) and ak (x), k = 0, 1, · · · , n, are functions of x, but not y.

If h(x) = 0 the equation is said to be homogeneous, otherwise it is an inhomogeneous

equation.

Linear equations are important for three principal reasons. First, they often approx-

imate physical situations where the appropriate variable, here y, has small magnitude,

so terms O(y 2 ) can be ignored. Second, by comparison with nonlinear equations they

are relatively easy to solve. Third, their solutions have benign properties that are

56 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

well understood: some of these properties are discussed in section 2.4 and others in

chapter 13.

Differential equations which are not linear are nonlinear equations. These equations

are usually difficult to solve and their solutions often have complicated behaviours.

Most equations encountered in this course are nonlinear.

An important distinction we need to mention is that between initial value problems and

boundary value problems, which we discuss in the context of the second-order equation

d2 y

= F (x, y, y 0 ), a ≤ x ≤ b. (2.6)

dx2

The general solution of this equation contains two arbitrary constants, and in practical

problems the values of these constants are determined by conditions imposed upon the

solution.

In an initial value problem 10 the value of the solution and its first derivative are

defined at the point x = a. Thus a typical initial value problem is

d2 y

+ y = 0, y(a) = A, y 0 (a) = B. (2.7)

dx2

In a boundary value problem the value of the solution is prescribed at two distinct points,

normally the end points of the range, x = a and x = b. A typical problem is

d2 y

+ y = 0, y(a) = A, y(b) = B. (2.8)

dx2

The distinction between initial and boundary value problems is very important. For

most initial value problems occurring in practice a unique solution exists, see theo-

rems 2.1 and 2.2, pages 61 and 81 respectively. On the contrary, for most boundary

value problems it is not known whether a solution exists and, if it does, whether it is

unique: we encounter examples that illustrate this behaviour later in the course. It is

important to be aware of this difficulty when numerical methods are used.

For example the solutions of equations 2.7 and 2.8 are, respectively,

A sin(b − x) + B sin(x − a)

y = A cos(x − a) + B sin(x − a) and y = .

sin(b − a)

The former solution exists for all a, A and B; the latter exists only if sin(b − a) 6= 0.

Other types of boundary conditions occur, and are important, and are introduce

later as needed.

The solution of the nonlinear equation

dy A

= y2, y(0) = A, is y(x) = .

dx 1 − Ax

10 It is named an initial value problem because in this type of system the independent variable, x, is

often related to the time and we require the solution subsequent to the initial time, x = a.

2.2. GENERAL DEFINITIONS 57

This solution is undefined at x = 1/A, a point which depends upon the initial condition.

Thus this singularity in the solution moves as the initial condition changes, and is

therefore named a movable singularity.

On the other hand the general, non-trivial solution of the linear equation

dy y C

+ = 0 is y = , C 6= 0, (2.9)

dx x x

where C is a constant. This solution is undefined at x = 0, regardless of the value of

the integration constant C. This type of singularity in the solution is named a fixed

singularity. The significance of this classification is that the singularities in the solutions

of nonlinear equations are almost always movable. For the solutions of linear equations

they are always fixed and their positions are at the same points as the singularities of

the coefficient functions defining the equation: in equation 2.9 the coefficient of y is

1/x, which is why the solution is singular at x = 0. For the linear equation 2.5 any

singularities in the solution are at points when one or more of the ratios ak (x)/an (x),

k = 0, 1, · · · , n − 1 has a singularity.

In the above examples the singularity is the point where the solution is unbounded.

But at a singularity a function is not necessarily a point where it is unbounded; a careful

definition of a singularity can only be provided in the context of complex variable theory

and for single-valued functions there are two types of singularity, poles and essential

singularities. We cannot describe this theory here11 but, instead we list some typical

examples of these singularities.

Functional form Name of singularity

1

, n = 1 , 2 · · ·. Pole

(a − x)n

(a − x)α , α a real, non-integer number. Essential singularity

√

Other types of essential singularities are exp(−1/(x − a)), exp( x − a) and ln(x − a).

Functions of a real variable are less

p refined and can misbehave in a variety of unruly

ways: some typical examples are |x|, 1/|x| and ln |x|.

Exercise 2.1

Show that the following equations have the solutions given, and state whether the

singularity in each solution is fixed or movable, and whether the equation is linear

or nonlinear.

dy A

(a) = xy 3 , y(0) = A, y= √ .

dx 1 − A 2 x2

dy y 3A − 1 1

(b) + = x, y(1) = A, y(x) = + x2 .

dx x 3x 3

A general solution of a differential equation can take many different forms, some more

useful than others. Most useful are those where y(x) is expressed as a formula involving

a finite number of familiar functions of x; this is rarely possible. This type of solution is

11 A brief summary of the relevant theory is provided in the course Glossary; for a fuller discussion

58 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

possible, however, to express solutions as an infinite series of increasing powers of x:

these solutions are sometimes useful, but normally only for a limited range of x.

A solution may be obtained in the form f (x, y) = 0, which cannot be solved to

provide a formula for y(x). In such cases the equation f (x, y) = 0, for fixed x, often

has many solutions, so the original differential equation has many solutions. A simple

example is the function f (x, y) = y 2 − 2xy + C = 0, where C is a constant, which

is a solution

√ of the equation y 0 = y/(y − x), which therefore has the two solutions

2

y = x ± x − C.

Another type of solution involves some form of approximation. From the beginning

of the 18 th century to the mid 20 th century were developed a range of techniques that

approximate solutions in terms of simple finite formulae: these approximations and the

associated techniques are important but do not feature in this course, except for the

method described in chapter 14.

Another type of approximation is obtained by solving the equation numerically,

but these methods find only particular solutions, not general solutions, and may fail

for initial value problems on large intervals and for nonlinear boundary value problems.

Moreover, if the equations contain several parameters it is usually difficult to understand

the effect of changing the parameters.

Exercise 2.2

Which of the following differential equations are linear and which are nonlinear?

(a) y 00 + x2 y = 0, (b) y 00 + xy 2 = 1, (c) y 00 + |y| = 0,

(

1, y > 0,

(d) y 000 + xy 0 2 + y = 0, (e) y 00 + y sin x = ex , (f) y 00 + y =

−1, y ≤ 0.

(g) y 0 = |x|, y(1) = 2, (h) y 0 = 0, y(1)2 = 1,

(i) y 00 = x, y(0) + y 0 (0) = 1, y(1) = 2.

Exercise 2.3

Which of the following problems are initial value problems, which are boundary

value problems and which are neither?

(a) y 00 + y = sin x, y(0) = 0, y(π) = 0,

00 0

(b) y + y = |x|, y(0) = 1, y (π) = 0,

00 0

(c) y + 2y + y = 0, y(0) + y(1) = 1, y 0 (0) = 0,

(d) y 00 − y = cos x, y(1) = y(2), y 0 (1) = 0,

(e) y 000 + 2y 00 + x2 y 0 2 + |y| = 0, y(0) = y 0 (0) = 0, y 00 (1) = 1,

(f) y (4) + 3y 000 + 2x2 y 0 2 + x3 |y| = x, y(0) = 1, y 0 (0) = 2, y 00 (0) = 1, y 000 (0) = 1,

(g) y 00 sin x + y cos x = 0, y(π/2) = 0, y 0 (π/2) = 1,

(h) y 00 + y 2 = y(x2 ), y(0) = 0, y 0 (0) = 1.

12 Used in this context the solution need not be analytic in the sense of complex variable theory.

2.2. GENERAL DEFINITIONS 59

Exercise 2.4

Liénard’s equation,

d2 x dx

+ νf (x) + g(x) = 0, ν ≥ 0,

dt2 dt

where f (x) and g(x) are well behaved functions and ν a constant, describes certain

important dynamical systems.

(a) Show that if y = dx/dt this equation can be written as the two coupled first-

order equations

dx dy

= y, = −νf (x)y − g(x).

dt dt

Rx 1 dx

(b) If F (x) = 0 du f (u), by defining z = + F (x), show that an alternative

ν dt

representation of Liénard’s equation is

dx dz g(x)

= ν (z − F (x)) , =− .

dt dt ν

This exercise demonstrates that there is no unique way of converting a second-

order equation to a pair of coupled first-order equations. The transformation of

part (b) may seem rather artificial, but if ν 0 it provides a basis for a good

approximation to the periodic solution of the original equations which is not easily

obtained by other means.

Exercise 2.5

Clairaut’s equation

An equation considered by A C Clairaut(1713 – 1765) is

dy

y = px + f (p) where p= .

dx

By differentiating with respect to x show that (x + f 0 (p)) p0 = 0 and deduce

that one solution is p = c, a constant, and hence that the general solution is

y = cx + f (c).

Show also that the function derived by eliminating p from the equations

a singular solution, and its connection with the general solution is discussed in

exercise 2.67 (page 90).

Exercise 2.6

Find the general and singular solutions of the differential equation y = px − ep ,

p = y 0 (x).

Exercise 2.7

Consider the second-order differential equation F (x, y 0 , y 00 ) = 0 in which y(x)

is not explicitly present. Show that by introducing the new dependent variable

p = dy/dx, this equation is reduced to the first-order equation F (x, p, p0 ) = 0.

60 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.8

Consider the second-order differential equation F (y, y 0 , y 00 ) = 0 in which the inde-

pendent variable x is not explicitly present.

Define p = dy/dx and show that by considering p as a function of y,

d2 y dp dp

= =p ,

dx2 dx dy

„ «

dp

and hence that the equation reduces to the first-order equation F y, p, p = 0.

dy

Of all ordinary differential equations, first-order equations are usually the easiest to

solve using conventional methods and there are fives types that are amenable to these

methods. When confronted with an arbitrary first-order equation, the trick is to recog-

nise the type or a transformation that converts it to a given equation of one of these

types. Before describing these types we first discuss the existence and uniqueness of

their solutions.

The first-order equation

dy

= F (x, y), a ≤ x ≤ b, (2.10)

dx

does not have a unique solution unless the value of y is specified at some point x =

c ∈ [a, b]. Why this is so can be seen geometrically: consider the Cartesian plane Oxy,

shown in figure 2.2. Take any point (x, y) with x ∈ [a, b] and where F (x, y) is defined

and single valued, so a unique value of y 0 (x) is defined: this gives the gradient of the

solution passing through this point, as shown by the arrows.

y

y(x)

x

x x +δ x x+2δ x

tion through a given point in the Oxy-plane.

y(x) + δxF (x, y(x)) + O(δx2 ), as shown. By taking the successive values of y at x + kδx,

k = 1, 2, · · · , we obtain a unique curve passing through the initial point. By letting

δx → 0 it can be shown that this construction gives the exact solution. This unique

2.3. FIRST-ORDER EQUATIONS 61

solution can be found only if the initial value of y is specified. Normally y(a) is defined

and this gives the initial value problem

dy

= F (x, y), y(a) = A, a ≤ x ≤ b. (2.11)

dx

If F (x, y) and its derivatives Fx and Fy are continuous in a suitable neighbourhood

surrounding the initial point, defined in theorem 2.2 (page 81), then it can be shown

that a unique solution satisfying the initial condition y(a) = A exists. This is essentially

the result deveoloped by Cauchy in his lectures at the École Polytechnique between 1820

and 1830, see Ince (1956, page 76). The solution may not, however, exist in the desired

interval [a, b]. The following, more useful, result was obtained by Picard13 in 1893, and

shows how, in principle, a solution can be constructed.

Theorem 2.1

In a rectangular region D of the Oxy plane a − h ≤ x ≤ a + h, A − H ≤ y ≤ A + H, if

in D we can find positive numbers M and L such that

a) |F (x, y)| < M , and

b) |F (x, y1 ) − F (x, y2 )| < L|y1 − y2 |,

then the sequence of functions

Z x

yn+1 (x) = A + dt F (t, yn (t)), y0 (x) = A, n = 0, 1, · · · , (2.12)

a

and b) are satisfied.

The proof of this theorem, valid for nth order equations, can be found in Ince14 , Pi-

aggio15 and a more modern treatment in Arnold16 . In general this iterative formula

results in very long expressions even after the first few iterations, or the integrals cannot

be evaluated in closed form.

Typically, if the integrals can be evaluated, equation 2.12 gives the solution as a

series in powers of x, but from this it is usually difficult to determine the radius of

convergence of the series, see for instance exercise 2.10. Even if it converges for all x, it

may be of little practical value when |x| is large: the standard example that illustrates

these difficulties is the Taylor series for sin x, which converges for all x, but for large |x|

is practically useless because of rounding errors, see section 1.3.8.

Exercise 2.9

Use the iterative formula 2.12 to find the infinite series solution of

dy

= y, y(0) = 1, x ≥ 0.

dx

For which values of x does this solution exist?

Note that you will need to use induction to construct the infinite series.

13 Picard E 1893, J de Maths 9 page 217: a history of this developement is given by Ince (1956,

page 63).

14 Ince E L 1956, Ordinary Differential Equations Dover, chapter 3.

15 Piaggio H T H 1962 An elementary Treaties on Differential Equations and their applications, G

16 Arnold V I 1973 Ordinary Differential Equations, Translated and Edited by R A Silverman, The

MIT Press.

62 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.10

(a) Use the iterative formula 2.12 to show that the second iterate of the solution

to

dy

= 1 + xy 2 , y(0) = A, x ≥ 0,

dx

is

1 2 1 1 1 4 6

y(x) = A + x + A2 x2 + Ax3 + (1 + A3 )x4 + Ax5 + A x .

2 3 4 5 24

(b) An alternative method of obtaining this series is by direct calculation of the

Taylor series, but for the same accuracy more work is usually required. Find

the values of y 0 (0), y 00 (0) and y 000 (0) directly from the differential equation and

construct the third-order Taylor series of the solution.

(c) The differential equation shows that for x > 0, y(x) is a monotonic increasing

function, so we expect that for sufficiently large x, xy(x)2 1, and hence that

the solution will be given approximately by the equation y 0 = xy 2 . Use this ap-

proximation to deduce that y(x) → ∞ at some finite value of x. Explain the likely

effect of this on the radius of convergence of this series. In exercise 2.21

p (page 68)

it is shown that for large A the singularity is approximately at x = 2/A.

Equation 2.11 is separable if F (x, y) can be written in the form F = f (x)g(y), with

f (x) depending only upon x and g(y) depending only upon y. Then the equation can

be rearranged in the form of two integrals: the following expression also incorporates

the initial condition Z y Z x

dv

= du f (u). (2.13)

A g(v) a

Provided these integrals can be evaluated this gives a representation of the solution,

although rarely in the convenient form y = h(x). This is named the method of separa-

tion of variables, and was used by both Leibniz and John Bernoulli at the end of the

17 th century.

Sometimes a non-separable equation y 0 = F (x, y) can be made separable if new

dependent and independent variables, u and v, can be found such that it can be written

in the form

du

= U (u)V (v),

dv

with U (u) depending only upon u and V (v) depending only upon v.

A typical separable equation is

dy

= cos y sin x, y(0) = A,

dx

so its solution can be written in the form

Z y Z x

dv tan(y/2 + π/4)

= du sin u, that is ln = 1 − cos x,

A cos v 0 tan(A/2 + π/4)

π −1 A π

which simplifies to y = − + 2 tan exp(1 − cos x) tan + .

2 2 4

2.3. FIRST-ORDER EQUATIONS 63

Exercise 2.11

Use the method of separation of variables to find solutions of the following equa-

tions.

(a) (1 + x2 )y 0 = x(1 − y 2 ), (b) (1 + x)y 0 − xy = x,

√ 1 + 2x + 2y

(c) (1 + x)y 0 = x 1 + y, y(0) = 0, (d) y 0 = . Hint, define z = x + y.

1 − 2x − 2y

A sub-class of equations that can be transformed into separable equations are those for

which F (x, y) depends only upon the ratio y/x, rather than on x and y separately,

dy y

=F , y(a) = A. (2.14)

dx x

Such equations are often named homogeneous equations. The general theory of this

type of equation is developed in the following important exercise.

Exercise 2.12

a) Show that by introducing the new dependent variable v(x) by the relation

y = xv, equation 2.14 is transformed to the separable form

dv F (v) − v A

= , v(a) = .

dx x a

Use this transformation to find solutions of the following equations.

“ y” y x + 3y

(b) y 0 = exp − + , (c) y 0 = , y(1) = 0,

x x 3x + y

3x2 − xy + 3y 2

(d) x(x + y)y 0 = x2 + y 2 , (e) y 0 = ,

2x2 + 3xy

4x − 3y − 1

(f) y 0 = . Hint, set x = ξ + a and y = η + b, where (a, b) is the

3x + 4y − 7

point of intersection of the lines 4x − 3y = 1 and 3x + 4y = 7.

The equation

dy

+ yP (x) = Q(x), y(a) = A, (2.15)

dx

where P (x) and Q(x) are real functions of x only, is linear because y and y 0 occur only

to first-order. Its solution can always be expressed as an integral, by first finding a

function, p(x), to write the equation as

d

(yp(x)) = Q(x)p(x), y(a) = A, (2.16)

dx

which can be integrated directly. The unknown function, p(x), is found by expand-

ing 2.16, dividing by p(x) and equating the coefficient of y(x) with that in the original

equation. This gives

p0

Z

= P (x) which integrates to p(x) = exp dx P (x) . (2.17)

p

64 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

The function p(x) is named the integrating factor: rather than remembering the formula

for p(x) it is better to remember the idea behind the transformation, because similar

ideas are used in other contexts.

Equation 2.16 integrates directly to give

Z

y(x)p(x) = C + dx p(x)Q(x), (2.18)

this analysis there is no need to include an arbitrary constant in the evaluation of p(x).

This method produces a formula for the solution only if both integrals can be eval-

uated in terms of known functions. If this is not the case it is often convenient to write

the solution in the form

Z x Z t

y(x)p(x) = A + dt Q(t)p(t), p(t) = exp du P (u) , (2.19)

a a

because this expression automatically satisfies the initial condition and the integrals

can be evaluated numerically.

Exercise 2.13

Use a suitable integrating factor to find solutions of the following equations. In

each case show that the singularity in the solution is fixed and relate its position

to properties of the coefficient functions.

(a) (x + 2)y 0 + (x + 3)y = 4 exp(−x),

(b) y 0 cos x + y sin x = 2 cos2 x sin x, y(0) = 0,

2 0

(c) x y + 1 + (1 − 2x)y = 0.

(d) Without solving it, use the properties of the differential equation

dy

cos2 x − y sin x cos x + (1 + cos2 x) tan x = 0, y(0) = 2,

dx

to show that the solution is stationary at x = 0.

Find this solution and show that y(0) is a local maximum.

Exercise 2.14

Variation of Parameters

Another method of solving equation 2.15 is to use the method of variation of

parameters which involves finding a function, f (x), which is either a solution of

part of the equation or a particular integral of the whole equation, expressing the

required solution in the form y(x) = v(x)f (x), and finding a simpler differential

equation for the unknown function v(x).

For equation 2.15, we find the solution of

dy

+ yP (x) = 0. (2.20)

dx

(a) Show that the solution of equation 2.20 with condition y(a) = 1 is

„ Z x «

f (x) = exp − dt P (t) .

a

2.3. FIRST-ORDER EQUATIONS 65

dy

+ yP (x) = Q(x), y(a) = A

dx

can be written as y = v(x)f (x), v(a) = A, and show that f v 0 = Q, and hence that

the required solution is

„ Z x «

Q(t)

y(x) = f (x) A + dt .

a f (t)

Relate this solution to that given by equation 2.19.

Exercise 2.15

Use the idea introduced in exercise 2.6 (page 56) to solve the differential equation

d2 y dy

x − = 3x2 , y(1) = A, y 0 (1) = A0 .

dx2 dx

Exercise 2.16

Use the idea introduced in exercise 2.7 (page 56) to solve the differential equation

d2 y

+ ω 2 y = 0, y(0) = A, y 0 (0) = 0, ω > 0.

dx2

Two Bernoulli brothers, James and John, and Leibniz studied the nonlinear, first-order

equation

dy

+ yP (x) = y n Q(x), (2.21)

dx

where n 6= 1 is a constant and P (x) and Q(x) are functions only of x; this equation is

now named Bernoulli’s equation. The method used by John Bernoulli is to set z = y 1−n ,

dz 1 − n dy

so that = , and equation 2.21 becomes

dx y n dx

dz

+ (1 − n)P (x)z = (1 − n)Q(x), (2.22)

dx

which is a first-order equation of the type treated in the previous section.

An example of such an equation is

dy

x(x2 − 1) − y = x3 y 2 , y(2) = A.

dx

1 x2

By dividing through by x(x2 − 1) we see that P (x) = − , Q(x) = 2 and

x(x2− 1) x −1

n = 2.

Thus equation 2.22 becomes

dz z x2 1 1

+ 2

=− 2 , z= , z(2) = .

dx x(x − 1) x −1 y A

66 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Z

1

p(x) = exp dx

x(x2 − 1)

Z √ 2

1 1 1 x −1

= exp dx + − = ,

2(x − 1) 2(x + 1) x x

√ !

d x2 − 1 x 1

z = −√ , z(2) = .

dx x x2 − 1 A

1 √

1 x

= 3 1+ √ − x.

y 2A 2

x −1

Exercise 2.17

Solve the following equations.

(a) y 0 = 2y − xy 2 , y(0) = 1, (b) x(1 − x2 )y 0 + (2x2 − 1)y = x2 y 3 ,

(c) y 0 cos x − y sin x = y 3 cos2 x, y(0) = 1, (d) x3 y 0 = y(x2 + y).

Jacopo Francesco, Count Riccati of Venice (1676 – 1754), introduced an important class

of first-order, nonlinear equations. Here we consider the most general of this type of

equation which was introduced by Euler, namely

dy

= P (x) + yQ(x) + y 2 R(x), (2.23)

dx

where P , Q and R are functions only of x. This equation is now named Riccati’s

equation17 . If R(x) = 0 Riccati’s equation is a linear equation of the type already

considered, and if P (x) = 0 it reduces to Bernoulli’s equation, so we ignore these cases.

Riccati’s studies were mainly limited to the equations

dy dy

= ay 2 + bxβ and = ay 2 + bx + cx2 ,

dx dx

where a, b, c and β are constants. The first of these equations was introduced in Riccati’s

1724 paper18 . It can be shown that the solution of the first of the above equations can

be represented by known functions if β = −2 or β = −4k/(2k − 1), k = 1, 2, · · · , and in

1841 Liouville showed that for any other values of β its solution cannot be expressed as

an integral of elementary functions19 . The more general equation 2.23 was also studied

17 Itwas apparently D’Alembert in 1770 first used the name ‘Riccati’s equation’ for this equation.

18 Acta Eruditorum, Suppl, VIII 1794, pp. 66-73.

19 Here the term ‘elementary function’ has a specific meaning which is defined in the glossary.

2.3. FIRST-ORDER EQUATIONS 67

by Euler. This equation has since appeared in many contexts, indeed whole books are

devoted to it and its generalisations: we shall meet it again in chapter 8.

This type of equation arose in Riccati’s investigations into plane curves with radii

of curvature solely dependent upon the ordinate. The radius of curvature, ρ, of a curve

described by a function y(x), where x and y are Cartesian coordinates, is given by

1 y 00 (x)

= . (2.24)

ρ (1 + y 0 (x)2 )3/2

This expression is derived in exercise 2.66. Thus if ρ depends only upon the ordinate,

y, we would have a second-order equation f (y, y 0 , y 00 ) = 0, which does not depend

explicitly upon x. Such equations can be converted to first-order equations by the

simple device of regarding y as the independent variable: define p = dy/dx and express

y 00 (x) in terms of p and p0 (y), using the chain rule as follows,

d2 y dp dp dy dp

= = =p .

dx2 dx dy dx dy

Thus the second-order equation f (y, y 0 , y 00 ) = 0 is reduced to the first-order equation

f (y, p(y), p0 (y)) = 0. Riccati chose particular functions to give the equations quoted at

the beginning of this section, but note that the symbols have changed their meaning.

Exercise 2.18

If a function y(x) can be expressed as the ratio

cg(x) + G(x)

y=

cf (x) + F (x)

where c is a constant and g, G, f and F are differentiable functions of x, by

eliminating the constant c from this equation and its first derivative, show that y

satisfies a Riccati equation.

Later we shall see that all solutions of Riccati’s equations can be expressed in this

form.

Riccati’s equation is an unusual nonlinear equation because it can be converted to

a linear, second-order equation by defining a new dependent variable u(x) with the

equation

1 du

y=− (2.25)

uR dx

to give, assuming R(x) 6= 0 in the interval of interest,

d2 u R0 du

− Q+ + P Ru = 0, (2.26)

dx2 R dx

which is a linear, second-order equation.

Exercise 2.19

Derive equation 2.26.

68 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.20

(a) Consider the equation

d2 u du

+ p1 (x)

p2 (x) + p0 (x)u = 0.

dx2 dx

„Z «

By introducing a new function y(x) by u = exp dx y show that y satisfies

the Riccati equation

dy p0 p1

= − − y − y2.

dx p2 p2

(b) The general solution of the second-order equation for u has two arbitrary

constants. The general solution of the first-order equation for y has one arbitrary

constant. Explain how this contradiction can be resolved.

Exercise 2.21

(a) Show that the Riccati equation considered in exercise 2.10 (page 62),

dy

= 1 + xy 2 , y(0) = A,

dx

has the associated linear equation

d2 u du 1 du

x − + x2 u = 0, where y=− .

dx2 dx xu dx

that a3k+1 = 0, k = 0, 1, · · · , and that

By choosing (a0 , a2 ) = (1, 0) and (a0 , a2 ) = (0, 1) obtain the two independent

solutions

u2 (x) = x2 + b5 x5 + b8 x8 + · · · + b3k+2 x3k+2 + · · · ,

where

(−1)k (−1)k

a3k = and b3k+2 = .

(22 − 1)(52 − 1) · · · ((3k − 1)2 − 1) (42 − 1)(72 − 1) · · · ((3k + 1)2 − 1)

Deduce that the radii of convergence of the series for u1 (x) and u2 (x) are infinite.

(c) Show that the solution of the original Riccati equation is

y(x) = − .

x(u1 (x) − Au2 (x)/2)

By considering

p the denominator show that for large A the singularity in y(x) is

at x = 2/A, approximately.

2.3. FIRST-ORDER EQUATIONS 69

Euler noted that if a particular solution v(x) is known then the substitution y = v +1/z

gives a linear equation for z(x), from which the general solution can be constructed.

Substituting for y = v + 1/z into Riccati’s equation gives

z0 Q 2Rv R

v0 − = P + Qv + Rv 2 + + + 2

z2 z z z

which simplifies to

z 0 + P1 z = −R where P1 = Q + 2Rv. (2.27)

This is a linear, first-order equation that can be solved using the methods previously

discussed. There are a number of special values for P , Q and R for which this method

yields the general solution in terms of an integral: for completeness these are listed in

table 2.1. You are not expected to remember this table.

Table 2.1: A list of the coefficients for Riccati’s equation, y 0 = P (x) + Q(x)y + R(x)y 2 , for

which a simple particular integral, v(x) can be found. In this list λ is a real number and n an

integer, but in some cases it may be a real number.

Cases 7 and 13 have two particular integrals and this allows the general solution to be expressed

as an integral, see equation 2.28.

Case 16 is special, because the transformation z = xn y makes the equation separable, see

exercise 2.26.

Case 17 has an explicit solution if n = −2 and reduces to a Bessel function if n 6= −2, see

exercise 2.28.

1 −a(a + f (x)) f (x) 1 a

2 −b(a + bf (x)) a f (x) b

3 f (x) xf (x) 1 −1/x

4 anxn−1 −axn f (x) f (x) axn

5 anxn−1 − a2 x2n f (x) 0 f (x) axn

6 −f (x) xn+1 f (x) −(n + 1)xn x−n−1

√

7 ax2n−1 f (x) n/x f (x)/x ± −a xn

8 −a2 f (x) − ag(x) g(x) f (x) a

9 −a2 x2n f (x) − axn g(x) + anxn−1 g(x) f (x) axn

10 λf (x) aeλx f (x) aeλx −λe−λx /a

11 aλeλx −aeλx f (x) f (x) aeλx

12 aλeλx − a2 e2λx f (x) 0 f (x) ae√λx

14 f 0 (x) − f (x)2 0 1 f (x)

15 g 0 (x) −f (x)g(x) f (x) g(x)

16 bf (x)/x (axn f (x) − n)/x x2n−1 f (x)

17 bxn 0 a

70 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.22

Use the method described in this section to find the solutions of the following

equations using the form of the particular solution, v(x), suggested, where a, b

are constants to be determined.

(a) y 0 = xy 2 + (1 − 2x)y + x − 1, y(0) = 1/2, v = a,

(b) y 0 = 1 + x − x3 + 2x2 y − xy 2 , y(1) = 1, v = ax + b,

(c) 2y 0 = 1 + (y/x)2 , y(1) = 1, v = ax + b,

x 2 x

0

(d) 2y = (1 + e )y + y − e , y(0) = −1, v = aebx .

Exercise 2.23

For the equation

dy

= −a(a + f (x)) + f (x)y + y 2 ,

dx

which is case 1 of table 2.1, show that the general solution is

„ Z «

p(x)

y = a+ R , p(x) = exp 2ax + dx f (x) .

C − dx p(x)

Exercise 2.24

Decide which of the cases listed in table 2.1 corresponds to the equation

dy

= 1 − xy + y 2 .

dx

Find the general solution in terms of an integral, and the solution for the condition

y(0) = a.

If two particular integrals, v1 (x) and v2 (x), are known then the general solution can be

expressed as an integral of a known function. Suppose that y is the unknown, general

solution: then from the defining equations,

and hence

y 0 − v10 y 0 − v20

− = (v1 − v2 )R.

y − v1 y − v2

This equation can be integrated directly to give

Z

y − v1

ln = dx (v1 − v2 )R, (2.28)

y − v2

2.3. FIRST-ORDER EQUATIONS 71

Exercise 2.25

Using the trial function y = Axa , where A and a are constants, for each of the

following equations find two particular integrals and hence the general solution.

dy

(a) x2 + 2 + x2 y 2 = 2xy,

dx

dy

(b) (x2 − 1) + x + 1 − (x2 + 1)y + (x − 1)y 2 = 0,

dx

dy

(c) x2 = 2 − x2 y 2 .

dx

Exercise 2.26

Show that the equation

dy b2 2 x3 y 2

= 2

− y+

dx x(1 + x ) x 1 + x2

is an example of case 16 of table 2.1 and hence find its general solution.

Exercise 2.27

Use case 7 to show that the particular and general solutions of

dy n f (x) 2

= −A2 x2n−1 f (x) + y + y

dx x x

are y = ±Axn and

Z

n1

+ B exp(F (x))

y = Ax where F (x) = 2A dx xn−1 f (x)

1 − B exp(F (x))

Exercise 2.28

This exercise is about case 17, that is, the Riccati equation

dy

= bxn + ay 2 .

dx

w0

(a) Using the transformation y = − transform this equation to the linear

aw

equation

d2 w

+ abxn w = 0.

dx2

Using the further transformation to the independent variable z = xα and of the

dependent variable w(z) = z β u(z), show that u(z) satisfies the equation

2

„ «

2d u du 2 β ab n+2

z +z + β − + 2z α u = 0.

dz 2 dz α α

Choosing the coefficient of zu0 to be unity and α such that (n + 2) = 2α, show

that

d2 u

„ «

du 4ab 1

z2 2 + z + z 2

− u = 0, n =

6 −2.

dz dz (n + 2)2 (n + 2)2

72 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

√ ! √ !

2 ab 2 ab n+2

u(z) = AJ 1 z + BY 1 z , z=x 2 ,

n+2 n+2 n+2 n+2

where Jν (ξ) and Yν (ξ) are the two ordinary Bessel functions satisfying Bessel’s

equation

d2 w dw

ξ2 2 + ξ + (ξ 2 − ν 2 )w = 0

dξ dξ

and α = (n + 2)/2 and β = 1/(n + 2).

(b) If n = −2 show that solutions of the equation for w(x) are w = Bxγ where B

is an arbitrary constant and γ are the solutions of γ 2 − γ + ab = 0. Deduce that

γ

particular solutions of the original Riccati equation are y = − and hence that

ax

its general solution is

Aγ2 − γ1 xd √ 1

y= , d= 1 − 4ab, γ1, 2 = (1 ± d),

ax(xd − A) 2

2.4.1 Introduction

In this section we introduce some aspects of linear, second-order equations, which fre-

quently arise in the description of physical systems. There are two themes to this

section: in section 2.4.2 we discuss some important general properties of linear equa-

tions, which are largely due to linearity and which make this type of equation much

easier to deal with than nonlinear equations: this discussion is continued in chapter 13

where we shall see that many properties of the solutions of some equations can be de-

termined without finding explicit solutions. Second, in section 2.4.4 we describe various

‘tricks’ to find solutions for particular types of equation.

In this section we describe some of the general properties of linear, second-order differ-

ential equations. The equation we consider is the inhomogeneous equation,

d2 y dy

p2 (x) 2

+ p1 (x) + p0 (x)y = h(x), a ≤ x ≤ b, (2.29)

dx dx

where the coefficients pk (x), k = 0, 1, 2 are real and assumed to be continuous for

x ∈ (a, b). The interval (a, b) may be finite or infinite.

The nature of the solutions depends upon p2 (x), the coefficient of y 00 (x). The theory

is valid in intervals for which p2 (x) 6= 0 and for which p1 /p2 and p0 /p2 are continuous.

If p2 (x) = 0 at some point x = c the equation is said to be singular at x = c, or to have

a singular point. Singular points, when they exist, always define the ends of intervals

of definition; hence we may always choose p2 (x) ≥ 0 for x ∈ [a, b].

2.4. SECOND-ORDER EQUATIONS 73

h(x) = 0,

d2 y dy

p2 (x) 2 + p1 (x) + p0 (x)y = 0, a ≤ x ≤ b. (2.30)

dx dx

All homogeneous equations have the trivial solution y(x) = 0, for all x. Solutions that

do not vanish identically are called nontrivial.

Equations 2.29 and 2.30 can be transformed into other forms which are more use-

ful. The two most useful changes are dealt with in exercise 2.31; the first of these is

important for the general theory discussed in this course and the second is particularly

useful for certain types of approximations.

The solutions of equations 2.29 and 2.30 satisfy the following properties.

P1: Solutions of the homogeneous equation satisfy the superposition principle:

that is if f (x) and g(x) are solutions of equation 2.30 then so is any linear com-

bination

y(x) = c1 f (x) + c2 g(x)

where c1 and c2 are any constants.

P2: Uniqueness of the initial value problem. If p1 /p2 and p0 /p2 are contin-

uous for x ∈ [a, b] then at most one solution of equation 2.29 can satisfy the given

initial conditions y(a) = α0 , y 0 (a) = α1 , theorem 2.2 (page 81).

P3: If f (x) and g(x) are solutions of the homogeneous equation 2.30 and if, for

some x = ξ, the vectors (f (ξ), f 0 (ξ)) and (g(ξ), g 0 (ξ)) are linearly independent,

then every solution of equation 2.30 can be written as a linear combination of

f (x) and g(x),

y(x) = c1 f (x) + c2 g(x).

The two functions f (x) and g(x) are said to form a basis of the differential equa-

tion.

P4: The general solution of the inhomogeneous equation 2.29 is given by the sum of

any particular solution and the general solution of the homogeneous equation 2.30.

Finally we observe that an ordinary point, x0 , is where p1 (x)/p2 (x) and p0 (x)/p2 (x)

can be expanded as a Taylor series about x0 , and that at every ordinary point the

solutions of the homogeneous equation 2.30 can also be represented by a Taylor series.

It is common, however, for either or both of p1 (x)/p2 (x) and p0 (x)/p2 (x) to be singular

at some point x0 , and these points are named singular points, and these are divided

into two classes. If (x − x0 )p1 (x)/p2 (x) and (x − x0 )2 p0 (x)/p2 (x) can be expanded as a

Taylor series, the singular point is regular: otherwise it is irregular. Irregular singular

points do not occur frequently in physical problems but, for the geometric reasons

discussed in chapter 13, regular singular points are common. For ordinary and regular

singular points there is a well developed and important theory of deriving the series

representation for the solutions of the homogeneous equation, but this is not relevant for

this course; good treatments can be found in Ince20 , Piaggio21 and Simmons22 . There

is no equivalent theory for nonlinear equations.

20 Ince E L , 1956 Ordinary differential equations, chapter XVI (Dover).

21 Piaggio H T H 1968, An Elementary treatise on Differential Equations, chapter IX, G Bell and

Sons, first published in 1920.

22 Simmons G F 1981, Differential Equations, chapter 5, McGraw-Hill Ltd.

74 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.29

Use property P2 to show that if a nontrivial solution of equation 2.30 y(x) is zero

at x = ξ, then y 0 (ξ) 6= 0, that is the zeros of the solutions are simple.

Exercise 2.30

Consider the two vectors x = (x1 , x2 ) and y = (y1 , y2 ) in the Cartesian plane.

Show that they are linearly independent, that is not parallel, if

˛ ˛

˛ x1 x2 ˛˛

x1 y2 − x2 y1 = ˛˛ 6= 0.

y1 y2 ˛

Exercise 2.31

Consider the second-order, homogeneous, linear differential equation

d2 y dy

p2 (x) + p1 (x) + p0 (x)y = 0.

dx2 dx

(a) Show that it may be put in the canonical form

„ «

d dy

p(x) + q(x)y = 0 (2.31)

dx dx

„Z «

p1 (x) p0 (x)

where p(x) = exp dx and q(x) = p(x).

p2 (x) p2 (x)

Equation 2.31 is known as the self-adjoint form and this transformation shows that

most linear, second-order, homogeneous differential equation may be cast into this

form: the significance of this transformation will become apparent in chapter 13.

(b) By putting y = uv, with a judicious choice of the function v(x), show that

equation 2.31 may be cast into the form

d2 u √

+ I(x)u = 0, u = y p, (2.32)

dx2

1 ` 02

p + 4qp − 2pp00 . Equation 2.32 is sometimes known as

´

and where I(x) =

4p2

the normal form and I(x) the invariant of the original equation.

In property P3 we introduced the vectors (f, f 0 ) and (g, g 0 ) and in exercise 2.30 it was

shown that these vectors are linearly independent if

f (x) f 0 (x)

W (f, g; x) =

= f (x)g 0 (x) − f 0 (x)g(x) 6= 0. (2.33)

g(x) g 0 (x)

The function W (f, g; x) is named the Wronskian23 of the functions f (x) and g(x). This

notation for the Wronskian shows which functions are used to construct it and the

23 Josef Hoëné (1778 – 1853) was born in Poland, moved to France and become a French citizen in

1800. He moved to Paris in 1810 and adpoted the name Josef Hoëné de Wronski at about that time,

just after he married.

2.4. SECOND-ORDER EQUATIONS 75

W (x) or W (f, g) is freely used.

If W (f, g; x) 6= 0 for a < x < b the functions f (x) and g(x) are said to be linearly

independent in (a, b); alternatively if W (f, g; x) = 0 they are linearly dependent. These

rules apply only to sufficiently smooth functions.

The Wronskian of any two solutions, f and g, of equation 2.30 satisfies the identity

Z x

p1 (t)

W (f, g; x) = W (f, g; a) exp − dt . (2.34)

a p2 (t)

This identity is proved in exercise 2.36 by showing that W (x) satisfies a first-order

differential equation and solving it. Because the right-hand side of equation 2.34 always

has the same sign, it follows that the Wronskian of two solutions is either always positive,

always negative or always zero. Thus, if f and g are linearly independent at one point

of the interval (a, b) they are linearly independent at all points of (a, b). Conversely, if

W (f, g) vanishes anywhere it vanishes everywhere. Further, if p1 (x) = 0 the Wronskian

is constant.

The Wronskian can be used with one known solution to construct another. Suppose

that f (x) is a known solution and let g(x) be another (unknown) solution. The equation

for W (x) can be interpreted as a first-order equation for g,

g 0 f − gf 0 = W (x),

0 0 2 d g

and, because g f − gf = f , this equation, with 2.34, can be written in the

dx f

form Z x

d g W (a) p1 (t)

= exp − dt

dx f f (x)2 a p2 (t)

having the general solution

x Z s

1 p1 (t)

Z

g(x) = f (x) C + W (a) ds exp − dt , (2.35)

a f (s)2 a p2 (t)

Exercise 2.32

If F (z) is a differentiable function and g = F (f ), with f (x) a differentiable,

non-constant function of x, show that W (f, g) = 0 only if g(x) = cf (x) for any

constant c.

Exercise 2.33

Show that the functions a1 sin x + a2 cos x and b1 sin x + b2 cos x are linearly inde-

pendent if a1 b2 6= a2 b1 .

Exercise 2.34

Use equation 2.35 to show that if f (x) is any nontrivial solution ofZ the equation

x

ds

y 00 + q(x)y = 0 for a < x < b, then another solution is g(x) = f (x) 2

.

a f (s)

76 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.35

(a) If f and g are linearly independent solutions of the homogeneous differential

equation y 00 + p1 (x)y 0 + p0 (x)y = 0, show that

f g 00 − gf 00 f 0 g 00 − g 0 f 00

p1 (x) = − and p0 (x) = .

W (f, g; x) W (f, g; x)

ing the following bases of solutions:

(i) (x, sin x), (ii) (xa , xa+b ), (iii) (x, eax ),

where a and b are real numbers. Determine any singular points of these equations,

and in case (ii) consider the limit b = 0.

Exercise 2.36

By differentiating the Wronskian W (f, g; x), where f and g are linearly indepen-

dent solutions of equation 2.30, show that it satisfies the first-order differential

equation

dW p1 (x)

=− W

dx p2 (x)

and hence derive equation 2.34.

A linear, second-order equation with constant coefficients has the form

d2 y dy

a2 2

+ a1 + a0 y = h(x), (2.36)

dx dx

where ak , k = 0, 1, 2 are real constants, h(x) a real function of only x and, with no loss

of generality, a2 > 0. Normally this type of equation is solved by finding the general

solution of the homogeneous equation,

d2 y dy

a2 2

+ a1 + a0 y = 0, (2.37)

dx dx

which contains two arbitrary constants, and adding to this any particular solution of

the original inhomogeneous equation, defined by equation 2.36.

The first part of this process is trivial because, for any constant λ, the nth derivative

of exp(λx) is λn exp(λx), that is, a constant multiple of the original function. Thus if

we substitute y = exp(λx) into the homogeneous equation a quadratic equation for λ

is obtained,

a2 λ2 + a1 λ + a0 = 0. (2.38)

This has two roots, λ1 and λ2 , and provided λ1 6= λ2 we have two independent solutions,

giving the general solution

If λ1 and λ2 are real, so are the constants c1 and c2 . If the roots are complex then

λ1 = λ∗2 and, to obtain a real solution, we need c1 = c∗2 . The case λ1 = λ2 is special

and will be considered after the next exercise.

2.4. SECOND-ORDER EQUATIONS 77

Exercise 2.37

Find real solutions of the following constant coefficient, differential equations: if

no initial or boundary values are given find the general solution. Here ω and k

are real.

(a) y 00 + 5y 0 + 6y = 0,

(b) 4y 00 + 8y 0 + 3y = 0,

(c) y 00 + y 0 + y = 0,

(d) y 00 + 4y 0 + 5y = 0, y(0) = 0, y 0 (0) = 2,

(e) y 00 + 6y 0 + 13y = 0, y(0) = 2, y 0 (0) = 1,

(f) y 00 + ω 2 y = 0, y(0) = a, y 0 (0) = b,

(g) y 00 − ω 2 y = 0, y(0) = a, y 0 (0) = b,

(h) y 00 + 2ky 0 + (ω 2 + k2 )y = 0.

Repeated roots

a1

If the roots of equation 2.38 are identical, that is a21 = 4a0 a2 , then λ = − and the

2a2

above method yields only one solution, y = exp(λx). The other solution of 2.37 is found

using the method of variation of parameters, introduced in exercise 2.14. Assuming

that the other solution is y = v(x) exp(λx), where v(x) is an unknown function, and

substituting into equation 2.37 gives

d2 v

= 0. (2.40)

dx2

a1

yg (x) = (c1 + c2 x) exp(λx), λ=− , (a21 = 4a0 a2 ). (2.41)

2a2

Exercise 2.38

Derive equation 2.40.

Exercise 2.39

Find the solutions of

(b) y 00 + 2y 0 + y = 0, y(0) = a, y(X) = b.

78 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

The general solution of the inhomogeneous equation

d2 y dy

a2 2

+ a1 + a0 y = h(x) (2.42)

dx dx

can be written as the sum of the general solution of the homogeneous equation and

any particular integral of the inhomogeneous equation. This is true whether or not the

coefficients ak , k = 0, 1 and 2, are constant: but here we consider only the simpler

constant coefficient case. Boundary or initial conditions must be applied to this sum,

not the component parts.

There are a variety of methods for attempting to find a particular integral. The

problem can sometimes be made simpler by splitting h(x) into a sum of simpler terms,

h = h1 + h2 , and finding particular y1 and y2 for h1 and h2 : because the equation is

linear the required particular integral is y1 + y2 .

n

Sometimes the integral can be found by a suitable guess. Thus if h(x) Pn= x , kn

being a positive integer, we expect a particular integral to have the form k=0 ck x .

By substituting this into equation 2.42 and equating the coefficients of xk to zero, n + 1

equations for the n + 1 coefficients are obtained.

Exercise 2.40

Find the general solution of

d2 y

+ ω 2 y = x2 , ω > 0,

dx2

and find the solution that satisfies the initial conditions y(0) = a, y 0 (0) = b.

integral is eµx /(a2 µ2 + a1 µ + a0 ). But if a2 µ2 + a1 µ + a0 = 0 we can use the method

of variation of parameters by substituting y = v(x)eµx into the equation, to form a

simpler equation for v(x). These calculations form the basis of the next exercise.

Exercise 2.41

(a) For the equation

d2 y dy

a2 + a1 + a0 y = eµx ,

dx2 dx

by substituting the function y = Aeµx , A a constant, into the equation find a

particular integral if a2 µ2 + a1 µ + a0 6= 0.

(b) If a2 µ2 + a1 µ + a0 = 0, put y = v(x)eµx and show that v satisfies the equation

d2 v dv

a2 + (2µa2 + a1 ) = 1,

dx2 dx

and that this equation has the general solution

x Aa2

v= +B− e−x(2µ+a1 /a2 ) .

2µa2 + a1 2µa2 + a1

x

Hence show that a particular integral is y = eµx .

2µa2 + a1

2.4. SECOND-ORDER EQUATIONS 79

Exercise 2.42

Find the solutions of the following inhomogeneous equations with the initial con-

ditions y(0) = a, y 0 (0) = b.

d2 y d2 y d2 y

(a) + y = eix , (b) − y = sin x, (c) − 4y = 6,

dx2 dx 2 dx2

d2 y d2 y dy

(d) + 9y = 1 + 2x, (e) − − 6y = 14 sin 2x + 18 cos 2x.

dx2 dx2 dx

to an integral. This transformation is achieved by applying the method of variation

of parameters using two linearly independent solutions of the homogeneous equation,

which we denote by f (x) and g(x). We assume that the solution of the inhomogeneous

equation can be written in the form

where c1 (x) and c2 (x) are unknown functions, to be found. It transpires that both of

these are given by separable, first-order equations; but the analysis to derive this result

is a bit involved.

By substituting this expression into the differential equation, it becomes

+a1 (c01 f + c1 f 0 ) + a1 (c02 g + c2 g 0 ) + a0 (c1 f + c2 g) = h(x).

We expect this expression to simplify because f and g satisfy the inhomogeneous equa-

tion: some re-arranging gives

c1 (a2 f 00 + a1 f 0 + a0 f ) + c2 (a2 g 00 + a1 g 0 + a0 g)

+a2 (c001 f + 2c01 f 0 ) + a1 c01 f

+a2 (c002 g + 2c02 g 0 ) + a1 c02 g = h(x).

The first line of this expression is identically zero; the second line can be written in the

form

a2 (c001 f + c01 f 0 ) + a2 c01 f 0 + a1 c01 f = a2 (c01 f )0 + a2 (c01 f 0 ) + a1 (c01 f ),

and similarly for the third line. Adding these two expressions we obtain

0

a2 (c01 f + c02 g) + a2 (c01 f 0 + c02 g 0 ) + a1 (c01 f + c02 g) = h(x). (2.44)

This identity will hold if c1 and c2 are chosen to satisfy the two equations

c01 f + c02 g = 0,

h(x) (2.45)

c01 f 0 + c02 g 0 = .

a2

Any solutions of these equations will yield a particular integral.

For each x, these are linear equations in c01 and c02 , and since the Wronskian

W (f, g) 6= 0, for any x, they have unique solutions given by

hg hf

c01 (x) = − , c02 (x) = . (2.46)

W (f, g) W (f, g)

80 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Integrating these gives a particular integral. Notice that in this derivation, at no point

did we need to assume that the coefficients a0 , a1 and a2 are constant. Hence this result

is true for the general case, when these coefficients are not constant, although it is then

more difficult to find the solutions, f and g, of the homogeneous equation.

As an example we re-consider the problem of exercise 2.40, for which two linearly

independent solutions are f = cos ωx and g = sin ωx, giving W (f, g) = ω and equa-

tions 2.46 give

1 1

Z Z

c1 = − dx x2 sin ωx, c2 = dx x2 cos ωx,

ω ω

2 2

x 2 2x x 2 2x

= 2

− 4

cos ωx − 2

sin ωx, = 2

− 4

sin ωx + 2 cos ωx.

ω ω ω ω ω ω

Thus

x2 2

y = c1 cos ωx + c2 sin ωx + 2

− 4,

ω ω

the result obtained previously, although the earlier method was far easier.

Exercise 2.43

d2 y π

Find the general solution of the equation + y = tan x, 0≤x< .

dx2 2

The linear, second-order differential equation

d2 y dy

a 2 x2 + a1 x + a0 y = 0, a2 > 0, (2.47)

dx2 dx

where the coefficients a0 , a1 and a2 are constants, is named a (homogeneous) Euler

equation, of second order. This equation is normally defined on an interval of the x-

axis which does not include the origin except, possibly, as an end point. It is one of the

relatively few equations with variable coefficients that can be solved in terms of simple

functions.

If we introduce a new independent variable, t, by x = et , then

dy dy dt 1 dy dy dy

= = that is x = . (2.48)

dx dt dx x dt dx dt

A second differentiation gives

d2 y d2 y d2 y dy

d dy

x x = 2 x2 = − , (2.49)

dx dx dt dx2 dt2 dt

d2 y dy

a2 + (a1 − a2 ) + a0 y = 0. (2.50)

dt2 dt

This can be solved using the methods described in section 2.4.4.

2.5. AN EXISTENCE AND UNIQUENESS THEOREM 81

Exercise 2.44

Use the method described above to solve the equation

d2 y dy

x2 + 2x − 6y = 0, y(1) = 1, y 0 (1) = 0, x ≥ 1.

dx2 dx

Exercise 2.45

Find the solution of

d2 y dy

x + = 0, y(1) = A, y 0 (1) = A0 , x ≥ 1.

dx2 dx

Exercise 2.46

d3 y d3 y d2 y dy

Show that if x = et then = x3 3 + 3x2 2 + x , and hence that

dt3 dx dx dx

d3 y d3 y d2 y dy

x3 3

= 3 −3 2 +2 .

dx dt dt dy

Hence find the general solution of the equation

d3 y d2 y dy √

x3 3

− 3x2 2 + 6x − 6y = x, x ≥ 0.

dx dx dx

Here we quote a basic existence theorem for coupled first-order systems, which is less

restrictive than theorem 2.1, but which does not provide a method of constructing

the solution. This proof was first given by Cauchy in his lecture course at the École

polytechnique between 1820 and 1830.

Theorem 2.2

For the n coupled first-order, autonomous, initial value system

dyk

= fk (y), y(x0 ) = A, (2.51)

dx

where y = (y1 , y2 , . . . , yn ), A = (A1 , A2 , . . . , An ) and where fk (y) are differentiable

functions of y on some domain D, ak ≤ yk ≤ bk , −∞ ≤ ak < bk ≤ ∞, k = 1, 2, · · · , n,

then:

(i) for every real x0 and A ∈ D there exists a solution satisfying the initial conditions

y(x0 ) = A, and;

(ii) this solution is unique in some neighbourhood of x containing x0 .

A geometric understanding of this theorem comes from noting that in any region

where fk (y) 6= 0, for some k, all solutions are non-intersecting, smooth curves. More

precisely in a neighbourhood of a point y0 where fk (y0 ) 6= 0, for some k, if all fk (y0 )

have continuous second derivatives, it is possible to find a new set of variables u such

du1 duk

that in the neighbourhood y0 equation 2.51 transform to = 1 and = 0,

dx dx

82 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

transformation is said to rectify the system. From this it follows that a unique solution

exists. A proof of the above theorem that uses this idea is give in Arnold24 .

dyk

= fk (y, x), y(x0 ) = A,

dx

can, by setting x = yn+1 , fn+1 = 1, be converted to an n + 1 dimensional autonomous

system.

Second, we note that differentiability of fk (y), for all k, is necessary for uniqueness.

Consider, for instance, the system dy/dx = y 2/3 , y(0) = 0, which has the two solutions

y(x) = 0 and y(x) = (x/3)3 .

The equation f (x, y) = 0 defines a curve in the Cartesian space Oxy. If the function

contains a parameter C, the equation becomes f (x, y, C) = 0 and a different curve is

obtained for each value of C. By varying C over an interval we obtain a family of curves:

the envelope of this family is the curve that touches every member of this family.

This envelope curve is given by eliminating C between the two equations

∂f

f (x, y, C) = 0 and = 0. (2.52)

∂C

Before proving this result we illustrate the idea with the equation

value of φ this equation defines a straight line cutting 1.2

the x and y axes at r/ cos φ and r/ sin φ, respectively,

1

and passing a minimum distance of r from the origin.

Segments of five of these lines are shown in figure 2.3, 0.8

and it is not too difficult to imagine more segments 0.6

and to see that the envelope is a circle of radius r.

0.4

For this example equations 2.52 become

0.2

x cos φ + y sin φ = r and − x sin φ + y cos φ = 0. 0

0 0.2 0.4 0.6 0.8 1 1.2

Figure 2.3 Diagram showing five

examples of the line defined in equa-

tion 2.53, with r = 1 and φ = kπ/14,

k = 2, 3, · · · , 6.

a circle with radius r and centre at the origin.

24 Arnold V I 1973 Ordinary Differential Equations, section 32.6, translated and edited by R A

2.6. ENVELOPES OF FAMILIES OF CURVES (OPTIONAL) 83

follows. Suppose that f (x, y, C) = 0 is the general solution of a first-order equation,

so on each member of the family of curves f (x, y, C) = 0 the gradient satisfies the

differential equation. Where the envelope touches a member of the family the gradient

and coordinates of the point on the envelope also satisfy the differential equation. But,

by definition, the envelope touches some member of the family at every point along its

length. We therefore expect the envelope to satisfy the differential equation: since it

does not include any arbitrary constant and is not one of the family of curves, it is a

singular solution.

We prove equation 2.52 by considering neighbouring

members of the family of curves f (x, y, C + kδC) = 0,

k = 1, 2, 3, · · · , 0 < δC |C|, such that the curves

R

defined by f (x, y, C) and f (x, y, C+δC) intersect at P , Q

those by f (x, y, C +δC) and f (x, y, C +2δC) at Q, and P

so on, as shown in figure 2.4. As δC → 0 the members

4

of this family of curves approach each other as do the 3

points P , Q and R. The locus of these points form a 2

curve each point of which lies on successive members k=1

of the original family.

Figure 2.4

as δC → 0 we require values of x and y that satisfy both of these equations. The second

equation can be exanded,

Thus the points on the locus of P , Q, R · · · each satisfy both equations f (x, y, C) = 0

and fC (x, y, C) = 0, so the equation of the envelope is obtained by eliminating C from

these equations.

Exercise 2.47

The equation of a straight line intersecting the x-axis at a and the y-axis at b is

x/a + y/b = 1.

(a) Find the envelope, in the first quadrant, of the family of straight lines such

that the sum of the intersects is constant, a + b = d > 0.

(b) Find the envelope, in the first quadrant, of the family of straight lines such

that the product of the intersects is constant, ab = d2 .

84 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.48

Find the solution of each of the following differential equations: if no initial or

boundary values are given, find the general solution.

dy

(a) + y = y 1/2 , y(1) = A > 0.

dx

dy

(b) − y = y 1/5 , y(1) = A > 0.

dx

1 dy 1

(c) − x = xy, y(0) = .

y dx 2

dy

(d) = sin(x − y), y(0) = 0.

dx

dy p

(e) x = y + x2 + y 2 , y(0) = A > 0.

dx

dy x + 2y − 1

(f) = , y(1) = 0.

dx 2x + 4y + 3

dy

(g) y − x + y = 0, y(1) = 1.

dx

dy

(h) + y sinh x = sinh 2x, y(0) = 1.

dx

dy

(i) x + 2y = x3 , y(1) = 0.

dx

dy √

(j) = y tan x + y 3 tan3 x, y(0) = 2.

dx

dy

(k) x3 = y(x2 + y).

dx

Exercise 2.49

Find the solution of each of the following differential equations: if no initial or

boundary values are given, find the general solution.

„ «2

d2 y dy

(a) = 2x , y(0) = 0, y 0 (0) = 1.

dx2 dx

d2 y dy

(b) x2 −x + y = x3 ln x.

dx2 dx

d2 y dy

(c) x 2 = y .

dx dx

d2 y dy

(d) x 2 − = 3x2 .

dx dx

„ «2

d2 y dy

(e) x 2 = .

dx dx

„ «2

d2 y dy

(f) + (x + a) = 0, y(0) = A, y 0 (0) = B, 0 < Ba2 < 2.

dx2 dx

„ «2

d2 y dy

(g) (y − a) 2 + = 0.

dx dx

2.7. MISCELLANEOUS EXERCISES 85

Exercise 2.50

For each of the following equations, show that the given function, v(x), is a solution

and hence find the general solution.

d2 y dy

(a) x(1 − x)2 2 + (1 − x2 ) + (1 + x)y = 0, v(x) = 1 − x.

dx dx

d2 y dy cos x

(b) x 2 + 2 + xy = 0, v = .

dx dx x

d2 y dy

(c) + 2 tan x + 2y tan2 x = 0, v = ex cos x.

dx2 dx

Exercise 2.51

(a) Consider the Riccati equation with constant coefficients,

dy

= a + by + cy 2 , c 6= 0,

dx

where a, b and c are constants.

Show that if b2 =6 4ac the general solution is

8 ω √

tan(ωx + α), ω = 12 4ac − b2 if 4ac > b2

b <

c

y(x) = − + √

2c : − ν tanh(νx + α), ν = 12 b2 − 4ac if 4ac < b2

c

where α is a constant. Also find the general solution if b2 = 4ac.

(b) Find the solutions of the following equations;

(i) y 0 = 2 + 3y + y 2 , (ii) y 0 = 9 − 4y 2 ,

(iii) y = 1 − 2y + y , (iv) y 0 = 1 + 4y + 5y 2 .

0 2

Exercise 2.52

(a) Show that the change of variable v = y 0 /y reduces the second-order equation

d2 y dy

+ a1 (x) + a0 (x)y = 0

dx2 dx

to the Riccati equation

dv

+ v 2 + a1 (x)v + a0 (x) = 0.

dx

Hence deduce that the problem of solving the original second-order equation is

equivalent to solving the coupled first-order equations

dy dv

= vy, = −v 2 − a1 (x)v − a0 (x).

dx dx

This equation is named the associated Riccati equation.

(b) Using an appropriate solution of y 00 + ω 2 y = 0, where ω is a real constant,

show that the general solution of v 0 + v 2 + ω 2 = 0 is v = −ω tan(ωx + c).

86 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.53

(a) If x(t) and y(t) satisfy the pair of coupled, linear equations

dx dy

= ax + by, = cx + dy,

dt dt

where a, b, c and d are constants, show that the ratio z = y/x satisfies the Riccati

equation

dz

= c + (d − a)z − bz 2 .

dt

(b) Hence show that the general solution of this Riccati equation is

λ1 eλ1 t + Cλ2 eλ2 t p

z= where 2λ1,2 = (d − a) ± (d − a)2 + 4bc

b (eλ1 t + Ceλ2 t )

and C is an arbitrary constant.

Exercise 2.54

In this and the next exercise you will show that some of the equations studied by

Riccati have closed form solutions.

(a) Consider the equation

dz

x = az − bz 2 + cxn , (2.54)

dx

where a, b and c are constants. By putting z = yxa show that this becomes the

Riccati equation

dy

= −bxa−1 y 2 + cxn−a−1

dx

and by changing the independent variable to ξ = xa transform this to

dy c b

= ξ (n−2a)/a − y 2 .

dξ a a

Deduce that if n = 2a the solution of the original equation can be expressed in

terms of simple functions.

a xn

(b) By substituting z = + into equation 2.54 show that it becomes

b u

du

x = (n + a)u − cu2 + bxn ,

dx

which is the same but with (a, b, c) replaced by (n + a, c, b). Deduce that the

solution of equation 2.54 can be expressed in terms of simple functions if n = 2a

or n = 2(n + a).

Using further, similar transformations show that the original equation has closed-

form solutions if n = 2(ns + a), s = 0, 1, 2, · · · .

Exercise 2.55

By putting z = xn /u into equation 2.54 show that u satisfies the equation

du

x = (n − a)u − cu2 + bxn

dx

and deduce that z(x) can be expressed in terms of simple functions if n = 2(n−a).

By making further transformations of the type used in exercise 2.54, show that

z(x) can be expressed in terms of simple functions if n = 2(ns − a), s = 1, 2, · · · .

2.7. MISCELLANEOUS EXERCISES 87

Exercise 2.56

Consider the sequence of functions

y0 (x) = A + A0 (x − a),

Z x

yn (x) = dt (t − x)G(t)yn−1 (t), n = 1, 2, · · · ,

a

Show that if

∞

X

y(x) = yk (x)

k=0

and assuming that the infinite series is uniformly convergent on an interval con-

taining x = a, then y(x) satisfies the second-order equation

d2 y

+ G(x)y = 0, y(a) = A, y 0 (0) = A0 .

dx2

Exercise 2.57

It is well known that the exponential function, E(x) = ex , is the solution of the

first-order equation

dE

= E, E(0) = 1. (2.55)

dx

x

Not so well known is the fact that many of the properties of e , for real x, can be

deduced directly from this equation.

(a) Using theorem 2.2 (page 81) deduce that there are no real values of x at which

E(x) = 0.

(b) By defining the function W (x) = 1/E(x), show that W 0 (y) = W (y), W (0) = 1,

where y = −x, and deduce that E(x)E(−x) = 1.

(c) If Z(x) = E(x + y) show that Z 0 (x) = Z(x), Z(0) = E(y), and hence deduce

that E(x + y) = E(x)E(y).

(d) If L(y) is the inverse function, that is if E(x) = y then L(y) = x, show that

L0 (y) = 1/y, L(y1 y2 ) = L(y1 ) + L(y2 ) and L(1/y) = −L(y).

“ ”

1+z

(e) Show that the Taylor series of E(x), L(1 + z) and L 1−z are

∞ ∞ ∞

xn zn x2n+1

„ «

X X 1+z X

E(x) = , L(1 + z) = (−1)n−1 and L =2 .

n=0

n! n=1

n 1−z n=0

2n + 1

Exercise 2.58

In this exercise you will derive some important properties of the sine and cosine

functions directly from the differential equations that can be used to define them.

Your solutions must not make use of trigonometric functions.

(a) Show that the solution of the initial value problem

d2 z

+ z = 0, z(0) = α, z 0 (0) = β,

dx2

can be written as an appropriate linear combination of the functions C(x) and

S(x) which are defined to be the solutions of the equations

„ 0 « „ « „ «

C C 0 −1

0 =A , A= , C(0) = 1, S(0) = 0.

S S 1 0

88 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

(c) If, for any real constant a,

show that

f0

„ « „ «

f

=A , f (0) = C(a), g(0) = S(a),

g0 g

and deduce that

(d) Show that there is a non-negative number X such that

(e) Show that

„ « „ « „ « „ « „ « „ «

X X X X 3X 3X

S = 1, C = 0; S = 0, C = −1; S = −1, C = 0.

4 4 2 2 4 4

(f) Show that A2 = −I and hence that A2n = (−1)n I and A2n+1 = (−1)n A.

By repeated differentiation of the equations defining C and S show that

„ (n) « „ «

C (0) n 1

= A

S (n) (0) 0

and deduce that the Taylor expansions of C(x) and S(x) are

x2 x4 x2n

C(x) = 1− + + · · · + (−1)n + ··· ,

2! 4! (2n)!

x3 x5 x2n+1

S(x) = x− + + · · · + (−1)n + ··· .

3! 5! (2n + 1)!

Exercise 2.59

Find the normal forms, as defined in exercise 2.31(b) of Legendre’s equation

„ «

d dy

(1 − x2 ) + λy = 0.

dx dx

Exercise 2.60 Z x p

Show that changing to the independent variable t = dx q(x) converts the

a

00 0

equation y + p1 (x)y + q(x)y = 0, a ≤ x ≤ b, q(x) > 0, into

d2 y q 0 (x) + 2p1 q dy

2

+ + y = 0.

dt 2q 3/2 dt

2.7. MISCELLANEOUS EXERCISES 89

Exercise 2.61

If f (x) and g(x) and h(x) are any solutions of the second-order equation y 00 +

p1 (x)y 0 + q(x)y = 0, show that the following determinant is zero

˛ f f 0 f 00 ˛

˛ ˛

˛ g g 0 g 00 ˛ .

˛ ˛

˛ ˛

˛ h h0 h00 ˛

Exercise 2.62

Using the results found in exercise 2.35 (page 76) to construct a linear, homoge-

neous, second-order differential equation having the solutions

(a) (sinh x, sin x), (b) (tan x, 1/ tan x).

Exercise 2.63

Use the results found in exercise 2.35 (page 76) to show that the equation

d2 y u0 dy f0

2

− − u2 y = 0, u= ,

dx u dx f

Exercise 2.64

Let f (x), g(x) and h(x) be three solutions of the linear, third order differential

equation

d3 y d2 y dy

3

+ p2 (x) 2 + p1 (x) + p0 (x)y = 0.

dx dx dx

Derive a first-order differential equation for the Wronskian

˛ ˛

˛ f g h ˛˛

˛ 0 0 0

W (x) = ˛˛ f g h ˛˛ .

˛ f 00 g 00 h00 ˛

minant, A, where the elements depend upon x is

n

d X

det(A) = det(Ak )

dx k=1

Exercise 2.65

The Schwarzian derivative

(a) If f (x) and g(x) are any two linearly independent solutions of the equation

y 00 + q(x)y = 0, show that the ratio v = f /g is a solution of the third order,

nonlinear equation S(v) = 2q(x), where

«2

v 000 v 00

„

3

S(v) = − .

v0 2 v0

90 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

„ «

av + b

S = S(v).

cv + d

The function S(v) is named the Schwarzian derivative and has the important

property that if S(F ) < 0 and S(G) < 0 in an interval, then S(H) < 0, where

H(x) = F (G(x)). This result is useful in study of bifurcations of the fixed points

of one dimensional maps.

Exercise 2.66

The radius of curvature

The equation of the normal to a curve repre- y

sented by the function y = f (x), through the

point (ξ, η) is r

y−η =−

1

(x − ξ), m(x) =

df

. η

m(ξ) dx

x

ξ ξ+δξ

(a) Consider the adjacent normal, through the point (ξ + δξ, η + δη), where δη =

f 0 (ξ)δξ, and find the point where this intersects the normal through (ξ, η), correct

to first order in δξ.

(b) If the curve defined by f (x) is a segment of a circle of radius r, all normals

intersect at its centre, a distance r from (ξ, η). The point of intersection found in

part (a) will be a distance r(ξ, δξ) from the point (ξ, η) and we define the radius

of curvature by the limit ρ(ξ) = limδξ→0 r(ξ, δξ). Use this definition to show that

1 f 00 (ξ)

= .

ρ (1 + f 0 (ξ)2 )3/2

Exercise 2.67

The tangent to a curve C intersects the x- and y-axes at x = a and y = b,

respectively. If the product ab = 2∆ is constant as the tangent moves on C, show

that the differential equation for C is given by

dy

2p∆ = −(px − y)2 , where p= .

dx

Notice that |∆| is the area of the triangle formed by the axes and the tangent.

∆

Show that the singular solution of this equation is the hyperbola xy = , and

2

show that the general solution is a family of straight lines.

2.7. MISCELLANEOUS EXERCISES 91

This section of exercises contains a few elementary applications giving rise to simple

first-order equations. Part of each of these questions involves deriving a differential

equation, so all of these exercises are optional.

Exercise 2.68

The number, N , of a particular species of atom that decays in sufficiently large

volume of material decreases at a rate proportional to N . The half-life of a sub-

stance containing only one species of decaying atoms is defined to be the time

for N to decrease to N/2. The half-life of Carbon-14 is 5600 years; if initially

there are N0 Carbon-14 atoms find an expression for N (t), the number of atoms

at t ≥ 0.

Exercise 2.69

A moth ball evaporates, losing mass at a rate proportional to its surface area.

Initially it has radius 10 cm and after a month this has become 5 cm. Find its

radius as a function of time and the time at which it vanishes.

Exercise 2.70

A tank contains 1000 L of pure water. At time t = 0 brine containing 1 kg of

salt/L is added at a rate of one litre a minute, with the mixture kept uniform by

constant stirring, and one litre of the mixture is run off every minute, so the total

volume remains constant. When will there be 50 kg of dissolved salt in the tank?

Exercise 2.71

Torricelli’s law

Torricelli’s law states that water flows out of an open tank through a small hole

at a speed it would acquire falling freely from the surface to the hole.

A hemispherical bowl of radius R has a small circular hole, of radius a, drilled in

its bottom. It is initially full of water and at time t = 0 the hole is uncovered.

How long does it take for the bowl to empty?

Exercise 2.72

Water clocks

Water clocks, or clepsydra meaning ‘water thief’, are devices for measuring time

using the regular rate of flow of water, and were in use from the 15 th century BC,

in Egypt, to about 100 BC25

A simple version is a vessel from which water escapes from a small hole in the

bottom. It was used in Greek and Roman courts to time the speeches of lawyers.

Determine the shape necessary for the water level to fall at a constant rate.

92 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS

Exercise 2.73

By winding a rope round a circular post, a rope can be used to restrain large

weights with a small force. If T (θ) and T (θ + δθ) = T (θ) + δT are the tensions

in the rope at angles θ and θ + δθ, then it can be shown that a normal force of

approximately δT is exerted by the rope on the post in (θ, θ + δθ). If µ is the

coefficient of friction between the rope and the post, then µT δθ ' δT .

Use this to find a differential equation satisfied by T (θ) and by solving this

find T (θ).

Exercise 2.74

A chain of length L starts with a length l0 hanging over the edge of a horizontal

table. It is released from rest at time t = 0. Neglecting friction determine how

long it takes to fall off the table.

Exercise 2.75

Lambert’s law of absorption

Lambert’s law of absorption states that the percentage of incident light absorbed

by a thin layer of translucent material is proportional to the thickness of the

layer. If sunlight falling vertically on ocean water is reduced to one-half its initial

intensity at a depth of 10 feet, find a differential equation for the intensity as a

1

function of the depth and determine the depth at which the intensity is 16 th of

the initial intensity.

Exercise 2.76

shown in the figure, will oscillate if one side

of the liquid is initially higher than the other

side. If the liquid is initially a height h0 above

the other side, use conservation of energy to

h h2

show that, if friction can be ignored,

2g 2

ḣ2 = (h − h2 ), h1

L 0

where h(t) is the difference in height at time

t, L is the total length of the tube and g is

the acceleration due to gravity.

Use this formula to find h(t) and top

show that

the period of oscillations is T = π 2L/g.

Figure 2.5

Exercise 2.77

It can be shown that a body inside the earth is attracted towards the centre by a

force that is directly proportional to the distance from the centre.

If a hole joining any two points on the surface is drilled through the earth and

a particle can move without friction along this tube, show that the period of

oscillation is independent of the end points. The rotation of the earth should be

ignored.

Chapter 3

3.1 Introduction

In this chapter we consider the particular variational principle defining the shortest

distance between two points in a plane. It is well known that this shortest path is the

straight line, however, it is almost always easiest to understand a new idea by applying it

to a simple, familiar problem; so here we introduce the essential ideas of the Calculus of

Variations by finding the equation of this line. The algebra may seem overcomplicated

for this simple problem, but the same theory can be applied to far more complicated

problems, and we shall see in chapter 4 the most important equation of the Calculus of

Variations, the Euler-Lagrange equation, can be derived with almost no extra effort.

The chapter ends with a description of some of the problems that can be formulated

in terms of variational principles, some of which will be solved later in the course.

The approach adopted is intuitive, that is we assume that functionals behave like

functions of n real variables. This is exactly the approach used by Euler (1707 – 1783)

and Lagrange (1736 – 1813) in their original analysis and it can be successfully applied

to many important problems. However, it masks a number of problems, all to do

with the subtle differences between infinite and finite dimensional spaces which are not

considered in this course.

plane

The distance between two points Pa = (a, A) and Pb = (b, B) in the Oxy-plane along a

given curve, defined by the function y(x), is given by the functional

Z b p

S[y] = dx 1 + y 0 (x)2 . (3.1)

a

The curve must pass through the end points, so y(x) satisfies the boundary conditions,

y(a) = A and y(b) = B. We shall usually assume that y 0 (x) is continuous on (a, b).

We require the equation of the function that makes S[y] stationary, that is we need

to understand how the values of the functional S[y] change as the path between Pa and

93

94 CHAPTER 3. THE CALCULUS OF VARIATIONS

Pb varies. These ideas are introduced here, and developed in chapter 4, using analogies

with the theory of functions of many real variables.

In the theory of functions of several real variables a stationary point is one at which the

values of the function at all neighbouring points are ‘almost’ the same as at the station-

ary point. To be precise, if G(x) is a function of n real variables, x = (x1 , x2 , · · · , xn ),

we compare values of G at x and the nearby point x + ξ, where || 1 and |ξ| = 1.

Taylor’s expansion, equation 1.37 (page 36), gives,

n

X ∂G

G(x + ξ) − G(x) = ξk + O(2 ). (3.2)

∂xk

k=1

A stationary point is defined to be one for which the term O() is zero for all ξ. This

gives the familiar conditions for a point to be stationary, namely ∂G/∂xk = 0 for

k = 1, 2, · · · , n.

For a functional we proceed in the same way. That is, we choose adjacent paths

joining Pa to Pb and compare the values of S along these paths. If a path is represented

by a differentiable function y(x), adjacent paths may be represented by y(x) + h(x),

where is a real variable and h(x) another differentiable function. Since all paths must

pass through Pa and Pb , we require y(a) = A, y(b) = B and h(a) = h(b) = 0; otherwise

h(x) is arbitrary. The difference

δS = S[y + h] − S[y],

may be considered as a function of the real variable , for arbitrary y(x) and h(x) and

for small values of , || 1. When = 0, δS = 0 and for small || we expect δS to be

proportional to ; in general this is true as seen in equation 3.3 below.

However, there may be some paths for which δS is proportional to 2 , rather than .

These paths are special and we define these to be the stationary paths, curves or sta-

tionary functions. Thus a necessary condition for a path y(x) to be a stationary path

is that

S[y + h] − S[y] = O(2 ),

for all suitable h(x). The equation for the stationary function y(x) is obtained by

examining this difference more carefully.

The distances along these adjacent curves are

Z b p Z b p

S[y] = dx 1 + y 0 (x)2 , and S[y + h] = dx 1 + [y 0 (x) + h0 (x)]2 .

a a

We proceed by expanding the integrand of S[y + h] in powers of , retaining only the

terms proportional to . One way of making this expansion is to consider the integrand

as a function of and to use Taylor’s series to expand in powers of ,

p p d p

0 0 2

1 + (y + h ) = 1+y + 0 2 0 0

1 + (y + h ) 2 + O(2 ),

d =0

p y 0 h0

= 1 + y0 2 + p + O(2 ).

1 + y0 2

3.2. THE SHORTEST DISTANCE BETWEEN TWO POINTS IN A PLANE 95

Substituting this expansion into the integral and rearranging gives the difference be-

tween the two lengths,

Z b

y 0 (x)

S[y + h] − S[y] = dx p h0 (x) + O(2 ). (3.3)

0

1 + y (x) 2

a

This difference depends upon both y(x) and h(x), just as for functions of n real variables

the difference G(x+ξ)−G(x), equation 3.2, depends upon both x and ξ, the equivalents

of y(x) and h(x) respectively.

Since S[y] is stationary it follows, by definition, that

Z b

y 0 (x)

dx p h0 (x) = 0 (3.4)

1 + y 0 (x)2

a

We shall see in chapter 4 that because 3.4 holds for all those functions h(x) for

which h(a) = h(b) = 0 and h0 (x) is continuous, this equation is sufficient to determine

y(x) uniquely. Here, however, we simply show that if

y 0 (x)

p = α = constant for all x, (3.5)

1 + y 0 (x)2

then the integral in equation 3.4 is zero for all h(x). Assuming that 3.5 is true, equa-

tion 3.4 becomes

Z b

dx αh0 (x) = α {h(b) − h(a)} = 0 since h(a) = h(b) = 0.

a

In section 4.3 we show that condition 3.5 is necessary as well as sufficient for equation 3.4

to hold.

Equation 3.5 shows that y 0 (x) = m, where m is a constant, and integration gives

the general solution,

y(x) = mx + c

for another constant c: this is the equation of a straight line as expected. The constants

m and c are determined by the conditions that the straight line passes through Pa

and Pb :

B−A Ab − Ba

y(x) = x+ . (3.6)

b−a b−a

This analysis shows that the functional S[y] defined in equation 3.1 is stationary along

the straight line joining Pa to Pb . We have not shown that this gives a minimum

distance: this is proved in exercise 3.2.

Exercise 3.1

Use the above method on the functional

Z 1 p

S[y] = dx 1 + y 0 (x), y(0) = 0, y(1) = B > −1,

0

√ line y(x) = Bx, and that the

value of the functional on this line is S[y] = 1 + B.

96 CHAPTER 3. THE CALCULUS OF VARIATIONS

In this section we show that the straight line 3.6 gives the minimum distance. For

practical reasons this analysis is divided into two stages. First, we show that the

straight line is a local minimum of the functional, using an analysis that is generalised

in chapter 8 to functionals. Second, we show that, amongst the class of differentiable

functions, the straight line is actually a global minimum: this analysis makes use of

special features of the integrand.

The distinction between local and global extrema is illustrated in figure 3.1. Here

we show a function f (x), defined in the interval a ≤ x ≤ b, having three stationary

points B, C and D, two of which are minima the other being a maximum. It is clear

from the figure that at the stationary point D, f (x) takes its smallest value in the

interval — so this is the global minimum. The function is largest at A, but this point

is not stationary — this is the global maximum. The stationary point at B is a local

minimum, because here, f (x) is smaller than at any point in the neighbourhood of B:

likewise the points C and D are local maxima and minima, respectively. The adjective

local is frequently omitted. In some texts local extrema are named relative extrema.

f ( x) A

E

C

B

D

x

a b

Figure 3.1 Diagram to illustrate the difference be-

tween local and global extrema.

It is clear from this example that to classify a point as a local extremum requires an

examination of the function values only in the neighbourhood of the point. Whereas,

determining whether a point is a global extremum requires examining all values of the

function; this type of analysis usually invokes special features of the function.

The local analysis of a stationary point of a function, G(x), of n variables proceeds

by making a second order Taylor expansion about a point x = a,

n n n

X ∂G 1 X X ∂2G

G(a + ξ) = G(a) + ξk + 2 ξk ξj + · · · ,

∂xk 2 j=1

∂xk ∂xj

k=1 k=1

first derivatives are zero. The nature of the stationary point is usually determined by

the behaviour of the second order term. For a stationary point to be a local minimum

it is necessary for the quadratic terms to be strictly positive for all ξ, that is

n Xn

X ∂2G

ξk ξj > 0 for all ξk , ξj , k, j = 1, 2, · · · , n,

j=1

∂xk ∂xj

k=1

with |ξ| = 1. The stationary point is a local maximum if this quadratic form is strictly

negative. For large n it is usually difficult to determine whether these inequalities are

satisfied, although there are well defined tests which are described in chapter 8.

3.2. THE SHORTEST DISTANCE BETWEEN TWO POINTS IN A PLANE 97

For a functional we proceed in the same way: the nature of a stationary path

is usually determined by the second order expansion. If S[y] is stationary then, by

definition,

1

S[y + h] − S[y] = ∆2 [y, h]2 + O(3 )

2

for some quantity ∆2 [y, h], depending upon both y and h; special cases of this expansion

are found in exercises 3.2 and 3.3. Then S[y] is a local minimum if ∆2 [y, h] > 0 for

all h(x), and a local maximum if ∆2 [y, h] < 0 for all h(x). Normally it is difficult to

establish these inequalities, and the general theory is described in chapter 8. For the

functional defined by equation 3.1, however, the proof is straight forward; the following

exercise guides you through it.

Exercise 3.2

(a) Use the binomial expansion, exercise 1.32 (page 34), to obtain the following

expansion in ,

p αβ β 2 2

+ O(3 ).

p

1 + (α + β)2 = 1 + α2 + √ +

1+α 2 2(1 + α2 )3/2

(b) Use this result to show that if y(x) is the straight line defined in equation 3.6

and S[y] the functional 3.1, then,

Z b

2 B−A

S[y + h] − S[y] = dx h0 (x)2 + O(3 ), m = .

2(1 + m2 )3/2 a b−a

Deduce that the straight line is a local minimum for the distance between Pa

and Pb .

Exercise 3.3

In this exercise the functional defined in exercise 3.1 is considered in more detail.

By expanding the integrand of S[y + h] to second order in show that, if y(x) is

the stationary path, then

Z 1

2

S[y + h] = S[y] − dx h0 (x)2 , B > −1.

8(1 + B)3/2 0

Deduce that the path y(x) = Bx, B > −1, is a local maximum of this functional.

Now we show that the straight line between the points (0, 0) and (a, A) gives a global

minimum of the functional, not just a local minimum. This analysis relies on a special

property of the integrand that follows from the Cauchy-Schwarz inequality.

Exercise 3.4

Use the Cauchy-Schwarz inequality (page 41) with a = (1, z) and b = (1, z + u)

to show that p

1 + (z + u)2 1 + z 2 ≥ 1 + z 2 + zu

p

p p zu

1 + (z + u)2 − 1 + z 2 ≥ √ .

1 + z2

98 CHAPTER 3. THE CALCULUS OF VARIATIONS

The distance between the points (0, 0) and (a, A) along the path y(x) is

Z a p

S[y] = dx 1 + y0 2, y(0) = 0, y(a) = A.

0

On using the inequality derived in the previous exercise, with z = y 0 (x) and u = h0 (x),

we see that Z a

y0

S[y + h] − S[y] ≥ dx p h0 .

0 1 + y 0 2

But on the stationary path y 0 is a constant and since h(0) = h(a) = 0 we have

S[y + h] ≥ S[y] for all h(x).

This analysis did not assume that |h| is small, and since all admissible paths can

be expressed in the form x + h(x), we have shown that in the class of differentiable

functions the straight line gives the global minimum of the functional.

An observation

Problems involving shortest distances on surfaces other than a plane illustrate other

features of variational problems. Thus if we replace the plane by the surface of a sphere

then the shortest distance between two points on the surface is the arc length of a

great circle joining the two points — that is the circle created by the intersection of

the spherical surface and the plane passing through the two points and the centre of

the sphere; this problem is examined in exercise 5.20 (page 168). Now, for most points,

there are two stationary paths corresponding to the long and the short arcs of the great

circle. However, if the points are at opposite ends of a diameter, there are infinitely

many shortest paths. This example shows that solutions to variational problems may

be complicated.

In general, the stationary paths between two points on a surface are named geodesics 1 .

For a plane surface the only geodesics are straight lines; for a sphere, most pairs of points

are joined by just two geodesics that are the segments of the great circle through the

points. For other surfaces there may be several stationary paths: an example of the

consequences of such complications is described next.

The general theory of relativity, discovered by Einstein (1879 – 1955), shows that the

path taken by light from a source to an observer is along a geodesic on a surface in a

four-dimensional space. In this theory gravitational forces are represented by distortions

to this surface. The theory therefore predicts that light is “bent” by gravitational

forces, a prediction that was first observed in 1919 by Eddington (1882 – 1944) in his

measurements of the position of stars during a total solar eclipse: these observations

provided the first direct confirmation of Einstein’s general theory of relativity.

The departure from a straight line path depends upon the mass of the body be-

tween the source and observer. If it is sufficiently massive, two images may be seen as

illustrated schematically in figure 3.2.

1 In some texts the name geodesic is used only for the shortest path.

3.3. TWO GENERALISATIONS 99

Quasar Image

Galaxy Earth

Quasar Image

Figure 3.2 Diagram showing how an intervening galaxy can sufficiently dis-

tort a path of light from a bright object, such as a quasar, to provide two

stationary paths and hence two images. Many examples of such multiple im-

ages, and more complicated but similar optical effects, have now been observed.

Usually there are more than two stationary paths.

3.3.1 Functionals depending only upon y 0 (x)

The functional 3.1 (page 93) depends only upon the derivative of the unknown function.

Although this is a special case it is worth considering in more detail in order to develop

the notation we need.

If F (z) is a differentiable function of z then a general functional of the form of 3.1 is

Z b

S[y] = dx F (y 0 ), y(a) = A, y(b) = B, (3.7)

a

0 0

where F (y ) simply means that in F (z) all occurrences

√ of z are replaced

p by y (x). Thus

0

for the distance between two points F (z) = 1 + z so F (y ) = 1 + y (x)2 . Note

2 0

The difference between the functional evaluated along y(x) and the adjacent paths

y(x) + h(x), where || 1 and h(a) = h(b) = 0, is

Z b

dx F (y 0 + h0 ) − F (y 0 ) .

S[y + h] − S[y] = (3.8)

a

Taylor’s theorem gives

dF

F (z + u) = F (z) + u + O(2 ).

dz

The expansion of F (y 0 +h0 ) is obtained from this simply by the replacements z → y 0 (x)

and u → h0 (x), which gives

d

F (y 0 + h0 ) − F (y 0 ) = h0 (x) F (y 0 ) + O(2 ) (3.9)

dy 0

where the notation dF/dy 0 means

d 0 dF

F (y ) = . (3.10)

dy 0 dz z=y0 (x)

100 CHAPTER 3. THE CALCULUS OF VARIATIONS

√

For instance, if F (z) = 1 + z 2 then

dF z dF y 0 (x)

=√ and = .

dy 0

p

dz 1 + z2 1 + y 0 (x)2

Exercise 3.5

Find the expressions for dF/dy 0 when

(a) F (y 0 ) = (1 + y 0 2 )1/4 , (b) F (y 0 ) = sin y 0 , (c) F (y 0 ) = exp(y 0 ).

b

d

Z

S[y + h] − S[y] = dx h0 (x) F (y 0 ) + O(2 ). (3.11)

a dy 0

The functional S[y] is stationary if the term O() is zero for all suitable functions h(x).

As before we give a sufficient condition, deferring the proof that it is also necessary. In

this analysis it is important to remember that F (z) is a given function and that y(x)

is an unknown function that we need to find. Observe that if

d

F (y 0 ) = α = constant (3.12)

dy 0

then

since h(a) = h(b) = 0.

In general equation 3.12 is true only if y 0 (x) is also constant, and hence

B−A Ab − Ba

y(x) = mx + c and therefore y(x) = x+ ,

b−a b−a

the last result following from the boundary conditions y(a) = A and y(b) = B.

This is the same solution as given in equation 3.6. Thus, for this class of functional,

the stationary function is always a straight line, independent of the form of the inte-

grand, although its nature can sometimes depend upon the boundary conditions, see

for instance exercise 3.18 (page 117).

The exceptional example is when F (z) is linear, in which case the value of S[y]

depends only upon the end points and not the values of y(x) in between, as shown in

the following exercise.

Exercise 3.6

If F (z) = Cz + D, where C and D are constants, by showing that the value of

Rb

the functional S[y] = a dx F (y 0 ) is independent of the chosen path, deduce that

equation 3.12 does not imply that y 0 (x) = constant.

What is the effect of making either, or both C and D a function of x?

3.3. TWO GENERALISATIONS 101

Now consider the slightly more general functional

Z b

S[y] = dx F (x, y 0 ), y(a) = A, y(b) = B, (3.13)

a

where the integrand F (x, y 0 ) depends explicitly upon the two variables x and y 0 . The

difference in the value of the functional along adjacent paths is

Z b

dx F (x, y 0 + h0 ) − F (x, y 0 ) .

S[y + h] − S[y] = (3.14)

a

In this example F (x, z) is a function of two variables and we require the expansion

∂F

F (x, z + u) = F (x, z) + u + O(2 )

∂z

where Taylor’s series for functions of two variables is used. Comparing this with the

expression in equation 3.9 we see that the only difference is that the derivative with

respect to y 0 has been replaced by a partial derivative. As before, replacing z by y 0 (x)

and u by h0 (x), equation 3.14 becomes

Z b

∂

S[y + h] − S[y] = dx h0 (x) 0 F (x, y 0 ) + O(2 ). (3.15)

a ∂y

If y(x) is the stationary path it is necessary that

Z b

∂

dx h0 (x) 0 F (x, y 0 ) = 0 for all h(x).

a ∂y

As before a sufficient condition for this is that Fy0 (x, y 0 ) = constant, which gives the

following differential equation for y(x),

∂

F (x, y 0 ) = c, y(a) = A, y(b) = B, (3.16)

∂y 0

where c is a constant. This is the equivalent of equation 3.12, but now the explicit

presence of x in the equation means that y 0 (x) = constant is not a solution.

Exercise 3.7

Consider the functional

Z 1 p

S[y] = dx 1 + x + y 0 2 , y(0) = A, y(1) = B.

0

y 0 (x) = c 1 + x + y 0 (x)2 ,

p

solve this equation to show that

(B − A) “ ”

y(x) = A + 3/2 (1 + x)3/2 − 1 .

(2 − 1)

102 CHAPTER 3. THE CALCULUS OF VARIATIONS

3.4 Notation

In the previous sections we used the notation F (y 0 ) to denote a function of the derivative

of y(x) and proceeded to treat y 0 as an independent variable, so that the expression

dF/dy 0 had the meaning defined in equation 3.10. This notation and its generalisation

are very important in subsequent analysis; it is therefore essential that you are familiar

with it and can use it. √

Consider a function F (x, u, v) of three variables, for instance F = x u2 + v 2 , and

assume that all necessary partial derivatives of F (x, u, v) exist. If y(x) is a function of

x we may form a function of x with the substitutions u → y(x), v → y 0 (x), thus

Rb

single variable x, as when evaluating the integral a dx F (x, y(x), y 0 (x)), or as a function

of three independent variables (x, y, y 0 ). In the latter case the first partial derivatives

with respect to y and y 0 are just

∂F ∂F ∂F ∂F

= and = .

∂y ∂u u=y,v=y0 ∂y 0 ∂v u=y,v=y0

Because y depends upon x we may also form the total derivative of F (x, y, y 0 ) with

respect to x using the chain rule, equation 1.22 (page 27)

dF ∂F ∂F 0 ∂F

= + y (x) + 0 y 00 (x). (3.17)

dx ∂x ∂y ∂y

√

In the particular case F (x, u, v) = x u2 + v 2 these rules give

∂F p ∂F xy ∂F xy 0

= y2 + y0 2 , =p , = .

∂y 0

p

∂x ∂y y + y0 2

2 y2 + y0 2

∂ 2F ∂ 2 F ∂2F ∂ 2 F ∂2F ∂ 2 F

= , = and = .

∂y 2 ∂u2 u=y,v=y0 ∂y 0 2 ∂v 2 u=y,v=y0 ∂y∂y 0 ∂u∂v u=y,v=y0

Because you must be able to use this notation we suggest that you do all the following

exercises before proceeding.

Exercise 3.8 „ «

p ∂F ∂F ∂F dF d ∂F

If F (x, y 0 ) = x2 + y 0 2 find , , , and . Also, show that,

∂x ∂y ∂y 0 dx dx ∂y 0

„ « „ «

d ∂F ∂ dF

= .

dx ∂y 0 ∂y 0 dx

3.4. NOTATION 103

Exercise 3.9

Show that for an arbitrary differentiable function F (x, y, y 0 )

∂ 2 F 00 ∂2F 0 ∂2F

„ «

d ∂F

= 2

y + y + .

dx ∂y 0 ∂y 0 ∂y∂y 0 ∂x∂y 0

d ∂F ∂ dF

6= ,

dx ∂y 0 ∂y 0 dx

with equality only if F does not depend explicitly upon y.

Exercise 3.10

Use the first identity found in exercise 3.9 to show that the equation

„ «

d ∂F ∂F

− =0

dx ∂y 0 ∂y

∂ 2 F 00 ∂2F 0 ∂2F ∂F

0 2

y + 0

y + − = 0.

∂y ∂y∂y ∂x∂y 0 ∂y

Note the first equation will later be seen as crucial to the general theory described

in chapter 4. The fact that it is a second-order differential equation means that

unique solutions can be obtained only if two initial or two boundary conditions

are given. Note also that the coefficient of y 00 (x), ∂ 2 F/∂y 0 2 , is very important in

the general theory of the existence of solutions of this type of equation.

Exercise 3.11

p ∂F ∂F ∂ 2 F

(a) If F (y, y 0 ) = y 1 + y 0 2 find , , and show that the equation

∂y ∂y 0 ∂y 0 2

«2

d2 y

„ « „

d ∂F ∂F dy

− =0 becomes y −1− =0

dx ∂y 0 ∂y dx2 dx

„ « „ „ 0« «

d ∂F ∂F ´−3/2 d y

= 1 + y0 2 y2

`

− −1 .

dx ∂y 0 ∂y dx y

«2

d2 y

„

dy 1

y −1− =0 is y= cosh(Ax + B),

dx2 dx A

for some constants A and B. Hint, let y be the independent variable and define a

new variable z by the equation yz(y) = dy/dx to obtain an expression for dy/dx

that can be integrated.

104 CHAPTER 3. THE CALCULUS OF VARIATIONS

In this section we describe a variety of problems that can be formulated in terms of

functionals, with solutions that are stationary paths of these functionals. This list is

provided because it is likely that you will not be familiar with these descriptions and

will be unaware of the wide variety of problems for which variational principles are

useful, and sometimes essential. You should not spend long on this section if time is

short; in this case you you should aim at obtaining a rough overview of the examples.

Indeed, you may move directly to chapter 4 and return to this section at a later date,

if necessary.

In each of the following sub-sections a different problem is described and the relevant

functional is written down; some of these are derived later. In compiling this list one

aim has been to describe a reasonably wide range of applications: if you are unfamiliar

with the underlying physical ideas behind any of these examples, do not worry because

they are not an assessed part of the course. Another aim is to show that there are

subtly different types of variational problems, for instance the isoperimetric and the

catenary problems, described in sections 3.5.5 and 3.5.6 respectively.

Given two points Pa = (a, A) and Pb = (b, B) in the same vertical plane, as in the

diagram below, we require the shape of the smooth wire joining Pa to Pb such that a

bead sliding on the wire under gravity, with no friction, and starting at Pa with a given

speed shall reach Pb in the shortest possible time.

y

x

Pa

Pb

Figure 3.3 The curved line joining Pa to Pb is

a segment of a cycloid. In this diagram the axes

are chosen to give a = A = 0.

The name given to this curve is the brachistochrone, from the Greek, brachistos, shortest,

and chronos, time.

If the y-axis is vertical it can be shown that the time taken along the curve y(x) is

s

b

1 + y0 2

Z

T [y] = dx , y(a) = A, y(b) = B,

a C − 2gy

where g is the acceleration due to gravity and C a constant depending upon the initial

speed of the particle. This expression is derived in section 5.2.

3.5. EXAMPLES OF FUNCTIONALS 105

This problem was first considered by Galileo (1564 – 1642) in his 1638 work Two

New Sciences, but lacking the necessary mathematical methods he concluded, erro-

neously, that the solution is the arc of a circle passing vertically through Pa ; exercise 5.4

(page 150) gives part of the reason for this error.

It was John Bernoulli (1667 – 1748), however, who made the problem famous when in

June 1696 he challenged the mathematical world to solve it. He followed his statement

of the problem by a paragraph reassuring readers that the problem was very useful in

mechanics, that it is not the straight line through Pa and Pb and that the curve is well

known to geometers. He also stated that he would show that this is so at the end of

the year provided no one else had.

In December 1696 Bernoulli extended the time limit to Easter 1697, though by this

time he was in possession of Leibniz’s solution, sent in a letter dated 16 th June 1696,

Leibniz having received notification of the problem on 9 th June. Newton also solved

the problem quickly: apparently2 the letter from Bernoulli arrived at Newton’s house,

in London, on 29 th January 1697 at the time when Newton was Warden of the Mint.

He returned from the Mint at 4pm, set to work on the problems and had solved it by

the early hours of the next morning. The solution was returned anonymously, to no

avail with Bernoulli stating upon receipt “The lion is recognised by his paw”. Further

details of this history and details of these solutions may be found in Goldstine (1980,

chapter 1).

The curve giving this shortest time is a segment of a cycloid, which is the curve traced

out by a point fixed on the circumference of a vertical circle rolling, without slipping,

along a straight line. The parametric equations of the cycloid shown in figure 3.3 are

where a is the radius of the circle: these equations are derived in section 5.2.1, where

other properties of the cycloid are discussed.

Other historically important names are the isochronous curve and the tautochrone.

A tautochrone is a curve such that a particle travelling along it under gravity reaches

a fixed point in a time independent of its starting point; a cycloid is a tautochrone

and a brachistochrone. Isochronal means “equal times” so isochronous curves and

tautochrones are the same.

There are many variations of the brachistochrone problem. Euler3 considered the

effect of resistance proportional to v 2n , where v is the speed and n an integer. The

problem of a wire with friction, however, was not considered until 19754. Both these

extensions require the use of Lagrange multipliers and are described in chapter 11.

Another variation was introduced by Lagrange5 who allowed the end point, Pb in fig-

ure 3.3, to lie on a given surface and this introduces different boundary conditions that

the cycloid needs to satisfy: the simpler variant in which the motion remains in the

plane and one or both end points lie on given curves is treated in chapter 10.

2 This anecdote is from the records of Catherine Conduitt, née Barton, Newton’s niece who acted as

his housekeeper in London, see Newton’s Apple by P Aughton, (Weidenfeld and Nicolson), page 201.

3 Chapter 3 of his 1744 opus, The Method of Finding Plane Curves that Show Some Property of

Maximum or Minimum. . . .

4 Ashby A, Brittin W E, Love W F and Wyss W, Brachistochrone with Coulomb Friction, Amer J

Physics 43 902-5.

5 Essay on a new method. . . , published in Vol II of the Miscellanea Taurinensai, the memoirs of

106 CHAPTER 3. THE CALCULUS OF VARIATIONS

Here the problem is to find a curve y(x) passing through two given points Pa = (a, A)

and Pb = (b, B), with A ≥ 0 and B > 0, as shown in the diagram, such that when

rotated about the x-axis the area of the curved surface formed is a minimum.

y (b,B)

(a,A)

B

A x

a b

duced when a curve y(x), joining (a, A) to (b, B), is rotated

about the x-axis.

Z b p

S[y] = 2π dx y(x) 1 + y 0 2 ,

a

and we shall see that this problem has solutions that can be expressed in terms of

differentiable functions only for certain combinations of A, B and b − a.

Newton formulated one of the first problems to involve the ideas of the Calculus of

Variations. Newton’s problem is to determine the shape of a solid of revolution with

the least resistance to its motion along its axis through a stationary fluid.

Newton was interested in the problem of fluid resistance and performed many exper-

iments aimed at determining its dependence on various parameters, such as the velocity

through the fluid. These experiments were described in Book II of Principia (1687) 6 ;

an account of Newton’s ideas is given by Smith (2000)7 . It is to Newton that we owe

the idea of the drag coefficient, CD , a dimensionless number allowing the force on a

body moving through a fluid to be written in the form

1

FR = CD ρAf v 2 , (3.18)

2

where Af is the frontal area of the body, ρ the fluid density8 , v = |v| where v is the

relative velocity of the body and the fluid. For modern cars CD has values between

about 0.30 and 0.45, with frontal areas of about 30 ft2 (about 2.8m2 ).

6 The full title is Philopsophiae naturalis Principia Mathematica, (Mathematical Principles of nat-

ural Philosophy.

7 Smith G E Fluid Resistance: Why Did Newton Change His Mind?, in The Foundations of New-

tonian Scholarship.

8 Note that this suggests that the 30◦ C change in temperature between summer and winter changes

3.5. EXAMPLES OF FUNCTIONALS 107

a) those imposed on the front of the body which oppose the motion, and

b) those at the back of the body resulting from the disturbance of the fluid and which

may be in either direction.

He also considered two types of fluid:

a) rarefied fluids comprising non-interacting particles spread out in space, such as a gas,

and

b) continuous fluids, comprising particles packed together so that each is in contact

with its neighbours, such as a liquid.

The ideas sketched below are most relevant to rarefied fluids and ignore the second

type of force. They were used by Newton in 1687 to derive a functional, equation 3.21

below, for which the stationary path yields, in theory, a surface of minimum resistance.

This solution does not, however, agree with observation largely because the physical

assumptions made are too simple. Moreover, the functional has no continuously dif-

ferentiable paths that can satisfy the boundary conditions, although stationary paths

with one discontinuity in the derivative exist; but, Weierstrass (1815 – 1897) showed

that this path does not yield a strong minimum. These details are discussed further in

section 10.6. Nevertheless, the general problem is important and Newton’s approach,

and the subsequent variants, are of historical and mathematical importance: we shall

mention a few of these variants after describing the basic problem.

It is worth noting that the problem of fluid resistance is difficult and was not properly

understood until the early part of the 20 th century. In 1752 d’Alembert, (1717 – 1783),

published a paper, Essay on a New theory of the resistance of Fluids, in which he derived

the partial differential equations describing the motion of an ideal, incompressible invis-

cid fluid; the solution of these equations showed that resisting force was zero, regardless

of the shape of the body: this was in contradiction to observations and was hence-

forth known as d’Alembert’s paradox. It was not resolved until Prandtl (1875 – 1953)

developed the theory of boundary layers in 1904. This shows how fluids of relatively

small viscosity, such as water or air, may be treated mathematically by taking account

of friction only in the region where essential, namely in the thin layer that exists in

the neighbourhood of the solid body. This concept was introduced in 1904, but many

decades passed before its ramifications were understood: an account of these ideas can

be found in Schlichting (1955)9 and a modern account of d’Alembert’s paradox can be

found in Landau and Lifshitz (1959)10 . An effect of the boundary layer, and also turbu-

lence, is that the drag coefficient, defined in equation 3.18, becomes speed dependent;

thus for a smooth sphere in air it varies between 0.07 and 0.5, approximately.

We now return to the main problem, which is to determine a functional for the

fluid resistance. In deriving this it is necessary to make some assumptions about the

resistance and this, it transpires, is why the stationary path is not a minimum. The

main result is given by equation 3.21, and you may ignore the derivation if you wish.

It is assumed that the resistance is proportional to the square of the velocity. To

see why, consider a small plane area moving through a fluid comprising many isolated

stationary particles, with density ρ: the area of the plane is δA and it is moving with

velocity v along its normal, as seen in the left-hand side of figure 3.5.

In order to derive a simple formula for the force on the area δA it is helpful to

9 Schlichting H Boundary Layer Theory (McGraw-Hill, New York).

10 Landau L D and Lifshitz E M Fluid mechanics (Pergamon).

108 CHAPTER 3. THE CALCULUS OF VARIATIONS

imagine the fluid as comprising many particles, each of mass m and all stationary. If

there are N particles per unit volume, the density is ρ = mN . In the small time δt the

area δA sweeps through a volume vδtδA, so N vδtδA particles collide with the area, as

shown schematically on the left-hand side of figure 3.5.

N

vδt ψ

ψ

O

v δΑ ψ

Figure 3.5 Diagram showing the motion of a small area, δA, through a rar-

efied gas. On the left-hand side the normal to the area is perpendicular to the

relative velocity; on the right-hand side the area is at an angle. The direction

of the arrows is in the direction of the gas velocity relative to the area.

For an elastic collision between a very large mass (that of which δA is the small surface

element) with velocity v, and a small initially stationary mass, m, the momentum

change of the light particle is 2mv — you may check this by doing exercise 3.23,

although this is not part of the course. Thus in a time δt the total momentum transfer

is in the opposite direction to v, ∆P = (2mv) × (N vδtδA). Newton’s law equates force

with the rate of change of momentum, so the force on the area opposing the motion is,

since ρ = mN ,

∆P

δF = = 2ρv 2 δA. (3.19)

δt

Equation 3.19 is a justification for the v 2 -law. If the normal, ON , to the area δA is at

an angle ψ to the velocity, as in the right-hand side side of figure 3.5, where the arrows

denote the fluid velocity relative to the body, then the formula 3.19 is modified in two

ways. First, the significant area is the projection of δA onto v, so δA → δA cos ψ.

Second, the fluid particles are elastically scattered through an angle 2ψ (because the

angle of incidence equals the angle of reflection), so the momentum transfer along the

direction of travel is v(1 + cos 2ψ) = 2v cos2 ψ: hence 2v → 2v cos2 ψ, and the force

in the direction (−v) is δF = 2ρv 2 cos3 ψ δA. We now apply this formula to find the

force on a surface of revolution. We define Oy to be the axis: consider a segment CD

of the curve in the Oxy-plane, with normal P N at an angle ψ to Oy, as shown in the

left-hand panel of figure 3.6.

3.5. EXAMPLES OF FUNCTIONALS 109

y y

N δs

ψ

ψ A C

C

P D D

δx δx

x x

O

b

Figure 3.6 Diagram showing change in velocity of a particle colliding with the

element CD, on the left, and the whole curve which is rotated about the y-axis,

on the right.

The force on the ring formed by rotating the segment CD about Oy is, because of axial

symmetry, in the y-direction. The area of the ring is 2πxδs, where δs is the length of

the element CD, so the magnitude of the force opposing the motion is

The total force on the curve in figure 3.6 is obtained by integrating from x = 0 to x = b,

and is given by the functional,

Z x=b

2

F [y] = 4πρv ds x cos3 ψ, y(0) = A, y(b) = 0. (3.20)

x=0

b

F [y] x

Z

= dx , y(0) = A, y(b) = 0. (3.21)

4πρv 2 0 1 + y0 2

For a disc of area Af , y 0 (x) = 0, and this reduces to F = 2Af ρv 2 , giving a drag

coefficient CD = 4, which compares with the measured value of about 1.3. Newton’s

problem is to find the path making this functional a minimum and this is solved in

section 10.6.

Exercise 3.12

Use the definition of the drag coefficient, equation 3.18, to show that, according

to the theory described here,

Z b

8 x

CD = dx .

b2 0 1 + y0 2

value of the drag coefficient for the motion of a sphere in air varies between 0.07

and 0.5, depending on its speed.

Variations of this problem were considered by Newton: one is the curve CBD, shown

in figure 3.7, rotated about Oy.

110 CHAPTER 3. THE CALCULUS OF VARIATIONS

y

B

A C

D x

O a b

Figure 3.7 Diagram showing the modified geometry considered by Newton.

Here the variable a is an unkown, the line CB is parallel to the x-axis and

the coordinates of C are (0, A).

In this problem the position D is fixed, but the position of B is not; it is merely

constrained to be on the line y = A, parallel to Ox. The resisting force is now given by

the functional

Z b

F1 [y] 1 2 x

= a + dx , y(a) = A, y(b) = 0. (3.22)

4πρv 2 2 a 1 + y0 2

Now the path y(x) and the number a are to be chosen to make the functional stationary.

Problems such as this, where the position of one (or both) of the end points are

also to be determined are known as variable end point problems and are dealt with in

chapter 10.

Given a river with straight, parallel banks a distance b apart y

and a boat that can travel with constant speed c in still water, v(x)

the problem is to cross the river in the shortest time, starting

and landing at given points.

If the y-axis is chosen to be the left bank, the starting point y(x)

B

to be the origin, O, and the water is assumed to be moving

parallel to the banks with speed v(x), a known function of the x

distance from the left-hand bank, then the time of passage O x=b

along the path y(x) is, assuming c > max(v(x)),

Z b p

c2 (1 + y 0 2 ) − v(x)2 − v(x)y 0

T [y] = dx , y(0) = 0, y(b) = B,

0 c2 − v(x)2

where the final destination is a distance B along the right-hand bank. The derivation of

this result is set in exercise 3.22, one of the harder exercises at the end of this chapter.

A variation of this problem is obtained by not defining the terminus, so there is only

one boundary condition, y(0) = 0, and then we need to find both the path, y(x) and

the terminal point. It transpires that this is an easier problem and that the path is the

solution of y 0 (x) = v(x)/c, as is shown in exercise 10.7 (page 262).

Among all curves, y(x), represented by functions with continuous derivatives, that join

the two points Pa and Pb in the plane and have given length L[y], determine that which

3.5. EXAMPLES OF FUNCTIONALS 111

y

Pb

B

L[ y]

Pa

A S [ y]

x

a b

Figure 3.8 Diagram showing the area, S[y], under a

curve of given length joining Pa to Pb .

This is a classic problem discussed by Pappus of Alexandria in about 300 AD. Pappus

showed, in Book V of his collection, that of two regular polygons having equal perimeters

the one with the greater number of sides has the greater area. In the same book he

demonstrates that for a given perimeter the circle has a greater area than does any

regular polygon. This work seems to follow closely the earlier work of Zenodorus (circa

180 BC): extant fragments of his work include a proposition that of all solid figures, the

surface areas of which are equal, the sphere has the greatest volume.

Returning to figure 3.8, a modern analytic treatment of the problem requires a

differentiable function y(x) satisfying y(a) = A, y(b) = B, such that the area,

Z b

S[y] = dx y

a

Z b p

L[y] = dx 1 + y0 2,

a

This problem differs from the first three because an additional constraint — the

length of the curve — is imposed. We consider this type of problem in chapter 12.

hanging between supports at both ends. In figure 3.9 we show an example of such a

curve when the points of support, (−a, A) and (a, A), are at the same height.

112 CHAPTER 3. THE CALCULUS OF VARIATIONS

y

(-a,A) (a,A)

A

x

-a a

Figure 3.9 Diagram showing the catenary formed by

a uniform chain hanging between two points at the

same height.

If the lowest point of the chain is taken as the origin, the catenary equation is shown

in section 12.2.3 to be x

y = c cosh −1 (3.23)

c

for some constant c determined by the length of the chain and the value of a.

If a curve is described by a differentiable function y(x) it can be shown, see exer-

cise 3.19, that the potential energy E of the chain is proportional to the functional

Z a p

S[y] = dx y 1 + y 0 2 .

−a

The curve

p that minimises this functional, subject to the length of the chain L[y] =

Ra

−a dx 1 + y 0 2 remaining constant, is the shape assumed by the hanging chain. In

common with the previous example, the catenary problem involves a constraint — again

the length of the chain — and is dealt with using the methods described in chapter 12.

Light and other forms of electromagnetic radiation are wave phenomena. However, in

many common circumstances light may be considered to travel along lines joining the

source to the observer: these lines are named rays and are often straight lines. This is

why most shadows have distinct edges and why eclipses of the Sun are so spectacular.

In a vacuum, and normally in air, these rays are straight lines and the speed of light in

a vacuum is c ' 2.9 × 1010 cm/sec, independent of its colour. In other uniform media,

for example water, the rays also travel in straight lines, but the speed is different: if

the speed of light in a uniform medium is cm then the refractive index is defined to be

the ratio n = c/cm . The refractive index usually depends on the wave length: thus for

water it is 1.333 for red light (wave length 6.50×10−5 cm) and 1.343 for blue light (wave

length 7.5 × 10−5 cm); this difference in the refractive index is one cause of rainbows.

In non-uniform media, in which the refractive index depends upon position, light rays

follow curved paths. Mirages are one consequence of a position-dependent refractive

index.

A simple example of the ray description of light is the reflection of light in a plane

mirror. In diagram 3.10 the source is S and the light ray is reflected from the mirror

3.5. EXAMPLES OF FUNCTIONALS 113

at R to the observer at O. The plane of the mirror is perpendicular to the page and it

is assumed that the plane SRO is in the page.

S θ1 θ2 h2

h1

A R B

an observer O, via a reflection at R. The angles of incidence and of

reflection are defined to be θ1 and θ2 , respectively.

It is known that light travels in straight lines and is reflected from the mirror at a

point R as shown in the diagram. But without further information the position of R is

unknown. Observations, however, show that the angle of incidence, θ1 , and the angle

of reflection, θ2 , are equal. This law of reflection was known to Euclid (circa 300 BC)

and Aristotle (384 – 322 BC); but it was Hero of Alexandria (circa 125 BC) who showed

by geometric argument that the equality of the angles of incidence and reflection is a

consequence of the Aristotelean principle that nature does nothing the hard way; that

is, if light is to travel from the source S to the observer O via a reflection in the mirror

then it travels along the shortest path.

This result was generalised by the French mathematician Fermat (1601 – 1665) into

what is now known as Fermat’s principle which states that the path taken by light rays

is that which minimises the time of passage11. For the mirror, because the speed along

SR and RO is the same this means that the distance along SR plus RO is a minimum.

If AB = d and AR = x, the total distance travelled by the light ray depends only upon

x and is q q

f (x) = x2 + h21 + (d − x)2 + h22 .

This function has a minimum when θ1 = θ2 , that is when the angle of incidence, θ1 ,

equals the angle of reflection, θ2 , see exercise 3.14.

In general, for light moving in the Oxy-plane, in a medium with refractive index

n(x, y), with the source at the origin and observer at (a, A) the time of passage, T ,

along an arbitrary path y(x) joining these points is

1 a

Z p

T [y] = dx n(x, y) 1 + y 0 2 , y(0) = 0, y(a) = A.

c 0

This follows

p because the time taken to travel along an element of length δs is n(x, y)δs/c

and δs = 1 + y 0 (x)2 δx. If the refractive index, n(x, y), is constant then this integral

reduces to the integral 3.1 and the path of a ray is a straight line, as would be expected.

11 Fermat’s original statement was that light travelling between two points seeks a path such that the

number of waves is equal, as a first approximation, to that in a neighbouring path. This formulation

has the form of a variational principle, which is remarkable because Fermat announced this result in

1658, before the calculus of either Newton or Leibniz was developed.

114 CHAPTER 3. THE CALCULUS OF VARIATIONS

Fermat’s principle can be used to show that for light reflected at a mirror the angle

of incidence equals the angle of reflection. For light crossing the boundary between two

media it gives Snell’s law,

sin α1 c1

= ,

sin α2 c2

where α1 and α2 are the angles between the ray and the normal to the boundary and

ck is the speed of light in the media, as shown in figure 3.11: in water the speed of light

is approximately c2 = c1 /1.3, where c1 is the speed of light in air, so 1.3 sin α2 = sin α1 .

O Air

α1

N

S’

Water α2

S

Figure 3.11 Diagram showing the refraction of light at the surface of wa-

ter. The angles of incidence and refraction are defined to be α2 and α1

respectively; these are connected by Snell’s law.

In figure 3.11 the observer at O sees an object S in a pond and the light ray from S

to O travels along the two straight lines SN and N O, but the observer perceives the

object to be at S 0 , on the straight line OS 0 . This explains why a stick put partly into

water appears bent.

Newton’s laws of motion accurately describe a significant portion of the physical world,

from the motion of large molecules to the motion of galaxies. However, Newton’s

original formulation is usually difficult to apply to even quite simple mechanical systems

and hides the mathematical structure of the equations of motion, which is important

for the advanced developments in dynamics and for finding approximate solutions. It

transpires that in many important circumstances Newton’s equations of motion can be

expressed as a variational principle the solution of which is the equations of motion.

This reformulation took some years to accomplish and was originally motivated partly

by Snell’s law and Fermat’s principle, that minimises the time of passage, and partly

by the ancient philosophical belief in the “Economy of Nature”; for a brief overview of

these ideas the introduction of the book by Yourgrau and Mandelstam (1968) should

be consulted.

The first variational principle for dynamics was formulated in 1744 by Maupertuis

(1698 – 1759), but in the same year Euler (1707 – 1783) described the same principle

more precisely. In 1760 Lagrange (1736 – 1813) clarified these ideas, by first reformu-

lating Newton’s equations of motion into a form now known as Lagrange’s equations of

motion: these are equivalent to Newton’s equations but easier to use because the form

of the equations is independent of the coordinate system used — this basic property

3.5. EXAMPLES OF FUNCTIONALS 115

of variational principles is discussed in chapter 6 — and this allows easier use of more

general coordinate systems.

The next major step was taken by Hamilton (1805 – 1865), in 1834, who cast La-

grange’s equations as a variational principle; confusingly, we now name this Lagrange’s

variational principle. Hamilton also generalised this theory to lay the foundations for

the development of modern physics that occurred in the early part of the 20 th century.

These developments are important because they provide a coordinate-free formulation

of dynamics which emphasises the underlying mathematical structure of the equations

of motion, which is important in helping to understand how solutions behave.

Summary

These few examples provide some idea of the significance of variational principles. In

summary, they are important for three distinct reasons

• A variational principle is often the easiest or the only method of formulating a

problem.

• Often conventional boundary value problems may be re-formulated in terms of a

variational principle which provides a powerful tool for approximating solutions.

This technique is introduced in chapter 13.

• A variational formulation provides a coordinate free method of expressing the

laws of dynamics, allowing powerful analytic techniques to be used in ordinary

Newtonian dynamics. The use of variational principles also paved the way for

the formulation of dynamical laws describing motion of objects moving at speeds

close to that of light (special relativity), particles interacting through gravita-

tional forces (general relativity) and the laws of the microscopic world (quantum

mechanics).

116 CHAPTER 3. THE CALCULUS OF VARIATIONS

Exercise 3.13

Functionals do not need to have the particular form considered in this chapter.

The following expressions also map functions to real numbers:

Z 1 h i

(b) K[y] = dx a(x) y(x) + y(1)y 0 (x) ;

0

h i1 Z 1 h i

(c) L[y] = xy(x)y 0(x) + dx a(x)y 0 (x) + b(x)y(x) , where a(x) and b(x)

0 0

are prescribed functions;

Z 1 Z 1

dt s2 + st y(s)y(t).

` ´

(d) S[y] = ds

0 0

Find the values of these functionals for the functions y(x) = x2 and y(x) = cos πx

when a(x) = x and b(x) = 1.

Exercise 3.14

Show that the function

q q

f (x) = x2 + h21 + (d − x)2 + h22 ,

where h1 , h2 are defined in figure 3.10 (page 113) and x and d denote the lengths

AR and AB respectively, is stationary when θ1 = θ2 where

x d−x

sin θ1 = p , sin θ2 = p .

x2 + h21 (d − x)2 + h22

Exercise 3.15

Consider the functional

Z 1

dx y 0 1 + y 0 ,

p

S[y] = y(0) = 0, y(1) = B > −1.

0

√ line y(x) = Bx and that the

value of the functional on this line is S[y] = B 1 + B.

(b) By expanding the integrand of S[y + h] to second order in , show that

1

(4 + 3B)2

Z

S[y + h] = S[y] + dx h0 (x)2 , B > −1,

8(1 + B)3/2 0

3.6. MISCELLANEOUS EXERCISES 117

Exercise 3.16

Using the method described in the text, show that the functionals

Z b Z b

dx 1 + xy 0 y 0 and S2 [y] = dx xy 0 2 ,

` ´

S1 [y] =

a a

where b > a > 0, y(b) = B and y(a) = A are both stationary on the same curve,

namely

ln(x/a)

y(x) = A + (B − A) .

ln(b/a)

Explain why the same function makes both functionals stationary.

Exercise 3.17

In this exercise the theory developed in section 3.3.1 is extended. The function

F (z) has a continuous second derivative and the functional S is defined by the

integral Z b

S[y] = dx F (y 0 ).

a

(a) Show that

b b

d2 F 0 2

Z Z

dF 0 1

S[y + h] − S[y] = dx h (x) + 2 dx h (x) + O(3 ),

a dy 0 2 a dy 0 2

where h(a) = h(b) = 0.

(b) Show that if y(x) is chosen to make dF/dy 0 constant then the functional is

stationary.

(c) Deduce that this stationary path makes the functional either a maximum or a

minimum, provided F 00 (y 0 ) 6= 0.

Exercise 3.18

Show that the functional

Z 1

´1/4

dx 1 + y 0 (x)2

`

S[y] = , y(0) = 0, y(1) = B,

0

In addition, √

show that this straight line gives a minimum value of the functional

only if B < 2, otherwise it gives a maximum.

Harder exercises

Exercise 3.19

If a uniform, flexible, inextensible chain of length L is suspended between two

supports having the coordinates (a, A) and (b, B), with the y-axis pointing verti-

cally upwards, show that, if the shape assumed by the chain Ris described by the

b

p

differentiable function y(x), then its length is given by L[y] = a dx 1 + y 0 2 and

its potential energy by

Z b p

E[y] = gρ dx y 1 + y 0 2 , y(a) = A, y(b) = B,

a

where ρ is the line-density of the chain and g the acceleration due to gravity.

118 CHAPTER 3. THE CALCULUS OF VARIATIONS

Exercise 3.20

This question is about the shortest distance between two points on the surface of a

right-circular cylinder, so is a generalisation of the theory developed in section 3.2.

(a) If the cylinder axis coincides with the z-axis we may use the polar coordinates

(ρ, φ, z) to label points on the cylindrical surface, where ρ is the cylinder radius.

Show that the Cartesian coordinates of a point (x, y) are given by x = ρ cos φ, y =

ρ sin φ and hence that the distance between two adjacent points on the cylinder,

(ρ, φ, z) and (ρ, φ + δφ, z + δz) is, to first order, given by δs2 = ρ2 δφ2 + δz 2 .

(b) A curve on the surface may be defined by prescribing z as a function of φ.

Show that the length of a curve from φ = φ1 to φ2 is

Z φ2 p

L[z] = dφ ρ2 + z 0 (φ)2 .

φ1

(c) Deduce that the shortest distance on the cylinder between the two points

(ρ, 0, 0) and (ρ, α, ζ) is along the curve z = ζφ/α.

Exercise 3.21

An inverted cone has its apex at the origin and axis along the z-axis. Let α be

the angle between this axis and the sides of the cone, and define a point on the

conical surface by the coordinates (ρ, φ), where ρ is the perpendicular distance to

the z-axis and φ is the polar angle measured from the x-axis.

Show that the distance on the cone between adjacent points (ρ, φ) and (ρ + δρ, φ +

δφ) is, to first order,

δρ2

δs2 = ρ2 δφ2 + .

sin2 α

Hence show that if ρ(φ), φ1 ≤ φ ≤ φ2 , is a curve on the conical surface then its

length is r

Z φ2

ρ0 2

L[ρ] = dφ ρ2 + 2

.

φ1 sin α

Exercise 3.22

A straight river of uniform width a flows with velocity (0, v(x)), where the axes

are chosen so the left-hand bank is the y-axis and where v(x) > 0. A boat can

travel with constant speed c > max(v(x)) relative to still water. If the starting

and landing points are chosen to be the origin and (b, B), respectively, show that

the path giving the shortest time of crossing is given by minimising the functional

Z b p

T [y] = dx , y(0) = 0, y(b) = B.

0 c2 − v(x)2

Exercise 3.23

In this exercise the basic dynamics required for the derivation of the minimum

resistance functional, equation 3.21, is derived. This exercise is optional, because it

requires knowledge of elementary mechanics which is not part of, or a prerequisite

of, this course.

Consider a block of mass M sliding smoothly on a plane, the cross section of which

is shown in figure 3.12.

3.6. MISCELLANEOUS EXERCISES 119

V’ v’ After collision

V v Before collision

M

m

particle before and after the collision.

The block is moving from left to right, with speed V , towards a small particle of

mass m moving with speed v, such that initially the distance between the particle

and the block is decreasing. Suppose that after the inevitable collision the block

is moving with speed V 0 , in the same direction, and the particle is moving with

speed v 0 to the right. Use conservation of energy and linear momentum to show

that (V 0 , v 0 ) are related to (V, v) by the equations

M V 2 + mv 2 = M V 0 2 + mv 0 2 and M V − mv = M V 0 + mv 0 .

2m 2M V + (M − m)v

V0 = V − (V + v) and v0 = .

M +m M +m

Show that in the limit m/M → 0, V 0 = V and v 0 = 2V + v and give a physical

interpretation of these equations.

120 CHAPTER 3. THE CALCULUS OF VARIATIONS

Chapter 4

4.1 Introduction

In this chapter we apply the methods introduced in section 3.2 to more general problems

and derive the most important result of the Calculus of Variations. We show that for

the functional Z b

S[y] = dx F (x, y, y 0 ), y(a) = A, y(b) = B, (4.1)

a

where F (x, u, v) is a real function of three real variables, a necessary and sufficient

condition for the twice differentiable function y(x) to be a stationary path is that it

satisfies the equation

d ∂F ∂F

0

− = 0 and the boundary conditions y(a) = A, y(b) = B. (4.2)

dx ∂y ∂y

This equation is known either as Euler’s equation or the Euler-Lagrange equation, and is

a second-order equation for y(x), exercise 3.10 (page 103). Conditions for a stationary

path to give either a local maximum or a local minimum are more difficult to find and

we defer a discussion of this problem to chapter 8.

In order to derive the Euler-Lagrange equation it is helpful to first discuss some

preliminary ideas. We start by briefly describing Euler’s original analysis, because

it provides an intuitive understanding of functionals and provides a link between the

calculus of functions of many variables and the Calculus of Variations. This leads

directly to the idea of the rate of change of a functional, which is required to define

a stationary path. This section is followed by the proof of the fundamental lemma of

the Calculus of Variations which is essential for the derivation of the Euler-Lagrange

equation, which follows.

The Euler-Lagrange equation is usually a nonlinear boundary value problem: this

combination causes severe difficulties, both theoretical and practical. First, solutions

may not exist and if they do uniqueness is not ensured: second, if solutions do exist

it is often difficult to compute them. These difficulties are in sharp contrast to initial

value problems and, because the differences are so marked, in section 4.5 we compare

these two types of equations in a little detail. Finally, in section 4.6, we show why the

limiting process used by Euler is subtle and can lead to difficulties.

121

122 CHAPTER 4. THE EULER-LAGRANGE EQUATION

4.2.1 Relation to differential calculus

Euler (1707 – 1783) was the first to make a systematic study of problems that can

be described by functionals, though it was Lagrange (1736 – 1813) who developed the

method we now use. Euler studied functionals having the form defined in equation 4.1.

He related these functionals to functions of many variables using the simple device of

dividing the abscissa into N + 1 equal intervals,

a = x0 , x1 , x2 , . . . xN , xN +1 = b, where xk+1 − xk = δ,

and replacing the curve y(x) with segments of straight lines with vertices

y

Pb

B

y(x)

Pa

A

x

a=x0 x1 x2 x3 x4 x5 b=x6

Figure 4.1 Diagram showing the rectification of a curve by a

series of six straight lines, N = 5.

Approximating the derivative at xk by the difference (yk − yk−1 )/δ the functional 4.1

is replaced by a function of the N variables (y1 , y2 , · · · , yN ),

N +1

X yk − yk−1 b−a

S(y1 , y2 , · · · , yN ) = δ F xk , y k , where δ = , (4.3)

δ N +1

k=1

variables can illuminate the nature of functionals and, if all else fails, it can be used

as the basis of a numerical approximation; examples of this procedure are given in

exercises 4.1 and 4.21. The integral 4.1 is obtained from this sum by taking the limit

N → ∞; similarly the Euler-Lagrange equation 4.2 may be derived by taking the same

limit of the N algebraic equations ∂S/∂yk , k = 1, 2, · · · , N , see exercise 4.30 (page 141).

In any mathematical analysis care is usually needed when such limits are taken and the

Calculus of Variations is no exception; however, here we discuss these problems only

briefly, in section 4.6.

Euler made extensive use of this method of finite differences. By replacing smooth

curves by polygonal lines he reduced the problem of finding stationary paths of func-

tionals to finding stationary points of a function of N variables: he then obtained exact

4.2. PRELIMINARY REMARKS 123

as functions of infinitely many variables — that is, the values of the function y(x) at

distinct points — and the Calculus of Variations may be regarded as the corresponding

analogue of differential calculus.

Exercise 4.1

If the functional depends only upon y 0 ,

Z b

S[y] = dx F (y 0 ), y(a) = A, y(b) = B,

a

„ «

y1 − A “y − y ”

2 1

“y − y

k k−1

”

S(y1 , y2 , · · · , yN ) = δ F +F + ··· + F +

δ δ δ

“y − y „ «ff

N N −1

” B − yN

··· +F +F .

δ δ

Hence show that a stationary point of S satisfies the equations

F 0 ((yk − yk−1 )/δ) = c, k = 1, 2, · · · , N + 1,

where c is a constant, independent of k. Deduce that, if F (z) is sufficiently smooth,

S(y1 , y2 , · · · , yN ) is stationary when the points (xk , y(xk )) lie on a straight line.

The stationary points of a function of n variables are where all n first partial derivatives

vanish. The stationary paths of a functional are defined in a similar manner and

the purpose of this section is to introduce the idea of the derivative of a functional

and to show how it may be calculated. First, however, it is necessary to make a few

preliminary remarks in order to emphasise the important differences between functionals

and functions of n variables: we return to these problems later.

In the study of functions of n variables, it is convenient to use geometric language

and to regard the set of n numbers (x1 , x2 , · · · , xn ) as a point in an n-dimensional

space. Similarly, we regard each function y(x), belonging to a given class of functions,

as a point in some function space.

For functions of n variables it is sufficient to consider a single space, for instance

the n-dimensional Euclidean space. But, there is no universal function space and the

nature of the problem determines the choice of function space. For instance, when

dealing with a functional of the form 4.1 it is natural to use the set of all functions with

a continuous first derivative. In the case of functionals of the form

Z b

dx F (x, y, y 0 , y 00 )

a

The concept of continuity of functions is important and you will recall, section 1.3.2,

that a function f (x) is continuous at x = c if the values of f (x) at neighbouring values

of c are close to f (c); more precisely we require that

lim f (c + ) = f (c).

→0

124 CHAPTER 4. THE EULER-LAGRANGE EQUATION

Remember that if the usual derivative of a function exists at any point x, it is continuous

at x.

The type of functional defined by equation 4.1 involves paths joining the points

(a, A) and (b, B) which are differentiable or piecewise differentiable for a ≤ x ≤ b.

In order to find a stationary path we need to compare values of the functional on

nearby paths; this means that a careful definition of the distance between nearby paths

(functions) is important. This is achieved most easily by using the notion of a norm of

a function. A norm defined on a function space is a map taking elements of the space

to the non-negative real numbers; it represents the ‘distance’ from an element to the

origin (zero function). It has the same properties as the Euclidean distance defined in

equation 1.2 (page 11).

In Rn the Euclidean distance suffices for most purposes. In infinite dimensional

function spaces there is no obvious choice of norm that can be used in all circumstances.

Use of different norms and the corresponding concepts of ‘distance’ can lead to different

classifications of stationary paths as is seen in section 4.6.

For this reason it is usual to distinguish between a function space and a normed

space by using a different name whenever a specific norm on the set of functions is being

considered. For example, we have introduced the space C0 [a, b] of continuous functions

on the interval [a, b]. One of the simplest norms on this space is the supremum norm1

ky(x)k = max |y(x)|,

a≤x≤b

and this norm can be shown to satisfy the conditions of equation 1.3 (page 11). The

‘distance’ between two functions y and z is of course ky − zk. When we wish to

emphasise that we are considering this particular normed space, and not just the space

of continuous functions, we shall write D0 [a, b], by which we shall mean the space of

continuous functions with the specified norm. When we write C0 [a, b], no particular

norm is implied.

In what follows, we shall sometimes need to restrict attention to functions which

have a continuous and bounded derivative. A suitable norm for such functions is

y(x) = max |y(x)| + max |y 0 (x)|,

1 a≤x≤b a≤x≤b

and we shall denote by D1 [a, b] the normed space of functions with continuous bounded

derivative equipped with the norm k . k1 defined above. This space consists of the same

functions as the space C1 [a, b], but as before use of the latter notation will not imply

the use of any particular norm on the space.

It is usually necessary to restrict the class of functions we consider to the subset

of all possible functions that satisfy the boundary conditions, if defined. Normally we

shall simply refer to this restricted class of functions as the admissible functions: these

are defined to be those differentiable functions that satisfy any boundary conditions

and, in most circumstances, to be in D1 (a, b), because it is important to bound the

variation in y 0 (x). Later we shall be less restrictive and allow piecewise differentiable

functions.

We now come to the most important part of this section, that is the idea of the rate

of change of a functional which is implicit in the idea of a stationary path. Recall that a

1 In analysis texts max |y(x)| is replaced by sup |y(x)|, but for continous functions on closed finite

4.2. PRELIMINARY REMARKS 125

at a point if all its first partial derivatives are zero, ∂G/∂xk = 0, k = 1, 2, · · · , n.

This result follows by considering the difference between the values of G(x) at adjacent

points using the first-order Taylor expansion, equation 1.39 (page 36),

n

X ∂G

G(x + ξ) − G(x) = ξk + O(2 ), |ξ| = 1,

∂xk

k=1

dividing by and taking the limit → 0,

n

G(x + ξ) − G(x) X ∂G

∆G(x, ξ) = lim = ξk . (4.4)

→0 ∂xk

k=1

A stationary point is defined to be one at which the rate of change, ∆G(x, ξ), is zero

in every direction; it follows that at a stationary point all first partial derivatives must

be zero.

The idea embodied in equation 4.4 may be applied to the functional

Z b

S[y] = dx F (x, y, y 0 ), y(a) = A, y(b) = B,

a

which has a real value for each admissible function y(x). The rate of change of a

functional S[y] is obtained by examining the difference between neighbouring admissible

paths, S[y + h] − S[y]; since both y(x) and y(x) + h(x) are admissible functions for all

real , it follows that h(a) = h(b) = 0. This difference is a function of the real variable

, so we define the rate of change of S[y] by the limit,

S[y + h] − S[y] d

∆S[y, h] = lim = S[y + h] , (4.5)

→0 d =0

which we assume exists. The functional ∆S depends upon both y(x) and h(x), just as

the limit of the difference [G(x + ξ) − G(x)]/, of equation 4.4, depends upon x and ξ.

Definition 4.1

The functional S[y] is said to be stationary if y(x) is an admissible function and if

∆S[y, h] = 0 for all h(x) for which y(x) and y(x) + h(x) are admissible.

The functions for which S[y] is stationary are named stationary paths. The stationary

path, y(x), and the varied path y(x) + h(x) must be admissible: for most variational

problems considered in this chapter both paths needs to satisfy the boundary conditions,

so h(a) = h(b) = 0. But in more general problems considered later, particularly in

chapter 10, these conditions on h(x) are removed, but see exercises 4.12 and 4.13. If

y(x) is an admissible path we name the allowed variations, h(x), to be those for which

y(x) + h(x) are admissible.

On a stationary path the functional may achieve a maximum or a minimum value,

and then the path is named an extremal. The nature of stationary paths is usually

determined by the term O(2 ) in the expansion of S[y + h]: this theory is described in

chapter 8.

126 CHAPTER 4. THE EULER-LAGRANGE EQUATION

d

∆S[y, h] = S[y + h]

d =0

is linear in h, that is if c is any constant then ∆S[y, ch] = c∆S[y, h]; in this case it is

named the Gâteaux differential.

Notice that if S is an ordinary function of n variables, (y1 , y2 , · · · , yn ), rather than

a functional, then the Gâteaux differential is

n

d X ∂S

∆S = lim S(y + h) = hk ,

→0 d ∂yk

k=1

As an example, consider the functional

Z b p

S[y] = dx 1 + y0 2, y(a) = A, y(b) = B,

a

for the distance between (a, A) and (b, B), discussed in section 3.2.1. We have

Z b p Z b

d d 0 0 2

d p

S[y + h] = dx 1 + (y + h ) = dx 1 + (y 0 + h0 )2 ,

d d a a d

Z b

(y 0 + h0 )

= dx p h0 .

a 1 + (y 0 + h0 )2

Note that we may change the order of differentiation with respect to and integration

with respect to x because a and b are independent of and all integrands are assumed

to be sufficiently well-behaved functions of x and . Hence, on putting = 0

Z b

y0

d

h0 ,

∆S[y, h] = S[y + h]

= dx p

d =0 a 1 + y0 2

For our final comment, we note the approximation defined in equation 4.3 (page 122)

gives a function of N variables, so the associated differential is

∆S[y, h] = lim .

→0

Comparing this with ∆G, equation 4.4, we can make the equivalences y ≡ x and h ≡ ξ.

However, for functions of N variables there is no relation between the variables ξ k and

ξk+1 , but h(x) is differentiable, so |hk − hk+1 | = O(δ). This suggests that some care is

required in taking the limit N → ∞ of equation 4.3 and shows why problems involving

finite numbers of variables can be different from those with infinitely many variables

and why the choice of norms, discussed above, is important. Nevertheless, provided

caution is exercised, the analogy with functions of several variables can be helpful.

4.3. THE FUNDAMENTAL LEMMA 127

Exercise 4.2

Find the Gâteaux differentials of the following functionals:

Z π/2 Z b

y0 2

dx y 0 2 − y 2 ,

` ´

(a) S[y] = (b) S[y] = dx 3 , b > a > 0,

0 a x

Z b Z 1

dx y 0 2 + y 2 + 2yex , (d) S[y] =

` ´ p p

(c) S[y] = dx x2 + y 2 1 + y 0 2 .

a 0

Exercise 4.3

Show that the Gâteaux differential of the functional,

Z b Z b

S[y] = ds dt K(s, t)y(s)y(t)

a a

is Z b Z b “ ”

∆S[y, h] = ds h(s) dt K(s, t) + K(t, s) y(t).

a a

This section contains the essential result upon which the Calculus of Variations depends.

Using the result obtained here we will be able to use the stationary condition that

∆S[y, h] = 0, for all suitable h(x), to form a differential equation for the unknown

function y(x).

Z b

dx z(x)h(x) = 0

a

for all functions h(x) that are continuous for a ≤ x ≤ b and are zero at x = a and

x = b, then z(x) = 0 for a ≤ x ≤ b.

In order to prove this we assume on the contrary that z(η) 6= 0 for some η satisfying

a < η < b. Then, since z(x) is continuous there is an interval [x1 , x2 ] around η with

a < x1 ≤ η ≤ x 2 < b

in which z(x) 6= 0. We now construct a suitable function h(x) that yields a contradic-

tion. Define h(x) to be

(

(x − x1 )(x2 − x), a < x1 ≤ x ≤ x2 < b,

h(x) =

0, otherwise,

Z b Z x2

dx z(x)h(x) = dx z(x)(x − x1 )(x2 − x) 6= 0,

a x1

128 CHAPTER 4. THE EULER-LAGRANGE EQUATION

Rb

since the integrand is continuous and non-zero on (x1 , x2 ). However, a dx zh = 0, so

we have a contradiction.

Thus the assumptions that z(x) is continuous and z(x) 6= 0 for some x ∈ (a, b)

lead to a contradiction and we deduce that z(x) = 0 for a < x < b: because z(x) is

continuous it follows that z(x) = 0 for a ≤ x ≤ b. This result is named the fundamental

lemma of the Calculus of Variations.

This proof assumed only that h(x) is continuous and made no assumptions about

its differentiability. In previous applications h(x) had to be differentiable for x ∈ (a, b).

However, for the function h(x) defined above h0 (x) does not exist at x1 and x2 . The

proof is easily modified to deal with this case. If h(x) needs to be n times differentiable

then we use the function

(

(x − x1 )n+1 (x2 − x)n+1 , x1 ≤ x ≤ x 2 ,

h(x) =

0, otherwise.

Exercise 4.4

In this exercise a result due to du Bois-Reymond (1831 – 1889) which is closely

related to the fundamental lemma will be derived. This is required later, see

exercise 4.11.

If z(x) and h0 (x) are continuous, h(a) = h(b) = 0 and

Z b

dx z(x)h0 (x) = 0

a

Prove this result by defining a constant C and a function g(x) by the relations

Z b Z x

1

C= dx z(x) and g(x) = dt (C − z(t)).

b−a a a

Z b Z b Z b

dx z(x)g 0 (x) = dx z(x)(C − z(x)) = − dx (C − z(x))2 .

a a a

This section contains the most important result of this chapter. Namely, that if

F (x, u, v) is a sufficiently differentiable function of three variables, then a necessary

and sufficient condition for the functional2

Z b

S[y] = dx F (x, y, y 0 ), y(a) = A, y(b) = B, (4.6)

a

2 Many texts state that a necessary condition for y(x) to be an extremal of S[y] is that it satisfies the

Euler-Lagrange equation. Here we consider stationary paths and then the condition is also sufficient.

4.4. THE EULER-LAGRANGE EQUATIONS 129

to be stationary on the path y(x) is that it satisfies the differential equation and bound-

ary conditions,

d ∂F ∂F

− = 0, y(a) = A, y(b) = B. (4.7)

dx ∂y 0 ∂y

This is named Euler’s equation or the Euler-Lagrange equation. It is a second-order

differential equation, as shown in exercise 3.10, and is the analogue of the conditions

∂G/∂xk = 0, k = 1, 2, · · · , n, for a function of n real variables to be stationary, as

discussed in section 4.2.2. We now derive this equation.

The integral 4.6 is defined for functions y(x) that are differentiable for a ≤ x ≤ b.

Using equation 4.5 we find that the rate of change of S[y] is

Z b

d 0

0

∆S[y, h] = dx F (x, y + h, y + h )

d a

=0

Z b

d

= dx F (x, y + h, y 0 + h0 ) . (4.8)

a d

=0

The integration limits a and b are independent of and we assume that the order of

integration and differentiation may be interchanged. The integrand of equation 4.8 is a

total derivative with respect to and equation 1.21 (page 26) shows how to write this

expression in terms of the partial derivatives of F . Using equation 1.21 with n = 3,

t = and the variable changes (x1 , x2 , x3 ) = (x, y, y 0 ) and (h1 , h2 , h3 ) = (0, h(x), h0 (x)),

so that

we obtain

d ∂F ∂F

F (x, y + h, y 0 + h0 ) = h + h0 0 .

d ∂y ∂y

Now set = 0, so the partial derivatives are evaluated at (x, y, y 0 ), to obtain,

b

∂F ∂F

Z

∆S[y, h] = dx h(x) + h0 (x) 0 . (4.9)

a ∂y ∂y

b b Z b

∂F ∂F d ∂F

Z

dx h0 (x) = h(x) − dx h(x) ,

a ∂y 0 ∂y 0 a a dx ∂y 0

assuming that Fy0 is differentiable. But h(a) = h(b) = 0 so the boundary term on the

right-hand side vanishes and the rate of change of the functional S[y] becomes

b

d ∂F ∂F

Z

∆S[y, h] = − dx − h(x). (4.10)

a dx ∂y 0 ∂y

stationary path of the functional.

130 CHAPTER 4. THE EULER-LAGRANGE EQUATION

If S[y] is stationary then, by definition, ∆S[y, h] = 0 for all allowed h and it follows

from the fundamental lemma of the Calculus of Variations that y(x) satisfies the second-

order differential equation

d ∂F ∂F

− = 0, y(a) = A, y(b) = B. (4.11)

dx ∂y 0 ∂y

ciently differentiable path, y(x), is that it satisfies the Euler-Lagrange equation 4.7.

The paths that satisfy the Euler-Lagrange equation are not necessarily extremals,

that is do not necessarily yield maxima or minima, of the functional. The Euler-

Lagrange equation is, in most cases, a second-order, nonlinear, boundary value problem

and there may be no solutions or many. Finally, note that functionals that are equal

except for multiplicative or additive constants have the same Euler-Lagrange equations.

Exercise 4.5

Show that the Euler-Lagrange equation for the functional

Z X “ ”

S[y] = dx y 0 2 − y 2 , y(0) = 0, y(X) = 1, X > 0,

0

function is y = sin x/ sin X.

The significance of the point X = π will be revealed in chapter 8, in particular

exercise 8.12. There it is shown that for 0 < X < π this solution is a minimum of

the functional, but for X > π it is simply a stationary point. In this example at

the boundary, X = π, the Euler-Lagrange equation does not have a solution.

The Euler-Lagrange equation is a second-order differential equation. But if the inte-

grand does not depend explicitly upon x, so the functional has the form

Z b

S[y] = dx G(y, y 0 ), y(a) = A, y(b) = B, (4.12)

a

∂G

y0 − G = c, y(a) = A, y(b) = B, (4.13)

∂y 0

where c is a constant determined by the boundary conditions, see for example exer-

cise 4.6 below. The expression on the left-hand side of this equation is often named the

first-integral of the Euler-Lagrange equation. This result is important because, when

applicable, it often saves a great deal of effort, because it is usually far easier to solve

this lower order equation. Two proofs of equation 4.13 are provided: the first involves

deriving an algebraic identity, see exercise 4.7, and it is important to do this yourself.

The second proof is given in section 7.2.1 and uses the invariance properties of the inte-

grand G(y, y 0 ). A warning, however; in some circumstances a solution of equation 4.13

4.4. THE EULER-LAGRANGE EQUATIONS 131

will not be a solution of the original Euler-Lagrange equation, see exercise 4.8, also

section 5.3 and chapter 6.

Another important consequence is that the stationary function, the solution of 4.13,

depends only upon the variables u = x − a and b − a (besides A and B), rather than

x, a and b independently, as is the case when the integrand depends explicitly upon x.

A specific example illustrating this behaviour is given in exercise 4.20.

An observation

You may have noticed that the original functional 4.6 is defined on the class of func-

tions for which F (x, y(x), y 0 (x)) is integrable: if F (x, u, v) is differentiable in all three

variables this condition is satisfied if y 0 (x) is piecewise continuous. However, the Euler-

Lagrange equation 4.11 requires the stronger condition that y 0 (x) is differentiable. This

extra condition is created by the derivation of the Euler-Lagrange equation, in partic-

ular the step between equations 4.9 and 4.10: a necessary condition for the functional

S[y] to be stationary, that does not make this step and does not require y 00 to exist, is

derived in exercise 4.11.

There are important problems where y 00 (x) does not exist at all points on a stationary

path — the minimal surface of revolution, dealt with in the next chapter, is one simple

example; the general theory of this type of problem will be considered in chapter 10.

Exercise 4.6

Consider the functional

Z 1

dx y 0 2 − y ,

` ´

S[y] = y(0) = 0, y(1) = 1.

0

d2 y

2 + 1 = 0, y(0) = 0, y(1) = 1,

dx2

and find its solution.

Show that the first-integral, equation 4.13, becomes the nonlinear equation

„ «2

dy

+ y = c.

dx

Find the general solution of this equation and find the solution that satisfies the

boundary conditions.

In this example it is easier to solve the linear second-order Euler-Lagrange equation

than the first-order equation 4.13, which is nonlinear. Normally, both equations

are nonlinear and then it is easier to solve the first-order equation. In the examples

considered in sections 5.3 and 5.2 it is more convenient to use the first-integral.

Exercise 4.7

If G(y, y 0 ) does not depend explicitly upon x, that is ∂G/∂x = 0, show that

„ „ « « „ «

d ∂G ∂G d 0 ∂G

y 0 (x) − = y − G

dx ∂y 0 ∂y dx ∂y 0

and hence derive equation 4.13.

Hint: you will find the result derived in exercise 3.10 (page 103) helpful.

132 CHAPTER 4. THE EULER-LAGRANGE EQUATION

Exercise 4.8

(a) Show that provided Gy0 (y, 0) exists the differential equation 4.13 (without the

boundary conditions) has a solution y(x) = γ, where the constant γ is defined

implicitly by the equation G(γ, 0) = −c.

(b) Under what circumstances is the solution y(x) = γ also a solution of the

Euler-Lagrange equation 4.11?

Exercise 4.9

Show that the Euler-Lagrange equation for the functional

Z 1 “ ”

S[y] = dx y 0 2 + y 2 + 2axy , y(0) = 0, y(1) = B,

0

sinh x

y(x) = (a + B) − ax.

sinh 1

By expanding S[y + h] to second order in show that this solution makes the

functional a minimum.

Exercise 4.10

In this exercise we consider a problem, due to Weierstrass (1815 – 1897), in which

the functional achieves its minimum value of zero for a piecewise continuous func-

tion but for continuous functions the functional is always positive.

The functional is

Z 1

J[y] = dx x2 y 0 2 , y(−1) = −1, y(1) = 1,

−1

−1, −1 ≤ x < 0

y(x) =

1, 0 < x ≤ 1,

has a piecewise continuous derivative and J[y] = 0.

(a) Show that the associated Euler-Lagrange equation gives x2 y 0 = A for some

constant A and that the solutions of this that satisfy the boundary conditions at

x = −1 and x = 1 are, respectively,

A

8

< −1 − A − ,

> −1 ≤ x < 0

y(x) = x

: 1 + A − A,

>

0 < x ≤ 1.

x

Deduce that no continuous function satisfies the Euler-Lagrange equation and the

boundary conditions.

(b) Show that for the class of continuous function defined by

8

< −1,

> −1 ≤ x ≤ −,

y(x) = x/, |x| < ,

>

: 1, ≤ x ≤ 1,

where is a small positive number, J[y] = 2/3. Deduce that for continuous

functions the functional can be made arbitrarily close to the smallest possible

value of J, that is zero, so there is no stationary path.

4.4. THE EULER-LAGRANGE EQUATIONS 133

(c) A similar result can be proved for a class of continuously differentiable func-

tions. For the functions

1 “x” 1

y(x) = tan−1 , tan β = , 0 < < 1,

β

show that

2

J[y] =+ O(2 ).

π

Deduce that J[y] may take arbitrarily small values, but cannot be zero.

Hint the relation tan−1 (1/z) = π/2 − tan−1 (z) is needed.

It may be shown that for no continuous function satisfying the boundary condi-

tions is J[y] = 0. Thus on the class of continuous functions J[y] never equals its

minimum value, but can approach it arbitrarily closely.

Exercise 4.11

The Euler-Lagrange equation 4.11 requires that y 00 (x) exists, yet the original func-

tional does not. The second derivative arises when equation 4.9 is integrated by

parts to replace h0 (x) by h(x). In this exercise you will show that this step may be

avoided and that a sufficient condition not depending upon y 00 (x) may be derived.

Define the function φ(x) by the integral

Z x

φ(x) = dt Fy (t, y(t), y 0 (t)),

a

so that φ(a) = 0 and φ (x) = Fy (x, y, y 0 ), and show that equation 4.9 becomes

0

Z b » –

∂F

∆S = dx h0 (x) − φ(x) .

a ∂y 0

Using the result derived in exercise 4.4 show that a necessary condition for S[y]

to be stationary is that Z x

∂F ∂F

− dt = C,

∂y 0 a ∂y

where C is a constant.

In practice, this equation is not usually as useful as the Euler-Lagrange equation.

Exercise 4.12

The boundary conditions y(a) = A, y(b) = B are not always appropriate so we

need functionals that yield different conditions. In this exercise we illustrate how

this can sometimes be achieved. The technique used here is important and will

be used extensively in chapter 10.

Consider the functional

Z b

1

dx y 0 2 + y 2 ,

` ´

S[y] = −G(y(b)) + y(a) = A,

2 a

with no condition being given at x = b. For this functional the variation h(x)

satisfies h(a) = 0, but h(b) is not constrained.

134 CHAPTER 4. THE EULER-LAGRANGE EQUATION

(a) Use the fact that h(a) = 0 to show that the Gâteaux differential can be written

in the form

“ ” Z b

∆S[y, h] = y 0 (b) − Gy (y(b)) h(b) − dx y 00 − y h.

` ´

a

(b) Using a subset of variations with h(b) = 0 show that the stationary paths

satisfy the equation y 00 − y = 0, y(a) = A, and that on this path

“ ”

∆S[y, h] = y 0 (b) − Gy (y(b)) h(b).

Deduce that S[y] is stationary only if y(b) and y 0 (b) satisfy the equation

y 0 (b) = Gy (y(b)).

1 b

Z

dx y 0 2 + y 2 ,

` ´

S[y] = −By(b) + y(a) = A,

2 a

Exercise 4.13

Use the ideas outlined in the previous exercise to show that if G(b, y, B) is defined

by the integral Z y

G(b, y, B) = dz Fy0 (b, z, B)

the functional

Z b

S[y] = −G(b, y(b), B) + dx F (x, y, y 0 ), y(a) = A,

a

„ «

d ∂F ∂F

− = 0, y(a) = A, y 0 (b) = B.

dx ∂y 0 ∂y

In section 4.4 it was shown that a necessary condition for a function, y(x), to represent

Rb

a stationary path of the functional S = a dx F (x, y, y 0 ), y(a) = A, y(b) = B, is

that it satisfies the Euler-Lagrange equation 4.11 or, in expanded form, exercise 3.10

(page 103),

the boundary conditions this equation cannot normally be solved in terms of known

functions: the addition of the boundary values normally makes it even harder to solve. It

is therefore frequently necessary to resort to approximate or numerical methods to find

solutions, in which case it is helpful to know that solutions actually exist and that they

4.5. THEOREMS OF BERNSTEIN AND DU BOIS-REYMOND 135

are unique: indeed it is possible for “black-box” numerical schemes to yield solutions

when none exists. In this course there is insufficient space to discuss approximate

and numerical methods, but this section is devoted to a discussion of a theorem that

provides some information about the existence and uniqueness of solutions for the Euler-

Lagrange equation. In the last part of this section we contrast these results with those

for the equivalent equation, but with initial conditions rather than boundary values.

First, however, we return to the question, discussed on page 131, of whether the

second derivative of the stationary path exists, that is whether it satisfies the Euler-

Lagrange equation in the whole interval.

The following theorem due to the German mathematician du Bois-Reymond (1831 –

1889) gives necessary conditions for the second derivative of a stationary path to exist.

Theorem 4.1

If

(a) y(x) has a continuous first derivative,

(b) ∆S[y, h] = 0 for all allowed h(x),

(c) F (x, u, v) has continuous first and second derivatives in all variables and

(d) ∂ 2 F/∂y 0 2 6= 0 for a ≤ x ≤ b,

then y(x) has a continuous second derivative and satisfies the Euler-Lagrange equa-

tion 4.11 for all a ≤ x ≤ b.

This result is of limited practical value because its application sometimes requires

knowledge of the solution, or at least some of its properties. A proof of this theorem may

be found in Gelfand and Fomin (1963, page 17)3 . An example in which Fy0 y0 = 0 on the

stationary path and where this path does not possess a second derivative, yet satisfies

the Euler-Lagrange equation almost everywhere, is given in exercise 4.28 (page 141).

The theorem quoted in this section concerns the boundary value problem that can be

written in form of the second-order, nonlinear, boundary value equation,

d2 y

dy

= H x, y, , y(a) = A, y(b) = B. (4.15)

dx2 dx

For such equations this is one of the few general results about the nature of the solutions

and is due to the Ukrainian mathematician S N Bernstein (1880 – 1968). This theorem

provides a sufficient condition for equation 4.15 to have a unique solution.

Theorem 4.2

If for all finite y, y 0 and x in an open interval containing [a, b], that is c < a ≤ x ≤ b < d,

(a) the functions H, Hy and Hy0 are continuous,

(b) there is a constant k > 0 such that Hy > k, and,

(c) for any Y > 0 and all |y| < Y and a ≤ x ≤ b there are positive constants α(Y )

and β(Y ), depending upon Y , and possibly c and d, such that

|H(x, y, y 0 )| ≤ α(Y )y 0 2 + β(Y ),

3 I M Gelfand and S V Fomin Calculus of Variations, (Prentice Hall, translated from the Russian

136 CHAPTER 4. THE EULER-LAGRANGE EQUATION

A proof of this theorem may be found in Akhiezer (1962, page 30)4 . The conditions

required by this theorem are far more stringent than those required by theorems 2.1

(page 61) and 2.2 (page 81), which apply to initial value problems. These theorems

emphasise the significant differences between initial and boundary value problems as

discussed in section 2.2.

Some examples

The usefulness of Bernstein’s theorem is somewhat limited because the conditions of

the theorem are too stringent; it is, however, one of the rare general theorems applying

to this type of problem. Here we apply it to the two problems dealt with in the next

chapter, for which the integrands of the functionals are

p

F = y 1 + y0 2 Minimal surface of revolution,

s

1 + y0 2

F = Brachistochrone.

y

Substituting these into the Euler-Lagrange equation 4.14 we obtain the following ex-

pressions for H,

1 + y0 2

y 00 = H = Minimal surface of revolution,

y

1 + y0 2

y 00 = H = − Brachistochrone.

2y

In both cases is H discontinuous at y = 0, so the conditions of the theorem do not hold.

In fact, the Euler-Lagrange equation for the minimal surface problem has one piecewise

smooth solution and, in addition, either two or no differentiable solutions, depending

upon the boundary values. The brachistochrone problem always has one, unique solu-

tion. These examples emphasise the fact that Bernstein’s theorem gives sufficient as

opposed to necessary conditions.

Exercise 4.14

Use Bernstein’s theorem to show that the equation y 00 −y = x, y(0) = A, y(1) = B,

has a unique solution, and find this solution.

Exercise 4.15

(a) Apply Bernstein’s theorem to the equation y 00 + y = x, y(0) = 0, y(X) = 1

with X > 0.

(b) Show that the solution of this equation is

sin x

y = x + (1 − X)

sin X

and explain why this does not contradict Bernstein’s theorem.

Exercise 4.16

The integrand of p the functional for Brachistochrone problem, described in sec-

√

tion 3.5.1, is F = 1 + y 0 2 / y. Show that the associated Euler-Lagrange equa-

1 + y0 2

tion is y 00 = − and that this may be written as the pair of first-order

2y

equations

dy1 dy2 1 + y22

= y2 , =− where y1 = y.

dx dx 2y1

4N I Akhiezer The Calculus of Variations (Blaisdell).

4.6. STRONG AND WEAK VARIATIONS 137

Exercise 4.17 Z 1 ´2

dx y 2 1 − y 0 , y(−1) = 0, y(1) = 1, the

`

Consider the functional S[y] =

−1

smallest value of which is zero. Show that the solution of the Euler-Lagrange

equation that minimises this functional is

0, −1 ≤ x ≤ 0,

y(x) =

x, 0 < x < 1,

which has a discontinuous derivative at x = 0. Show that this result is consistent

with theorem 4.1 of du Bois-Reymond.

In section 4.2.2 we briefly discussed the idea of the norm of a function. Here we show

why the choice of the norm is important.

Consider the functional for the distance between the origin and the point (1, 0), on

the x-axis,

Z 1 p

S[y] = dx 1 + y 0 2 , y(0) = 0, y(1) = 0. (4.16)

0

It is obvious, and proved in section 3.2, that in the class of smooth functions the

stationary path is the segment of the x-axis between 0 and 1, that is y(x) = 0 for

0 ≤ x ≤ 1.

Now consider the value of the functional as the path is varied about y = 0, that is

S[h], where h(x) is first restricted to D1 (0, 1) and then to D0 (0, 1).

In the first case the norm of h(x) is taken to be

||h(x)||1 = max |h(x)| + max |h0 (x)|. (4.17)

0≤x≤1 0≤x≤1

and without loss of generality we may restrict h to satisfy ||h||1 = 1, so that |h0 (x)| ≤

H1 < 1. On the varied path the value of the functional is

Z 1 p p

S[h] = dx 1 + 2 h0 2 ≤ 1 + (H1 )2

0

and hence

p (H1 )2

S[h] − S[0] ≤ 1 + (H1 )2 − 1 = p < (H1 )2 < 2 .

1 + 1 + (H1 )2

Thus if h(x) belongs to D1 (0, 1), S[y] changes by O(2 ) on the neighbouring path and

since S[h] − S[0] > 0 for all the straight line path is a minimum.

Now consider the less restrictive norm

||h(x)||0 = max |h(x)|, (4.18)

0≤x≤1

which restricts the magnitude of h, but not the magnitude of its derivative. A suitable

path close to y = 0 is given by h(x) = sin nπx, n being a positive integer. Now we

have Z 1 p Z 1

S[h] = dx 1 + (nπ)2 cos2 nπx ≥ nπ dx |cos nπx| .

0 0

138 CHAPTER 4. THE EULER-LAGRANGE EQUATION

But

1 1/2n

2

Z Z

dx |cos nπx| = 2n dx cos nπx = .

0 0 π

Hence S[h] ≥ 2n. Thus for any > 0 we may chose a value of n to make S[h] as

large as we please, even though the varied path is arbitrarily close to the straight-line

path: hence the path y = 0 is not stationary when this norm is used.

This analysis shows that the definition of the distance between paths is important

because different definitions can change the nature of a path; consequently two types

of stationary path are defined.

The functional S[y] is said to have a weak stationary path, ys if there exists a δ > 0

such that S[ys + g] − S[ys ] has the same sign for all variations g satisfying ||g||1 < δ.

On the other hand, S[y] is said to have a strong stationary path, ys if there exists

a δ > 0 such that S[ys + g] − S[ys ] has the same sign for all variations g satisfying

||g||0 < δ.

A strong stationary path is also a weak stationary path because if ||g||1 < δ then

||g||0 < δ. The converse is not true in general.

It is easier to find weak stationary paths and, fortunately these are often the most

important. The Gâteaux differential is defined only for weak variations and, as we have

seen, it leads to a differential equation for the stationary path.

Exercise 4.18

In this exercise we give another example of a path satisfying the ||z||0 norm which

is arbitrarily close to the line y = 0, but for which S is arbitrarily large.

Consider the isosceles triangle with base AC of length a, height h and base angle β,

as shown on the left-hand side of the figure.

B B

l B1 B2

h

β

A D C A D C

Figure 4.2

(a) Construct the two smaller triangles AB1 D and DB2 C by halving the height

and width of ABC, as shown on the right. If AB = l and BD = h, show that

AB1 = l/2, 2l = a/ cos β and h = l sin β. Hence show that the lengths of the lines

AB1 DB2 C and ABC are the same and equal to 2l.

(b) Show that after n such divisions there are 2n similar triangles of height 2−n h

and that the total length of the curve is 2l. Deduce that arbitrarily close to AC,

the shortest distance between A and C, we may find a continuous curve every

point of which is arbitrarily close to AC, but which has any given length.

4.7. MISCELLANEOUS EXERCISES 139

Exercise 4.19

Show that the Euler-Lagrange equation for the functional

Z 1

dx y 0 2 − y 2 − 2xy , y(0) = y(1) = 0,

` ´

S[y] =

0

00

is y + y = −x. Hence show that the stationary function is y(x) = sin x/ sin 1 − x.

Exercise 4.20

Consider the functional

Z b

S[y] = dx F (y, y 0 ), y(a) = A, y(b) = B,

a

where F (y, y 0 ) does not depend explicitly upon x. By changing the independent

variable to u = x − a show that the solution of the Euler-Lagrange equation

depends on the difference b − a rather than a and b separately.

Exercise 4.21

Euler’s original method for finding solutions of variational problems is described

in equation 4.3 (page 122). Consider approximating the functional defined in

exercise 4.19 using the polygon passing through the points (0, 0), ( 12 , y1 ) and (1, 0),

so there is one variable y1 and two segments.

This polygon can be defined by the straight line segments

(

2y1 x, 0 ≤ x ≤ 12 ,

y(x) =

2y1 (1 − x), 12 ≤ x ≤ 1.

11 2 1

S(y1 ) = y1 − y1 ,

3 2

and hence that the stationary polygon is given by y(1/2) ' y1 = 3/44. Note that

this gives y(1/2) ' 0.0682 by comparison to the exact value 0.0697.

Exercise 4.22

Find the stationary paths of the following functionals.

Z 1

dx y 0 2 + 12xy ,

` ´

(a) S[y] = y(0) = 0, y(1) = 2.

0

Z 1

dx 2y 2 y 0 2 − (1 + x)y 2 ,

` ´

(b) S[y] = y(0) = 1, y(1) = 2.

0

Z 2

(c) S[y] = − 21 By(2) + dx y 0 2 /x2 , y(1) = A.

1

b

y(0)2

Z

(d) S[y] = − + dx y/y 0 2 , y(b) = B 2 , B 2 > 2Ab > 0.

A3 0

Hint for (c) and (d) use the method described in exercise 4.12.

140 CHAPTER 4. THE EULER-LAGRANGE EQUATION

Exercise 4.23

What is the equivalent of the fundamental lemma of the Calculus of Variations in

the theory of functions of many real variables?

Exercise 4.24

Find the general solution

Z b of the Euler-Lagrange equation corresponding to the

p

functional S[y] = dx w(x) 1 + y 0 2 , and find explicit solutions in the special

√ a

cases w(x) = x and w(x) = x.

Exercise 4.25 Z 1 ´2

dx y 0 2 − 1 ,

`

Consider the functional S[y] = y(0) = 0, y(1) = A > 0.

0

02

(a) Show that the Euler-Lagrange equation reduces to y = m2 , where m is a

constant.

(b) Show that the equation y 0 2 = m2 , with m > 0, has the following three solu-

tions that fit the boundary conditions, y1 (x) = Ax,

8

A+m

>

> mx, 0≤x≤ ,

2m

<

y2 (x) = m>A

: A + m(1 − x), A + m ≤ x ≤ 1,

>

>

2m

and 8

m−A

>

> −mx, 0≤x≤ ,

2m

<

y3 (x) = m > A.

>

: A − m(1 − x), m−A

> ≤ x ≤ 1,

2m

Show also that on these solutions the functional has the values

S[y1 ] = (A2 − 1)2 , S[y2 ] = (m2 − 1)2 and S[y3 ] = (m2 − 1)2 .

(c) Deduce that if A ≥ 1 the minimum value of S[y] is (A2 − 1)2 and that this

occurs on the curve y1 (x), but if A < 1 the minimum value of S[y] is zero and this

occurs on the curves y2 (x) and y3 (x) with m = 1.

Exercise 4.26

Show that the following functionals do not have stationary values

Z 1 Z 1 Z 1

(a) dx y 0 , (b) dx yy 0 , (c) dx xyy 0 ,

0 0 0

Exercise 4.27

Show that the Euler-Lagrange equations for the functionals

Z b Z b „ «

d

S1 [y] = dx F (x, y, y 0 ) and S2 [y] = dx F (x, y, y 0 ) + G(x, y)

a a dx

are identical.

4.7. MISCELLANEOUS EXERCISES 141

Exercise 4.28 Z 1 ´2

dx y 2 2x − y 0 ,

`

Show that the functional S[y] = y(−1) = 0, y(1) = 1,

−1

achieves its minimum value, zero, when

(

0, −1 ≤ x ≤ 0,

y(x) =

x2 , 0 ≤ x ≤ 1,

which has no second derivative at x = 0. Show that, despite the fact that y 00 (x)

does not exist everywhere, the Euler-Lagrange equation is satisfied for x 6= 0.

Exercise 4.29 Z b

The functional S[y] = dx F (x, y, y 0 ), y(a) = A, y(b) = B, is stationary on

a

those paths satisfying the Euler-Lagrange equation

„ «

d ∂F ∂F

− = 0, y(a) = A, y(b) = B.

dx ∂y 0 ∂y

In this formulation of the problem we choose to express y in terms of x: however,

we could express x in terms of y, so the functional has the form

Z B

J[x] = dy G(y, x, x0 ), x(A) = a, x(B) = b,

A

(a) Show that G(y, x, x0 ) = x0 F (x, y, 1/x0 ), and that the Euler-Lagrange equation

for this functional,

„ «

d ∂G ∂G

− = 0, x(A) = a, x(B) = b,

dy ∂x0 ∂x

when expressed in terms of the original function F is

Fy0 y0 00 1

x − 0 Fyy0 − Fxy0 + Fy = 0

x0 3 x

where, for instance, the function Fy0 is the differential of F (x, y, y 0 ) with respect

to y 0 expressed in terms of x0 after differentiation.

(b) Derive the same result from the original Euler-Lagrange equations for F .

Exercise 4.30

Use the approximation 4.3 (page 122) to show that the equations for the values

of y = (y1 , y2 , · · · , yn ), where xk+1 = xk + δ, that make S(y) stationary are

∂S ∂ ∂ ∂

= δ F (zk ) + F (zk ) − F (zk+1 ) = 0, k = 1, 2, · · · , n,

∂yk ∂u ∂v ∂v

where zk = (xk , u, v), u = yk , v = (yk − yk−1 )/δ and where y0 = A and yn+1 = B.

Show also that zk+1 = zk + δ (1, yk0 , yk0 0 ) + O(δ 2 ), and hence that

∂2F ∂2F ∂2F

„ «

∂S ∂F

= δ − − yn0 − yn0 0 2 + O(δ 2 ),

∂yk ∂u ∂x∂v ∂u∂v ∂v

„ „ « «

d ∂F ∂F 2

= −δ − + O(δ ),

dx ∂v ∂u

where F and its derivatives are evaluated at z = zk .

Hence derive the Euler-Lagrange equations.

142 CHAPTER 4. THE EULER-LAGRANGE EQUATION

Harder exercises

Exercise 4.31

This exercise is a continuation of exercise 4.21 and uses a set of n variables to

define the polygon. Take a set of n + 2 equally spaced points on the x-axis,

xk = k/(n + 1), k = 0, 1, · · · , n + 1 with x0 = 0 and xn+1 = 1, and a polygon

passing through the points (xk , yk ). Since y(0) = y(1) = 0 we have y0 = yn+1 = 0,

leaving N unknown variables.

Show that the functional defined in exercise 4.19 approximates to

n „ «ff

1X 2k 1

S= (yk+1 − yk )2 − h2 yk2 + yk , h= .

h n+1 n+1

k=0

(a) For n = 1, the case treated in exercise 4.21, show that this reduces to

7 2 1

S(y1 ) = y1 − y1 .

2 2

Explain the difference between this and the previous expression for S(y1 ), given

in exercise 4.21.

(b) For n = 2 show that this becomes

17 2 17 2 2 4

S= y + y − 6y1 y2 − y1 − y2 ,

3 1 3 2 9 9

and hence that the equations for y1 and y2 are

2 4

34y1 − 18y2 = , 34y2 − 18y1 = .

3 3

Solve these equations to show that y(1/3) ' 35/624 ' 0.0561 and y(2/3) '

43/624 ' 0.0689. Note that these compare favourably with the exact values,

y(1/3) = 0.0555 and y(2/3) = 0.0682.

Exercise 4.32 Z b

Consider the functional S[y] = dx F (y 00 ) where F (z) is a differentiable func-

a

tion and the admissible functions are at least twice differentiable and satisfy the

boundary conditions y(a) = A1 , y(b) = B1 , y 0 (a) = A2 and y 0 (b) = B2 .

(a) Show that the function making S[y] stationary satisfies the equation

∂F

= c(x − a) + d

∂y 00

where c and d are constants.

(b) In the case that F (z) = 21 z 2 show that the solution is

1 1

y(x) = c(x − a)3 + d(x − a)2 + A2 (x − a) + A1 ,

6 2

where c and d satisfy the equations

1 3 1

cD + dD2 = B1 − A1 − A2 D where D = b − a,

6 2

1 2

cD + dD = B 2 − A2 .

2

(c) Show that this stationary function is also a minimum of the functional.

4.7. MISCELLANEOUS EXERCISES 143

Exercise 4.33

The theory described in the text considered functionals with integrands depend-

ing only upon x, y(x) and y 0 (x). However, functionals depending upon higher

derivatives also exist and are important, for example in the theory of stiff beams,

and the equivalent of the Euler-Lagrange equation may be derived using a direct

extension of the methods described in this chapter.

Consider the functional

Z b

S[y] = dx F (x, y, y 0 , y 00 ), y(a) = A1 , y 0 (a) = A2 , y(b) = B1 , y 0 (b) = B2 .

a

Z b „ «

∂F ∂F ∂F

∆S[y, h] = dx h + h0 0 + h00 00 .

a ∂y ∂y ∂y

Z b Z b

d2

„ «

∂F ∂F

dx h00 00 = dx h 2

a ∂y a dx ∂y 00

being careful to describe the necessary properties of h(x). Hence show that S[y]

is stationary for the functions that satisfy the fourth-order differential equation

d2

„ « „ «

∂F d ∂F ∂F

− + = 0,

dx2 ∂y 00 dx ∂y 0 ∂y

Exercise 4.34

Using the result derived in the previous exercise, find the stationary functions of

the functionals

Z 1

(a) S[y] = dx (1 + y 00 2 ), y(0) = 0, y 0 (0) = y(1) = y 0 (1) = 1,

0

Z π/2 “π” “π ”

dx y 00 2 − y 2 + x2 , y 0 (0) = y y0

` ´

(b) S[y] = y(0) = 1, = 0, = −1.

0 2 2

144 CHAPTER 4. THE EULER-LAGRANGE EQUATION

Chapter 5

Applications of the

Euler-Lagrange equation

5.1 Introduction

In this chapter we solve the Euler-Lagrange equations for two classic problems, the

brachistochrone, section 5.2, and the minimal surface of revolution, section 5.3. These

examples are of historic importance and special because the Euler-Lagrange equations

can be solved in terms of elementary functions. They are also important because they

are relatively simple yet provide some insight into the complexities of variational prob-

lems.

The first example, the brachistochrone problem, is the simpler of these two prob-

lems and there is always a unique solution satisfying the Euler-Lagrange equation. The

second example is important because it is one of the simplest examples of a minimum

energy problem; but it also illustrates the complexities inherent in nonlinear boundary

value problems and we shall see that there are sometimes two and sometimes no differ-

entiable solutions, depending upon the values of the various parameters. This example

also shows that some stationary paths have discontinuous derivatives and therefore can-

not satisfy the Euler-Lagrange equations everywhere. This effect is illustrated in the

discussion of soap films in section 5.4 and in chapter 10 is considered in more detail.

In both these cases you may find the analysis leading to the required solutions com-

plicated. It is, however, important that you are familiar with this type of mathematics

so you should understand the text sufficiently well to be able to write the analysis in

your own words.

The problem, described previously in section 3.5.1 (page 104), is to find the smooth

curve joining two given points Pa and Pb , lying in a vertical plane, such that a bead

sliding on the curve, without friction but under the influence of gravity, travels from

Pa to Pb in the shortest possible time, the initial speed at Pa being given. It was

pointed out in section 3.5.1 that John Bernoulli made this problem famous in 1696

145

146 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

and that several solutions were published in 1697: Newton’s comprised the simple

statement that the solution was a cycloid, giving no proof. In section 5.2.3 we prove

this result algebraically, but first we describe necessary preliminary material. In the next

section we derive the parametric equations for the cycloid after giving some historical

background. In section 5.2.2 the brachistochrone problem is formulated in terms of a

functional and the stationary path of this is found in section 5.2.3.

The cycloid is one of a class of curves formed by a point fixed on a circle that rolls,

without slipping, on another curve. A cycloid is formed when the fixed point is on the

circumference of the circle and the circle rolls on a straight line, as shown in figure 5.1:

other curves with similar constructions are considered in chapter 9. A related curve is

the trochoid where the point tracing out the curve is not on the circle circumference;

clearly different types of trochoids are produced depending whether the point is inside

or outside the circle, see exercise 9.18 (page 253).

θ

C B

a

x

O A D

Figure 5.1 Diagram showing how the cycloid OP D is traced out by a circle

rolling along a straight line.

In figure 5.1 a circle of radius a rolls along the x-axis, starting with its centre on the

y-axis. Fix attention on the point P attached to the circle, initially at the origin O. As

the circle rolls P traces out the curve OP D named the cycloid .

The cycloid has been studied by many mathematicians from the time of Galileo

(1564 – 1642), and was the cause of so many controversies and quarrels in the 17 th

century that it became known as “the Helen of geometers”. Galileo named the cycloid

but knew insufficient mathematics to make progress. He tried to find the area between

it and the x-axis, but the best he could do was to trace the curve on paper, cut out the

arc and weigh it, to conclude that its area was a little less than three times that of the

generating circle — in fact it is exactly three times the area of this circle, as you can

show in exercise 5.3. He abandoned his study of the cycloid, suggesting only that the

cycloid would make an attractive arch for a bridge. This suggestion was implemented

in 1764 with the building of a bridge with three cycloidal arches over the river Cam in

the grounds of Trinity College, Cambridge, shown in figure 5.2.

The reason why cycloidal arches were used is no longer known, all records and

original drawings having been lost. However, it seems likely that the architect, James

Essex (1722 – 1784), chose this shape to impress Robert Smith (1689 – 1768), the Master

of Trinity College, who was keen to promote the study of applied mathematics.

5.2. THE BRACHISTOCHRONE 147

Figure 5.2 Essex’s bridge over the Cam, in the grounds of Trinity

college, having three cycloidal arches.

The area under a cycloid was first calculated in 1634 by Roberval (1602 – 1675). In

1638 he also found the tangent to the curve at any point, a problem solved at about

the same time by Fermat (1601 – 1665) and Descartes (1596 – 1650). Indeed, it was at

this time that Fermat gave the modern definition of a tangent to a curve. Later, in

1658, Wren (1632 – 1723), the architect of St Paul’s Cathedral, determined the length

of a cycloid.

Pascal’s (1623 – 1662) last mathematical work, in 1658, was on the cycloid and,

having found certain areas, volumes and centres of gravity associated with the cycloid,

he proposed a number of such questions to the mathematicians of his day with first and

second prizes for their solution. However, publicity and timing were so poor that only

two solutions were submitted and because these contained errors no prizes were awarded,

which caused a degree of aggravation among the two contenders A de Lalouvère (1600 –

1664) and John Wallis (1616 – 1703).

At about the time of this contest Huygens (1629 – 1695) designed the first pendulum

clock, which was made by Salomon Closter in 1658, but was aware that the period of the

pendulum depended upon the amplitude of the swing. It occurred to him to consider the

motion of an object sliding on an inverted cycloidal arch and he found that the object

reaches the lowest point in a time independent of the starting point. The question

that remained was how to persuade a pendulum to oscillate in a cycloidal, rather than

a circular arc. Huygens now made the remarkable discovery illustrated in figure 5.3.

If one suspends from a point P at the cusp, between two inverted cycloidal arcs P Q

and P R, then a pendulum of the same length as one of the semi-arcs will swing in a

cycloidal arc QSR which has the same size and shape as the cycloidal arcs of which P Q

and P R are parts. Such a pendulum will have a period independent of the amplitude

of the swing.

148 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

Q R

S

Figure 5.3 Diagram showing how Huygens’ cy-

cloidal pendulum, P T , swings between two fixed,

similar cycloidal arcs P R and P Q.

Huygens made a pendulum clock with cycloidal jaws, but found that in practice it

was no more accurate than an ordinary pendulum clock: his results on the cycloid

were published in 1673 when his Horologium Oscillatorium appeared1 . However, the

discovery illustrated in figure 5.3 was significant in the development of the mathematical

understanding of curves in space.

The equation of the cycloid is obtained by finding the coordinates of P , in figure 5.1,

after the circle has rolled through an angle θ, so the length of the longer circular arc P A

is aθ. Because there is no slipping, OA = P A = aθ and coordinates of the circle centre

are C = (aθ, a). The distances P B and BC are P B = −a cos θ and BC = −a sin θ and

hence the coordinates of P are

which are the parametric equations of the cycloid. For |θ| 1, x and y are related

approximately by y = (a/2)(6x/a)2/3 , see exercise 5.2. The arc OP D is traced out as

θ increases from 0 to 2π.

If, in figure 5.3 the y-axis is in the direction P S, that is pointing downwards, the

upper arc QP R, with the cusp at P is given by these equations with −π ≤ θ ≤ π and

it can be shown, see exercise 5.28, that the lower arc is described by x = a(θ + sin θ),

y = a(3 + cos θ), and the same range of θ. The following three exercises provide practice

in the manipulation of the cycloid equations; further examples are given in exercises 5.26

– 5.28.

Exercise 5.1

dy 1

Show that the gradient of the cycloid is given by = . Deduce that the

dx tan(θ/2)

cycloid intersects the x-axis perpendicularly when θ = 0 and 2π.

1 A more detailed account of Huygens’ work is given in Unrolling Time by J G Yoder (Cambridge

University Press).

5.2. THE BRACHISTOCHRONE 149

Exercise 5.2

By using the Taylor series of sin θ and cos θ show that for small |θ|, x ' aθ 3 /6

and y ' aθ 2 /2. By eliminating θ from these equations show that near the origin

y ' (a/2)(6x/a)2/3 .

Exercise 5.3

Show that the area under the arc OP D in figure 5.1 is 3πa2 and that the length

of the cycloidal arc OP is s(θ) = 8a sin2 (θ/4).

In this section we formulate the variational principle for the brachistochrone by obtain-

ing an expression for the time of passage from given points (a, A) to (b, B) along a curve

y(x).

Define a coordinate system Oxy with the y-axis vertically upwards and the origin

chosen to make a = B = 0, so the starting point, at (0, A), is on the y-axis and the

final point is on the x-axis at (b, 0), as shown in figure 5.4.

y

A

s(x)

P

x

O b

Figure 5.4 Diagram showing the curve y(x) through (0, A) and

(b, 0) on which the bead slides. Here s(x) is the distance along

the curve from the starting point to P = (x, y(x)) on it.

At a point P = (x, y(x)) on this curve let s(x) be the distance along the curve from the

starting point, so the speed of the bead is defined to be v = ds/dt. The kinetic energy

of a bead having mass m at P is 21 mv 2 and its potential energy is mgy; because the

bead is sliding without friction, energy conservation gives

1

mv 2 + mgy = E, (5.2)

2

where the energy E is given by the initial conditions, E = 21 mv02 + mgA, v0 being the

initial speed at Pa = (0, A). Small changes in s are given by δs2 = δx2 + δy 2 , and so

2 2 2 2

ds dx dy dx

= + = 1 + y 0 (x)2 . (5.3)

dt dt dt dt

Thus on rearranging equation 5.2 we obtain

2 r

ds 2E dx p 0 2

2E

= − 2gy or 1 + y (x) = − 2gy(x). (5.4)

dt m dt m

150 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

T b

1

Z Z

T = dt = dx .

0 0 dx/dt

Thus on re-arranging equation 5.4 to express dx/dt in terms of y(x) we obtain the

required functional,

Z b s

1 + y0 2

T [y] = dx . (5.5)

0 2E/m − 2gy

This functional may be put in a slightly more convenient form by noting that the energy

and the initial conditions are related by equation 5.2, so by defining the new dependent

variable

Z b s

v02 1 + z0 2

z(x) = A + − y(x) we obtain T [z] = dx . (5.6)

2g 0 2gz

Exercise 5.4

(a) Find the time, T , taken for a particle of mass m to slide down the straight

line, y = Ax, from the point (X, AX) to the origin when the initial speed is v0 .

Show that if v0 = 0 this is

r

2X p

T = 1 + A2 .

gA

(b) Show also that if the point (X, AX) lies on the circle of radius R and with

centre at (0, R), so the equation of the circle is x2 + (y − R)2 = R2 , then the time

taken to slide along the straight line to the origin is independent of X and is given

by

r

R

T =2 .

g

This surprising result was known by Galileo and seems to have been one reason

why he thought that the solution to the brachistochrone problem was a circle.

Exercise 5.5

Show that the functional defined in equation 5.6 when expressed using z as the

independent variable and if v0 = 0 becomes

r

A

1 + x0 (z)2

Z

1

T [x] = √ dz , x(0) = 0, x(A) = b,

2g 0 z

5.2. THE BRACHISTOCHRONE 151

5.2.3 A solution

The integrand of the functional 5.6 is independent of x, so we may use equation 4.13

(page 130) to write Euler’s equation in the form

r

0 ∂F 0 1 + z0 2

z 0

− F = constant where F (z, z ) = .

∂z z

Note that the external constant (2g)−1/2 can be ignored. Since

r

∂F z0 z0 2 1 + z0 2 1

= this gives − =−

∂z 0

p p

0

z(1 + z )2 0

z(1 + z ) 2 z c

for some positive constant c — note that c must be positive because the left-hand side

of the above equation is negative. Rearranging the last expression gives

r

02

2 dz c2

z 1+z = c or =± − 1. (5.7)

dx z

This first-order differential equation is separable and can be solved. First, however, note

that because the y-axis is vertically upwards we expect the solution y(x) to decrease

away from x = 0, that is z(x) will increase so we take the positive sign and then

integration gives, r

z

Z

x = dz .

c2 − z

Now substitute z = c2 sin2 φ to give

Z Z

x = 2c2 dφ sin2 φ = c2 dφ (1 − cos 2φ)

1 2 1 2

= c (2φ − sin 2φ) + d and z = c (1 − cos 2φ), (5.8)

2 2

where d is a constant. Both c and d are determined by the values of A, b and the

initial speed, v0 . Comparing these equations with equation 5.1 we see that the required

stationary curve is a cycloid. It is shown in chapter 8 that, in some cases, this solution

is a global minimum of T [z].

In the case that the particle starts from rest, v0 = 0, these solutions give

1 1

x = d + c2 (2φ − sin 2φ) , y = A − c2 (1 − cos 2φ)

2 2

where c and d are constants determined by the known end points of the curve.

At the starting point y = A so here φ = 0 and since x = 0 it follows that d = 0:

because φ(0) = 0 the particle initially falls vertically downwards. At the final point of

the curve, x = b, y = 0, let φ = φb . Then

2b 2A

= 2φb − sin 2φb , = 1 − cos 2φb ,

c2 c2

giving two equations for c and φb : we now show that these equations have a unique,

real solution. Consider the cycloid

u = 2θ − sin 2θ, v = 1 − cos 2θ, 0 ≤ θ ≤ π. (5.9)

152 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

The value of φb is given by the value of θ where this cycloid intersects the straight line

Au = bv. The graphs of these two curves are shown in the following figure.

2 v

1.5 Au=bv

cycloid

1

0.5

u

0 1 2 3 4 5 6

Figure 5.5 Graph of the cycloid defined in equation 5.9 and

the straight line bv = Au.

Because the gradient of the cycloid at θ = 0, (u = v = 0), is infinite this graph shows

that there is a single value of φb for all positive values of the ratio A/b. By dividing the

first of equations 5.9 by the second we see that φb is given by solving the equation

= , 0 < φb < π. (5.10)

2 sin2 φb A

Unless b/A is small this equation can only be solved numerically. Once φb is known,

the value of c is given from the equation 2A/c2 = 1 − cos 2φb , which may be put in the

more convenient form c2 = A/ sin2 φb .

Exercise 5.6

Show that if A b then φb ' 3b/2A and that y/A ' 1 − (x/b)2/3 .

Exercise 5.7

Use the solution defined in equation 5.8 to show that on the stationary path the

time of passage is

r

2A φb

T [z] = .

g sin φb

We end this section by showing a few graphs of the solution 5.8 and quoting some

formulae that help understand them; the rest of this section is not assessed.

In the following figure are depicted graphs of the stationary paths for A = 1 and

various values of b, ranging from small to large, so all curves start at (0, 1) but end at

the points (b, 0), with 0.1 ≤ b ≤ 4.

5.2. THE BRACHISTOCHRONE 153

1

y

b=0.1

0.5

b=π/2

x

0

1 2 3 4

b=0.5

-0.5

-1

Figure 5.6 Graphs showing the stationary paths joining the points

(0, 1) and (b, 0) for b = 0.1, 1/2, 1, π/2, 2, 3 and 4.

From figure 5.6 we see that for small b the stationary path is close to that of a straight

line, as would be expected. In this case φb is small and it was shown in exercise 5.6

that

3b 9b3 5 y x 2/3

φb = − + O(b ) and that ' 1 − .

2A 20A3 A b

Also the time of passage is

s

3b2 81b4

2A 6

T = 1+ − + O(b ) .

g 8A2 640A4

By comparison, if a particle slides down the straight line joining (0, A) to (b, 0), that is

y/A + x/b = 1, so z = Ax/b, then the time of passage is

s

b2

2A 4

1 + + O(b ) , b A,

s

g 2A2

2(A2 + b2 )

TSL = =

Ag

A2

r

2

−4

b 1 + 2 + O(b ) , b A.

Ag 2b

Thus for, small b, the relative difference is

b2

TSL − T = T + O(b4 ).

8A2

Returning to figure 5.6 we see for small b the stationary paths cross the x-axis at

the terminal point. At some critical value of b the stationary path is tangential to the

x-axis at the terminal point. We can see from the equation for x(φ) that this critical

path occurs when y 0 (φ) = 0, that is when φb = π/2 and, from equation 5.10, we see

that this gives b = Aπ/2. On this path the time of passage is

s r

π 2A 4

T = and also TSL = T 1 + 2 = 1.185T.

2 g π

For b > Aπ/2 the stationary path dips below the x-axis and approaches

p the terminal

point from below. For b Aπ/2 it can be shown that φb = π − Aπ/b + O(b−3/2 ),

154 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

b b

x' (2φ − sin 2φ), y 'A− sin2 φ,

2π π

and that s r √ 3/2 !

2πb A π A

T = 1− + +··· .

g bπ 6 b

√

Thus the time of passage increases as b, compared with the time to slide down the

straight line, which is proportional to b, for large b. Further, the stationary path reaches

its lowest point when φ = π/2, where y = A − b/π, in other words the distance it falls

below the x-axis is about 1/3 the distance it travels along it, providedp b Aπ. That

is, the particle first accelerates to a high speed, reaching

√ a speed v ' 2gb/π, before

slowing to reach the terminal point at speed v = 2gA: on the straight line path the

particle accelerates uniformly to this speed.

Exercise 5.8

Galilieo thought that the solution to the brachistrchrone problem was given by the

circle passing through the initial and final points, (0, A) and (b, 0), and tangential

to the y-axis at the start point.

Show that the equation of this circle is (x − R)2 + (y − A)2 = R2 , where R is

its radius given by 2bR = A2 + b2 . Show also that if x = R(1 − cos θ) and

y = A − R sin θ, then the time of passage is

r Z θ

R b

1 A 2Ab

T = dθ √ where sin θb = = 2 .

2g 0 sin θ R A + b2

p

If b A show that T ' 2A/g.

The problem is to find the non-negative, smooth function y(x), with given end points

y(a) = A and y(b) = B, such that the cylindrical surface formed by rotating the curve

y(x) about the x-axis has the smallest possible area. The left-hand side of the following

figure shows the construction of this surface: note that the end discs do not contribute

to the area considered.

y (b,B) y

(a,A) δs

x x

δx

Figure 5.7 Diagram showing the construction of a surface of revolution, on the left,

and, on the right, the small segment used to construct the integral 5.11.

5.3. MINIMAL SURFACE OF REVOLUTION 155

This section is divided into three parts. First, we derive the functional S[y] giving the

required area. Second, we derive the equation that a sufficiently differentiable function

must satisfy to make the functional stationary. Finally we solve this equation in a

simple case and show that even this relatively simple problem has pitfalls.

An expression for the area of this surface is obtained by first finding the area of the

edge of a thin disc of width δx, shown in the right-hand side of figure 5.7. The small

segment of the boundary curve may be approximated by a straight line provided δx is

sufficiently small, so its length, δs, is given by

p

δs = 1 + y 0 2 δx + O(δx2 ).

The area δS traced out by this segment as it rotates about the x-axis is the circumference

of the circle of radius y(x) times δs; to order δx this is.

p

δS = 2πy(x)δs = 2πy 1 + y 0 2 δx.

Hence the area of the whole surface from x = a to x = b is given by the functional

Z b p

S[y] = 2π dx y 1 + y 0 2 , y(a) = A ≥ 0, y(b) = B > 0, (5.11)

a

with no loss of generality we may assume that A ≤ B and hence that B > 0.

Exercise 5.9

Show that the equation of the straight line joining (a, A) to (b, B) is

B−A

y= (x − a) + A.

b−a

Use this together with equation 5.11 to show that the surface area of the frustum

of the cone shown in figure 5.8 is given by

p

S = π(B + A) (b − a)2 + (B − A)2 .

Note that the frustum of a solid is that part of the solid lying between two parallel

planes which cut the solid; its area does not include the area of the parallel ends.

y l

B

A x

b−a

Figure 5.8 Diagram showing the frustum of a cone, the unshaded area. The

slant-height is l and the radii of the circular ends are A and B.

Show further that this expression may be written in the form π(A + B)l where l

is the length of the slant height and A and B are the radii of the end circles.

156 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

The following exercise may seem a non-sequitur, but it illustrates two important points.

First, it shows how a simple version of Euler’s method, section 4.2, can provide a useful

approximation to a functional. Second, it shows how a very simple approximation can

capture the essential, quite complicated, behaviour of a functional: this is important

because only rarely can the Euler-Lagrange equation be solved exactly. In particular

it suggests that in the simple case A = B, with y(x) defined on |x| ≤ a, there are

stationary paths only if A/a is sufficiently large and then there are two stationary

paths.

Exercise 5.10

Consider the case A = B and with −a ≤ x ≤ a, so the functional 5.11 becomes

Z a p

S[y] = 2π dx y 1 + y 0 2 , y(±a) = A > 0.

−a

(a) Assume that the required stationary paths are even and use a variation of

Euler’s method, described in section 4.2.1, by assuming that

A−α

y(x) = α + x, 0≤x≤a

a

where α is a constant, to derive an approximation, S(α), for S[y].

(b) By differentiating

` this

√ expression

´ with respect to α show that S(α) is station-

ary if α = α± = A ± A2 − 2a2 /2, and deduce that no such solutions exist if

√

A < a 2. Note that the exact calculation, described below, shows that there are

no continuous stationary paths if A < 1.51a.

√

(c) Show that if A > a 2 the two stationary values of S satisfy S(α− ) > S(α+ )

(d) If A a show that the two values of α are given approximately are by

a2 a2 a2

„ «

α+ = A − + ··· and α− = 1+ + ··· ,

2A 2A 2A2

and find suitable approximations for the associated stationary paths. Show also

that the stationary values of S are given approximately by S(α− ) ' 2πA2 and

S(α+ ) ' 4πAa, and give a physical interpretation of these values.

The integrand of the functional 5.11 does not depend explicitly upon x, hence the first-

integral of the Euler-Lagrange equation 4.13

p (page 130) may be used. In this case we

may take the integrand to be G(y, y 0 ) = y 1 + y 0 2 so that

∂G yy 0 ∂G y

= and y 0 − G = −p .

∂y 0 0

p

1 + y0 2 ∂y 1 + y0 2

y

p = c, y(a) = A ≥ 0, y(b) = B > 0, (5.12)

1 + y0 2

5.3. MINIMAL SURFACE OF REVOLUTION 157

for some constant c; since y(b) > 0 we may assume that c is positive. By squaring and

re-arranging this equation we obtain the simpler first-order equation

p

dy y 2 − c2

=± , y(a) = A ≥ 0, y(b) = B > 0. (5.13)

dx c

The solutions of equation 5.13, if they exist, ensure that the functional 5.11 is stationary.

We shall see, however, that suitable solutions do not always exist and that when they

do further work is necessary in order to determine the nature of the stationary point.

Here we solve the first-order differential equation 5.13 when the ends of the cylinder

have the same radius, that is A = B > 0: in this case it is convenient to put b = −a,

so that the origin is at the centre of the cylinder which has length 2a. Now there

are two independent parameters, the lengths a and A; since there are no other length

scales we expect the solution to depend upon a single, dimensionless parameter, which

may be taken to be the ratio A/a. If B 6= A, there are two independent dimensionless

parameters, A/a and B/a for instance, and this makes understanding the behaviour

of the solutions more difficult. However, even the seemingly simple case A = B has

surprises in store and so provides an indication of the sort of difficulties that may

be encountered with variational problems: such difficulties are typical of nonlinear

boundary value problems. Because the following analysis involves several strands, you

will probably understand it more easily by re-writing it in your own words.

The ends have the same radius so it is convenient to introduce a symmetry by re-

defining a and putting the cylinder ends at x = ±a. This change, which is merely a

shift along the x-axis, does not affect the differential equation 5.13 (because its right-

hand side is independent of x); but the boundary conditions are slightly different. If we

denote the required solution by f (x), then, from equation 5.13 we see that it satisfies

the differential equation and boundary conditions,

p

df f 2 − c2

=± , f (−a) = f (a) = A > 0. (5.14)

dx c

The identity cosh2 z − sinh2 z = 1 suggests changing the dependent variable from f

to φ, where f = c cosh φ. This gives the simpler equation cdφ/dx = ±1 with solution

cφ = β ± x for some real constant β. Hence the general solution2 is

β±x

f (x) = c cosh .

c

The boundary conditions give

A β+a β−a β a

= cosh = cosh , that is sinh sinh = 0.

c c c c c

Since a 6= 0, the only way of satisfying this equation is to set β = 0, which gives

x a

f (x) = c cosh with c determined by A = c cosh . (5.15)

c c

2 Another solution is f (x) = c in the special case that c = A; however, this solution is not a solution

of the original Euler-Lagrange equation, see the discussion in section 4.4, in particular exercise 4.8.

158 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

Notice that f (0) = c, so c is the height of the curve at the origin, where f (x) is

stationary; also, because β = 0 the solution is even. The required solutions are obtained

by finding the real values of c satisfying this equation. Unfortunately, the equation

A = c cosh(a/c) cannot be inverted to express c in terms of known functions of A.

Numerical solutions may be found, but first it is necessary to determine those values of

a and A for which real solutions exist.

A convenient way of writing this equation is to introduce a new dimensionless vari-

able η = a/c so we may write the equation for c in the form

A 1

= g(η) where g(η) = cosh η. (5.16)

a η

This equation shows directly that η depends only upon the dimensionless ratio A/a. In

terms of η and A the solution 5.15 becomes

a x cosh (xη/a)

f (x) = cosh η =A . (5.17)

η a cosh η

The stationary solutions are found by solving the equation A/a = g(η) for η. The

graph of g(η), depicted in figure 5.9, shows that g(η) has a single minimum and that for

A/a > min(g) there are two real solutions, η1 and η2 , with η1 < η2 , giving the shapes

f1 (x) and f2 (x) respectively.

10

g(η)

8

6

4 A/a

2

η

0 η1 1 2 3 η2 4

Figure 5.9 Graph of g(η) = η −1 cosh η showing the solu-

tions of the equation g(η) = A/a.

This graph also suggests that g(η) → ∞ as η → 0 and ∞; this behaviour can be verified

with the simple analysis performed in exercise 5.12, which shows that

1 eη

g(η) ∼ for η 1 and g(η) ∼ for η 1.

η 2η

The minimum of g(η) is at the real root of η tanh η = 1, see exercise 5.13; this may be

found numerically, and is at ηm ' 1.200, and here g(ηm ) = 1.509. Hence if A < 1.509a

there are no real solutions of equation 5.16, meaning that there are no functions with

continuous derivatives making the area stationary. For A > 1.509a there are two

real solutions giving two stationary values of the functional 5.11; we denote these two

solutions by η1 and η2 with η1 < η2 . Because there is no upper bound on the area

neither solution can be a global maximum. Recall that in exercise 5.10 it was shown √

that a simple polygon approximation √ to the stationary path did not exist if A < a 2

and there were two solutions if A > a 2.

5.3. MINIMAL SURFACE OF REVOLUTION 159

The following graph shows values of the dimensionless area S/a2 for these two sta-

tionary solutions as functions of A/a when A/a ≥ g(ηm ) ' 1.509. The area associated

with the smaller root, η1 , is denoted by S1 , with S2 denoting the area associated with

η2 . These graphs show that S2 > S1 for A > ag(ηm ) ' 1.51a.

60 2

2

S/a S2 /a

50

2

40 S1 /a

30

20 A/a

1.5 1.75 2 2.25 2.5 2.75 3

Figure 5.10 Graphs showing how the dimensionless area

S/a2 varies with A/a.

It is difficult to find simple approximations for the area S[f ] except when A a, in

which case the results obtained in exercise 5.12 and 5.13 may be used, as shown in the

following analysis. We consider the smaller and larger roots separately.

If A a the smaller root, η1 is seen from figure 5.9 to be small. The approximation

developed in exercise 5.12 gives η1 ' a/A so that equation 5.17 becomes

f1 (x) ' A cosh(x/A) ' A,

since |x| ≤ a A and cosh(x/A) ' 1. Because f1 (x) is approximately constant the

original functional, equation 5.11, is easily evaluated to give

S1 A

S1 = S[f1 ] = 4πaA or = 4π .

a2 a

The latter expression is the equation of the approximately straight line seen in fig-

ure 5.10. The area S1 is that of the right circular cylinder formed by joining the ends

with parallel lines.

For the larger root, η2 , since cosh η ' eη /2, for large η, equation 5.16 for η becomes,

see exercise 5.12

A 1 η

= e (5.18)

a 2η

and η η

2

2

η2

f2 (x) ' A exp − (a − x) + A exp − (a + x) , 1.

a a a

For positive x the second term is negligible (because η2 1) provided xη2 a. For

negative x the first term is negligible, for the same reason. Hence an approximation for

f2 (x) is η

2

f2 (x) ' A exp − (a − |x|) provided |x|η2 a. (5.19)

a

The behaviour of this function as η → ∞ is discussed after equation 5.20. In exer-

cise 5.12 it is shown that the area is given by

2

S2 A

S2 = S[f2 ] ' 2πA2 or 2

= 2π ,

a a

160 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

which is the same as the area of the cylinder ends. The latter expression increases

quadratically with A/a, as seen in figure 5.10.

These approximations show directly that if A a then S2 > S1 , confirming the

conclusions drawn from figure 5.10. They also show that when A a the smallest area

is given when the surface of revolution approximates that of a right circular cylinder.

In the following three figures we show examples of these solutions for A = 2a,

A = 10a and A = 100a. In the first example, on the left, the ratio A/a = 2 is only

a little larger than min(g(η)) ' 1.509, but the two solutions differ substantially, with

f1 (x) already close to the constant value of A for all x. In the two other figures the

ratio A/a is larger and now f1 (x) is indistinguishable from the constant A, while f2 (x)

is relatively small for most values of x.

1 1 1

f1(x) /A f1(x) /A f1(x) /A

0.75 0.75 0.75

f2 (x) /A

0.25 0.25 f2 (x) /A 0.25 f2 (x) /A

x/a x/a x/a

0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1

Figure 5.11 Graphs showing the stationary solutions f (x)/A = cosh(xη/a) as a function of x/a

and for various values of A/a, with a = 1.

These figures and the preceding analysis show that when the ends are relatively close,

that is A/a large, f1 (x) ' A, for all x, and that as A/a → ∞, f2 (x) tends to the

function

0, |x| < a,

f2 (x) → fG (x) = (5.20)

A, |x| = a.

This result may be derived from the approximate solution given in equation 5.19. Con-

sider positive values of x, with xη2 a. If x = a(1 − δ), where δ is a small positive

number, then

f2 (x) ' Ae−δη2 .

But from equation 5.18 ln(A/a) = η − ln(2η) and if η 1, ln(2η) η, so η ' ln(A/a)

and the above approximation for f2 (x) becomes

f2 (x) a δ

= , x = a(1 − δ).

A A

Hence, provided δ > 0, that is x 6= a, f2 /A → 0 as A/a → ∞.

The surface defined by the limiting function fG (x) comprises two discs of radius A, a

distance 2a apart, so has area SG = 2πA2 , independent of a. Since this limiting solution

has discontinuous derivatives at x = ±a it is not an admissible function. Nevertheless

it is important because if A < ag(ηm ) ' 1.509a it can be shown that this surface gives

the global minimum of the area and, as will be seen in the next subsection, has physical

significance. This solution to the problem was first found by B C W Goldschmidt in

1831 and is now known as the Goldschmidt curve or Goldschmidt solution.

5.3. MINIMAL SURFACE OF REVOLUTION 161

5.3.4 Summary

We have considered the special case where the ends of the cylinder are at x = ±a and

each end has the same radius A; in this case the curve y = f (x) is symmetric about

x = 0 and we have obtained the following results.

1. If the radius of the ends is small by comparison to the distance between them,

A < ag(ηm ) ' 1.509a, there are no curves described by differentiable functions

making the traced out area stationary. In this case it can be shown that the

smallest area is given by the Goldschmidt solution, fG (x), defined in equation 5.20,

and that this is the global minimum.

2. If A > 1.51a there are two smooth stationary curves. One of these approaches

the Goldschmidt solution as A/a → ∞ and the other approaches the constant

function f (x) → A in this limit, and this gives the smaller area. This solution is

a local minimum of the functional, as will be shown in chapter 8.

The nature of the stationary solutions is not easy to determine. In the following graph

we show the areas S1 /a2 and S2 /a2 , as in figure 5.10 and also, with the dashed lines,

the areas given by the Goldschmidt solution, SG /a2 = 2π(A/a)2 , curve G, and the area

of the right circular cylinder, Sc /a2 = 4πA/a, curve c.

60

2

S/a 2

50 S2/a

G

40 c

30

20 S1/a

2

A/a

1.5 1.75 2 2.25 2.5 2.75 3

Figure 5.12 Graphs showing how the dimensionless area S/a2 varies

with A/a. Here the curves k, k = 1, 2, denote the area Sk /a2

as in figure 5.10; G the scaled area of the Goldschmidt curve,

SG = 2π(A/a)2 and c the scaled area of the cylinder, 4πA/a.

If A > ag(ηm ) ' 1.509a it will be shown in chapter 8 that S1 is a local minimum of

the functional. The graphs shown in figure 5.12 suggest that for large enough A/a,

S1 < SG , but for smaller values of A/a, SG < S1 . The value of η at which SG = S1 is

given by the solution of 1 + e−2η = 2η, see exercise 5.14. The numerical solution of this

equation gives η = 0.639 at which A = 1.8945a. Hence if A < 1.89a the Goldschmidt

curve yields a smaller area, even though S1 is a local minimum. For A > 1.89a, S1

gives the smallest area.

This relatively simple example of a variational problem provides some idea of the

possible complications that can arise with nonlinear boundary value problems.

Exercise 5.11

(a) If f (x) = c cosh(x/c) show that

S[f ] 2π a

= 2 (η + sinh η cosh η) , η= .

a2 η c

(b) Show that S[f ] considered as function of η is stationary at the root of η tanh η = 1.

162 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

Exercise 5.12

(a) Use the expansion cosh η = 1 + 21 η 2 + O(η 4 ) to show that, for small η, g(η) =

1/η + η/2 + O(η 3 ), where g(η) is defined in equation 5.16. Hence show that if

A a then η ' a/A and hence that c ' A and f (x) ' A. Using the result

obtained in the previous exercise, or otherwise, show that S1 = 4πAa.

(b) Show that if η2 is large the equation defining it is given approximately by

A 1 η

' e

a 2η

and, using the result obtained in the previous exercise, that

„ η «2 „ η «2

S2 e 2π e

' 2π + ' 2π , (η = η2 ).

a2 2η η 2η

Exercise 5.13

(a) Show that the position of the minimum of the function g(η) = η −1 cosh η,

η > 0, is at the real root, ηm , of η tanh η = 1.

By sketching the graphs of y = 1/η and y = tanh η, for η > 0, show that the

equation η tanh η = 1 has only one real root.

(b) If a/c = ηm and A/a = g(ηm ) use the result derived in exercise 5.11 to

show that the area of the cylinder formed is Sm = 2πA2 ηm , and that Sm /a2 =

2πηm−1

cosh2 ηm .

Exercise 5.14

Use the result derived in exercise 5.12 to show that SG = S1 when η satisfies

the equation cosh2 η = η + sinh η cosh η. Show that this equation simplifies to

1 + e−2η = 2η and that there is only one positive root, given by η = 0.639232.

Exercise 5.15

(a) Show that the functional

Z 1 p

S[y] = dx y (1 + y 0 2 ), y(−1) = y(1) = A > 0,

−1

1 ` 4 1“ p ”

4c + x2 c2 = c2± =

´

y(x) = 2

where A ± A2 − 1 .

4c 2

In the following these solutions are denoted by y± (x).

(b) Show that on these stationary paths

1

S[y] = 2c + ,

6c3

√

and deduce that when A > 1, S[y− ] > S[y+ ], and that when A = 1, S[y] = 4 2/3.

Show also that if A 1

4 √

S[y− ] ' A3/2 and S[y+ ] ' 2 A.

3

5.4. SOAP FILMS 163

8

< 0, 0 ≤ x < 1 − δ, 0 < δ 1,

yδ (x) = A

: A − (1 − x), 1 − δ ≤ x ≤ 1.

δ

Show that as δ → 0, yδ (x) → fG (x), the Goldschmidt curve defined in equa-

tion 5.20. Show also that

4 3/2

lim S [yδ ] = S[fG ] = A .

δ→0 3

An easy way of forming soap films is to dip a loop of wire into soap solution and then to

blow on it. Almost everyone will have noticed the initial flat soap film bounded by the

wire forms a segment of a sphere when blown. It transpires that there is a very close

connection between these surfaces and problems in the Calculus of Variations. The exact

physics of soap films is complicated, but a fairly simple and accurate approximation

shows that the shapes assumed by soap films are such as to minimise their areas, because

the surface-tension energy is approximately proportional to the area and equilibrium

positions are given by the minimum of this energy. Thus, in some circumstances the

shapes given by the minimum surface of revolution, described above, are those assumed

by soap films.

The study of the formation and shapes of soap films has a very distinguished pedi-

gree: Newton, Young, Laplace, Euler, Gauss, Poisson are some of the eminent scientists

and mathematicians who have studied the subject. Here we cannot do the subject jus-

tice, but the interested reader should obtain a copy of Isenberg’s fascinating book 3 .

The essential property is that a stable soap film is formed in the shape of a surface of

minimal area that is consistent with a wire boundary.

Probably the simplest example is that of a soap film supported by a circular loop of

wire. If we distort it by blowing on it gently to form a portion of a sphere, when we stop

blowing the surface returns to its previous shape, that is a circular disc. Essentially this

is because in each case the surface-tension energy, which is proportional to the area, is

smallest in the assumed configuration.

Imagine a framework comprising two identical circular wires of radius A, held a

distance 2a apart (like wheels on an axle), as in figure 5.13 below. What shape soap

film can such a frame support? These figures illustrate the alternatives suggested by

the analysis of the previous section and agree qualitatively with the solutions one would

intuitively expect.

The left-hand configuration (large separation), with two distinct surfaces, is the

Goldschmidt solution, equation 5.20, and it gives an absolute minimum area if A <

1.89a. The shape on the right is a catenoid of revolution and represents the absolute

minimum if A > 1.89a. It is a local minimum if 1.51a < A < 1.89a and does not exist

if A < 1.51a. When 1.51a < A < 1.89a the catenoid is unstable and we have only

to disturb it slightly, by blowing on it for instance, and it may suddenly jump to the

Goldschmidt solution which has a smaller area, as seen in figure 5.12.

3 The Science of Soap Films and Soap Bubbles, by C Isenberg (Dover 1992).

164 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

A A

2a 2a

Figure 5.13 Diagrams showing two configurations assumed by soap films on two rings of radius

A and a distance 2a appart. On the left, 1.89a > A, the soap film simply fills the two circular

wires because they are too far apart: this is the Goldschmidt solution, equation 5.20. On the right

1.51a < A the soap film joins the two rings in the shape defined by equation 5.17 with η = η 1 .

The methods discussed previously provide the shape of the right-hand film, but the

matter of determining whether these stationary positions are extrema, local or global,

is of a different order of difficulty. The complexity of this physical problem is further

compounded when one realises that there can be minimum energy solutions of a quite

unexpected form. The following diagram illustrates a possible configuration of this kind.

We do not expect the theory described in the previous section to find such a solution

because the mathematical formulation of the physical problem makes no allowance for

this type of behaviour.

2a

Figure 5.14 Diagram showing a possible soap film. In this example a circular

film, perpendicular to the axis, is formed in the centre and this is joined to

both outer rings by a catenary.

The relationship between soap films and some problems in the Calculus of Variations

can certainly add to our intuitive understanding, but this example should provide a

salutary warning against dependence on intuition.

Examples of the complex shapes that soap films can form, but which are difficult

to describe mathematically, are produced by dipping a wire frame into a soap solution.

Photographs of the varied shapes obtained by cubes and tetrahedrons are provided in

Isenberg’s book.

Here we describe a conceptually simple problem which is difficult to deal with math-

ematically, but which helps to understand the difficulties that may be encountered with

certain variational problems. Further, this example has potential practical applications.

Consider the soap film formed between two clear, parallel planes joined by a number

of pins, of negligible diameter, perpendicular to the planes. When dipped into a soap

5.4. SOAP FILMS 165

solution the resulting film will join the pins in such a manner as to minimise the length

of film, because the surface tension energy is proportional to the area, which is propor-

tional to the length of film. In figure 5.15 we show three cases, viewed from above, with

two and three pins.

In panel A there are two pins: the natural shape for the soap films is the straight line

joining them. In panels B and C there are three pins and two different configurations

are shown which, it transpires, are the only two allowed; but which of the pair is actually

assumed depends upon the relative positions of the pins.

A B C

of soap films for two and three pins.

The reason for this follows from elementary geometry and the application of one of

Plateau’s (1801 – 1883)4 three geometric rules governing the shapes of soap films, which

he inferred from his experiments. In the present context the relevant rule is that three

intersecting planes meet at equal angles of 120◦ : this is a consequence of the surface

tension forces in each plane being equal. Plateau’s other two rules are given by Isenberg

(1992, pages 83 – 4).

We can see how this works, and some of the consequences for certain problems in

the Calculus of Variations, by fixing two points, a and b, and allowing the position of

the third point to vary. The crucial mathematical result needed is Proposition 20 of

Euclid5 , described next.

C

Euclid: proposition 20

The angle subtended by a chord AB at the centre of α

the circle, at O, is twice the angle subtended at any

O

point C on the circumference of the circle, as shown

in the figure. This is proved using the properties of 2α

similar triangles. A B

With this result in mind draw a circle through the points a and b such that the angle

subtended by ab on the circumference is 120◦ , figures √ 5.16 and 5.17. If L is the distance

between a and b the radius of this circle is R = L/ 3. The orientation of this circle is

chosen so the third point is on the same side of the line ab as the 120◦ angle.

Then for any point c outside this circle the shortest set of lines is obtained by joining

c to the centre of the circle, O, and if c0 is the point where this line intersects the circle,

4 Joseph Plateau was a Belgian physicist who made extensive studies of the surface properties of

fluids.

5 See Euclid’s Elements, Book III.

166 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

see figure 5.16, the lines cc0 , ac0 and c0 b are the shortest set of lines joining the three

points a, b and c.

c

c’ c

a 120

o

b a >120

o b

O

Figure 5.16 Diagram of the shortest Figure 5.17 Diagram of the shortest

length for a point c outside the circle.The length for a point c inside the circle.

point O is the centre of the circle.

If the third point c is inside this circle the shortest line joining the points comprises

the two straight line segments ac and cb, as shown in figure 5.17. This result can be

proved, see Isenberg (1992, pages 67 – 73) and also exercise 5.16.

As the point c moves radially from outside to inside the circle the shortest config-

uration changes its nature: this type of behaviour is generally difficult to predict and

may cause problems in the conventional theory of the Calculus of Variations.

If more pins join the parallel planes the soap film will form configurations making

the total length a local minimum; there are usually several different minimum configu-

rations, and which is found depends upon a variety of factors, such as the orientation of

the planes when extracted from the soap solution. The problem of minimising the total

length of a path joining n points in a plane was first investigated by the Swiss mathe-

matician Steiner (1796 – 1863) and such problems are now known as Steiner problems.

The mathematical analysis of such problems is difficult. One physical manifestation of

this type of situation is the laying of pipes between a number of centres, where, all else

being equal, the shortest total length of pipe is desirable.

Exercise 5.16

Consider the three points, O, A and C, in the Cartesian plane with coordinates

O = (0, 0), A = (a, 0) and C = (c, d) and where the angle OAC is less than 120◦ .

Consider a point X, with coordinates (x, y) inside the triangle OAC. Show that

the sum of the lengths OX, AX and CX is stationary and is a minimum when

the angles between the three lines are all equal to 120◦ .

Exercise 5.17

Consider the case where four pins are situated at the corners of a square with side

of length L.

(a) One possible configuration of the soap films is for them to lie along the two

√

diagonals, to form the cross . Show that the length of the films is 2 2 L = 2.83L.

(b) Another configuration is the ‘H’-shape, . Show that the length of film is 3L.

(c) Another possible configuration is, , where the angle√ between three intersect-

ing lines is 120◦ . Show that the length of film is (1 + 3)L = 2.73L.

5.4. SOAP FILMS 167

Exercise 5.18

aL, a>1

Consider the configuration of four pins forming a rectangle

with sides of length L and aL. L

(a) For the case shown in the√top panel, a > 1, show that

a<1

total line length is d1 = L(a + 3) and √

that for the case in the

bottom panel, a < 1, it is d2 = L(1 + a 3). L

(b) Show that the minimum of these two lengths is d1 if a > 1

and d2 if a < 1.

168 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

Exercise 5.19

Show that the Euler-Lagrange equation for the minimal surface of revolution on

the interval 0 ≤ x ≤ a with the boundary conditions y(0) = 0, y(a) = A > 0, has

no solution.

Note that in this case the only solution is the Goldschmidt curve, equation 5.20,

page 160.

Exercise 5.20

Show that the functional giving the distance between two points on a sphere of

radius r, labelled by the spherical polar coordinates (θa , φa ) and (θb , φb ) can be

expressed in either of the forms

Z θb q Z φb q

S=r dθ 1 + φ0 (θ)2 sin2 θ or S = r dφ θ0 (φ)2 + sin2 θ

θa φa

q

φ0 sin2 θ = c 1 + φ0 (θ)2 sin2 θ, φ(θa ) = φa , φ(θb ) = φb ,

where c is a constant, and

θ00 sin θ − 2θ 0 2 cos θ − sin2 θ cos θ = 0, θ(φa ) = θa , θ(φb ) = θb .

Both these equations can be solved, but this task is made easier with a sensible

choice of orientation. The two obvious choices are:

(a) put the initial point at the north pole, so θa = 0 and φa is undefined, and

(b) put both points on the equator, so θa = θb = π/2, and we may also choose

φa = 0.

Using one of these choices show that the stationary paths are great circles.

Exercise 5.21

Consider the minimal surface problem with end points Pa = (0, A) and Pb = (b, B),

where b, A and B are given and A ≤ B.

(a) Show that the general solution of the appropriate Euler-Lagrange equation is

“α − x”

y = c cosh ,

c

where α and c are real constants with c > 0. Show that if c = bη the boundary

conditions give the following equation for η

q

2

B = f (η) where f (x) = A cosh(1/x) − A − x2 sinh(1/x)

and A = A/b, B = B/b, with 0 ≤ η ≤ A.

(b) Show that for x A and x ' A the function f (x) behaves, respectively, as

x2 1/x

q

f (x) ' e and f (x) ' A cosh(1/A) − 2A(A − x) sinh(1/A).

4A

Deduce that f (x) has at least one minimum in the interval 0 < x < A and that

the equation B = f (η) has at least two roots for sufficiently large values of B and

none for small B.

5.5. MISCELLANEOUS EXERCISES 169

(c) If A 1 show „that« the minimum value of f (x) occurs near x = A and that

1 1

min(f ) ' A exp . Deduce that if A 1 there are two solutions of the

2 A „ «

1 1

Euler-Lagrange equation if B > A exp , approximately, otherwise there are

2 A

no solutions.

Exercise 5.22

(a) For the brachistochrone problem suppose that the initial and final points of

the curve are (x, y) = (0, A) and (b, 0), respectively, as in the text, but that the

initial speed, v0 , is not zero.

Show that the parametric equations for the stationary path are

1 2 v02

x=d+ c (2φ − sin 2φ), z = c2 sin2 φ, y = A+ − z,

2 2g

where φ0 ≤ φ ≤ φb , for some constants c, d, φ0 and φb . Show that these four

constants are related by the equations

v02

sin2 φ0 = k2 sin2 φb , k2 = < 1,

v02 + 2gA

v02 “ ”

b = 2

(2φ b − sin 2φ b ) − (2φ 0 − sin 2φ0 ) ,

4gk2 sin φb

v2

c2 sin2 φb = A+ 0.

2g

(b) If v02 Ag, show that k is small and find an approximate solution for these

equations. Note, this last part is technically demanding.

Exercise 5.23

In this exercise you will show that the cycloid is a local minimum for the brachis-

tochrone problem using the functional found in exercise 5.5. Consider √ the varied

path x(z) + h(z) and show that (ignoring the irrelevant factor 1/ 2g )

2 A h0 (z)2

Z

T [x + h] − T [x] = dz √ + O(3 ),

2 0 z(1 + x0 2 )3/2

Z φA

= 2 c dφ h0 (z)2 cos4 φ,

0

1 2

2

c (2φ−sin 2φ) and where A = c2 sin2 φA . Deduce that T [x+h] > T [x], for || > 0

and all h(x), and hence that the stationary path is actually a local minimum.

Exercise 5.24

The Oxy-plane is vertical with the Oy-axis vertically upwards. A straight line

is drawn from the origin to the point P with coordinates (x, f (x)), for some

differentiable function f (x). Show that the time taken for a particle to slide

smoothly from P to the origin is

s

x2 + f (x)2

T (x) = 2 .

2gf (x)

170 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

By forming a differential equation for f (x), and solving it, show that T (x) is

independent of x if f satisfies the equation x2 +(f −α)2 = α2 , for some constant α.

Describe the shape of the curve defined by this equation.

Exercise 5.25

A cylindrical shell of negligible thickness is formed by rotating the curve y(x),

a ≤ x ≤ b, about the x-axis. If the material is uniform with density ρ the moment

of inertia about the x-axis is given by the functional

Z b

dx y 3 1 + y 0 2 , y(a) = A, y(b) = B

p

I[y] = πρ

a

where A and B are the radii of the ends and are given.

(a) In the case A = B and with the end points at x = ±a show that I[y] is

stationary on the curve y = c cosh φ(x) where φ(x) is given implicitly by

Z φ

x 1

= dv p ,

c 0 1 + cosh v + cosh4 v

2

solution of the equation

Z z

a 1 1

= f (φa ) where f (z) = dv p .

A cosh z 0 1 + cosh v + cosh4 v

2

8 z 3

< √ + O(z ),

>

3

f (z) ' −z

Z ∞

1

: 2βe ,

> β= dv p .

0 1 + cosh v + cosh4 v

2

Hence show that for a/A 1 there are two solutions. Show, also that there

is a critical value of a/A above which there are no appropriate solutions of the

Euler-Lagrange equation.

Problems on cycloids

Exercise 5.26

The cycloid OP D of figure 5.1 (page 146) is rotated about the x-axis to form a

solid of revolution. Show that the surface area, S, and volume, V , of this solid are

Z 2π Z 2π

ds dx

S = 2π dθ y V = π dθ y 2

0Z dθ 0Z dθ

2π 2π

2 3

= 4πa dθ (1 − cos θ) sin(θ/2) = πa dθ (1 − cos θ)3

0 0

64 2

= πa = 5π 2 a3 .

3

Exercise 5.27

The half cycloid with parametric equations x = a(φ − sin φ), y = a(1 − cos φ) with

0 ≤ φ ≤ θ ≤ π is rotated about the y-axis to form a container.

5.5. MISCELLANEOUS EXERCISES 171

(a) Show that the surface area, S(θ), and volume, V (θ), are given by

Z θ “ ”

S(θ) = 4πa2 dφ φ − sin φ sin(φ/2),

0

Z θ “ ”2

3

V (θ) = πa dφ φ − sin φ sin φ.

0

2π 2/3 1/3 5/3 π 2/3 1/3 8/3

S(x) = 6 a x + O(x7/3 ) and V (x) = 6 a x + O(x10/3 ).

5 8

(c) Find the general expressions for S(θ) and V (θ) and their values at θ = π.

Exercise 5.28

This exercise shows that the arc QST in figure 5.3, (page 148) is a cycloid, a

result discovered by Huygens and used in his attempt to construct a pendulum

with period independent of its amplitude for use in a clock.

Consider the situation shown in figure 5.18, where the arcs ABO and OCD are

cycloids defined parametrically by the equations

x = a(φ − sin φ), y = a(1 − cos φ), −2π ≤ φ ≤ 2π,

where B and C are at the points φ = ±π, respectively.

A O x

D

B C

θ R

y

Figure 5.18

The curve OQR has length l, is wrapped round the cycloid along OQ, is a straight

line between Q and R and is tangential to the cycloid at Q.

(a) If the point Q has the coordinates

xQ = a(φ − sin φ) and yQ = a(1 − cos φ)

show that the angle θ between QR and the x-axis is given by θ = (π − φ)/2.

(b) Show that the coordinates of the point R are

xR = xQ + (l − s(φ)) sin(φ/2) and yR = yQ + (l − s(φ)) cos(φ/2),

where s(φ) is the arc length OQ.

(c) If the length of OQR is the same as the length of OQC show that

xR = a(φ + sin φ) and yR = a(3 + cos φ).

172 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION

Chapter 6

Further theoretical

developments

6.1 Introduction

In this chapter we continue the development of the general theory, by first considering

the effects of changing variables and then by introducing functionals with several de-

pendent variables. The chapter ends with a discussion of whether any second-order dif-

ferential equation can be expressed as an Euler-Lagrange equations and hence whether

its solutions are stationary paths of a functional.

The motivation for changing variables is simply that most problems can be simpli-

fied by a judicious choice of variables, both dependent and independent. If a set of

differential equations can be derived from a functional it transpires that changing vari-

ables in the functional is easier than the equivalent change to the differential equations,

because the order of differentiation is always smaller.

The introduction of two or more dependent variables is needed when stationary paths

are described parametrically — an idea introduced in chapter 9. Another very important

use, however, is in the reformulation of Newton’s laws as a variational principle, an

important topic we have no room for in this course.

In this section we consider the effect of changing both the dependent and independent

variables and show that the form of the Euler-Lagrange equation remains unchanged,

an important property first noted by Euler in 1744. This technique is useful because

one of the principal methods of solving differential equations is to change variables with

the aim of converting it to a standard, recognisable form. For instance the unfamiliar

equation

d2 y dy

z + (1 − a) + a2 z 2a−1 y = 0 (6.1)

dz 2 dz

173

174 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

d2 y

+ y = 0.

dx2

It is rarely easy to find suitable new variables, but if the equation can be derived from

a variational principle the task is usually made easier because the algebra is simpler:

you will see why in exercise 6.1 where the above example is treated with a = 2.

We start with functionals having only one dependent variable, but the full power of

this technique becomes apparent mainly in the advanced study of dynamical systems

which cannot be dealt with here.

The easiest way of understanding why the form of the Euler-Lagrange equation is invari-

ant under a coordinate change is to examine the effect of changing only the independent

variable x. Thus for the functional

Z b

S[y] = dx F (x, y(x), y 0 (x)) (6.2)

a

function g(u), assumed to be monotonic so the inverse exists. With this change of

variable y(x) becomes a function of u and it is convenient to define Y (u) = y(g(u)).

Then the chain rule gives

dy dy du Y 0 (u) Y 0 (u)

= = = 0 ,

dx du dx dx/du g (u)

d

Y 0 (u)

Z

0

S[Y ] = du g (u)F g(u), Y (u), 0 , (6.3)

c g (u)

with the integration limits, c and d, defined implicitly by the equations a = g(c) and

b = g(d).

The integrand of the original functional depends upon x, y(x) and y 0 (x). The

integrand of the transformed functional depends upon u, Y (u) and Y 0 (u), so if we

define

Y 0 (u)

0 0

F(u, Y (u), Y (u)) = g (u)F g(u), Y (u), 0 , (6.4)

g (u)

the functional can be written as

Z d

S[Y ] = du F(u, Y (u), Y 0 (u)). (6.5)

c

d ∂F ∂F

0

− =0 (6.6)

du ∂Y ∂Y

6.2. INVARIANCE OF THE EULER-LAGRANGE EQUATION 175

d ∂F ∂F

0

− = 0. (6.7)

dx ∂y ∂y

These two equations have the same form, in the sense that the formula 6.6 is obtained

from 6.7 by replacing the explicit occurrences of x, y, y 0 and F by u, Y , Y 0 and F

respectively. The new second-order differential equation for Y , obtained from 6.6 is,

however, normally quite different from the equation for y derived from 6.7, because F

and F have different functional forms.

A simple example is the functional

Z 2

y0 2

S[y] = dx 2 , y(1) = 1, y(2) = 2,

1 x

which is similar to the example dealt with in exercise 4.22(c) (page 139). The general

solution to the Euler-Lagrange is y(x) = β + αx3 and the boundary conditions give

β = 6/7 and α = 1/7.

Now make the transformation x = ua , for some constant a: the chain rule gives

dy dy du Y 0 (u)

= = a−1 where Y (u) = y(ua )

dx du dx au

and the functional becomes

Z 21/a 2 Z 1/a

1 2

0

1 Y (u) Y 0 (u)2

S[Y ] = a du ua−1 2a = du .

1 u aua−1 a 1 u3a−1

Choosing 3a = 1, simplifies this functional to

Z 8

S[Y ] = 3 du Y 0 2 , Y (1) = 1, Y (8) = 2.

1

The Euler-Lagrange equation for this functional is Y 00 (u) = 0, having the general so-

lution Y = C + Du. The boundary conditions give C + 8D = 2 and C + D = 1 and

hence

1 1

6 + x3 .

Y (u) = (6 + u) giving y(x) = Y (u(x)) =

7 7

In this example little was gained, because the Euler-Lagrange equation is equally easily

solved in either representation. This is not always the case as the next exercise shows.

Exercise 6.1 Z X

dx y 0 2 − ω 2 y 2 , where ω is a constant, gives rise to

` ´

The functional S[y] =

0

the Euler-Lagrange equation y 00 + ω 2 y = 0.

(a) Show that changing the independent variable to z where x = z 2 gives the

functional

1 Z √

Z „ 0 2 «

y (z)

S[y] = dz − 4ω 2 zy 2 , Z = X,

2 0 z

with the associated Euler-Lagrange equation

d2 y dy

z− + 4ω 2 z 3 y = 0.

dz 2 dz

Show that this is the same as equation 6.1 when a = 2 and ω = 1.

176 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

d2 y

„ 2 «

1 d y dy

= z −

dx2 4z 3 dz 2 dz

and hence derive the above Euler-Lagrange equation directly.

Note that the first method requires only that we compute dy/dx and avoids the

need to calculate the more difficult second derivative, d2 y/dx2 , required by the

second method. This is why it is normally easier to transform the functional rather

than the differential equation.

Exercise 6.2

A simpler type of transformation involves a change of the dependent variable.

Consider the functional Z b

S[y] = dx y 0 2 .

a

(a) Show that the associated Euler-Lagrange equation is y 00 (x) = 0.

(b) Define a new variable z related to y by the differentiable monotonic function

y = G(z) and show that the functional becomes

Z b

S[z] = dx G0 (z)2 z 0 2 .

a

Exercise 6.3

Show that if y = G(z), where G(z) is a differentiable function, the functional

Z b Z b

S[y] = dx F (x, y, y 0 ) transforms to S[z] = dx F (x, G(z), G0 (z)z 0 )

a a

„ «

d 0 ∂F ∂F 0 ∂F 00 dz

G (z) 0 − G (z) − G (z) = 0.

dx ∂y ∂y ∂y 0 dx

In the previous section it was seen that when changing the independent variable the

algebra is simpler if the transformation is made to the functional rather than the associ-

ated Euler-Lagrange equation because changing the functional involves only first-order

derivatives, recall exercise 6.1.

For the same reason it is far easier to apply more general transformations to the

functional than to the Euler-Lagrange equation. The most general transformation we

need to consider will be between the Cartesian coordinates (x, y) and two new variables

(u, v): such transformations are defined by two equations

6.2. INVARIANCE OF THE EULER-LAGRANGE EQUATION 177

which we assume take each point (u, v) to a unique point (x, y) and vice-versa, so the

Jacobian determinant of the transformation, equation 1.26 (page 30), is not zero in the

relevant ranges of u and v.

Before dealing with the general case we illustrate the technique using the particular

example in which (u, v) are polar coordinates, which highlights all relevant aspects of

the analysis.

The Cartesian coordinates (x, y) are defined in terms of the plane polar coordinates

(r, θ) by

x = r cos θ, y = r sin θ, r ≥ 0, −π < θ ≤ π. (6.8)

The inverse transformation is (for r 6= 0),

y

r 2 = x2 + y 2 , tan θ =

, (6.9)

x

where the signs of x and y need to be taken into account when inverting the tan function.

At the origin r = 0, but θ is undefined. In Cartesian coordinates we normally choose x

to be the independent variable, so points on the curve C joining the points (a, A) and

(b, B), figure 6.1 below, are given by the Cartesian coordinates (x, y(x)).

y

C (x,y)

B

r rb

A θb

θ x

a b

Figure 6.1 Diagram showing the relation between the Cartesian and polar

representations of a curve joining (a, A) and (b, B).

and then the curve is defined by the polar coordinates (r(θ), θ).

The aim is to transform a functional

Z b

S[y] = dx F (x, y(x), y 0 (x)) (6.10)

a

to an integral over θ in which y(x) and y 0 (x) are replaced by expressions involving θ,

r(θ) and r0 (θ). First we change to the new independent variable θ: then since x = r cos θ

and y = r sin θ we have

Z θb

dx

S[r] = dθ F (r cos θ, r sin θ, y 0 (x)) . (6.11)

θa dθ

The differential dx/dθ is obtained from the relation x = r cos θ using the chain rule and

remembering that r depends upon θ,

dx dr dy dr

= cos θ − r sin θ and similarly = sin θ + r cos θ. (6.12)

dθ dθ dθ dθ

178 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

It remains only to express y 0 (x) in terms of r, θ and r 0 , and this given by the relation

dy dy dθ dy . dx r0 sin θ + r cos θ

= = = 0 , (6.13)

dx dθ dx dθ dθ r cos θ − r sin θ

where r is assumed to depend upon θ. Hence the functional transforms to

Z θb

S[r] = dθ F(θ, r, r0 ), (6.14)

θa

where

r0 sin θ + r cos θ

0

F = (r cos θ − r sin θ) F r cos θ, r sin θ, 0 . (6.15)

r cos θ − r sin θ

The new functional depends only upon θ, r(θ) and the first derivative r 0 (θ), so the

Euler-Lagrange equation is

d ∂F ∂F

0

− =0 (6.16)

dθ ∂r ∂r

which is the transformed version of

d ∂F ∂F

− = 0. (6.17)

dx ∂y 0 ∂y

This analysis shows that the transformation to polar coordinates keeps the form of

Euler’s equation invariant because the transformation of the functional introduces only

first derivatives, via equations 6.12 and 6.13, so does not alter the derivation of the

Euler-Lagrange equation. The same transformation applied to the Euler-Lagrange

equation 6.17 involves finding a suitable expression for the second derivative, y 00 (x),

which is harder.

Exercise 6.4

The integrand of the functional 6.14 contains the denominator r 0 cos θ − r sin θ.

Why can we assume that this is not zero?

Exercise 6.5

Show that if r is taken to be the independent variable the functional 6.10 becomes

Z rb

sin θ + rθ 0 cos θ

„ «

dr F(r, θ, θ 0 ) where F = cos θ − rθ 0 sin θ F r cos θ, r sin θ,

` ´

S[θ] = .

ra cos θ − rθ 0 sin θ

Exercise 6.6

(a) Show that the Euler-Lagrange equation for the functional

θb «2

d2 r

Z „ „ «

dr d 1 dr

−r2 = 0

p

S[r] = dθ r2 + r0 (θ)2 is r −2 or r = 1.

θa dθ2 dθ dθ r2 dθ

(b) Show that the general solution of this equation is r = 1/(A cos θ + B sin θ), for

constants A and B.

6.2. INVARIANCE OF THE EULER-LAGRANGE EQUATION 179

dθ xy 0 − y dr (yy 0 + x)r

(c) By showing that = 2 2

and = , where (x, y) are the

dx x +y dθ xy 0 − y

Cartesian coordinates, show that this functional becomes

Z b p

S[y] = dx 1 + y 0 (x)2 .

a

(d) If the boundary conditions in the Cartesian plane are (x, y) = (a, a) and

(b, b + ), b > a and > 0 show that in each representation the stationary path is

„ «

a a

y= 1+ x− and r = .

b−a b−a (b − a + ) cos θ − (b − a) sin θ

Consider the limit → 0 and explain why the polar equation fails in this limit.

This example illustrates that simplification can sometimes occur when suitable transfor-

mations are made: the art is to find such transformations. The last part of exercise 6.6

also shows that representations that are undefined at isolated points can cause diffi-

culties. In this case a problem is created because polar coordinates are not unique at

the origin, where θ is undefined. The same problems occur when using spherical polar

coordinates at the north and south poles, where the azimuthal angle is undefined.

Exercise 6.7

Show that in polar coordinates the functional

Z b p p Z θb p

S[y] = dx x2 + y 2 1 + y 0 (x)2 becomes S[r] = dθ r r2 + r0 (θ)2

a θa

„ «2

d2 r d2

„ «

3 dr 1 4

− − 2r = 0 which can be written as + = 0.

dθ2 r dθ dθ2 r2 r2

Hence show that equations for the stationary paths are

1

= A cos 2θ + B sin 2θ or A(x2 − y 2 ) + 2Bxy = 1,

r2

where A and B are constants and 0 ≤ θ < π.

The analysis for the general transformation

is very similar to the special case dealt with above and, as in that case (see exercise 6.6),

it is necessary that the transformation is invertible, so that the Jacobian determinant,

equation 1.26 (page 30) is not zero,

∂f ∂f

∂(f, g) ∂u ∂v

= 6= 0.

∂(u, v) ∂g ∂g

∂u ∂v

180 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

If the admissible curves are denoted by y(x) and v(u) in the two representations, with

a ≤ x ≤ b and c ≤ u ≤ d, then the functional

Z b Z d

0

S[y] = dx F (x, y, y ) transforms to S[v] = du F(u, v, v 0 ), (6.18)

a c

where

gu + g v v 0

F = (fu + fv v 0 ) F f, g, . (6.19)

fu + f v v 0

This result follows because the chain rule gives

dx dv dy dv

= fu + fv and = gu + gv .

du du du du

In the (u, v)-coordinate system the stationary path is given by the Euler-Lagrange

equation

d ∂F ∂F

− = 0.

du ∂v 0 ∂v

Exercise 6.8

Consider the elementary functional

Z b

S[y] = dx F (y 0 ), y(a) = A, y(b) = B.

a

If the roles of the dependent and independent variables are interchanged, by noting

that y 0 (x) = 1/x0 (y), show that the functional becomes

Z B

S[x] = dy G(x0 ) where G(u) = uF (1/u).

A

Exercise 6.9

Consider the functional

Z „ «

1

S[y] = dx A(x)y 0 2 + B(x, y) ,

2

that the boundary conditions play no role in this question, so are omitted.

(a) If a new independent variable, u, is defined by the relation x = f (u), where

f (u) is a differentiable, monotonic increasing function, show that with an appro-

priate choice of f the functional can be written in the form

Z „ «

1 0 2

S[y] = du y (u) + AB .

2

d2 y dy d2 y

x − − 4x3 y = 8x3 can be converted to − 4y = 8,

dx2 dx du2

with a suitable choice of the variable u.

6.3. FUNCTIONALS WITH MANY DEPENDENT VARIABLES 181

Exercise 6.10 2

y0 2

Z

Consider the functional S[y] = dx , y(1) = A, y(2) = B, where A and B

1 x2

are both positive.

(a) Using the fact that y 0 (x) = 1/x0 (y) show that if y is used as the independent

variable the functional becomes

Z B

1

S[x] = dy 2 0 , x(A) = 1, x(B) = 2.

A x x (y)

(b) Show that the Euler-Lagrange equation for the functional S[x] is

„ «2

d2 x

„ «

d 1 2 2 dx

− = 0 which can be written as + = 0.

dy x2 x0 2 x3 x0 dy 2 x dy

„ «

1 d 2 dx

x = 0, x(A) = 1, x(B) = 2,

x2 dy dy

and integrating twice show that the stationary path is x3 = (7y +B −8A)/(B −A).

6.3.1 Introduction

In chapter 4 we considered functionals of the type

Z b

S[y] = dx F (x, y, y 0 ), y(a) = A, y(b) = B, (6.20)

a

which involve one independent variable, x, and a single dependent variable, y(x), and

its first derivative. There are many useful and important extensions to this type of

functional and in this chapter we discuss one of these — the first in the following list —

which is important in the study of dynamical systems and when representing stationary

paths in parametric form, an idea introduced in chapter 9. Before proceeding we list

other important generalisations in order to provide you with some idea of the types of

problems that can be tackled: some are treated in later chapters.

(i) The integrand of the functional 6.20 depends upon the independent variable x and

a single dependent variable y(x), which is determined by the requirement that S[y]

be stationary. A simple generalisation is to integrands that depend upon several

dependent variables yk (x), k = 1, 2, · · · , n, and their first derivatives. This type

of functional is studied later in this section.

(ii) The integrand of 6.20 depends upon y(x) and its first derivative. Another simple

generalisation involves functionals depending upon second or higher derivatives.

Some examples of this type are treated in exercises 4.32, 4.33 (page 143) and 7.12.

The elastic theory of stiff beams and membranes requires functionals containing

the second derivative which represents the bending energy and some examples are

described in chapter 10.

182 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

Lagrange equation with a piecewise continuous first derivative. A simple example

of such a solution is the Goldschmidt curve defined by equation 5.20, (page 160).

That such solutions are important is clear, partly because they occur in the rela-

tively simple case of the surface of minimum revolution and also from observations

of soap films that often comprise spherical segments such that across common

boundaries the normal to the surface changes direction discontinuously. We con-

sider broken extremals in chapter 10.

(iv) In all examples so far considered the end points of the curve have been fixed.

However, there are variational problems where the ends of the path are free to

move on given curves: an example of this type of problem is described at the end

of section 3.5.1, equation 3.22. The general theory is considered in chapter 10.

(v) The integral defining the functional may be over a surface, S, rather than along

a line,

∂y ∂y

ZZ

J[y] = dx1 dx2 F x1 , x2 , y, ,

S ∂x1 ∂x2

where S is a region in the (x1 , x2 )-plane, so the functional depends upon two inde-

pendent variables, (x1 , x2 ), rather than just one. In this case the Euler-Lagrange

equation is a partial differential equation. Many of the standard equations of

mathematical physics can be derived from such functionals. There is, of course, a

natural extension to integrals over higher-dimensional spaces; such problems are

not considered in this course.

First we find the necessary conditions for a functional depending on two functions to

be stationary. We are ultimately interested in functionals depending upon any finite

number of variables, so we shall often use a notation for which this further generalisation

becomes almost trivial.

If the two dependent variables are (y1 (x), y2 (x)) and the single independent variable

is x, the functional is

Z b

S[y1 , y2 ] = dx F (x, y1 , y2 , y10 , y20 ) (6.21)

a

We require functions (y1 (x), y2 (x)) that make this functional stationary and proceed in

the same manner as before. Let y1 (x) and y2 (x) be two admissible functions — that

is, functions having continuous first derivatives and satisfying the boundary conditions

— and use the Gâteaux differential of S[y1 , y2 ] to calculate its rate of change. This is

d

∆S[y1 , y2 , h1 , h2 ] = S[y1 + h1 , y2 + h2 ] , (6.23)

d =0

6.3. FUNCTIONALS WITH MANY DEPENDENT VARIABLES 183

where yk (x) + hk (x), k = 1, 2, are also admissible functions, which means that hk (a) =

hk (b) = 0, k = 1, 2. As in equation 4.8 (page 129) we have

b

d

Z

∆S = dx F (x, y1 + h1 , y2 + h2 , y10 + h01 , y20 + h02 )

a d =0

dF ∂F ∂F ∂F ∂F

= h1 + 0 h01 + h2 + 0 h02 .

d =0 ∂y1 ∂y1 ∂y2 ∂y2

Z b

∂F ∂F ∂F ∂F

∆S = dx h1 + 0 h01 + h2 + 0 h02 . (6.24)

a ∂y1 ∂y1 ∂y2 ∂y2

For a stationary path we need, by definition (chapter 4, page 125), ∆S = 0 for all

h1 (x) and h2 (x). An allowed subset of variations is obtained by setting h2 (x) = 0,

then the above equation becomes the same as equation 4.9, (page 129), with y and h

replaced by y1 and h1 respectively. Hence we may use the same analysis to obtain the

second-order differential equation

d ∂F ∂F

0 − = 0, y1 (a) = A1 , y1 (b) = B1 . (6.25)

dx ∂y1 ∂y1

This equation looks the same equation 4.11, (page 130), but remember that here F also

depends upon the unknown function y2 (x).

Similarly, by setting h1 (x) = 0, we obtain another second-order equation

d ∂F ∂F

0 − = 0, y2 (a) = A2 , y2 (b) = B2 . (6.26)

dx ∂y2 ∂y2

Equations 6.25 and 6.26 are the Euler-Lagrange equations for the functional 6.21. These

two equations will normally involve both y1 (x) and y2 (x), so are named coupled differ-

ential equations; normally this makes them far harder to solve than the Euler-Lagrange

equations of chapter 4, which contain only one dependent variable.

An example will make this clear: consider the quadratic functional

Z π/2

dx y10 2 + y20 2 + 2y1 y2

S[y1 , y2 ] = (6.27)

0

d2 y1

− y2 = 0, (6.28)

dx2

which involves both y1 (x) and y2 (x), and equation 6.26 becomes

d2 y2

− y1 = 0, (6.29)

dx2

which also involves both y1 (x) and y2 (x).

184 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

Equations 6.28 and 6.29 now have to be solved. Coupled differential equations are

normally very difficult to solve and their solutions can behave in bizarre ways, including

chaotically; but these equations are linear which makes the task of solving them much

easier, and the solutions are generally better behaved. One method is to use the first

equation to write y2 = y100 , so the second equation becomes the fourth-order linear

equation

d4 y1

− y1 = 0.

dx4

By substituting a function of the form y1 = α exp(λx), where α and λ are constants,

into this equation we obtain an equation for λ that gives λ4 = 1, showing that there

are four solutions obtained by setting λ = ±1, ±i and hence that the general solution

is a linear combination of these functions,

The four arbitrary constants may now be determined from the four boundary conditions,

as demonstrated in the following exercise.

Exercise 6.11

Find the values of the constants A, B, C and “ πD” if the functional

“π ” 6.27 has the

boundary conditions y1 (0) = 0, y2 (0) = 0, y1 = 1, y2 = −1.

2 2

Exercise 6.12

Show that the Euler-Lagrange equations for the functional

Z 1

dx y10 2 + y20 2 + y10 y20 ,

` ´

S[y1 , y2 ] =

0

to 2y10 + y20 = a1 and 2y20 + y10 = a2 , where a1 and a2 are constants. Deduce

that the stationary path is given by the equations

Exercise 6.13

By defining a new variable z1 = y1 + y2 /2, show that the functional defined in the

previous exercise becomes

Z 1 „ «

3 1

S[z1 , y2 ] = dx z10 2 + y20 2 , z1 (0) = , y2 (0) = 1, z1 (1) = 2, y2 (1) = 2,

0 4 2

and that the corresponding Euler-Lagrange equations are

d2 z 1 d2 y 2

=0 and = 0.

dx2 dx2

Solve these equations to derive the solution obtained in the previous exercise.

6.3. FUNCTIONALS WITH MANY DEPENDENT VARIABLES 185

Note that by using the variables (z1 , y2 ) each of the new Euler-Lagrange equations

depends only upon one of the dependent variables and are therefore far easier to

solve. Such systems of equations are said to be uncoupled and one of the main

methods of solving coupled Euler-Lagrange equations is to find a transformation

that converts them to uncoupled equations. In real problems finding such trans-

formations is difficult and often relies upon understanding the symmetries of the

problem and then the methods described in sections 6.2 and 7.3 can be useful.

The extension of the above analysis to functionals involving n dependent variables,

their first derivatives and a single independent variable is straightforward. It is helpful,

however, to use the notation y(x) = (y1 (x), y2 (x), · · · , yn (x)) to denote the set of n

functions. There is still only one independent variable, so the functional is

Z b

S[y] = dx F (x, y, y0 ), y(a) = A, y(b) = B, (6.30)

a

0

where y = (y10 , y20 , · · · , yn0 ),

A = (A1 , A2 , · · · , An ), B = (B1 , B2 , · · · , Bn ) and

h = (h1 , h2 , · · · , hn ). If y(x) and y(x) + h(x) are admissible functions, so that h(a) =

h(b) = 0, the Gâteaux differential is given by the relation

Z b

d d 0 0

∆S[y, h] = S[y + h]

= dx F (x, y + h, y + h ) ,

d =0 a d =0

and for y to be a stationary path this must be zero for all allowed h. Using the chain

rule we have

n

d 0 0

X ∂F 0 ∂F

F (x, y + h, y + h ) = hk + hk 0 ,

d =0 ∂yk ∂yk

k=1

and hence n Z b

X ∂F ∂F

∆S[y, h] = dx hk + h0k 0 . (6.31)

a ∂yk ∂yk

k=1

Now integrate by parts to cast this in the form

n b X n Z b

X ∂F d ∂F ∂F

∆S[y, h] = hk 0 − dx − hk . (6.32)

∂yk a a dx ∂yk0 ∂yk

k=1 k=1

But, since h(a) = h(b) = 0, the boundary term vanishes. Further, since ∆S[y, h] = 0

for all allowed h, by the same reasoning used when n = 2, we obtain the set of n coupled

equations

d ∂F ∂F

− = 0, yk (a) = Ak , yk (b) = Bk , k = 1, 2, · · · , n. (6.33)

dx ∂yk0 ∂yk

This set of n coupled equations is usually nonlinear and difficult to solve. The one

circumstance when the solution is relatively simple is when the integrand of the func-

tional S[y] is a quadratic form in both y and y0 , and then the Euler-Lagrange equations

are coupled linear equations; this is an important example because it describes small

oscillations about an equilibrium position of an n-dimensional dynamical system.

186 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

Exercise 6.14

(a) If A and B are real, symmetric, positive definite, n × n matrices1 consider the

functional Z b n X

X n

` 0

yi Aij yj0 − yi Bij yj ,

´

S[y] = dx

a i=1 j=1

with the integrand quadratic in y and y0 . Show that the n Euler-Lagrange equa-

tions are the set of coupled, linear equations

n „

d2 y j

X «

Akj + B kj y j = 0, 1 ≤ k ≤ n.

j=1

dx2

(b) Show that if we interpret y as an n-dimensional column vector and its trans-

pose y> as a row vector, the functional can be written in the equivalent matrix

form Z b “ ”

S[y] = dx y0 > Ay0 − y> By ,

a

and that the Euler-Lagrange equations can be written in the matrix form

d2 y

A + By = 0.

dx2

Show that this can also be written in the form

d2 y

+ A−1 By = 0. (6.34)

dx2

(c) It can be shown that the matrix A−1 B has non-negative eigenvalues ωk2 and

n orthogonal eigenvectors zk , k = 1, 2, · · · , n, possibly complex, which each satis-

fying A−1 Bzk = ωk2 zk . By expressing y as the linear combination of the zk ,

n

X

y= ak (t)zk

k=1

d2 a j

+ ωj2 aj = 0, j = 1, 2, · · · , n.

dx2

In this section we consider the effect of changing the dependent variables. A simple

example of such a transformation was dealt with in exercise 6.13 where it was shown how

a linear transformation uncoupled the Euler-Lagrange equations. In general the aim of

changing variables is to simplify the Euler-Lagrange equations and it is generally easier

to apply the transformation to the functional rather than the Euler-Lagrange equations.

Before explaining the general theory we deal with a specific example, which high-

lights all salient points. The functional is

Z b

1 02

q

02

y1 + y2 − V (r) , r = y12 + y22 ,

S[y1 , y2 ] = dt (6.35)

a 2

1 A real symmetric matrix, A, has real elements satisfying A

ij = Aji , for all i and j, and it can be

shown that its eigenvalues are real; a positive definite matrix has positive eigenvalues.

6.3. FUNCTIONALS WITH MANY DEPENDENT VARIABLES 187

where V (r) is any suitable function: this functional occurs frequently because it arises

when describing the planar motion of a particle acted upon by a force depending only

on the distance from a fixed point, for example in a simplified description of the motion

of the Earth round the Sun; in this case the independent variable, t, is the time. The

functional S[y] is special because its integrand depends only upon the combinations

y10 2 + y20 2 and y12 + y22 , which suggests that changing to polar coordinates may lead to

simplification. These are (r, θ) where y1 = r cos θ and y2 = r sin θ so that y12 +y22 = r2

and, on using the chain rule,

dy1 dr dθ dy2 dr dθ

= cos θ − r sin θ and = sin θ + r cos θ.

dt dt dt dt dt dt

Squaring and adding these equations gives y10 2 + y20 2 = r0 2 + r2 θ0 2 . Hence the functional

becomes Z b

1 02 1 2 02

S[r, θ] = dt r + r θ − V (r) . (6.36)

a 2 2

Exercise 6.15

(a) Show that the Euler-Lagrange equations for the functional 6.35 are

d2 y 1 y1 d2 y 2 y2

+ V 0 (r) =0 and + V 0 (r) = 0. (6.37)

dt2 r dt2 r

(b) Show that the Euler-Lagrange equations for the functional 6.36 can be written

in the form

d2 r L2 dθ L

2

− 3 + V 0 (r) = 0 and = 2, (6.38)

dt r dt r

where L is a constant. Note that the equation for r does not depend upon θ and

that θ(t) is obtained from r(t) by a single integration. In older texts on dynamics,

see for instance Whittaker (1904), problems are said to be soluble by quadrature

if their solutions can be reduced to known functions or integrals of such functions.

The general theory is not much more complicated. Suppose that y = (y1 , y2 , · · · , yn )

and z = (z1 , z2 , · · · , zn ) are two sets of dependent variables related by the equations

yk = ψk (z), k = 1, 2, · · · , n, (6.39)

where we assume, in order to slightly simplify the analysis, that each of the ψk is not

explicitly dependent upon the independent variable, x. The chain rule gives

n

dyk X ∂ψk dzi

=

dx i=1

∂zi dx

showing that each of the yk0 depends linearly upon zi0 . These linear equations can be

inverted to give zi0 in terms of yk0 , k = 1, 2, · · · , n, if the n × n matrix with elements

∂ψk /∂zi is nonsingular, that is if the Jacobian determinant, equation 1.26 (page 30),

is non-zero. This is also the condition for the transformation between y and z to be

invertible.

Under this tranformation the functional

Z b Z b

S[y] = dx F (x, y, y0 ) becomes S[z] = dx G(x, z, z0 ) (6.40)

a a

188 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

where

n n

!

X ∂ψ1 X ∂ψn

G=F x, ψ1 (z), ψ2 (z), · · · , ψn (z), zi0 , · · · , zi0 ,

i=1

∂zi i=1

∂zi

that is, G(x, z, z0 ) is obtained from F (x, y, y0 ) simply by replacing y and y0 . In practice,

of course, the transformation 6.39 is chosen to ensure that G(x, z, z0 ) is simpler than

F (x, y, y0 ).

Exercise 6.16

Show that under the transformation y1 = ρ cos φ, y2 = ρ sin φ, y3 = z, the func-

tional

Z b ff

1 ` 02

q

y1 + y20 2 + y30 2 − V (ρ) , ρ = y12 + y22 ,

´

S[y1 , y2 , y3 ] = dt

a 2

becomes Z b ff

1 ` 02

ρ + ρ2 φ0 2 + z 0 2 − V (ρ) .

´

S[ρ, φ, z] = dt

a 2

Find the Euler-Lagrange equations and show that those for ρ and z are uncoupled.

The ideas described in this chapter have shown that there are several advantages in

formulating a system of differential equations as a variational principle. This naturally

raises the question as to whether any given system of equations can be formulated in

the form of the Euler-Lagrange equations and hence possesses an associated variational

principle.

In this section it is shown that any second-order equation of the form

d2 y

= f (x, y, y 0 ), (6.41)

dx2

where f (x, y, y 0 ) is a sufficiently well behaved function of the three variables can, in

principle, be expressed as a variational principle. When there are two or more dependent

variables there is no such general result, although there are special classes of equations

for which similar results hold: here, however, we do not discuss these more difficult

cases.

First, consider linear, second-order equations, the most general equation of this type

being,

d2 y dy

a2 (x) 2 + a1 (x) + a0 (x)y = b(x), (6.42)

dx dx

where ak (x), k = 0, 1 and 2, and b(x) depend only upon x and a2 (x) 6= 0 in the relevant

interval of x. This equation may be transformed to the canonical form

d dy p(x)b(x)

p(x) + q(x)y = (6.43)

dx dx a2 (x)

where Z

a1 (x) a0 (x)

p(x) = exp dx and q(x) = p(x). (6.44)

a2 (x) a2 (x)

6.4. THE INVERSE PROBLEM 189

2 !

dy 2pb

Z

2

S[y] = dx p − qy + y . (6.45)

dx a2

Exercise 6.17

(a) Show that equations 6.42 and 6.43 are equivalent if p(x) and q(x) are defined

as in equation 6.44.

(b) Show that the Euler-Lagrange equation associated with the functional 6.45 is

equation 6.43.

Now consider the more general equation 6.41. Suppose that the equivalent Euler-

Lagrange exists and is

d ∂F ∂F

− = 0 that is y 00 Fy0 y0 + y 0 Fyy0 + Fxy0 − Fy = 0, (6.46)

dx ∂y 0 ∂y

terms of x, y and y 0 to give

which is an equation relating the partial derivatives of F and may therefore be regarded

as second-order partial differential equation for F . As it stands this equation is of limited

practical value because it can rarely be solved directly. If, however, we define a new

function z = Fy0 y0 , we shall see that z satisfies a first-order equation for which the

solutions are known to exist. This is seen by differentiating equation 6.47 with respect

to y 0 ,

f Fy0 y0 y0 + fy0 Fy0 y0 + y 0 Fy0 y0 y + Fy0 y0 x = 0.

In terms of z this becomes the first-order equation,

∂z ∂z ∂z

f + y0 + + fy0 z = 0. (6.48)

∂y 0 ∂y ∂x

It can be shown that solutions of this partial differential equation for z(x, y, y 0 ) exist,

see for example Courant and Hilbert2 (1937b). It follows that the function F (x, y, y 0 )

exists and that equation 6.41 can be written as an Euler-Lagrange equation and that

there is an associated functional.

Finding F (x, y, y 0 ) explicitly, however, is not usually easy or even possible because

this involves first solving the partial differential equation 6.48 and then integrating this

solution twice with respect to y 0 . At either stage it may prove impossible to express

the result in a useful form. Some examples illustrate this procedure in simple cases.

Consider the differential equation

y 00 = f (x, y)

2R Courant and D Hilbert, Methods of Mathematical Physics, Volume 2.

190 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

where the right-hand side is independent of y 0 . Then equation 6.48 contains only

derivatives of z and one solution is z = c, a constant. Now the equation Fy0 y0 = z can

be integrated directly to give

1 02

F (x, y, y 0 ) = cy + y 0 A(x, y) + B(x, y),

2

where A and B are some functions of x and y, but not y 0 . The derivatives of F are,

Fy = y 0 A y + B y , Fyy0 = Ay , Fxy0 = Ax ,

cy 00 = By − Ax .

Comparing this with the original equation gives c = 1, and By − Ax = f (x, y): two

obvious solutions are

Z y Z x

B(x, y) = dv f (x, v), A = 0 and A = − du f (u, y), B = 0,

c1 c2

where c1 and c2 are constants, so that the integrands of the required functional are

Z y Z x

1 1

F1 = y 0 2 + dv f (x, v) or F2 = y 0 2 − y 0 du f (u, y). (6.49)

2 c1 2 c2

It may seem strange that this procedure yields two seemingly quite different expressions

for F (x, y, y 0 ). But, recall that different functionals will give the same Euler-Lagrange

equation if the integrands differ by a function that is the derivative with respect to x of

a function g(x, y), see exercises 4.27 (page 140) and 6.22 (page 191). Thus, we expect

that there is a function g(x, y) such that F1 − F2 = dg/dx.

In the next exercise it is shown that the Euler-Lagrange equations associated with

F1 and F2 are identical and an explicit expression is found for g(x, y).

Exercise 6.18

(a) Show that F1 and F2 , defined in equation 6.49 give the same Euler-Lagrange

equations.

d

(b) Show that F1 − F2 = g(x, y), and find g(x, y).

dx

Exercise 6.19

d2 y dy

Find a functional for the equation +α + y = 0, where α is a constant.

dx2 dx

6.5. MISCELLANEOUS EXERCISES 191

Exercise 6.20 Z b

dx y 0 2 − ω 2 y 2 and the change of variable z = x1/c ,

` ´

Using the functional S[y] =

a

show that the differential equation y 00 + ω 2 y = 0 is transformed into

d2 y dy

z + (1 − c) + c2 ω 2 z 2c−1 y = 0.

dz 2 dz

Exercise 6.21 Z b

Show that the Euler-Lagrange equations for the functional S[y1 , y2 ] = dx F (y10 , y20 ),

a

which depends only upon the first derivatives of y1 and y2 , are

∂ 2 F 00 ∂2F ∂2F ∂ 2 F 00

y + 0 0 y200 = 0

02 1

and y 00 +

0 1

y2 = 0.

∂y1 ∂y1 ∂y2 0

∂y1 ∂y2 ∂y20 2

∂2F ∂2F

˛ ˛

˛ ˛

∂y10 2 ∂y10 ∂y20

˛ ˛

˛ ˛

d = ˛˛ ˛

˛ ∂2F ∂2F ˛

˛

∂y10 ∂y20 ∂y20 2

˛ ˛

y1 (x) = Ax + B, y2 (x) = Cx + D

What is the equivalent condition if there is only one dependent variable?

Exercise 6.22

If Φ(x, y1 , y2 ) is any twice differentiable function show that the functionals

Z b

“ ”

S1 [y1 , y2 ] = dx F x, y1 , y2 , y10 , y20 and

a

Z b » “ ” “ ”–

S2 [y1 , y2 ] = dx F x, y1 , y2 , y10 , y20 + Ψ x, y1 , y2 , y10 , y20 ,

a

where

dΦ ∂Φ ∂Φ 0 ∂Φ 0

Ψ= = + y + y for some Φ(x, y1 , y2 ),

dx ∂x ∂y1 1 ∂y2 2

lead to the same Euler-Lagrange equation.

Note that this is the direct generalisation of the result derived in exercise 4.27

(page 140).

192 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

Exercise 6.23

Consider the two functionals

Z b » –

1 ` 02

y1 + y20 2 + g1 (x)y10 + g2 (x)y20 − V (x, y1 , y2 )

´

S1 [y1 , y2 ] = dx

a 2

and Z b » –

1 ` 02

y1 + y20 2 − V (x, y1 , y2 )

´

S2 [y1 , y2 ] = dx

a 2

where V = V + g10 (x)y1 + g20 (x)y2 . Use the result proved in the previous exercise

to show that S1 and S2 give rise to identical Euler-Lagrange equations.

Exercise 6.24

(a) Show that the Euler-Lagrange equation of the functional

Z ∞

d2 y ` −x ´ dy

dx e−x y − ex y 0 is + 2e−2x y = 0.

p

S[y] = 2

− 3e − 1 (6.50)

0 dx dx

(b) Show that the change of variables u = e−x , with inverse x = ln(1/u), trans-

forms this functional to

Z 1 p

S[Y ] = du Y (u) + Y 0 (u), Y (u) = y(− ln(u)),

0

d2 Y dY

+3 + 2Y = 0. (6.51)

du2 du

(c) By making the substituting x = − ln(u), show that equation 6.50 transforms

into equation 6.51.

dx y 0 2 + z 0 2 + 2yz ,

` ´

Show that the stationary paths of the functional S[y, z] =

0

with the boundary conditions y(0) = 0, z(0) = 0, y(π/2) = 3/2 and z(π/2) = 1/2,

satisfy the equations

d2 y d2 z

− z = 0, − y = 0.

dx2 dx2

Show that the solution of these equations is

sinh x 1 sinh x 1

y(x) = + sin x, z(x) = − sin x.

sinh(π/2) 2 sinh(π/2) 2

Exercise 6.26

The ordinary Bessel function, denoted by Jn (x), is defined to be proportional to

the solution of the second-order differential equation

d2 y dy

x2 +x + (x2 − n2 )y = 0, n = 0, 1, 2, · · · , (6.52)

dx2 dx

that behaves as (x/2)n near the origin.

6.5. MISCELLANEOUS EXERCISES 193

(a) Show that equation 6.52 is the Euler-Lagrange equation of the functional

Z X

n2

„ „ « «

F [y] = dx xy 0 (x)2 − x − y(x)2 , y(X) = Y 6= 0, ,

0 x

(b) Define a new independent variable u by the equation x = f (u), where f (u) is

monotonic and smooth, and set w(u) = y(f (u)) to cast this functional into the

form

Z u1 „ „ 0

« «

f (u) 0 2 0 2 f (u) 2

F [w] = du w (u) − f (u)f (u) − n w(u) ,

u0 f 0 (u) f (u)

(c) Hence show that if f (u) = eu , w(u) satisfies the equation

d2 w ` 2u

+ e − n2 w = 0

´

(6.53)

du2

and deduce that a solution of equation 6.53 is w(u) = Jn (eu ).

194 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS

Chapter 7

theorem

7.1 Introduction

In this chapter we show how symmetries can help solve the Euler-Lagrange equations.

The simplest example of the theory presented here was introduced in section 4.4.1 where

it was shown that a first-integral existed if the integrand did not depend explicitly upon

the independent variable. This simplification was used to help solve the brachistochrone

and the minimum surface area problems, sections 5.2 and 5.3. Here we show how the

first-integral may be derived using a more general principle which can be used to derive

other first-integrals.

Students knowing some dynamics will be aware of how important the conservation of

energy, linear and angular momentum can be: the theory described in section 7.3 unifies

all these conservation laws. In addition these ideas may be extended to deal with those

partial differential equations that can be derived from a variational principle, although

this theory is not included in the present course.

7.2 Symmetries

The Euler-Lagrange equations for the brachistrochrone, section 5.2, and the minimal

surface area, section 5.3, were solved using the fact that the integrand, F (y, y 0 ), does

not depend explicitly upon x, that is, ∂F/∂x = 0. In this situation it was shown in

exercise 4.7 (page 131) that

d ∂F ∂F d 0 ∂F

y 0 (x) − = y − F .

dx ∂y 0 ∂y dx ∂y 0

This result is important because it shows that if y(x) satisfies the second-order Euler

equation it also satisfies the first-order equation

∂F

y0 − F = constant,

∂y 0

195

196 CHAPTER 7. SYMMETRIES AND NOETHER’S THEOREM

which is often simpler: we ignore the possibility that y 0 (x) = 0 for the reasons discussed

after exercise 7.8. This result is proved algebraically in exercise 4.7, (page 131), and

there we relied only on the fact that ∂F/∂x = 0. In the following section we re-derive

this result using the equivalent but more fundamental notion that the integrand F (y, y 0 )

is invariant under translations in x: this is a fruitful method because it is more readily

generalised to other types of transformations; for instance in three-dimensional problems

the integrand of the functional may be invariant under all rotations, or just rotations

about a given axis. The general theory is described in section 7.3, but first we introduce

the method by applying it to functionals that are invariant under translations.

The algebra of the following analysis is fairly complicated and requires careful thought

at each stage, so you may need to read it carefully several times and complete the

intermediate steps.

Consider the functional

Z b

S[y] = dx F (y, y 0 ), y(a) = A, y(b) = B, (7.1)

a

where the integrand does not depend explicitly upon x, that is ∂F/∂x = 0. The

stationary function, y(x), describes a curve C in the two-dimensional space with axes

Oxy, so that a point, P , on the curve has coordinates (x, y(x)) as shown in figure 7.1.

y y

P

δ C

x

x x

O O

Figure 7.1 Diagram showing the two coordinate systems Oxy and Ox y, con-

nected by a translation along the x-axis by a distance δ.

Consider now the coordinate system Ox y where x = x + δ and y = y, with the origin,

O, of this system at x = δ, y = 0 in the original coordinate system; that is, Ox y is

translated from Oxy a distance δ along the x-axis. In this coordinate system the curve

C is described by y(x), so the coordinates of a point P are (x, y(x)) and these are

related to coordinates in Oxy, (x, y(x)), by

the latter equation defining the function y; differentiation, using the chain rule, gives

dy dy dx dy

= = .

dx dx dx dx

7.2. SYMMETRIES 197

The functional 7.1 can be computed in either coordinate system and, for reasons that

will soon become apparent, we consider the integral in the Ox y representation over a

limited, but arbitrary, range

Z d dy

G= du F y(u), y 0 (u) where y 0 = ,

c du

c = c − δ, d = d − δ and where a < c < d < b. The integrand of G depends on u only

through the function y(u): this means that at each value of u the integrand has the

same value as the integrand of S[y] at the equivalent point, x = u + δ. Hence

Z d Z d

0

du F y(u), y (u) = dx F y(x), y 0 (x) where x = u + δ, (7.2)

c c

Now consider small values of δ and expand to O(δ), first writing the integral in the

form

Z d−δ

G = du F (y(u), y 0 (u))

c−δ

Z d Z c Z d

= du F (y, y0 ) + du F (y, y0 ) − du F (y, y0 ). (7.3)

c c−δ d−δ

du g(u) = g(z)δ + O(δ 2 ),

z−δ

and to this order

y(u) = y(u + δ) = y(u) + y 0 (u)δ + O(δ 2 ), and y 0 (u) = y 0 (u) + y 00 (u)δ + O(δ 2 ).

Thus the expression 7.3 for G becomes, to first order in δ,

Z d h id

G = du F (y + y 0 δ, y 0 + y 00 δ) − δ F (y, y 0 ) + O(δ 2 )

c c

Z d id

∂F 0 ∂F h

= du F (y, y 0 ) + δ y + δ 0 y 00 − δ F (y, y 0 ) + O(δ 2 ).

c ∂y ∂y c

Z d id

∂F ∂F h

0=δ du y 0 + y 00 0 − δ F (y, y 0 ) + O(δ 2 ). (7.4)

c ∂y ∂y c

Z d d Z d

00 ∂F 0 ∂F 0 d ∂F

du y = y − du y .

c ∂y 0 ∂y 0 c c du ∂y 0

Substituting this into 7.4 and dividing by δ gives

d Z d

0 ∂F 0 d ∂F ∂F

0= y −F − du y − + O(δ). (7.5)

∂y 0 c c du ∂y 0 ∂y

198 CHAPTER 7. SYMMETRIES AND NOETHER’S THEOREM

zero, and hence on letting δ → 0,

∂F 0 ∂F

F − y0

0

= F − y 0

.

∂y x=c

∂y x=d

Finally, recall that c and d are arbitrary and hence, for any x in the interval a < x < b,

the function

∂

y 0 0 F (y, y 0 ) − F (y, y 0 ) = constant. (7.6)

∂y

Because the function on the left-hand side is continuous the equality is true in the

interval a ≤ x ≤ b. This relation is always true if the integrand of the functional does

not depend explicitly upon x, that is ∂F/∂x = 0.

Equation 7.6 relates y 0 (x) and y(x) and by rearranging it we obtain one, or more,

first-order equations for the unknown function y(x). Noether’s theorem, stated below,

shows that solutions of the Euler-Lagrange equation also satisfy equation 7.6. In prac-

tice, because this equation is usually easier to solve than the original Euler-Lagrange

equation, it is implicitly assumed that its solutions are also solutions of the Euler-

Lagrange equation. In general this is true, but the examples treated in exercises 4.8

(page 132) and 7.8 show that care is sometimes needed. By differentiating equation 7.6

with respect to x the Euler-Lagrange equation is regained, as shown in exercise 4.7

(page 131).

The function y 0 Fy0 − F is named the first-integral of the Euler-Lagrange equation,

this name being suggestive of it being derived by integrating the original second-order

equation once to give a first-order equation. For the same reason in dynamics, quantities

that are conserved, for instance energy, linear and angular momentum, are also named

first-integrals, integrals of the motion or constants of the motion, and these dynamical

quantities have exactly the same mathematical origin as the first-integral defined in

equation 7.6.

This proof of equation 7.6 may seem a lot more elaborate than that given in ex-

ercise 4.7 (page 131). However, there are circumstances when the algebra required to

use the former method is too unwieldy to be useful and then the present method is

superior. An example of such a problem is given in exercise 7.12.

In the context of Newtonian dynamics the equivalent of equation 7.6 is the conserva-

tion of energy in those circumstances when the forces are conservative and are indepen-

dent of the time; that is in Newtonian mechanics energy conservation is a consequence

of the invariance of the equations of motion under translations in time. Similarly, in-

variance under translations in space gives rise to conservation of linear momentum and

invariance under rotations in space gives rise to conservation of angular momentum.

As an example of a functional that is not invariant under translations of the inde-

pendent variable, consider

Z b

J[y] = dx xy 0 (x)2 , y(a) = A, y(b) = B.

a

It is instructive to go through the above proof to see where and how it breaks down.

7.3. NOETHER’S THEOREM 199

Z d Z d

J[y] = du u y0 (u)2 = dv (v − δ)y 0 (v)2

c c

Z d

= J[y] − δ dv y 0 (v)2 6= J[y],

c

and we see how the explicit dependence of x destroys the invariance needed for the

existence of the first-integral.

In this section we treat functionals having several dependent variables. The analysis

is a straightforward generalisation of that presented above but takes time to absorb.

For a first reading, ensure that you understand the fundamental ideas and try to avoid

getting lost in algebraic details. That is, you should try to understand the definition of

an invariant functional, the meaning of Noether’s theorem, rather than the proof, and

should be able to do exercises 7.1 – 7.3.

There are two ingredients to Noether’s theorem:

(i) functionals that are invariant under transformations of either or both dependent

and independent variables:

(ii) families of transformations that depend upon one or more real parameters, though

here we deal with situations where there is just one parameter.

We consider each of these in turn in relation to the functional

Z b

S[y] = dx F (x, y, y0 ), y = (y1 , y2 , · · · , yn ), (7.7)

a

which has stationary paths defined by the solutions of the Euler-Lagrange equations.

We do not include boundary conditions because they play no role in this theory.

The value of the functional depends upon the path taken which, in this section, is

not always restricted to stationary paths. We shall consider the change in the value

of the functional when the path is changed according to a given transformation: in

particular we are interested in those transformations which change the path but not

the value of the functional.

Consider, for instance, the two functionals

Z 1 Z 1

dx y10 2 + y20 2 dx y10 2 + y20 2 y1 .

S1 [y] = and S2 [y] = (7.8)

0 0

A path γ can be defined by the pair of functions (f (x), g(x)), 0 ≤ x ≤ 1, and on each γ

the functionals have a value.

Consider the transformation

y1 = y1 cos α − y2 sin α y1 = y1 cos α + y 2 sin α

with inverse (7.9)

y2 = y1 sin α + y2 cos α y2 = −y1 sin α + y 2 cos α

200 CHAPTER 7. SYMMETRIES AND NOETHER’S THEOREM

angle α. Hence under this transformation the curve γ is rotated bodily to the curve γ,

as shown in figure 7.2.

y2

γ

Rotated curve

Original curve γ

y1

Figure 7.2 Diagram showing the rotation of the curve γ

anticlockwise through the angle α to the curve γ.

Z 1 h i

S1 (γ) = dx f 0 (x)2 + g 0 (x)2

0

Z 1 h 0 i

S1 (γ) = dx f (x)2 + g 0 (x)2 .

0

0

But on using equation 7.10 we obtain f (x)2 + g0 (x)2 = f 0 (x)2 + g 0 (x)2 which gives

Z 1 h i

S1 (γ) = dx f 0 (x)2 + g 0 (x)2 = S1 (γ).

0

That is the functional S1 has the same value on γ and γ for all α and is therefore

invariant with respect to the rotation 7.9.

On the other hand the values of S2 are

Z 1 h i

S2 (γ) = dx f 0 (x)2 + g 0 (x)2 f (x)

0

and

Z 1 h 0 i

S2 (γ) = dx f (x)2 + g 0 (x)2 f (x)

0

Z 1 h ih i

= dx f 0 (x)2 + g 0 (x)2 f (x) cos α − g(x) sin α

0

Z 1 h i

= S2 (γ) cos α − sin α dx f 0 (x)2 + g 0 (x)2 g(x).

0

7.3. NOETHER’S THEOREM 201

In this case the functional has different values on γ and γ, unless α is an integer multiple

of 2π. That is, S2 [y] is not invariant with respect to the rotation 7.9.

The transformation 7.9 does not involve changes to the independent variable x,

whereas the transformation considered in section 7.2.1 involved only a change in the

independent variable, via a translation along the x-axis, see figure 7.1. In general it is

necessary to deal with a transformation in both dependent and independent variables,

which can be written as

We assume that these relations can be inverted to give x and y in terms of x and y.

For a curve γ, defined by the equation y = f (x), a ≤ x ≤ b, this transformation moves

γ to another curve γ defined by the transformed equation y = f (x).

Definition 7.1

The functional 7.7 is said to be invariant under the transformation 7.11 if

Z d Z d

dy dy

G = G where G = dx F x, y, , G= dx F x, y, ,

c dx c dx

and where c = Φ(c, y(c), y0 (c)) and d = Φ(d, y(d), y0 (d)), for all c and d satisfying

a ≤ c < d ≤ b.

y(x) and y(x) define two curves, γ and γ in an n-dimensional space, each parametrised

by the independent variable x. The functional G is the integral of F (x, y, y 0 ) along γ

and G is the integral of the same function along γ.

In the case x 6= x the parametrisation along γ and γ is changed. An important

example of a change to the independent variable, x, is the uniform shift x = x + δ,

where δ is independent of x, y and y0 , which is the example dealt with in the previous

section. The scale transformation, whereby x = (1 + δ)x, is also useful, see exercise 7.8.

A one-parameter family of transformations is the set of transformations

depending upon the single parameter δ, which reduces to the identity when δ = 0, that

is

x = Φ(x, y, y0 ; 0) and yk = Ψk (x, y, y0 ; 0), k = 1, 2, · · · , n,

and where Φ and all the Ψk have continuous first derivatives in all variables, including δ.

This last condition ensures that the transformation is invertible in the neighbourhood of

δ = 0, provided the Jacobian determinant is not zero. An example of a one-parameter

family of transformations is defined by equation 7.9, which becomes the identity when

α = 0.

Exercise 7.1

Which of the following is a one-parameter family of transformations

(a) x = x − yδ, y = y + xδ,

(b) x = x cosh δ − y sinh δ, y = x sinh δ − y cosh δ,

(c) y = y exp(Aδ) where A is a square, non-singular, n × n matrix.

B

Note that

P if Bk is a square matrix the matrix e is defined by the sum

eB = ∞ k=0 B /k!.

202 CHAPTER 7. SYMMETRIES AND NOETHER’S THEOREM

Exercise 7.2

Families of transformations are very common and are often generated by solutions

of differential equations, as illustrated by the following example.

(a) Show that the solution of the equation

dy zet

= y(1−y), 0 ≤ y(0) ≤ 1, is y = ψ(z, t) = , where z = y(0).

dt 1 + (et − 1) z

(b) Show that this defines a one-parameter family of transformations, y = Ψ(z, t),

with parameter t, so that for each t, ψ(z, t) transforms the initial point z to the

value y(t).

Exercise 7.3 Z b

dx y10 2 − y20 2 is invariant under the trans-

` ´

Show that the functional S[y] =

a

formation

We have finally arrived at the main part of this section, the statement and proof of

Noether’s theorem. The theorem was published in 1918 by Emmy Noether (1882-

1935), a German mathematician, considered to be one of the most creative abstract

algebraists of modern times. The theorem was derived for certain Variational Principles,

and has important applications to physics especially relativity and quantum mechanics,

besides systematising many of the known results of classical dynamics; in particular it

provides a uniform description of the laws of conservation of energy, linear and angular

momentum, which are, respectively, due to invariance of the equations of motion under

translations in time, space and rotations in space. The theorem can also be applied to

partial differential equations that can be derived from a variational principle.

The theorem deals with arbitrarily small changes in the coordinates, so in equa-

tion 7.12 we assume |δ| 1 and write the transformation in the form

0 2 ∂Φ

x = x + δφ(x, y, y ) + O(δ ), φ= ,

∂δ δ=0

(7.13)

∂Ψ k

y k = yk + δψk (x, y, y0 ) + O(δ 2 ), ψk = ,

∂δ δ=0

where we have used the fact that when δ = 0 the transformation becomes the identity.

In all subsequent analysis second order terms in the parameter, here δ, are ignored.

Exercise 7.4

Show that to first order in α the rotation defined by equation 7.9 becomes

7.3. NOETHER’S THEOREM 203

Theorem 7.1

Noether’s theorem: If the functional

Z d

S[y] = dx F (x, y, y0 ) (7.14)

c

is invariant under the family of transformations 7.13, for arbitrary c and d, then

n n

!

X ∂F X

0 ∂F

ψk + F − yk 0 φ = constant (7.15)

∂yk0 ∂yk

k=1 k=1

The function defined on the left-hand side of this equation is often named a first-

integral of the Euler-Lagrange equations.1

reduces to the result derived in the previous section, equation 7.6 (page 198). In general,

if n = 1, equation 7.15 furnishes a first-order differential equation which is usually easier

to solve than the corresponding second-order Euler-Lagrange equation, as was seen in

sections 5.2 and 5.3. Normally, solutions of this first-order equation are also solutions of

the Euler-Lagrange equations: however, this not always true, so some care is sometimes

needed, see for instance exercise 7.8 and the following discussion. A proof of Noether’s

theorem is given after the following exercises.

Exercise 7.5 Z b

dx y10 2 + y20 2 is invariant under the

` ´

Use the fact that the functional S[y] =

a

rotation defined by equation 7.9 and the result derived in exercise 7.4 to show that

a first-integral, equation 7.15, is y1 y20 − y2 y10 = constant.

In the context of dynamics this first-integral is the angular momentum.

Exercise 7.6 Z b

dx y10 2 + y20 2 is invariant under the following

` ´

Show that the functional S[y] =

a

three transformations

(i) y1 = y1 + δg(x), y 2 = y2 , x = x,

(ii) y 1 = y1 , y 2 = y2 + δg(x), x = x,

(iii) y 1 = y1 , y 2 = y2 , x = x + δg(x).

only if g(x) is a constant.

In the case g = 1 show that these three invariant transformations lead to the

first-integrals,

(i) y10 = constant, (ii) y20 = constant (iii) y10 2 + y20 2 = constant.

1 The name first-integral comes from the time when differential equations were solved by successive

integration, with n integrations being necessary to find the general solution of an nth order equation.

The term solution dates back to Lagrange, but it was Poincaré who established its use; what is now

named a solution used to be called an integral or a particular integral.

204 CHAPTER 7. SYMMETRIES AND NOETHER’S THEOREM

Exercise 7.7

Show that the functional

Z b » –

1 ` 02

y1 + y20 2 + V (y1 − y2 ) ,

´

S[y] = dx

a 2

y 1 = y1 + δg(x), y 2 = y2 + δg(x), x = x,

Exercise 7.8

A version of the Emden-Fowler equation can be written in the form

d2 y 2 dy

+ + y 5 = 0.

dx2 x dx

(a) Show that this is the Euler-Lagrange equation associated with the functional

Z b „ «

1

S[y] = dx x2 y 0 2 − y 6 .

a 3

(b) Show that this functional is invariant under the scaling transformation x = αx,

y = βy, where α and β are constants satisfying αβ 2 = 1. Use Noether’s theorem

to deduce that a first-integral is

„ «

1

x2 yy 0 + x3 y 0 2 + y 6 = c,

3

where c is a constant.

(c) By substituting the trial function y = Axγ into the first-integral, find a value

of γ that yields a solution of the first-integral, for any A. Show also that this is

a solution of the original Euler-Lagrange equation, but only for particular values

of A.

(d) By differentiating the first-integral, obtain the following equation for y 00 ,

xy + 2y 0 + xy 5 2x2 y 0 + xy = 0.

` 00 ´` ´

Show that the solutions of this equation are y = Ax−1/2 , for any constant A,

together with the solutions of the Euler-Lagrange equation.

In the previous exercise we saw that the first-order differential equation defined by

the first-integral had a solution that was not a solution of the original Euler-Lagrange

equation. This feature, surprising at first, is typical and a consequence of the orig-

inal equation being nonlinear in x and y. In general the Euler-Lagrange equation

can be written in the form y 00 = f (x, y, y 0 ); suppose this possesses the first-integral

φ(x, y, y 0 ) = c, then differentiation of this eliminates the constant c to give the second-

order equation y 00 φy0 + y 0 φy + φx = 0, which is linear in y 00 . By definition solutions of

the Euler-Lagrange equation satisfy the first-integral, so this equation factorises in the

form

y 00 φy0 + y 0 φy + φx = y 00 − f (x, y, y 0 ) g(x, y, y 0 ) = 0,

7.3. NOETHER’S THEOREM 205

for some function g(x, y, y 0 ), which may be a constant. This latter equation may also

have another solution given by g(x, y, y 0 ) = 0, which may be integrated to furnish a

relation between x and y involving the single constant c. Usually this function is not a

solution of the original Euler-Lagrange equation.

The general solution of the Euler-Lagrange equation contains two independent vari-

ables, which are determined by the boundary conditions that can be varied indepen-

dently. The extra solution of the first-integral, if it exists, can involve at most one

constant, so does not usually satisfy both boundary conditions. A simple example of

this was seen in the minimal surface area problem, equation 5.14 (page 157).

Noether’s theorem is proved by substituting the transformation 7.13 into the func-

tional 7.14 and expanding to first order in δ. The algebra is messy, so we proceed in

two stages.

First, we assume that x = x, that is φ = 0, which simplifies the algebra. It is easiest

to start with the transformed functional

Z d

dy

G= dx F x, y, , (since x = x).

c dx

d

dψ

Z

G = dx F x, y + δ ψ, y0 + δ

,

c dx

Z d Z d n

0

X ∂F ∂F dψk

= dx F (x, y, y ) + δ dx ψk + 0 . (7.16)

c c ∂yk ∂yk dx

k=1

But the first term is merely the untransformed functional which, by definition equals

the transformed functional — because it is invariant under the transformation. Also,

using integration by parts

d d Z d

∂F dψk ∂F d ∂F

Z

dx = ψ k − dx ψ k

c ∂yk0 dx ∂yk0 c c dx ∂yk0

d n n d

∂F d ∂F ∂F

Z X X

0=δ dx ψk − +δ ψk 0 . (7.17)

c ∂yk dx ∂yk0 ∂yk c

k=1 k=1

The term in curly brackets is, by virtue of the Euler-Lagrange equation, zero on a

stationary path and hence it follows that

n n

X ∂F X ∂F

ψk = ψ k .

∂yk0 x=d ∂yk0 x=c

k=1 k=1

206 CHAPTER 7. SYMMETRIES AND NOETHER’S THEOREM

As before we start with the transformed functional, which is now

d

dy

Z

G= dx F x, y, ,

c dx

where d = d + δφ(d), c = c + δφ(c), with φ(c) denoting φ(c, y(c), y 0 (c)). Now we have

to change the integration variable and limits, besides expanding F . First consider the

differential; using equations 7.13 and the chain rule,

dy dy dψ dx dx dφ

= +δ but =1+δ (7.18)

dx dx dx dx dx dx

dx dφ

giving =1−δ + O(δ 2 ) and so, to first order in δ we have

dx dx

dy dy dψ dφ dy dψ dy dφ

= +δ 1−δ = +δ − .

dx dx dx dx dx dx dx dx

d

dx dy dψ dy dφ

Z

G= dx F x + δφ, y + δ ψ, +δ − .

c dx dx dx dx dx

Now expand to first order in δ and use the fact that the functional is invariant. After

some algebra we find that

n n

Z d ( ! )

X ∂F dyk dφ ∂F X ∂F ∂F dψk

0=δ dx F− +φ + ψk + 0 . (7.19)

c ∂yk0 dx dx ∂x ∂yk ∂yk dx

k=1 k=1

Notice that if φ = 0 this is the equivalent of equation 7.17. Now integrate those terms

containing dφ/dx and dΨk /dx by parts to cast this equation into the form

" n

! n

#d

X ∂F dyk X ∂F

0 = δ F− φ+ ψk 0

∂yk0 dx ∂yk

k=1 k=1 c

n

Z d ( !)

∂F d X ∂F dyk

+δ dx φ − F−

c ∂x dx ∂yk0 dx

k=1

Z d n

X ∂F d ∂F

+δ dx ψk − . (7.20)

c ∂yk dx ∂yk0

k=1

Finally, we need to show that on stationary paths the integrals are zero. The second

integral is clearly zero, by virtue of the Euler-Lagrange equations. On expanding the

integrand of the first-integral it becomes

n n n

( ) X

∂F ∂F X ∂F 0 ∂F 00 ∂F 00 X 0 d ∂F

− + yk + 0 yk + y + y .

∂x ∂x ∂yk ∂yk ∂yk0 k k

dx ∂yk0

k=1 k=1 k=1

7.3. NOETHER’S THEOREM 207

Using the Euler-Lagrange equations to modify the last term it is seen that this expres-

sion is zero. Hence, because c and d are arbitrary we have shown that the function

n n

!

X ∂F 0 X ∂F

F− y φ + ψk 0 ,

∂yk0 k ∂yk

k=1 k=1

Exercise 7.9

Derive equations 7.19 and 7.20.

Exercise 7.10 Z b

Consider the functional S[y] = dx F (x, y 0 ), where the integrand depends only

a

upon x and y 0 . Show that Noether’s theorem gives the first-integral Fy0 (x, y 0 ) =

constant, and that this is consistent with the Euler-Lagrange equation.

208 CHAPTER 7. SYMMETRIES AND NOETHER’S THEOREM

Exercise 7.11

(a) Show that the Euler-Lagrange equation for the functional

Z b „ «2

d2 y dy dy

S[y] = dx xyy 0 2 is 2xy 2 + x + 2y = 0.

a dx dx dx

(b) Show that this functional is invariant with respect to scale changes in the

independent variable, x, that is, under the change to the new variable x = (1+δ)x,

where δ is a constant. Use Noether’s theorem to show that the first-integral of the

„ «2

dy

above differential equation is, x2 y = c, for some constant c,

dx

Exercise 7.12 Z

Consider the functional S[y] = dx x3 y 2 y 0 2 .

(a) Show that S is invariant under the scale transformation x = αx, y = βy if

αβ 2 = 1. Hence show that a first-integral is x3 y 3 y 0 + x4 y 2 y 0 2 = c = constant.

(b) Using the function y = Axγ find a solution of this equation; show also that

this is not a solution of the associated Euler-Lagrange equation.

(c) Show that the general solution of the Euler-Lagrange equation is y 2 + Ax−2 = B,

where A and B are arbitrary constants.

(d) Using the independent variable u where x = ua , show that with a suitable

choice of the constant a the functional becomes

Z „ «2

1 dy

S[y] = du y .

a du

Find the first-integral of this functional and show that the solution of the first-

integral found in part (b) does not satisfy this new first-integral.

Exercise 7.13

Show that the Euler-Lagrange equation of the functional

Z b

S[y] = dx F (x, y, y 0 , y 00 ), y(a) = A1 , y 0 (a) = A2 , y(b) = B1 , y 0 (b) = B2 ,

a

„ «

d ∂F ∂F

− = constant

dx ∂y 00 ∂y 0

if the integrand does not depend explicitly upon y(x) and the integral

„ „ « «

∂F d ∂F ∂F

y 00 00 − − y 0 − F = constant

∂y dx ∂y 00 ∂y 0

if the integrand does not depend explicitly upon x.

Hint: the second part of this question is most easily done using the theory de-

scribed in section 7.2.

Chapter 8

8.1 Introduction

In this chapter we derive necessary and sufficient conditions for the functional

Z b

S[y] = dx F (x, y, y 0 ), y(a) = A, y(b) = B, (8.1)

a

to have an actual extremum, rather than simply a stationary value. You will recall from

chapter 4 that necessary and sufficient conditions for this functional to be stationary on

a sufficiently differentiable curve y(x) is that it satisfies the Euler-Lagrange equation,

d ∂F ∂F

− = 0, y(a) = A, y(b) = B. (8.2)

dx ∂y 0 ∂y

be an extremum are stated in theorems 8.3 and 8.4 (page 221) respectively. This

theory is important because many variational principles require the functional to have

a minimum value. But the theory is limited because it does not determine whether an

extremum is local or global, section 3.2.2, and sometimes this distinction is important,

as for example with geodesics. The treatment of local and global extrema is different.

For a local extremum we need compare only neighbouring paths — the behaviour far

away is irrelevant. For a global extremum, we require information about all admissible

paths, which is clearly a far more demanding, and often impossible, task. For this

reason we shall concentrate on local extrema but note that the analysis introduced in

exercise 3.4 (page 97) uses a global property of the functional which can be used for

some brachistochrone problems, as shown in section 8.6. This, and other, methods are

analysed in more depth by Troutman1 .

We start this chapter with a description of the standard procedure used to classify

the stationary points of functions of several real variables, leading to a statement of

the Morse lemma, which shows that with n variables most stationary points can be

categorised as one of n + 1 types, only one of which is a local minimum and one a

local maximum. In section 8.4 we derive a sufficient condition for the functional 8.1 to

1 Troutman J L, 1983, Variational Calculus with Elementary Convexity, Springer-Verlag.

209

210 CHAPTER 8. THE SECOND VARIATION

equation and the notion of conjugate points, so these are introduced first. Theorem 8.4

(page 221) is important and useful because it provides a test for determining whether a

stationary path actually yields a local extremum; in sections 8.6 and 8.7 we shall apply

it to the brachistochrone problem and the minimum surface of revolution, respectively.

Finally, in section 8.8 we complete the story by showing how the classification method

used for functions of n variables tends to Jacobi’s equation as n → ∞.

Suppose that x = (x1 , x2 , · · · , xn ) and F (x) is a function in Rn that possesses deriva-

tives of at least second-order. Using the Taylor expansion of F (x + ξ), equation 1.39

(page 36), with |ξ| = 1 to guarantee that ξ tends to zero with , we have

1

F (x + ξ) = F (x) + ∆F [x, ξ] + 2 ∆2 F [x, ξ] + O(3 ), (8.3)

2

where ∆F is the Gâteaux differential

n n X

n

X ∂F X ∂2F

∆F [x, ξ] = ξk and ∆2 F [x, ξ] = ξk ξj . (8.4)

∂xk ∂xk ∂xj

k=1 k=1 j=1

all the first partial derivatives, ∂F/∂xk , must be zero at x = a. Provided ∆2 F [x, ξ]

is not identically zero for any ξ the nature of the stationary point is determined by

the behaviour of ∆2 F [a, ξ]. If this is positive for all ξ then F (a) has a local minimum

at x = a: note that the adjective local is usually omitted. If it is negative for all ξ

then F (a) has a local maximum at x = a. If the sign of ∆2 F [a, ξ] changes with ξ the

stationary point is said to be a saddle. In some texts the terms stationary point and

critical point are synonyms.

For functions of one variable, n = 1, there are three types of stationary points as

illustrated in figure 8.1. The function shown in this figure has a local maximum and

a local minimum, but the global maximum and minimum are at the ends of the range

and are not stationary points.

f(x) local

maximum inflection

local

minimum

x

Figure 8.1 Diagram showing the three possible types

of stationary point of a function, f (x), of one variable.

8.2. STATIONARY POINTS OF FUNCTIONS OF SEVERAL VARIABLES 211

At a typical maximum f 0 (x) = 0 and f 00 (x) < 0; at a typical minumum f 0 (x) = 0 and

f 00 (x) > 0 whereas at a stationary point which is also a point of inflection2 f 0 (x) =

f 00 (x) = 0. Care is needed when classifying stationary points because there are many

special cases; for instance the function f (x) = x4 is stationary at the origin and

f (k) (0) = 0 for k = 1, 2 and 3. For this reason we restrict the discussion to typi-

cal stationary points, defined to be those at which the second derivative is not zero:

without this restriction complications arise, for the reasons discussed in the following

exercise.

Exercise 8.1

Stationary points which are also points of inflection are not typical because small,

arbitrary changes to a function with such a stationary point usually change its

nature.

(a) Show that the function f (x) = x3 is stationary and has a point of inflection

at the origin. By adding x, with 0 < || 1, so f (x) becomes x3 + x, show that

the stationary point is removed if > 0 or converted to two ordinary stationary

points (at which the second derivative is not zero) if < 0.

f 0 (a) = f 00 (a) = 0, but f (3) (a) 6= 0, show that the function F (x) = f (x) + g(x),

where 0 < || 1 and where g(a) = 0 and g 0 (a) 6= 0, is either not stationary or

has ordinary stationary points in the neighbourhood of x = a. You may assume

that all functions possess a Taylor expansion in the neighbourhood of x = a.

Note that a sufficiently smooth function f (x) defined on an interval [a, b] is often said

to be structurally stable if the number and nature of its stationary points are unchanged

by the addition of a small, arbitrary function; that is for arbitrarily small , f (x) and

f (x)+g(x), also sufficiently smooth, have the same stationary point structure on [a, b].

Generally, functions describing the physical world are structurally stable, provided f

and g have the same symmetries. For functions of one real variable there are just two

typical stationary points, maxima and minima.

If n = 2 there are three types of stationary points, maxima, minima and saddles,

examples of which are

2 2

−x1 − x2 , maximum,

f (x) = x21 + x22 , minimum, (8.5)

x21 − x22 , saddle.

These three functions all have stationary points at the origin and their shapes are

illustrated in the following figures.

2A general point of infection is where f 00 (x) changes sign, though f 0 (x) need not be zero.

212 CHAPTER 8. THE SECOND VARIATION

always larger, respectively, than its value at the stationary point. In the neighbourhood

of a saddle it is both larger and smaller.

The nature of a stationary point is determined by the value of the Hessian determi-

nant at the stationary point. The Hessian determinant is the determinant of the real,

symmetric Hessian matrix with elements Hij = ∂ 2 f /∂xi ∂xj . A stationary point is said

to be non-degenerate if, at the stationary point, det(H) 6= 0; a degenerate stationary

point is one at which det(H) = 0. At a degenerate stationary point a function can have

the characteristics of an extremum or a saddle, but there are other cases. In this text

the adjectives typical and non-degenerate when used to describe a stationary point are

synonyms.

Exercise 8.2

Find the Hessian determinant of the functions defined in equation 8.5.

Exercise 8.3

Show that the function f (x, y) = x3 − 3xy 2 has a degenerate stationary point at

the origin.

For a scalar function, f (x), of n variables the Hessian matrix, H, is the n × n, real

symmetric matrix with elements Hij (x) = ∂ 2 f /∂xi ∂xj . A stationary point, at x = a,

is said to be typical, or non-degenerate, if det(H(a)) 6= 0 and the classification of these

points depends entirely upon the second-order term of the Taylor expansion, that is

∆2 F [a, ξ], equation 8.4. Further, the following lemma, due to Morse (1892 – 1977),

shows that there are n + 1 types of stationary points, only two of which are extrema.

The Morse Lemma: If a is a non-degenerate stationary point of a smooth function

f (x) then there is a local coordinate system3 (y1 , y2 , · · · , yn ), where yk (a) = 0, for all

k, such that

2 2

+ · · · + yn2 ,

+ yl+2

for some 0 ≤ l ≤ n.

3 This means that in the neighbourhood of x = a the transformation from x to y is one to one and

that each coordinate yk (x) satisfies the conditions of the implicit function theorem.

8.2. STATIONARY POINTS OF FUNCTIONS OF SEVERAL VARIABLES 213

Note that this representation of the function is exact in the neighbourhood of the

stationary point: it is not an expansion. The integer l is a topological invariant, meaning

that a smooth, invertible coordinate change does not alter its value, so it is a property

of the function not the coordinate system used to represent it.

At the extremes, l = 0 and l = n, we have

Xn

f (a) + yk2 , (l = 0), minimum,

k=1

f (y) = n

X

f (a) − yk2 , (l = n), maximum.

k=1

For 0 < l < n the function is said to have a Morse l-saddle and if n 1 there are many

more types of saddles than extrema4 .

The Morse lemma is an existence theorem: it does not provide a method of deter-

mining the transformation to the coordinates y(x) or the value of the index l: this is

usually determined using the second-order term of the Taylor expansion, most conve-

niently written in terms of the Hessian matrix, H, evaluated at the stationary point a,

n X

X n

>

∆2 F [a, z] = z H(a)z = Hij (a)zi zj where z = x − a (8.6)

i=1 j=1

and z> is the transpose of the vector z. Thus the nature of the non-degenerate station-

ary point depends upon the behaviour of the quadratic form ∆2 .

(a) if ∆2 is positive definite, ∆2 > 0 for all |z| > 0, the stationary point is a minimum;

(b) if ∆2 is negative definite, ∆2 < 0 for all |z| > 0, the stationary point is a maximum;

(c) otherwise the stationary point is a saddle.

The following two statements are equivalent and both provide necessary and sufficient

conditions to determine the behaviour of ∆2 and hence the nature of the stationary

point of f (x).

(I) If the eigenvalues of H(a) are λk , k = 1, 2, · · · , n, then

(a) ∆2 will be positive definite if λk > 0 for k = 1, 2, · · · , n:

(b) ∆2 will be negative definite if λk < 0 for k = 1, 2, · · · , n:

(c) ∆2 will be indefinite if the λk are of both signs; further, the index l is equal

to the number of negative eigenvalues.

Since H is real and symmetric all its eigenvalues are real. If one of its eigenvalues

is zero the stationary point is degenerate.

(II) Let Dr be the determinant derived from H by retaining only the first r rows and

columns, so that

H11 H12 · · · H1r

H21 H22 · · · H2r

Dr =

, r = 1, 2, · · · , n,

· · ··· ·

Hr1 Hr2 · · · Hrr

4 Note that some texts use l 0 = n − l in place of l in which case the function has a minimum when

214 CHAPTER 8. THE SECOND VARIATION

(b) ∆2 will be negative definite if (−1)k Dk > 0 for k = 1, 2, · · · , n:

(c) ∆2 will be indefinite it neither conditions (a) nor (b) are satisfied.

The proof of this statement may be found in Jeffrey5 (1990, page 288).

The determinants Dk , k = 1, 2, · · · , n, are known as the descending principal minors

of H. In general a minor of a given element, aij , of an N th-order determinant is the

(N − 1)th-order determinant obtained by removing the row and column of the given

element, and with the sign (−1)i+j .

Exercise 8.4

Determine the nature of the stationary points at the origin of the following quadratic

functions:

(a) f = 2x2 − 8xy + y 2 ,

(b) f = 2x2 + 4y 2 + z 2 + 2(xy + yz + xz).

Exercise 8.5

The functions f (x, y) = x2 + y 3 and g(x, y) = x2 + y 4 are both stationary at the

origin. Show that for both functions this is a degenerate stationary point, classify

it and determine the expression for ∆2 .

Exercise 8.6

Show that a nondegenerate stationary point of the function of two variables, f (x, y)

is a minimum if

„ 2 «2

∂2f ∂2f ∂2f ∂2f ∂ f

> 0, > 0 and − > 0,

∂x2 ∂y 2 ∂x2 ∂y 2 ∂x∂y

and a maximum if

«2

∂2f ∂2f ∂2f ∂2f ∂2f

„

< 0, <0 and − > 0,

∂x2 ∂y 2 ∂x2 ∂y 2 ∂x∂y

where all derivatives are evaluted at the stationary point. Show also that if

«2

∂2f ∂2f ∂2f

„

− <0

∂x2 ∂y 2 ∂x∂y

Exercise 8.7

(a) Show that the function f (x, y) = (x3 + y 3 ) − 3(x2 + y 2 + 2xy) has stationary

points at (0, 0) and (4, 4), and classify them.

(b) Find the stationary points of the function f (x, y) = x4 + 64y 4 − 2(x + 8y)2 ,

and classify them.

5 Linear Algebra and Ordinary Differential Equations by A Jeffrey, (Blackwell Scientific Publica-

tions).

8.3. THE SECOND VARIATION OF A FUNCTIONAL 215

Exercise 8.8

The least squares fit

Given a set of N pairs of data points (xi , yi ) we require a curve given by the line

y = a + bx with the constants (a, b) chosen to minimise the function

N

X

Φ(a, b) = (a + bxi − yi )2 .

i=1

Show that (a, b) are given by the solutions of the linear equations

X X

aN + b xi = yi ,

X X 2 X

a xi + b xi = x i yi ,

Hint use the Cauchy-Schwarz inequality for sums (page 41), to show that the

stationary point is a minimum.

As with functions of n real variables the nature of a stationary functional is usually

determined by considering the second-order expansion, that is the term O(2 ) in the

difference δS = S[y +h]−S[y], where y(x) is a solution of the Euler-Lagrange equation.

In order to determine this we use the Taylor expansion of the integrand,

0 0 0 ∂F 0 ∂F

F (x, y + h, y + h ) = F (x, y, y ) + h + h

∂y 0 ∂y

2 ∂ 2 F 0 2 ∂2F ∂2F 2

0

+ h +2 hh + h + O(3 ),

2 ∂y 0 2 ∂y∂y 0 ∂y 2

and assume that h(x) belongs to D1 (a, b), defined on page 124, that is, we are consid-

ering weak variations, section 4.6. It is convenient to write

1

δS = ∆[y, h] + 2 ∆2 [y, h] + o(2 )

2

where ∆[y, h] is the Gâteaux differential, introduced in equation 4.5 (page 125), and

∆2 [y, h] is named the second variation6 and is given by

b

∂2F 0 2 ∂2F ∂2F 2

Z

0

∆2 [y, h] = dx h +2 hh + h .

a ∂y 0 2 ∂y∂y 0 ∂y 2

shall assume that on this path

1 2

δS = ∆2 [y, h] + 2 R() with lim R() = 0, for all admissible h. (8.7)

2 →0

6 Note, in some texts 2 ∆ /2 is named the second variation but, whichever convention is used the

2

subsequent analysis is identical.

216 CHAPTER 8. THE SECOND VARIATION

This means that for small the first term dominates and the sign of δS is the same

as the sign of the second variation, ∆2 , which therefore determines the nature of the

stationary path.

The expression for ∆2 may be simplified by integrating the term involving hh0 by

parts, giving

b b

∂2F 1 ∂ 2 F dh2

Z Z

dx hh0 = dx

a ∂y∂y 0 2 a ∂y∂y 0 dx

b

1 2 ∂2F 1 b

2

2 d ∂ F

Z

= h 0

− dx h .

2 ∂y∂y a 2 a dx ∂y∂y 0

Because of the boundary conditions h(a) = h(b) = 0 and the boundary term vanishes

to give

Z b Z b

∂2F

2

0 2 d ∂ F

2 dx 0

hh = − dx h .

a ∂y∂y a dx ∂y∂y 0

Thus the second variation becomes

Z b h i

∆2 [y, h] = dx P (x)h0 (x)2 + Q(x)h(x)2 , (8.8)

a

where

∂2F ∂2F ∂2F

d

P (x) = and Q(x) = − (8.9)

∂y 0 2 ∂y 2 dx ∂y∂y 0

are known functions of x, because here y(x) is a solution of the Euler-Lagrange equation.

The significance of ∆2 leads to the first two important results conveniently expressed

as theorems, which we shall not prove7 .

Theorem 8.1

A sufficient condition for the functional S[y] to have a minimum on a path y(x) for

which the first variation vanishes, ∆[y, h] = 0, is that ∆2 [y, h] > 0 for all allowed h 6= 0.

For a maximum we reverse the inequality.

Note that the condition ∆2 > 0 for all h is often described by the statement “∆2 is

strongly positive”.

If ∆2 [y, h] = 0 for some h(x) then for these h the sign of δS is determined by the

higher-order terms in the expansion, as for the examples considered in exercise 8.5.

Theorem 8.2

A necessary condition for the functional S[y] to have a minimum along the path y(x)

is that ∆2 [y, h] ≥ 0 for all allowed h. For a maximum, the sign ≥ is replaced by ≤.

The properties of ∆2 [y, h] are therefore crucial in determining the nature of a stationary

path. We need to show that ∆2 [y, h] 6= 0 for all admissible h, and this is not easy; the

remaining part of this theory is therefore devoted to understanding the behaviour of ∆ 2 .

7 Proofs of theorems 8.1 and 8.2 are provided in I M Gelfand and S V Fomin Calculus of Variations,

8.3. THE SECOND VARIATION OF A FUNCTIONAL 217

For short intervals, that is sufficiently small b − a, there is a very simple condition for

a functional to have an extremum, namely that for a ≤ x ≤ b, P (x) 6= 0: if P (x) > 0

the stationary path is a minimum and if P (x) < 0 it is a maximum. Unfortunately,

estimates of the magnitude of b − a necessary for this condition to be valid are usually

hard to find.

This result follows because if h(a) = h(b) = 0 the variation of h0 (x)2 is larger that

that of h(x)2 . We may prove this using Schwarz’s inequality, that is

Z 2 Z

b b Z b

2

dx u(x)v(x) ≤ dx |u(x)| dx |v(x)|2 , (8.10)

a a a

Rx

provided all integrals exist. Since h(x) = a du h0 (u), we have

Z x 2 Z x Z x Z x

2 0 0 2

h(x) = du h (u) ≤ du du h (u) = (x − a) du h0 (u)2

a a a a

Z b

≤ (x − a) du h0 (u)2 .

a

b b

1

Z Z

dx h(x) ≤ (b − a)2

2

dx h0 (x)2 . (8.11)

a 2 a

Exercise 8.9

As an example consider the function g(x) = (x − a)(b − x) and show that

Z b Z b

1 1

I= (b − a)5 ,

dx g(x)2 = I0 = dx g 0 (x)2 = (b − a)3 ,

a 30 a 3

√

and deduce that I < I 0 if b − a < 10.

Now all that is necessary is an application of the integral mean value theorem (page 23):

consider each component of ∆2 , equation 8.8, separately:

Z b Z b

(P )

∆2 = dx P (x)h0 (x)2 = P (xp ) dx h0 (x)2 ,

a a

Z b Z b

(Q)

∆2 = dx Q(x)h(x)2 = Q(xq ) dx h(x)2 ,

a a

If Q(x) > 0 and P (x) > 0 for a < x < b then ∆2 > 0 for all admissible h and

the stationary path is a local minimum. However, this is neither a common nor very

interesting case, and the result follows directly from equation 8.8. We need to consider

the effect of Q(x) being negative with P (x) > 0.

218 CHAPTER 8. THE SECOND VARIATION

If Q(x) is negative in all or part of the interval (a, b) it is necessary to show that

(P ) (Q)

∆2 + ∆2 > 0. We have, on using the above results,

b b

1 1 Q(xq )

Z Z

(Q) (P )

∆2 = Q(xq ) 2

dx h(x) ≤ Q(xq )(b − a)2 dx h0 (x)2 = (b − a)2 ∆2 .

a 2 a 2 P (xp )

(Q) (P )

Since P (xp ) > 0 it follows that for sufficiently small (b − a), ∆2 < ∆2 and hence

that ∆2 > 0. If P (x) < 0 for a ≤ x ≤ b, we simply consider −∆2 .

Thus for sufficiently small b − a we have

∂2F

a) if P (x) = > 0, a ≤ x ≤ b, S[y] has a minimum;

∂y 0 2

∂2F

b) if P (x) = < 0, a ≤ x ≤ b, S[y] has a maximum.

∂y 0 2

This analysis shows that for short intervals, provided P (x) does not change sign, the

functional has either a maximum or a minimum and no other type of stationary point

exists. This result highlights the significance of the sign of P (x) which, as we shall see,

pervades the whole of this theory. In practice this result is of limited value because it

is rarely clear how small the interval needs to be.

Dynamical systems

For a one-dimensional dynamical system described by a Lagrangian defined by the dif-

ference between the kinetic and potential energy, L = 21 mq̇ 2 − V (q, t), where q is the

Rt

generalised coordinate, the action, defined by the functional S[q] = t12 dt L(t, q, q̇), is

stationary along the orbit from q1 = q(t1 ) to q2 = q(t2 ). For short times the kinetic en-

ergy dominates the motion which is therefore similar to rectilinear motion and the action

has an actual minimum along the orbit. A similar result holds for many-dimensional

dynamical systems.

Comment

The analysis of this section emphasises the fact that for short intervals the solutions

of most differential equations behave similarly and very simply, and is the idea the

rectification described after theorem 2.2 (page 81).

In this section we show that a necessary condition for ∆2 [y, h] ≥ 0 for all admissible h(x)

is that P (x) = Fy0 y0 ≥ 0 for a ≤ x ≤ b. Unlike the result derived in the previous section

this does not depend upon the interval being small, though only a necessary condition is

obtained. This result is due to Legendre: it is important because it is usually easier to

apply than the necessary condition of theorem 8.2. Further, it is of historical significance

because Legendre attempted (unsuccessfully) to show that a sufficient condition for S[y]

to have a weak minimum on the path y(x) is that Fy0 y0 > 0 on every point of the curve.

Recall theorem 8.2 which states that a necessary condition for S[y] to have a mini-

mum on the stationary path y(x) is that ∆2 [y, h] ≥ 0 for all allowed h(x). We now show

that a necessary condition for ∆2 ≥ 0, for all h(x) in D1 (a, b) (defined on page 124),

such that h(a) = h(b) = 0, is that P (x) = ∂ 2 F/∂y 0 2 ≥ 0 for a ≤ x ≤ b. The proof is by

contradiction.

8.3. THE SECOND VARIATION OF A FUNCTIONAL 219

Suppose that at some point x0 ∈ [a, b], P (x0 ) = −2p, (p > 0). Then since P (x) is

continuous

P (x) < −p if a ≤ x0 − α ≤ x ≤ x0 + α ≤ b

for some α > 0. We now construct a suitable function h(x) for which ∆2 < 0. Let

2 π(x − x0 )

sin , x0 − α ≤ x ≤ x0 + α,

h(x) = α

0, otherwise.

Then

Z b

∆2 = dx P (x)h0 2 + Q(x)h2

a

π 2 x0 +α

Z x0 +α

2π(x − x0 ) π(x − x0 )

Z

2 4

= dx P (x) sin + dx Q(x) sin

α2 x0 −α α x0 −α α

pπ 2 3

< − + M α, M = max (|Q|).

α 4 a≤x≤b

For sufficiently small α the last expression becomes negative and hence ∆2 < 0. But, it

is necessary that ∆2 ≥ 0, theorem 8.2, and hence it follows that we need P (x) ≥ 0 for

x ∈ [a, b]. Note that as in the analysis leading to 8.11 it is the term depending upon

h0 (x) that dominates the integral.

Exercise 8.10

Explain why h(x) has to be in D1 (a, b).

We have shown that a necessary condition for ∆2 [y, h] ≥ 0 is that P (x) ≥ 0 for x ∈ [a, b].

Using theorem 8.2 this shows that a necessary condition for S[y] to be a minimum on

the stationary path y(x) is that P (x) ≥ 0.

Legendre also attempted, unsuccessfully, to show that the weaker condition P (x) > 0,

x ∈ [a, b], is also sufficient. That this cannot be true is shown by the following counter

example.

We know that the minimum distance between two points on a sphere is along the

shorter arc of the great circle passing through them — assuming that the two points are

not on the same diameter. Thus for the three points A, B and C, on the great circle

through A and B, shown in figure 8.5, the shortest distance between A and B and

between B and C is along the short arcs and on these P > 0, exercise 5.20 (page 168).

Great circle C

A Diameter

through A

B

Figure 8.5

Hence on the arc ABC, P > 0, but this is not the shortest distance between A and C.

Hence, it is not sufficient that P > 0 for a stationary path to give a minimum.

220 CHAPTER 8. THE SECOND VARIATION

In this section we continue our analysis of the second variation

Z b h i

∆2 [y, h] = dx P (x)h0 (x)2 + Q(x)h(x)2 (8.12)

a

d

P (x) = and Q(x) = −

∂y 0 2 ∂y 2 dx ∂y∂y 0

In order that the functional S[y], defined in equation 8.1, has a minimum (maximum)

on y(x) it is necessary and sufficient that ∆2 > 0 (∆2 < 0) for