Sie sind auf Seite 1von 439

Universitext

Gheorghe Moroşanu

Functional
Analysis for
the Applied
Sciences
Universitext
Universitext

Series editors
Sheldon Axler
San Francisco State University, San Francisco, CA, USA

Carles Casacuberta
Universitat de Barcelona, Barcelona, Spain

John Greenlees
University of Warwick, Coventry, UK

Angus MacIntyre
Queen Mary University of London, London, UK

Kenneth Ribet
University of California, Berkeley, CA, USA

Claude Sabbah
École Polytechnique, CNRS, Université Paris-Saclay, Palaiseau, France

Endre Süli
University of Oxford, Oxford, UK

Wojbor A. Woyczyński
Case Western Reserve University, Cleveland, OH, USA

Universitext is a series of textbooks that presents material from a wide variety of


mathematical disciplines at master’s level and beyond. The books, often well class-
tested by their author, may have an informal, personal even experimental approach
to their subject matter. Some of the most successful and established books in the se-
ries have evolved through several editions, always following the evolution of teach-
ing curricula, into very polished texts.

Thus as research topics trickle down into graduate-level teaching, first textbooks
written for new, cutting-edge courses may make their way into Universitext.

More information about this series at http://www.springer.com/series/223


Gheorghe Moroşanu

Functional Analysis
for the Applied Sciences
Gheorghe Moroşanu
Romanian Academy of Sciences
Bucharest, Romania
Department of Mathematics
Babes-Bolyai University
Cluj-Napoca, Romania

ISSN 0172-5939 ISSN 2191-6675 (electronic)


Universitext
ISBN 978-3-030-27152-7 ISBN 978-3-030-27153-4 (eBook)
https://doi.org/10.1007/978-3-030-27153-4

Mathematics Subject Classification (2010): 32A70

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica-
tion does not imply, even in the absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Dedicated to my wife, Carmen
Preface

The goal of this book is to present in a friendly manner some of the


main results and techniques in Functional Analysis and use them to
explore various areas in mathematics and its applications. Special
attention is paid to creating appropriate frameworks towards solving
different problems in the field of differential and integral equations. In
fact, the flavor of this book is given by the fine interplay between the
tools offered by Functional Analysis and some specific problems which
are of interest in the Applied Sciences.
The table of contents of the book (see below) offers a fairly good
description of the material. In contrast with other books in the field,
we present in Chap. 1 the real number system, describing the Cantor–
Méray model which is most appropriate for our purposes here. Indeed,
it is based on a completion procedure, allowing the extension from ra-
tional numbers to real numbers. This procedure involves the concepts
of limit and infinity that are specific to analysis. We consider the
Cantor–Méray construction as the corner stone of mathematical anal-
ysis, which is why we pay attention to this subject which is usually
assumed well known.
In order to help the reader to understand the richness of ideas and
methods offered by Functional Analysis, we have included a section of
exercises at the end of each chapter. Some of these exercises supple-
ment the theoretical material discussed in the corresponding chapter,
while others are mathematical problems that are related to the real
world. Some of the exercises are borrowed from other books, being
reformulated and/or presented in a form adapted to the needs of the
corresponding chapter. We do not indicate the books where individual
exercises come from, but all those sources are included into the refer-
ence list of our book. In any event, we do not claim originality in such
cases. Other exercises were invented by us to offer the reader enough

vii
viii Preface

material to understand the theoretical part of the book and gain ex-
pertise in solving practical problems. In the last chapter of the book
(Chap. 12), we provide solutions to almost all exercises. This is in con-
trast to many other books which include exercises without solutions.
For easy exercises, we provide hints or final solutions, and answers to
very easy exercises are left to the reader. I encourage everybody to
spend some time working on an exercise before looking at its solution.
We shall refer to an exercise by indicating the chapter and exercise
numbers (and not the section number). For example, Exercise 11.3
will mean Exercise 3 in the last section of Chap. 11 (which is Sect. 11.3
in this case).
The book is addressed to graduate students and researchers in
applied mathematics and neighboring fields of science.
I would like to thank the anonymous reviewers whose pertinent
comments improved the initial version of the book.
Special thanks are due to a former American student of mine, Ivan
Andrus, who wrote the first draft of the present book as lecture notes
for my Functional Analysis lectures in 2010. He also carefully checked
the final version of the book and suggested several minor changes.
I am also indebted to my former student Liviu Nicolaescu for read-
ing the first part of the book and correcting some errors.
Last but not least, I would like to thank Mrs. Elizabeth Loew,
Executive Editor at Springer, for our very kind cooperation that led
to the successful completion of this book project.

Cluj-Napoca, Romania Gheorghe Moroşanu


Contents

1 Introduction 1
1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Complex Numbers . . . . . . . . . . . . . . . . . . . . 15
1.5 Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Metric Spaces 31
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Continuous Functions on Compact Sets . . . . . . . . . 44
2.5 The Banach Contraction Principle . . . . . . . . . . . 55
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 The Lebesgue Integral and Lp Spaces 65


3.1 Measurable Sets in Rk . . . . . . . . . . . . . . . . . . 65
3.2 Measurable Functions . . . . . . . . . . . . . . . . . . . 71
3.3 The Lebesgue Integral . . . . . . . . . . . . . . . . . . 75
3.4 Lp Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4 Continuous Linear Operators and Functionals 89


4.1 Definitions, Examples, Operator Norm . . . . . . . . . 89
4.2 Main Principles of Functional Analysis . . . . . . . . . 93
4.3 Compact Linear Operators . . . . . . . . . . . . . . . . 96
4.4 Linear Functionals, Dual Spaces, Weak Topologies . . 97
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 104
ix
x Contents

5 Distributions, Sobolev Spaces 107


5.1 Test Functions . . . . . . . . . . . . . . . . . . . . . . . 107
5.2 Friedrichs’ Mollification . . . . . . . . . . . . . . . . . . 112
5.3 Scalar Distributions . . . . . . . . . . . . . . . . . . . . 119
5.3.1 Some Operations with Distributions . . . . . . 121
5.3.2 Convergence in Distributions . . . . . . . . . . 122
5.3.3 Differentiation of Distributions . . . . . . . . . 125
5.3.4 Differential Equations for Distributions . . . . . 131
5.4 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . 143
5.5 Bochner’s Integral . . . . . . . . . . . . . . . . . . . . . 149
5.6 Vector Distributions, W m,p (a, b; X) Spaces . . . . . . . 155
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6 Hilbert Spaces 165


6.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.2 Jordan–von Neumann Characterization Theorem . . . 168
6.3 Projections in Hilbert Spaces . . . . . . . . . . . . . . 171
6.4 The Riesz Representation Theorem . . . . . . . . . . . 175
6.5 Lax–Milgram Theorem . . . . . . . . . . . . . . . . . . 180
6.6 Fourier Series Expansions . . . . . . . . . . . . . . . . 186
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7 Adjoint, Symmetric, and Self-adjoint Linear


Operators 201
7.1 The Adjoint of a Linear Operator . . . . . . . . . . . . 201
7.2 Adjoints of Operators on Hilbert Spaces . . . . . . . . 204
7.2.1 The Case of Compact Operators . . . . . . . . 205
7.3 Symmetric Operators and Self-adjoint Operators . . . 209
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 212

8 Eigenvalues and Eigenvectors 217


8.1 Definition and Examples . . . . . . . . . . . . . . . . . 217
8.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . 219
8.3 Eigenvalues of −Δ Under the Dirichlet Boundary
Condition . . . . . . . . . . . . . . . . . . . . . . . . . 226
8.4 Eigenvalues of −Δ Under the Robin Boundary
Condition . . . . . . . . . . . . . . . . . . . . . . . . . 228
8.5 Eigenvalues of −Δ Under the Neumann Boundary
Condition . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.6 Some Comments . . . . . . . . . . . . . . . . . . . . . 232
8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Contents xi

9 Semigroups of Linear Operators 243


9.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.2 Some Properties of C0 -Semigroups . . . . . . . . . . . 246
9.3 Uniformly Continuous Semigroups . . . . . . . . . . . 252
9.4 Groups of Linear Operators. Definitions and Link
to Operator Semigroups . . . . . . . . . . . . . . . . . 254
9.5 Translation Semigroups . . . . . . . . . . . . . . . . . . 257
9.6 The Hille–Yosida Generation Theorem . . . . . . . . . 260
9.7 The Lumer–Phillips Theorem . . . . . . . . . . . . . . 265
9.8 The Feller–Miyadera–Phillips Theorem . . . . . . . . . 268
9.9 A Perturbation Result . . . . . . . . . . . . . . . . . . 271
9.10 Approximation of Semigroups . . . . . . . . . . . . . . 273
9.11 The Inhomogeneous Cauchy Problem . . . . . . . . . . 279
9.12 Applications . . . . . . . . . . . . . . . . . . . . . . . . 283
9.12.1 The Heat Equation . . . . . . . . . . . . . . . . 283
9.12.2 The Wave Equation . . . . . . . . . . . . . . . 286
9.12.3 The Transport Equation . . . . . . . . . . . . . 288
9.12.4 The Telegraph System . . . . . . . . . . . . . . 291
9.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 293

10 Solving Linear Evolution Equations


by the Fourier Method 297
10.1 First Order Linear Evolution
Equations . . . . . . . . . . . . . . . . . . . . . . . . . 297
10.2 Second Order Linear Evolution
Equations . . . . . . . . . . . . . . . . . . . . . . . . . 304
10.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 309

11 Integral Equations 315


11.1 Volterra Equations . . . . . . . . . . . . . . . . . . . . 315
11.2 Fredholm Equations . . . . . . . . . . . . . . . . . . . 325
11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 336

12 Answers to Exercises 341


12.1 Answers to Exercises for Chap. 1 . . . . . . . . . . . . 341
12.2 Answers to Exercises for Chap. 2 . . . . . . . . . . . . 343
12.3 Answers to Exercises for Chap. 3 . . . . . . . . . . . . 354
12.4 Answers to Exercises for Chap. 4 . . . . . . . . . . . . 359
12.5 Answers to Exercises for Chap. 5 . . . . . . . . . . . . 365
xii Contents

12.6 Answers to Exercises for Chap. 6 . . . . . . . . . . . . 375


12.7 Answers to Exercises for Chap. 7 . . . . . . . . . . . . 383
12.8 Answers to Exercises for Chap. 8 . . . . . . . . . . . . 390
12.9 Answers to Exercises for Chap. 9 . . . . . . . . . . . . 398
12.10 Answers to Exercises for Chap. 10 . . . . . . . . . . . . 407
12.11 Answers to Exercises for Chap. 11 . . . . . . . . . . . . 417

Bibliography 429
Chapter 1

Introduction

This chapter comprises definitions, notation, and basic results related


to set theory, real and complex numbers, and linear spaces.

1.1 Sets
We assume that the reader is familiar with the basic concepts and
results of set theory. However, we are going to recall or specify some
concepts and symbols that will be frequently used in this book.
First of all, in this book the notation A ⊂ B or B ⊃ A indicates that
every element (member) of the set A is also an element of the set B.
In particular, A ⊂ A. The empty set, i.e., the set with no elements,
will be denoted as usual by ∅. The empty set is a subset of every set
A, ∅ ⊂ A. The sets A, B are equal, A = B, if and only if A ⊂ B and
B ⊂ A.
We assume that the sets
N = {1, 2, . . . } (natural numbers),
Z = {. . . , −2, −1, 0, 1, 2, . . . } (integers), and
Q = {0} ∪ {±m/n; m, n ∈ N, (m, n) = 1} (rational numbers)
are well known, including their axiomatic definitions.
A set A is called countable if there exists an injective function from A
to N. If one can find a bijective function from A to N then S is called
countably infinite. In particular, N, Z, and Q are countably infinite
sets. In fact, a countable set is either finite or countably infinite.

© Springer Nature Switzerland AG 2019 1


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 1
2 1 Introduction

Ordered Sets. A partial order on a given set A is a binary relation


≤ over A satisfying the following conditions for x, y, z ∈ A: (a) x ≤ x;
(b) if x ≤ y and y ≤ x, then x = y; (c) if x ≤ y and y ≤ z, then x ≤ z.
We say that x < y if x ≤ y and x = y. The symbols ≥ and > have
natural meanings: x ≥ y iff y ≤ x, and x > y iff y < x.
If A is endowed with a partial order, then A is called a partially ordered
set. For example, N is partially ordered with respect to the divisibility
relation (m ≤ n if m is a divisor of n); also, the set of subsets of a given
set S is partially ordered by the inclusion relation. Note that in these
examples there are pairs of elements which are not comparable with
respect to the corresponding order, which is why the order is called
partial.
If A is a set with a partial order ≤, then a subset B ⊂ A is said
to be totally ordered (or a chain) if any two elements x, y ∈ B are
comparable, i.e., either x ≤ y or y ≤ x (including the case x = y).
Let B be a subset of A. An element z ∈ A is an upper bound for B if
x ≤ z for all x ∈ B. If B has an upper bound, it is said to be bounded
above. An element m ∈ A is a maximal element of A if there is no
x ∈ A, x = m, such that m ≤ x. A maximal element of A is not
necessarily an upper bound for A.
The set A is called inductive if any totally ordered subset of A has an
upper bound.
Now, let us recall an important result which is known as Zorn’s
Lemma1 :

Theorem 1.1 (Zorn’s Lemma). Every nonempty, partially ordered,


inductive set has a maximal element.

If B is a nonempty subset of a partially (possibly totally) ordered set


A, the supremum of B, denoted sup B, is defined as the least upper
bound of B. An element b ∈ A is the least upper bound of B if and
only if

(i) x ≤ b for all x ∈ B;

(ii) if a < b then a is not an upper bound of B, i.e., there exists an


x ∈ B such that a < x.

If sup B exists, then it is unique. If B has a greatest element b (i.e.,


x ≤ b for all x ∈ B), then b = sup B.
1
Max August Zorn, German mathematician, 1906–1993.
1.3 Real Numbers 3

1.2 Sequences
A sequence in a nonempty set X is an ordered list of elements from
X, and can be defined as a function f : D → X whose domain D
is a countable, totally ordered set. The case when D is finite is not
considered in this book. We shall mostly consider that D = N and
the sequence is usually denoted (an )n∈N , or simply (an ), where an =
f (n) for all n ∈ N. Sometimes we consider infinite subsets of N, for
instance, D = {m, m + 1, . . . }, m ∈ N, m > 1, and in this case the
sequence is denoted (an )n≥m . A sequence can also be indicated by
listing its terms: (an )n∈N = (a1 , a2 , . . . ). For example, (1, 3, 5, 7, . . . )
is the sequence of odd natural numbers. It is worth pointing out
that a term (element) can appear several times in a sequence, e.g.,
(an )n∈N = (0, 1, 0, 1, 0, 1, . . . ), where a2k−1 = 0 and a2k = 1 for all
k ∈ N.
A subsequence of a given sequence (an )n∈N = (a1 , a2 , . . . ) is a new
sequence (bk )k∈N , obtained by removing some terms from (a1 , a2 , . . . )
and preserving the order of the remaining terms, i.e.,

bk = ank , k ∈ N,

where n1 < n2 < · · ·


We close this section by noting that further details on sequences will
be discussed later.

1.3 Real Numbers


While everybody feels comfortable dealing with rational numbers, in
order to understand the larger set of real numbers some effort is
needed.
Real numbers are needed since the set of rational numbers Q is not
sufficiently large for many purposes. For example, the equation p2 = 2
has no solution in Q. This assertion was first proved by Euclid.2 In
fact, it was observed that the diagonal and the side of any square are
incommensurable, i.e., the length p of the diagonal of the unit square
is not a rational number. Indeed, p must satisfy the equation p2 = 2.
One needs to find a number p (which cannot be a rational one) to
represent the length of that diagonal. Many other similar examples
2
Greek mathematician, known as father of Geometry, born around 330 BC,
presumably in Alexandria, Egypt.
4 1 Introduction

appear when trying to express areas, volumes, weights, etc. So, it was
really necessary to enlarge the set Q to obtain a set R, called the set of
real numbers, within which inconveniences as those described above do
not occur. The elements of R  Q√ will be called irrational numbers. In
particular, the irrational number 2 will be the precise representation
for the length of the diagonal of the unit square. In fact, we will √ see
√ equation p = 2 discussed above has two solutions in R, + 2
that the 2

and − 2.
Roughly speaking, R is the completion of Q, as we will explain below.
First of all, let us recall an axiomatic definition of R: R is an ordered
field, containing Q as a subfield, and having the least upper bound
property. More precisely, R, endowed with two internal operations,
addition and multiplication, denoted “+” and “·”, and a total order,
denoted “≤”, satisfies the following axioms:

(A1) x + y = y + x for all x, y ∈ R;

(A2) (x + y) + z = x + (y + z) for all x, y, z ∈ R;

(A3) there exists an element 0 ∈ R such that x + 0 = x for all


x ∈ R;

(A4) for all x ∈ R there exists an element −x ∈ R such that


x + (−x) = 0;

(M1) xy = yx for all x, y ∈ R (note that here and in what follows


x · y is also denoted xy);

(M2) (xy)z = x(yz) for all x, y, z ∈ R;

(M3) there exists an element 1 ∈ R, 1 = 0, such that 1 · x = x for


all x ∈ R;

(M4) for all x ∈ R  {0} there exists an element x−1 ∈ R (called


the inverse of x, also denoted x1 or 1/x) such that x · x−1 = 1;

(D) x(y + z) = xy + xz for all x, y, z ∈ R (the distributive law);

(O1) if x, y ∈ R and x ≤ y, then x + z ≤ y + z for all z ∈ R;

(O2) if x, y ∈ R and x ≥ 0, y ≥ 0, then xy ≥ 0;

(LUBP) for every nonempty subset A of R that is bounded above (i.e.,


A has an upper bound) there exists sup A ∈ R.
1.3 Real Numbers 5

The axiom (LUBP) is called the least upper bound property (which
is why it is so denoted) or the completeness axiom (this name will be
clarified in the following).
Remark 1.2. The fact that Q is a subfield of R means that Q ⊂ R and
the operations of addition and multiplication in R are also internal
operations in Q. In fact, any ordered field K contains a subfield QK
which is isomorphic to Q. Indeed, the function g : Q → K, defined by
g(m/n) = (m · 1K ) · (n · 1K )−1 , is an injective morphism, so g(Q) is a
subfield of K isomorphic to Q. Thus, the condition from the definition
above, that R contains Q as a subfield is superfluous if we admit that
Q is unique up to isomorphism. We merely wanted to make it clear
that R is an extension of Q.
Remark 1.3. It is worth pointing out that the extension from rational
numbers to real numbers is the result of a long investigative process
extended over more than 2000 years. The problem was clarified in
the nineteenth century. There are several models for R defined by
the above system of axioms, such as the Stolz–Weierstrass model,3
based on decimal expansions; Dedekind’s model,4 based on the so-
called Dedekind cuts and the Cantor–Méray model.5 All these
models are based on approximation (as are all models of R). We shall
describe the Cantor–Méray construction which involves Cauchy se-
quences of rational numbers and uses the basic properties of Q as an
ordered field. Intuitively speaking, according to this construction, R
will consist of all rational numbers, plus “limits” of Cauchy6 sequences
in Q which are not rational numbers. The most important step in this
construction (completion procedure) will be to show that the com-
pleteness axiom is satisfied by this model, denoted RC−M (C − M
comes from Cantor–Méray), thus ensuring that any Cauchy sequence
of rational numbers is “convergent” (has a “limit”) in RC−M . But
such “limits” cannot be used in this construction (one cannot define
real numbers by themselves!), so instead we consider as elements of
RC−M the equivalence classes of Cauchy rational sequences (two se-
quences being equivalent if the corresponding sequence of differences
3
Otto Stolz, Austrian mathematician, 1842–1905; Karl Weierstrass, German,
known as father of modern analysis, 1815–1897.
4
Richard Dedekind, German mathematician, 1831–1916.
5
Georg Cantor, German mathematician, 1845–1918; Charles Méray, French
mathematician, 1835–1911.
6
August-Louis Cauchy, French mathematician, engineer and physicist, 1789–
1857.
6 1 Introduction

approaches zero); one considers equivalence classes because the se-


quence which is supposed to define (“converge to”) a real number is
not unique.
Finally, we will prove that any two copies of R are isomorphic, thus
concluding that R is unique up to isomorphism.
Before presenting in detail the Cantor–Méray model, we will make a
few comments and derive some abstract results regarding R as defined
by the axioms above.
Remark 1.4. It is easily seen that (LUBP) implies that for any non-
empty set A ⊂ R which is bounded below (i.e., has a lower bound),
there exists the greatest lower bound of A, denoted inf A ∈ R. In fact,
inf A = − sup {x ∈ R; −x ∈ A}. The converse implication is also true,
so one may replace (LUBP) by this equivalent statement.
Remark 1.5. It is worth pointing out that the (LUBP) is precisely
what makes the difference between R and Q. Indeed, Q is an ordered
field, but does not satisfy the (LUBP), as illustrated by the following
counterexample:
Let A ⊂ Q denote the set {p ∈ Q : p > 0, p2 < 3}. A is nonempty,
since 1 ∈ A. Obviously, A is bounded above (e.g., 2 is an upper bound
of A). Assume by contradiction that there exists a number α ∈ Q
which is the least upper bound of A, α = sup A. Then α ≥ 1 and we
need to examine the following three possibilities: α2 < 3, α2 = 3, and
α2 > 3.
If α2 < 3, then (2α + 3)/(α + 2) > α, and (2α + 3)/(α + 2) ∈ A, so α
is not even an upper bound of A.
The case α ∈ Q, α2 = 3 is impossible (prove it!).
Finally, if α2 > 3, then β := (2α + 3)/(α + 2) ∈ Q, β > 0 (since α ∈ Q,
α ≥ 1), and α − β = (α2 − 3)/(α + 2) > 0, hence β < α. On the other
hand, 3 − β 2 = (3 − α2 )/(α + 2)2 < 0, so β 2 > 3. It follows that β
is an upper bound for A, with β < α. This contradicts the fact that
α = sup A.
Since none of the above cases is possible, there is no rational number
α such that α = sup A. Therefore Q does not satisfy the (LUBP).
Note that if A is considered as a subset of R, then there exists sup A =

3 ∈ R \ Q (see below).
Now, we present a result known as the Archimedean7 property:

7
Archimedes of Syracuse, 287–212 BC.
1.3 Real Numbers 7

Theorem 1.6. If x, y ∈ R and x > 0, then there exists n ∈ N such


that nx > y.
Proof. Assume that, on the contrary, nx ≤ y for all n ∈ N, so the set
A = {nx; n ∈ N} is bounded above. Then the (LUBP) implies that
there exists α = sup A ∈ R. Since α − x < α, there exists an element
of A, say mx, with m ∈ N, such that α − x < mx which is equivalent
to α < (m + 1)x ∈ A. This contradicts the fact that α is an upper
bound of A.

Theorem 1.7. Q is dense in R, i.e., between any two distinct real


numbers there is a rational number.
Proof. Let x, y ∈ R, x < y. Since y − x > 0 it follows by the
Archimedean property that there exists an n ∈ N such that
n(y − x) > 1. (1.3.1)
By the same Archimedean property there exist w, z ∈ N such that
−w < −nx < z. In fact, w can be replaced by m := − sup{r ∈ Z :
−w ≤ r < −nx}, so nx < m. Moreover,
nx < m ≤ nx + 1. (1.3.2)
By (1.3.1) and (1.3.2) we can conclude that x < m/n < y.

Theorem 1.8 (existence of n-th roots of positive reals). For all x ∈ R,


x > 0, and for all n ∈ N, n ≥ 2, there exists a unique y ∈ R, y > 0,
such that y n = x.
Proof. The uniqueness of y follows from the implication 0 < y1 < y2 ⇒
y1n < y2n . To prove the existence of y consider the set A = {t ∈ R; t >
0, tn < x}. A is nonempty, since it contains t1 = x/(1 + x). Indeed,
tn1 < t1 < x. A is also bounded above (for example, 1 + x is an upper
bound for A). By the (LUBP) there exists y = sup A ∈ R, y > 0. Let
us prove that y n = x. Assuming that y n < x, we have for 0 < ε < 1,
(y + ε)n − y n = ε[(y + ε)n−1 + y(y + ε)n−2 + · · · + y n−1 ] < εn(y + 1)n−1 .
Hence
(y + ε)n < y n + εz, (1.3.3)
where z = n(y+1)n−1 . By the Archimedean property, there is a k ∈ N,
k ≥ 2, such that ε = 1/k satisfies
εz < x − y n . (1.3.4)
8 1 Introduction

From (1.3.3) and (1.3.4) it follows that y + ε ∈ A which contradicts


the fact that y = sup A. We can also show that y n > x leads to a
contradiction. Hence, y n = x.
√ √
The n-th root y of the real number x > 0 is denoted n x ( x if
n = 2) or x1/n . At this moment, we can see that in particular
√ 2 the
2
equation√p = 2 √ can be solved in R: p = 2 ⇔ p − ( 2)
2 2
√ = 0
⇔ (p √− 2)(p + 2) =√ 0, so there are two solutions, p = 2 and
p = − 2. The number 2, which is irrational, represents in particular
the length of the diagonal of the unit square. So, the
√ difficulty pointed
out by Euclid can be handled in R. Similarly, 3 is an irrational
number representing the length of the diagonal of the unit cube.
Remark 1.9. Sometimes it is useful to represent numbers by points on
a straight line. First, let us mark arbitrarily two distinct points O
and A on the straight line to represent the numbers 0 and 1. The line
segment OA is called the unit segment. If we choose a point P to the
right of A, such that OP consists of m unit segments, m ∈ N, m ≥ 2,
then P represents the natural number m. The negative integers are
similarly represented by points on the left of O, following the natural
order . . . , −3, −2, −1. So now we have a directed straight line, called
the number line, including the positive half-line (on the right of O)
and the negative half-line. One can also associate with any rational
number a point on the number line. For example, if one divides OA
into 2 equal parts and choose a point R on the positive half-line, such
that OR is equal to 3 such parts, then R represents 3/2. Obviously,
the points corresponding to distinct rational numbers are distinct too.
Note that the set of points on the number line corresponding to all the
rational numbers does not cover the number line. For example, the
point√D corresponding to the length of the diagonal of the unit square
(i.e., 2) is on the number line (D being constructible by using a ruler
and compass). We will discuss later the representation of irrational
numbers by points on the number line.

Sequences of Real Numbers. A sequence (an )n∈N in R is said to be


increasing (or nondecreasing) if an ≤ an+1 for all n ∈ N. If an < an+1
for all n ∈ N, then (an ) is called strictly increasing.
Similarly, if the order relations “≤” and “<” are replaced by “≥”
and “>”, we obtain the definitions for a decreasing (or nonincreasing)
sequence, and a strictly decreasing sequence, respectively.
1.3 Real Numbers 9

A sequence (an )n∈N in R is said to be bounded above (bounded below)


if there exists an M ∈ R such that an ≤ M (an ≥ M , respectively) for
all n ∈ N. If (an ) is bounded both above and below, then it is called
bounded.
A sequence (an )n∈N in R is said to be convergent if there exists a
number a ∈ R (called limit of (an )) such that
∀ε ∈ R, ε > 0, ∃N = N (ε) ∈ N such that ∀ n > N, |an − a| < ε.
Here, | · | means the absolute value function, i.e., |x| = x if x ≥ 0, and
|x| = −x if x < 0. The above definition (of a convergent sequence)
will be discussed again later in a more general framework. Here we
are interested in some properties of sequences of real numbers.
It is easily seen that any convergent sequence is bounded, and its limit
is unique.
Next, we state the so-called Monotone Convergence Theorem:

Theorem 1.10 (Monotone Convergence Theorem). Any sequence


(an )n∈N in R which is increasing (decreasing) and bounded is
convergent.

Proof. We consider the case when (an ) is increasing and bounded (the
other case is similar). Since the set of all an ’s (where repetitions are
eliminated) is bounded above, it follows by (LUBP) that there exists
its supremum a ∈ R. Thus, for all ε ∈ R, ε > 0, there exists an N ∈ N,
such that a − ε < N . Since (an ) is increasing, we have a − ε < an for
all n > N , so
|an − a| = a − an < ε ∀n > N.

We continue with the following result known as Bolzano–Weierstrass’


Theorem.8

Theorem 1.11 (Bolzano–Weierstrass). Every bounded sequence in R


has a convergent subsequence.

Proof. Let (an )n∈N be a bounded sequence in R. Let k be a natural


number with the property ak > am for all m > k. Assume there are
infinitely many such k’s, say k = nj , n1 < n2 < · · · < nj < · · · . Then,
the subsequence (anj )j∈N is strictly decreasing, hence convergent since
it is also bounded (cf. Theorem 1.10).
8
Bernard Bolzano, Bohemian mathematician, logician, philosopher, and the-
ologian, 1781–1848.
10 1 Introduction

If the set of such k’s is finite (possibly empty), we denote by K the


maximum of such k’s. Obviously, for n1 = K + 1 there exists an
n2 ∈ N, such that an1 ≤ an2 . Now, since n2 does not belong to the
set of k’s, there exists an n3 ∈ N such that an2 ≤ an3 . Continuing this
procedure we obtain a subsequence (anj )j∈N which is increasing and
bounded, hence convergent (cf. Theorem 1.10).

A sequence (an )n∈N in R is said to be a Cauchy sequence if

∀ε ∈ R, ε > 0, ∃N = N (ε) ∈ N such that


∀ n, m > N, |an − am | < ε.

Theorem 1.12. A sequence in R is Cauchy if and only if it is con-


vergent.
Proof. Let (an )n∈N be a Cauchy sequence in R. It is easily seen that
(an ) is bounded. Thus, by Theorem 1.11, there is a convergent subse-
quence, say (ank )k∈N . Let a ∈ R be its limit. By the triangle inequality
(which obviously holds in R), we have

|an − a| ≤ |an − ank | + |ank − a|.

Using this inequality we easily conclude that (an ) is convergent (with


the same limit a).
The converse implication is trivial.

The facts recalled above, derived from the axiomatic definition of R,


are important in real analysis and also help us understand the Cantor–
Méray model for R.

The Cantor–Méray Construction. Assume that Q (the ordered


field of rational numbers) is known. We want to extend Q to obtain
a larger ordered field satisfying in addition the (LUBP). Denote by
SQ the collection of all Cauchy sequences of rational numbers. When
defining a Cauchy sequence in Q we require ε ∈ Q, ε > 0 (since the
extension of Q is not yet known). Define the following equivalence
relation in SQ

(an ) ∼ (bn ) iff ∀ε ∈ Q, ε > 0, ∃N ∈ N such that


∀n > N, |an − bn | < ε.

For example, the sequences (an ), (bn ), (cn ), defined by an = 1/n,


bn = n/(n2 + 1), cn = 0 for all n ≥ 1, belong to the same equivalence
1.3 Real Numbers 11

class, i.e., the class of the constant sequence (0, 0, . . . ), which can be
identified with 0 ∈ Q. We identify any r ∈ Q with the equivalence
class of the constant sequence (r, r, . . . ).
Let us denote by RC−M the set of all equivalence classes in SQ (with
respect to the equivalence relation defined above). Obviously, Q can
be regarded as a subset of RC−M (in view of the natural identification
mentioned above).
Now, one defines in a natural manner the operations of addition and
multiplication in RC−M . Specifically, if a, b are classes in RC−M with
representatives (an ), (bn ) ∈ SQ , then a + b and ab are defined as the
equivalence classes of (an + bn ) and (an bn ), respectively. Also, a ≤ b
if for all ε ∈ Q, ε > 0, there exists an N ∈ N such that bn − an ≥ −ε
for all n ≥ N . Note that the strict inequality a < b (i.e., a ≤ b
and a = b) can be equivalently expressed as follows: there exists an
ε0 ∈ Q, ε0 > 0, such that bn −an ≥ ε0 for all n large enough. Likewise,
these definitions do not depend on specific representatives.
It is easily seen that RC−M is an ordered field satisfying axioms (A1)−
(A4), (M 1) − (M 4), (D), and (O1) − (O2).
Let us now prove that RC−M also satisfies the (LUBP). Let Ω be a
nonempty subset of RC−M which is bounded above, with upper bound
of a ∈ RC−M . One may assume that a is the class of a constant
sequence (u0 , u0 , . . . ) with u0 ∈ Q (if this is not the case, we can use
the information that a Cauchy sequence in Q has an upper bound in Q,
so a can be replaced by the class of a constant sequence (u0 , u0 , . . . ),
where u0 is a large rational number).
Let us pick an s0 ∈ Ω and a rational number l0 such that l0 < s0 ,
where l0 is identified with the class of the constant sequence (l0 , l0 , . . . ).
Next, we construct two sequences of rational numbers (un ) and (ln ) as
follows: u1 = u0 and l1 = l0 , then, successively, for n = 1, 2, . . . , either
un+1 = (un + ln )/2, ln+1 = ln if (un + ln )/2 is an upper bound of Ω, or
un+1 = un and ln+1 = (un + ln )/2 if (un + ln )/2 is not an upper bound
of Ω. By induction we can see that un is an upper bound of Ω for all
n ∈ N, while ln is not an upper bound of Ω for any n ∈ N. Obviously,
(un ) and (ln ) are Cauchy sequences, so their classes u, l ∈ RC−M , and
in fact u = l, since |un − ln | = un − ln = (u0 − l0 )/2n−1 , n ≥ 1. It
is also obvious that u is an upper bound of Ω. Let us prove that u is
the least upper bound: u = sup Ω. Assume that there exists a smaller
upper bound, say v ∈ RC−M , v < u = l. Since lk ≤ lk+1 for all k ∈ N,
there exists an N ∈ N such that v < lN . But lN is not an upper
bound of Ω, hence v = u cannot be an upper bound of Ω, leading to a
12 1 Introduction

contradiction. Therefore, RC−M satisfies all the axioms and is indeed


a model for R.
Remark 1.13. Let us summarize: any element x ∈ RC−M is the equiv-
alence class of a Cauchy sequence in Q, say (rn ) (this could be a
constant sequence if x ∈ Q); since RC−M is a model for R (a complete
ordered field), we know that (rn ) is convergent (see Theorem 1.12); its
limit (which is independent of the choice of (rn ) in the class x) can
be identified with x. So now we have a clear representation of RC−M ,
including rational and irrational numbers.

The Real Number System (Model) is Unique up to Isomor-


phism. Let R̂ be another model for R. As before, we admit that Q
is unique up to isomorphism, so Q is a subfield of both RC−M and R̂.
Since Q is dense in R̂ (see Theorem 1.7), for any x ∈ R̂, there exists a
sequence of rational numbers (rn ) that converges to x (this sequence
can be the constant sequence (x, x, . . . ) if x ∈ Q). Of course, (rn ) is a
Cauchy sequence. We associate with such an x the class of (rn ) with
respect to the equivalence class “∼” defined above. So we have defined
a mapping φ : R̂ → RC−M , φ(x) = the class of (rn ). It is easily seen
that φ is a bijection, and
φ(x + y) = φ(x) + φ(y) ∀x, y ∈ R̂,
φ(x · y) = φ(x) · φ(y) ∀x, y ∈ R̂,
x > 0 =⇒ φ(x) > 0.
Therefore, R̂ is isomorphic to RC−M , hence any two real number mod-
els are isomorphic. So in what follows we will consider that the real
number system is unique and denote it by R.

The Dedekind–Cantor Axiom on Continuity of a Straight


Line. We discussed in Remark 1.9 how to represent rational numbers
on a directed straight line. Now, taking into account the Cantor–Méray
construction, we can complete the procedure by representing irrational
numbers. We see that to every real number there corresponds a unique
point of the directed straight line, and the correspondence is one-to-
one. The Dedekind–Cantor axiom stipulates that there are no gaps on
the line after representing all real numbers, that is there is a one-to-
one correspondence between R and the points of the directed straight
line. The directed straight line will be called the real line, and real
numbers will be sometimes called points.
1.3 Real Numbers 13

The Extended Real Number System. Sometimes it is necessary to


describe mathematically what happens “beyond” real numbers. For
example, 1/x gets closer and closer to zero when x gets larger and
larger. Having in mind that the point on the real line corresponding
to x goes far away to the right, we usually say that x tends to infinity,
and write x → +∞. The fact that 1/x tends to zero as x → ∞ can be
1
written as +∞ = 0.
Similar situations require the introduction of the symbol −∞. So we
are led to the so-called extended real number system,

R := R ∪ {−∞, +∞} .

The usual order in R is preserved, and we define

−∞ < x < +∞ ∀x ∈ R.

Then +∞ (−∞) is an upper bound (lower bound, respectively) of


every nonempty subset of R. Moreover, any nonempty subset has a
least upper bound. For instance, E = {x + x1 : x ∈ R, x = 0} has
sup E = +∞ and inf E = −∞. The symbol +∞ is also denoted by ∞.
In accordance with our intuition, we adopt the following conventions
x x
x + ∞ = ∞, x − ∞ = −∞, = = 0 ∀x ∈ R;
∞ −∞
x · ∞ = ∞, x · (−∞) = −∞ ∀x ∈ R, x > 0;
x · ∞ = −∞, x · (−∞) = +∞ ∀x ∈ R, x < 0;
∞ + ∞ = ∞, −∞ − ∞ = −∞,
∞ · ∞ = ∞, ∞ · (−∞) = −∞, (−∞) · (−∞) = +∞.

On the other hand, operations like



0 · (±∞), ∞ − ∞,

are not accepted. For example, x/(1 + x2 ) approaches 0 as x → ∞,

while x/(1 + x) approaches +∞ as x → ∞. Thus, the quotient of
two large numbers may approach either 0 or ∞. That is why we say
that ∞
∞ does not make sense.
Note that R does not form a field (why?).

We assume familiarity of the reader with sequences and series of real


numbers. For information see, for example, [33, 41, 42].
14 1 Introduction

The Number e. Sometimes checking whether a real number is irra-


tional is not a trivial task. The number often known as e is an example
in this respect. It is defined as the sum of a series, namely

 1
e= ,
n!
n=0

where n! = 1 · 2 · 3 · · · n for n
≥ 1, and 0! = 1. Let sn denote the partial
sum of the series, i.e., sn = nk=0 k! 1
. By the ratio test we see that the
series converges, hence e ∈ R. More precisely,

 1
2<e<1+ = 3. (1.3.5)
2k
k=0

Note that
 ∞
1 1
e − sn < ,
(n + 1)! (k + 1)2
k=0
hence
2
0 < e − sn < . (1.3.6)
n!n
Let us now prove that e is irrational. Assume the contrary, that
e = p/q, where p, q ∈ N, (p, q) = 1. In fact, q > 1 (see (1.3.5)).
From (1.3.6) we infer that
p  2
0 < q! − sq < . (1.3.7)
q q
 
Observing that q!sq ∈ N, we have m := q! pq − sq ∈ N. So we deduce
from (1.3.7) that 0 < m < 1 which is impossible (there is no integer
between 0 and 1). Therefore e is irrational, as claimed.
Remark 1.14. By an argument from Rudin [42, p. 64] we see  thate
1 n
is also the limit of the sequence (xn )n∈N defined by xn = 1 + n .
Using the binomial formula we can write
1 1 1  1  2
xn = 1 + 1 + 1− + 1− 1− + ···
2! n 3! n n
1 1  2  n − 1
+ 1− 1− ··· 1 − .
n! n n n
Then, for all n, m ∈ N, n ≥ m ≥ 2, we have
1 1 1  1  2
1+1++ 1− + ··· + 1− 1− ···
2! n m! n n
 m − 1
1− ≤ xn ≤ sn ,
n
1.4 Complex Numbers 15

which implies
1 1
1+1+ + ··· + ≤ lim inf xn ≤ lim sup xn ≤ e.
2! m!
Therefore, e = lim xn exists, as claimed.

1.4 Complex Numbers


We assume that the reader is familiar with the complex field. In what
follows we just recall its construction and some notation.
Let C denote the Cartesian9 product R×R equipped with two internal
operations, addition and multiplication, defined as follows:

(x, y) + (u, v) = (x + u, y + v), (x, y) · (u, v) = (xu − yv, xv + yu).

It is easy to check that C is a field, with (0, 0) and (1, 0) in the role of
0 and 1, respectively.
 x In −y particular,
 for any z = (x, y) ∈ C, z = (0, 0),
we have z −1 = x2 +y 2 , x2 +y 2 .

Note that the set R1 := {(x, 0); x ∈ R} is a subfield of C with respect


to the above operations that read in this case

(x, 0) + (u, 0) = (x + u, 0), (x, 0) · (u, 0) = (xu, 0).

Thus any (x, 0) can be identified with x and R1 with these operations
can be identified with R with the usual operations of addition and
multiplication. So R can be viewed as a subfield of C.
Any z = (x, y) ∈ C can be decomposed as z = (x, 0) + (y, 0) · (0, 1),
so in view of the above identification, we can write z = x + yi, where
i := (0, 1). Note that (0, 1)·(0, 1) = (−1, 0), thus we can write i2 = −1;
i is called the imaginary unit.
Summarizing, we can write C = {x + yi; x, y ∈ R} and observe that
the two operations initially defined can be viewed as the addition and
multiplication similar to those used for real numbers if we admit that
i2 = −1.
The elements z = x + yi of C are called complex numbers and C is
known as the complex field or complex number system. For a complex
number z = x + yi, the real numbers x and y are called the real
part and the imaginary part of z, respectively (denoted x = Re z,
9
René Descartes, latinized Renatus Cartesius, French mathematician, philoso-
pher, and scientist, 1596–1650.
16 1 Introduction

y = Im z). Complex numbers z = x + yi can be represented by points


(of coordinates x, y) in the complex plane which is determined by two
orthogonal directed straight lines with the same unit, the x-axis (real
axis) and the y-axis (imaginary axis).
Let z̄ = x − yi be the complex conjugate
 of z = x + yi. Note that z · z̄ =
x2 + y 2 ∈ R. The number |z| = x2 + y 2 is called the magnitude of z,
and it represents the length of the segment connecting the origin O of
the complex plane and the point of coordinates x and y corresponding
to z.

1.5 Linear Spaces


Recall that a nonempty set X is said to be a linear space (or vector
space) over a field K if there exist a binary operation on X, called
addition, + : X × X → X, and an external binary operation, called
scalar multiplication, · : K × X → X, such that the following axioms
are satisfied

(A1) (x + y) + z = x + (y + z) ∀x, y, z ∈ X;
(A2) x + y = y + x ∀x, y ∈ X;
(A3) ∃0 ∈ X, called zero, such that x + 0 = x ∀x ∈ X;
(A4) ∀x ∈ X ∃ − x ∈ X such that x + (−x) = 0;
(A5) 1 · x = x ∀x ∈ X, where 1 is the unit element of
the field K;
(A6) α(βx) = (αβ)x ∀α, β ∈ K, ∀x ∈ X;
(A7) (α + β)x = αx + βx ∀α, β ∈ K, ∀x ∈ X;
(A8) α(x + y) = αx + αy ∀α ∈ K, ∀x, y ∈ X.

The first four axioms ensure that X is an Abelian10 group with respect
to addition.
In the following K will be either the field R of real numbers or the filed
C of complex numbers, and X will be called a real or complex space,
respectively.
A nonempty subset Y of X which is a linear space with respect to the
same operations is called a subspace of X. In fact, a necessary and

10
Niels Henrik Abel, Norwegian mathematician, 1802–1829.
1.5 Linear Spaces 17

sufficient condition for a nonempty subset Y of X to be a subspace is


that Y be closed under the operations, i.e.,

∀ x, y ∈ Y, ∀ α ∈ K, x + y ∈ Y, αx ∈ Y.

If S is a nonempty subset of a linear space X, we denote by Span S


the collection of all finite linear combinations of elements of S, i.e.,
 k

Span S = αi xi = α1 x1 + · · · + αk xk ; k ∈ N, αi ∈ K,
i=1

xi ∈ S, i = 1, . . . , k .

Obviously, Span S is a linear subspace of X, called the linear subspace


generated by S (and S is said to be a system of generators).
We recall that x1 , x2 , . . . , xk ∈ X (where X is a linear space) are said
to be linearly dependent if there exist some scalars α1 , . . . , αk ∈ K,
not all zero, such that α1 x1 + · · · + αk xk = 0. Otherwise, the vectors
x1 , x2 , . . . , xk are called linearly independent (and {x1 , x2 , . . . , xk } is
said to be a linearly independent system). In this case, S = {x1 , x2 ,
. . . , xk } is a basis of the space Y = Span S (which could be the whole
of X), and we say that Y has dimension k, dim Y = k, and any vector
x ∈ Y can be uniquely expressed as a linear combination,


k
x= αi x i = α1 x 1 + · · · + αk x k ,
i=1

where α1 , . . . , αk ∈ K are called coordinates of x with respect to the


basis S.
A basis is not unique.
A linear space X is infinite dimensional (written as dim X = ∞) if
for any k ∈ N there exist k vectors in X which are linearly indepen-
dent. If X contains only the null vector, then by convention we define
dim X = 0.
Recall that any two linear spaces X, Y are isomorphic if there exists
a bijection φ : X → Y which satisfies

φ(αx + βy) = αφ(x) + βφ(y) ∀ α, β ∈ K, ∀ x, y ∈ X .

If either of the two (isomorphic) spaces is finite dimensional then the


other is also finite dimensional and dim X = dim Y (prove it!).
18 1 Introduction

Scalar Product. An important concept that allows the extension


of some properties of classical Euclidean geometry to general linear
spaces is the scalar product.
A scalar product (or inner product) on a linear space X is a mapping
from X × X to K, denoted (·, ·), which satisfies the following axioms

(a1 ) (x, y) = (y, x) ∀ x, y ∈ X ,


(a2 ) (x + y, z) = (x, z) + (y, z) ∀ x, y, z ∈ X ,
(a3 ) (αx, y) = α(x, y) ∀ α ∈ K, ∀ x, y ∈ X ,
(a4 ) (x, x) ≥ 0 ∀x ∈ X, and (x, x) = 0 ⇐⇒ x = 0 .

We have denoted by (y, x) the complex conjugate of (y, x) (obviously,


(y, x) = (y, x) if K = R). A space X together with such a product is
called an inner product space. It is easily seen that (x, αy) = α(x, y)
for all α ∈ K and all x, y ∈ X.
Two vectors x, y ∈ X are called orthogonal if their scalar product
is equal to zero: (x, y) = 0 (this is sometimes denotedx⊥y). One
can also define the length of a vector x ∈ X as x = (x, x). The
mapping x → x satisfies the following properties:

(i) x = 0 ⇐⇒ x = 0 ;
(ii) αx = |α| · x ∀α ∈ K, ∀ x ∈ X ;
(iii) x + y ≤ x + y ∀ x, y ∈ X .

The mapping  ·  : X → [0, ∞) defined above is a norm on X, and X


is a normed space. In general a mapping from X to [0, ∞) satisfying
(i), (ii), (iii) is called a norm on X. A given space may have many
different norms, but the above is a special norm, being generated by a
scalar product.
While (i) and (ii) are trivial, property (iii) (called triangle inequality)
follows from the Bunyakovsky–Cauchy–Schwarz11 inequality:

|(x, y)| ≤ x · y ∀x, y ∈ X , (1.5.8)

which is valid in any normed space whose norm is generated by a scalar


product. Indeed,

11
Viktor Y. Bunyakovsky, Russian mathematician, 1804–1889; Karl Hermann
Amandus Schwarz, German mathematician, 1843–1921.
1.5 Linear Spaces 19

x + y2 = x2 + 2 Re(x, y) + y2


≤ x2 + 2|(x, y)| + y2
≤ x2 + 2x · y + y2
= (x + y)2 ,

which clearly implies (iii). As far as (1.5.8) is concerned, its proof is


based on the inequality

0 ≤ x + αy2 = x2 + 2 Re α(x, y) + |α|2 y2 , (1.5.9)

for all α ∈ K and all x, y ∈ X. In fact, we can assume x = 0 and y = 0


(otherwise (1.5.8) is trivial). Now replacing in (1.5.9) α = −(x, y)/y2
yields (1.5.8).

We continue with some examples:


Example 1. For a given n ∈ N, consider X = Rn , which is the set of all
ordered n-tuples (here arranged as n × 1 matrices) x = (x1 , . . . , xn )T ,
where x1 , . . . , xn ∈ R. It is easily seen that X = Rn is a linear space
over R with respect to the usual operations of addition and scalar
multiplication:

x + y = (x1 + y1 , . . . , xn + yn ), αx = (αx1 , . . . , αxn ) ,

for all x = (x1 , . . . , xn )T , y = (y1 , . . . , yn )T ∈ X, α ∈ R. The null


(zero) element of X is (0, 0, . . . , 0)T , while the inverse of any x =
(x1 , . . . , xn )T ∈ X with respect to the addition is (−x1 , . . . , −xn )T .
The usual scalar product of X = Rn is defined by

n
(x, y) = x i yi ∀ x = (x1 , . . . , xn )T , y = (y1 , . . . , yn )T ∈ X ,
i=1

and the corresponding norm is




n


x = (x, x) = x2i .
i=1

If n = 1 the above scalar product is the usual multiplication in R,


while the corresponding norm is the absolute value. If n = 2 or n = 3
then the above scalar product is nothing else but the scalar product
(dot product) of two vectors in the Euclidean plane or space, respec-
tively, while the corresponding norm of a vector represents its length.
20 1 Introduction

Orthogonality of two vectors means the usual geometric orthogonal-


ity. For this reason X = Rn so equipped is called Euclidean n-space.
By extension, a general normed space whose norm is generated by a
scalar (inner) product is called a generalized Euclidean space (or inner
product space, as previously mentioned).
Analogously, Cn is a linear space over C with respect to the usual
operations of addition and scalar multiplication. Here, the usual scalar
product is defined by

n
(x, y) = x i yi ∀ x = (x1 , . . . , xn )T , y = (y1 , . . . , yn )T ∈ Cn ,
i=1

and the corresponding (Euclidean) norm is




n


x = (x, x) = |xi |2 .
i=1

Note that any n-dimensional linear space X over K is isomorphic to Kn .


Indeed, an isomorphism φ : X → Kn is the mapping which associates
with any x ∈ X the vector constructed with the coordinates of x with
respect to a basis in X. Thus any such space X can be equipped with
a scalar product as follows:

n
(x, y)X := (φ(x), φ(y)) = φ(x)i · φ(y)i ∀x, y ∈ X .
i=1

Therefore, any finite dimensional linear space X can be organized as


a generalized Euclidean (inner product) space.

Example 2. Let X be the set of all functions from [0, 1] to K. Obviously,


X is a linear space with respect to the usual operations of addition
and scalar multiplication

(f + g)(t) = f (t) + g(t),


(αf )(t) = αf (t) ∀ t ∈ [0, 1], ∀ f, g ∈ X, ∀ α ∈ K.

Consider the set Y of all polynomial functions f : [0, 1] → K (i.e.,


f (t) = a0 + a1 t + a2 t + · · · + ak tk , a0 , . . . , ak ∈ K, k ∈ {0} ∪ N).
Obviously Y is a (proper) subspace of X. Note that Y is infinite
dimensional (dim Y = ∞) and hence so is X. Indeed, for any k ∈ N
1.5 Linear Spaces 21

the set of polynomials {1, t, t2 , . . . , tk } ⊂ Y is an independent system.


We can define on Y the scalar product
1
(f, g) = f (t) · g(t) dt ∀f, g ∈ Y,
0

  1/2
1
and the corresponding norm f  = (f, f ) = 0 |f (t)|2 dt .
Another norm on Y is the following

f ∗ = sup |f (t)| ∀f ∈ Y ,
t∈[0,1]

but this one is not generated by a scalar product. Indeed, if we assume


that  · ∗ is generated by a scalar product, then it must satisfy the
parallelogram law
 
f + g2∗ + f − g2∗ = 2 f 2∗ + g2∗ ∀f, g ∈ Y , (1.5.10)

which is valid in any inner product space. But, for example, the poly-
nomial functions f (t) = t, g(t) = 1 − t do not satisfy (1.5.10), which
confirms our assertion above.
Now, let Z be the set of polynomials f ∈ Y of degree less than or
equal to n − 1, for a given natural number n. This is a finite dimen-
sional subspace of Y , with basis {1, t, t2 , . . . , tn−1 } and dimension n.
Therefore, Z is isomorphic to Kn . A natural isomorphism between Z
and Kn is the mapping which associates with any polynomial func-
tion f (t) = a0 + a1 t + a2 t2 + · · · + an−1 tn−1 the n-dimensional vector
(a0 , a1 , a2 , . . . , an−1 )T ∈ Kn . Thus, besides the scalar product above,
one can define on Z another scalar product


n−1
(f, g)Z = a i · bi ,
i=0

for all f (t) = a0 + a1 t + · · · + an−1 tn−1 , g(t) = b0 + b1 t + · · · + bn−1 tn−1 ,


where ai , bi ∈ K, i ∈ {0, 1, . . . , n − 1}. This scalar product generates
a new norm on Z,

 n−1
 1/2
f Z = (f, f )Z = |ai |2
i=0
∀ f (t) = a0 + a1 t + · · · + an−1 tn−1 ∈ Z.
22 1 Introduction

Orthogonal Systems. Let S ⊂ X \ {0} be a nonempty countable


set, where X is an inner product space with a scalar product (·, ·) and
the norm  ·  generated by it. We further assume that S is a linearly
independent system (otherwise we eliminate those vectors which are
linear combinations of other vectors from S). Recall that an infinite
set is linearly independent if all finite subsets of it are independent.
Consider first the case when S is a finite independent system, S =
{u1 , . . . , un }. Starting from S one can construct an orthogonal system
S  = {v1 , . . . , vn }, i.e., (vi , vi ) = 0 and (vi , vj ) = 0 if i = j.
In what follows we present the Gram–Schmidt12 orthogonalization
method. To create S  let

v 1 = u1 ,
v2 = u2 + αv1 ,

such that v2 ⊥v1 , i.e.,

0 = (v2 , v1 ) = (u2 + αv1 , v1 )


= (u2 , v1 ) + αv1 2 ,

giving
(u2 , v1 )
α=− .
v1 2
Note that v1 = u1 = 0 (by assumption) and also v2 = 0. To see that,
we suppose by contradiction that v2 = 0, i.e., u2 + αu1 = 0. But this
is impossible since u1 , u2 are independent vectors.
After having determined the first p members of S  define


p
vp+1 = up+1 + βj vj ,
j=1

so that vp+1 ⊥vk for all k = 1, . . . , p, that is,

0 = (up+1 , vk ) + βk vk 2 ,
(up+1 , vk )
βk = − , k = 1, . . . , p.
vk 2

12
Jorgen Pedersen Gram, Danish actuary and mathematician, 1850–1916; Er-
hard Schmidt, Baltic German mathematician, 1876–1959.
1.5 Linear Spaces 23

As before vp+1 = 0 since each vk is a linear combination of uk ’s, k =


1, . . . , p, namely
 p
vp+1 = up+1 + θk uk ,
k=1
and u1 , u2 , . . . , up , up+1 are independent vectors. Continue the process
until finished.
Since S  is an orthogonal system, it follows that it is independent
(prove it!). S  can be simply replaced by an orthonormal (independent)
system S  = {w1 , . . . , wn }, by defining wj = vj −1 vj , j = 1, . . . , n.
In particular, any n-dimensional inner product space possesses an or-
thonormal basis (since any basis can be replaced by an orthonormal
one).
If S is a countably infinite, independent system in X, S = {u1 ,
u2 , . . . , un , . . . }, then using the same Gram–Schmidt method, one
can construct an orthonormal system S  = {w1 , w2 , . . . , wn , . . . }, i.e.,
(wi , wj ) = δij , where δij is the Kronecker13 symbol, δii = 1 and δij = 0
for i = j.

Linear, Bilinear, Sesquilinear, Quadratic Forms. Let X be a


linear space over K. A function f : X → K is said to be a linear form
on X if

f (αx + βy) = αf (x) + βf (y) ∀ α, β ∈ K, ∀ x, y ∈ X .

The set of all linear forms on X, denoted X ∗ , is a linear space with


respect to the usual operations on functions and is called the dual of X.
If X is finite dimensional, with a basis B = {u1 , . . . , un }, n ∈ N, then
any linear form f has a specific form:

n 
n
f (x) = a i αi ∀x = αi u i ∈ X ,
i=1 i=1

where ai = f (ui ) are called coefficients of the linear form with respect
to the basis B. X ∗ is isomorphic to Kn (hence dim X ∗ = n), since the
mapping associating each f ∈ X ∗ to the vector (f (u1 ), . . . , f (un ))T ∈
Kn is an isomorphism (prove it!).
A function a : X ×X → K which is linear with respect to each variable
is called a bilinear form on X (more precisely, a(·, y) is a linear form
for all y ∈ X, and a(x, ·) is also a linear form for all x ∈ X).
13
Leopold Kronecker, German mathematician, 1823–1891.
24 1 Introduction

If K = C and the above condition on a(x, ·) is replaced by the linearity


of the complex conjugate function a(x, ·) (x ∈ X) then a is said to be
a sesquilinear form on X. For example, a scalar product on X is a
sesquilinear form.
 B = {u1 , . . . ,
If X is finite dimensional, with a basis un } and a is a
bilinear form on X, then for all x = ni=1 αi ui , y = nj=1 βj uj ∈ X
we have
n
a(x, y) = cij αi βj , (1.5.11)
i,j=1

where cij = a(ui , uj ), i, j = 1, . . . , n. Hence a is represented by the ma-


trix C = (cij ) (which depends on the basis of X). If a is a sesquilinear
form, then instead of (1.5.11) we have


n
a(x, y) = cij αi βj .
i,j=1

A bilinear form a : X × X → R is said to be symmetric if a(x, y) =


a(y, x) for all x, y ∈ X. If X is finite dimensional, then the symmetry of
a bilinear form a is expressed by the symmetry of the matrix associated
with a (the symmetry of that matrix being independent of the basis
of space X). Any symmetric bilinear form a defines a quadratic form
F : X → R by setting F (x) = a(x, x). Given a quadratic form F , one
can recover the corresponding bilinear form a. Indeed, from

a(x + y, x + y) = a(x, x) + 2a(x, y) + a(y, y),

we deduce
1
a(x, y) = [a(x + y, x + y) − a(x, x) − a(y, y)]
2
1
= [F (x + y) − F (x) − F (y)] .
2

A quadratic form F is said to be positive definite (positive semidefinite)


if F (x) > 0 for all x ∈ X, x = 0 (F (x) ≥ 0 for all x ∈ X, respectively).
F is called negative definite (negative semidefinite) if −F is positive
definite (positive semidefinite, respectively).
If F is a positive definite quadratic form on the real linear space X
then the corresponding a is a scalar product on X.
1.5 Linear Spaces 25

If X is a real n-dimensional linear space, with a basis B = {u1 , . . . , un },


and F is a quadratic form on X, then


n
F (x) = a(x, x) = cij αi αj ,
i,j=1

where αj ’s are the coordinates of x with respect to basis B (in par-


ticular, the components of x if X = Rn with its usual basis). It can
be shown (using the well-known Gauss14 method), that for any such
quadratic form F , there is a convenient basis of X such that F can be
written as follows:

F (x) = λ1 β12 + λ2 β22 + · · · + λn βn2 ,

where β1 , . . . , βn are the coordinates of x with respect to the new basis,


and λ1 , . . . , λn ∈ R (some of these λ’s could be zero). In fact, starting
from the new basis, one can simply define another basis, such that F
can be written under the following canonical form:


n
F (x) = εi γi2 , εj ∈ {−1, 0, +1}, j = 1, . . . , n ,
i=1

where γi ’s are the coordinates of x with respect to the last basis.


Obviously, F is positive definite (positive semidefinite) if and only if
εj = 1, j = 1, . . . , n (εj ∈ {0 , +1}, j = 1, . . . , n, respectively).
Let us also recall that for a quadratic form F : X → R (X being an n-
dimensional real linear space), whose matrix C with respect to a basis
of X has nonzero NW principal minors (i.e., Δi = 0, i = 1, . . . , n) there
always exists a decomposition (called Jacobi’s formula15 ) as follows:


n
Δi−1
F (x) = βi2 ,
Δi
i=1

where Δ0 := 1, and β1 , . . . , βn are the coordinates of x with respect to


a new basis of X. Therefore, F is positive definite (negative definite)
if and only if Δi > 0, i = 1, . . . , n (respectively, (−1)i Δi > 0, i =
1, . . . , n). These are known as Sylvester’s conditions.16

14
Carl Friedrich Gauss, German mathematician and physicist, 1777–1855.
15
Carl Gustav Jacob Jacobi, German mathematician, 1804–1851.
16
James Joseph Sylvester, English mathematician, 1814–1897.
26 1 Introduction

If X is a complex linear space and a : X × X → C is a sesquilinear


form, then a is called Hermitian17 if a(x, y) = a(y, x) for all x, y ∈ X.
Such a form a defines a quadratic form F (x) = a(x, x), x ∈ X, with
values in R. If X is an n-dimensional complex linear space, then one
can find a basis in X such that F takes the form

n 
n
F (x) = λi β i β i = λi |βi |2 ,
i=1 i=1

where λi ∈ R, i = 1, . . . , n, and βi , i = 1, . . . , n, are the coordinates


of x with respect to that basis. The Jacobi formula also works in this
complex case, and Sylvester’s conditions remain valid.
We close this chapter by inviting the reader to consult other books to
find more information on the topics addressed in this chapter, such as
[6, 16, 28, 33, 37, 41, 42, 51].

1.6 Exercises
1. Let A, B, C be some arbitrary subsets of a universe U . Show
that

(a) A \ (B ∪ C) = (A \ B) ∩ (A \ C) = (A \ B) \ C ;
(b) A \ (B ∩ C) = (A \ B) ∪ (A \ C) ;
(c) (A ∩ B) \ C = A ∩ (B \ C) = (A \ C) ∩ B ;
(d) (A ∪ B) \ C = (A \ C) ∪ (B \ C) ;

2. Let A, B, C be given sets, which are subsets of a universe U .


Determine the set X ⊂ U satisfying

A ∩ X = B and A ∪ X = C .

The same question if X satisfies

A \ X = B, and X \ A = C .

3. Prove that for all sets A, B, C satisfying A ∩ C = B ∩ C and


A ∪ C = B ∪ C, we have A = B.

17
Charles Hermite, French mathematician, 1822–1901.
1.6 Exercises 27

4. Let A, B, C, D be some arbitrary subsets of a universe U . Which


of the following statements are true?

(a) (A ∩ B) × (C ∩ D) = (A × C) ∩ (B × D);
(b) (A ∪ B) × (C ∪ D) = (A × C) ∪ (B × D);
(c) (A \ B) × C = (A × C) \ (B × C),

where “×” denotes the Cartesian product?

5. Let A be a set with a partial order ≤ . If A has a smallest element


a = min A, then a is the unique minimal element of A.

6. Let A = {an ; an = 1
1·2 + 1
2·3 + ··· + 1
n(n+1) , n ∈ N}. Find inf A
and sup A.

7. Define on C (the set of complex numbers) the binary relation 


as follows:

z1 = x1 + y1 i  z2 = x2 + y2 i ⇐⇒ x1 ≤ x2 and y1 ≤ y2 .

Show that

(a)  is a partial order on C, but not a total order on C;


(b) for each a ≥ 0,  is a total order on Xa = {z = x + yi ∈
C; y = ax} (i.e., Xa is a chain);
(c) there exists a partial order on C such that, for each a < 0,
Xa defined as above is a chain of C with respect to this
partial order.

8. Show that the sequence (an )n≥1 defined by


√ 
a1 = 2, an = 2 + an−1 , n ≥ 2 ,

is convergent and calculate its limit.

9. Let a be a given real number and let (an )n≥1 be a sequence in


R such that any subsequence of it has a convergent subsequence
whose limit is a. Show that an → a.

10. Let X be a vector space. If {v1 , v2 , v3 } ⊂ X is a linearly in-


dependent system, then show that {v1 + v2 , v2 + v3 , v3 + v1 } is
too.
28 1 Introduction

11. Let X be the real vector space of all functions f : R → R. Show


that each of the following systems of functions in X

S1 = {1, cos x, (cos x)2 }, S2 = {ex , xex , . . . , xk ex }, k ∈ N,

is linearly independent.

12. Let X be the real vector space of all continuous functions f :


[0, 1] → R. Consider on X the scalar product
1
(f, g) = f (t) · g(t) dt ∀f, g ∈ X ,
0

and the induced norm.

(a) Which of the following systems of functions in X is linearly


independent?
(i) f1 (t) = 1, f2 (t) = t, f3 (t) = t2 ;
(ii) f1 (t) = 1 − t, f2 (t) = t(1 − t), f3 (t) = 1 − t2 ;
(iii) f1 (t) = 1, f2 (t) = et , f3 (t) = 2e−t ;
(iv) f1 (t) = 3t, f2 (t) = t + 5, f3 (t) = −2t2 , f4 (t) = (t + 1)2 ;
(v) f1 (t) = (t + 1)2 , f2 (t) = t2 − 1, f3 (t) = 2t2 + 2t − 3;
(vi) f1 (t) = 1, f2 (t) = 1 + t, f3 (t) = 1 + t + t2 , . . . , fk (t) =
1+t+t2 +· · ·+tk−1 , where k is a given natural number.
(b) Let Y be the vector subspace of X generated by B =
{f1 , f2 , f3 }, where f1 (t) = 1, f2 (t) = t, f3 (t) = t2 for
t ∈ [0, 1]. By using the Gram–Schmidt method, construct
an orthonormal basis in Y with respect to the above scalar
product.

13. Show that the system B = {1, t − 1, (t − 1)2 , (t − 1)3 } is a


basis of the real vector space X of all polynomials of degree ≤ 3
with real coefficients, and find the coordinates of a polynomial
p = p(t) ∈ X with respect to this basis.

14. Let X be a linear space equipped with a scalar product (·, ·).
Show that a system S = {x1 , x2 , . . . , xk } ⊂ X is linearly inde-
pendent if and only if the following determinant (called the Gram
determinant)
 
det (xi , xj )1≤i,j≤k = 0.
1.6 Exercises 29

15. Show that the mapping (·, ·) : R3 × R3 −→ R defined by

1
3
1
(x, y) = xi yi − (x1 y2 + x1 y3 + x2 y3 )
2 4
i=1

is a scalar product. Construct a basis of R3 which is orthonormal


with respect to this scalar product.
16. Let X be the real vector space of polynomials of degree ≤ m
with real coefficients, where m is a given natural number. Find
the expression of the linear form f : X → R defined by
1
f (p) = p(t) dt ∀p(t) = a0 + a1 t + a2 t2 + · · · am tm ∈ X
0
with respect to each of the bases

B = {1, t, t2 , . . . , tm }, B  = 1, 1 + t, 1 + t + t2 , . . . ,

1 + t + t2 + · · · + tm .

17. Let X be a vector space over R. A bilinear form a : X × X → R


is said to be antisymmetric if
a(x, y) = −a(y, x) ∀x, y ∈ X .
Show that
(i) a bilinear form a : X × X → R is antisymmetric ⇐⇒
a(x, x) = 0 ∀x ∈ X.
(ii) any bilinear form on X is the sum of a symmetric bilinear
form and an antisymmetric one.
18. Let A be an n × n matrix with real entries, and let B = aIn +
AT A, where AT denotes the transpose of A, In denotes the n × n
identity matrix, and a > 0. Show that the quadratic form F :
Rn → R whose matrix with respect to the canonical basis of Rn
is B is positive definite. What about the case a = 0?
19. Consider the quadratic form F : R3 → R,
F (x) = x21 + x22 + 3x23 + 4x1 x2 + 2x1 x3 + 2x2 x3 ∀x ∈ R3 .
n
Determine a basis of R3 such that F (x) = 2
i=1 εi ξi , where
ξ1 , . . . , ξn are the coordinates of x with respect to this basis, and
εj ∈ {−1, 0, +1}, j = 1, . . . , n. Check whether F is positive
definite, negative definite, or neither.
30 1 Introduction

20. Use Sylvester’s conditions to show that the quadratic form


F : R4 → R,

F (x) = 2x21 +2x22 +x23 +4x24 −x1 x2 +x1 x3 +x2 x4 −x3 x4 ∀x ∈ R4 ,

is positive definite. Determine a basis of R4 such that F can be


written as a sum of squares with respect to this basis.
Chapter 2

Metric Spaces

Metric spaces offer a sufficiently large framework for most of the prob-
lems we discuss in this book.

2.1 Definitions
Definition 2.1. A metric (or a distance function) on a nonempty
set X is a function d : X × X → [0, ∞) satisfying

(M 1) d(x, y) = 0 ⇐⇒ x = y ;
(M 2) d(x, y) = d(y, x) ∀x, y ∈ X ;
(M 3) d(x, y) ≤ d(x, z) + d(z, y) ∀x, y, z ∈ X .

A set X equipped with a metric d is called a metric space and is


sometimes denoted (X, d).
Any set X = ∅ can be equipped with a metric. The “simplest” metric
is the so-called discrete metric which is defined by

d(x, y) = 1 if x, y ∈ X, x = y and d(x, x) = 0 ∀x ∈ X .

Note that this metric is not very useful in practice, but is suitable for
counterexamples.

Now let (X,  · ) be a normed (linear) space. Then X can be equipped


with the metric

d(x, y) = x − y, x, y ∈ X . (2.1.1)

© Springer Nature Switzerland AG 2019 31


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 2
32 2 Metric Spaces

Note also that any finite dimensional linear space can be equipped with
a norm (e.g., with the Euclidean norm—see the previous chapter), and
hence with the metric generated by that norm (cf. (2.1.1)).

If (X, d) is a metric space and ∅ = Y ⊂ X, then Y is also a metric


space with respect to d restricted to Y × Y .

Definition 2.2. Let (X, d) be a metric space. For x0 ∈ X and r > 0


define
B(x0 , r) := {x ∈ X; d(x, x0 ) < r} ,
which is called the open ball centered at x0 with radius r.

Definition 2.3. A nonempty set A ⊂ (X, d) is said to be open if for


each x ∈ A there exists an ε > 0 such that B(x, ε) ⊂ A. By convention
the empty set is considered open.

Obviously, the collection τ of all open sets forms a topology:

(a) ∅, X ∈ τ ;
(b) the union of any sub-collection of τ is in τ ;
(c) the intersection of any finite sub-collection of τ is in τ.

Note that the intersection of an infinite collection of open sets may


not be open. For example, in X = R, with d(x, y) = |x − y|, we have
for a fixed x0 ∈ R

 1 1
x0 − , x0 + = {x0 },
n n
n=1

and obviously {x0 } does not belong to the (usual) topology of R defined
by | · |.

In what follows X denotes a metric space endowed with the topology


τ generated by its metric d (see above), called metric topology. If
d is defined by a norm, i.e., d(x, y) = x − y (x, y ∈ X), then τ is
called a norm topology.

A set V ⊂ X is said to be a neighborhood of a point p ∈ X if there


is an r > 0 such that B(p, r) ⊂ V . In particular, any open set D is a
neighborhood of any p ∈ D.
2.1 Definitions 33

A set C ⊂ X is said to be closed (with respect to topology τ ) if X \ C


is open (i.e., X \ C ∈ τ ). In particular, for any x0 ∈ X and r > 0, we
have B(x0 , r) ∈ τ , and
B(x0 , r) := {x ∈ X; d(x, x0 ) ≤ r}
is closed, i.e., X \ B(x0 , r) ∈ τ (prove these assertions!).
A subset A of a metric space (X, d) is said to be bounded (with
respect to d) if it is contained in a closed ball (equivalently, in an open
ball). Otherwise, A is called unbounded (with respect to d). For
example, N ⊂ R is bounded with respect to the discrete metric on R,
but is unbounded with respect to the usual norm topology (the norm
being the absolute value function | · |).
A sequence (an )n∈N in (X, d) is said to be convergent if there exists
a ∈ X such that d(an , a) → 0. This is denoted an → a, or limn→∞ an =
a, or lim an = a, and we say that (an ) converges to a, or that a is the
limit of (an ). It is easily seen that the limit is unique.
Let S be a nonempty subset of a metric space (X, d). S is closed if
and only if the limit of any convergent sequence of points in S is also
a point of S (prove it!).
A point p ∈ (X, d) is called an accumulation point (or limit point)
of a set S ⊂ X if (V ∩ S) \ {p} = ∅ for every neighborhood V of p.
Note that p is not necessarily an element of S. If q ∈ S and q is not
an accumulation point of S, then q is an isolated point of S.
Obviously, p is an accumulation point of S if and only if there exists
a sequence (pn ) in S such that pn → p. By the above assertion, S is
closed if and only if S contains all its accumulation points.
Let (an )n∈N a sequence in (X, d). A point p ∈ X is called a cluster
point of (an ) if for every ε > 0 there are infinitely many an such that
d(an , p) < ε (in other words, (an ) has a subsequence converging to p).
A point p ∈ S is called an interior point of S if there is an r > 0
such that B(p, r) ⊂ S. The set of all interior points of S is called the
interior of S, and is denoted Int S.
Obviously,
• Int S is the union of all open subsets of S, and hence Int S is an open
set (possibly ∅);
• S is open if and only if S = Int S.
34 2 Metric Spaces

For a set S ⊂ (X, d) the closure of S, denoted Cl S or S, is the


intersection of all closed sets containing S.
Clearly, Cl S is a closed set, and
• S is closed if and only if S = Cl S;
• Cl S = S ∪ {accumulation points of S}.

A metric space (X, d) is called separable if it has a countable, dense


subset S, i.e., Cl S = X (the closure being related to the metric topol-
ogy generated by d).
For example, R is separable with respect to its usual topology (since Q
is dense in R with respect to this topology), but is not separable with
respect to the discrete topology, i.e., the topology associated with the
discrete metric on R. This is because any subset of R is closed with
respect to the discrete topology, so there is no dense countable subset
of R.

The boundary of a set S ⊂ (X, d) is defined to be the set

∂S := Cl S ∩ Cl (X \ S).

Obviously, ∂S = ∂(X \ S), and p ∈ ∂S if and only if B(p, ε) ∩ S = ∅


and B(p, ε) ∩ (X \ S) = ∅ for all ε > 0.

2.2 Completeness
We start this section with the definition of a Cauchy sequence which
is essential in what follows.

Definition 2.4. A sequence (an )n∈N in a metric space (X, d) is called


a Cauchy sequence if for all ε > 0 there exists an N = N (ε) ∈ N
such that d(an , am ) < ε for all m, n > N .

It is easy to see that any convergent sequence in a metric space is a


Cauchy sequence. The converse implication is not true in general.

Definition 2.5. A metric space (X, d) is called complete if every


Cauchy sequence (an )n∈N in X converges (i.e., there exists a point
a ∈ X such that d(an , a) → 0).

For example, R with its usual topology (defined by | · |) is a complete


metric space (as shown in the previous chapter, see Theorem 1.12).
More generally, for any n ∈ N, Rn , equipped with the Euclidean metric
2.2 Completeness 35

(generated by the Euclidean norm), is complete, because a Cauchy


sequence in Rn is Cauchy in each coordinate. In fact, we will see later
that Rn endowed with any norm is complete.
On the other hand, the metric space (Q, d), where d(x, y) = |x − y|
(x, y ∈ X) is not complete. For example, the sequence in Q, defined
by
1 2
a1 = 2, an+1 = an + , n = 1, 2, . . .
2 an

is convergent in (R, |·|) (since an ≥ 2 and an+1 /an√≤ 1, n = 1, 2, . . . ),
hence Cauchy with respect to | · |, but its limit is 2 ∈/ Q.

Now, let us examine another example. Let S be a nonempty set.


Define
B(S; R) = {f : S → R; f (S) is bounded} ,
where the boundedness condition on f (S) means: ∃M > 0 such that
|f (s)| ≤ M for all s ∈ S. Obviously, X = B(S; R) is a real linear
space with respect to the usual operations (addition and scalar multi-
plication). It can be equipped with a norm  ·  defined by

f  := sup |f (s)| ∀f ∈ X ,
s∈S

which gives a metric d : X × X → [0, ∞), d(f, g) = f − g (f, g ∈ X).


Moreover, it is easily seen that (X, d) is a complete metric space. The
key condition ensuring the completeness of X is the completeness of
R with respect to its usual metric.
Convergence in X = B(S; R) is called uniform convergence on S.
It is stronger than the pointwise convergence. In particular,

d(fn , f ) = fn − f  → 0 ⇒ lim fn (s) = f (s) ∀s ∈ S ,


n→∞

but the converse implication is not true in general.

Definition 2.6. A normed space (X,  · ) which is complete (i.e.,


(X, d) is complete for d(x, y) = x − y, x, y ∈ X) is called a Banach
space.

In particular, B(S; R) with the norm above (called uniform conver-


gence norm) is a Banach space. The subset XK = {f ∈ B(S; R) :
|f (s)| ≤ K ∀s ∈ S}, where K is a given positive constant, is a com-
plete metric space with respect to the same metric (generated by the
36 2 Metric Spaces

uniform convergence norm), since XK is closed in B(S; R). Note that


XK is not a Banach space because it is not a linear space.
In general, if (X, d) is a complete metric space, then any nonempty
closed set Y ⊂ X is also a complete metric space with the metric d
restricted to Y × Y .
Definition 2.7. Two metric spaces (X1 , d1 ), (X2 , d2 ) are isometric if
there exists a bijection φ : X1 → X2 such that d2 (φ(x), φ(y)) = d1 (x, y)
for all x, y ∈ X.

An important result, due to Hausdorff,1 says that any metric space


can be extended (uniquely up to isometry) to a complete metric space
(see [44, Chapter 2]). More precisely we have
Theorem 2.8. For any metric space (X, d) there exists a complete
¯ such that
metric space (X̄, d)

(j) there exists X1 ⊂ X̄ such that (X, d) is isometric to


¯
(X1 , d);
(jj) ¯.
X1 is dense in (X̄, d)
¯ with the above properties is unique up to isometry.
(X̄, d)
Proof. One can construct an extension (completion) of (X, d) by a pro-
cedure similar to that used in the previous chapter to construct the
Cantor–Méray model for R starting from rational numbers. Specif-
ically, let E denote the set of all Cauchy sequences in (X, d). E is
nonempty as it contains constant sequences (c, c, . . . ), c ∈ X. We
define an equivalence relation in E as follows: (an ), (bn ) ∈ E are
equivalent iff d(an , bn ) → 0. In other words, two sequences convergent
in (X, d) with the same limit are equivalent. It is easily seen that
the relation defined above is indeed an equivalence relation. Let X̄
be the collection of all equivalence classes in E with respect to this
equivalence relation. Denote by A, B, C, . . . the classes of sequences
(an ), (bn ), (cn ), . . . Now, define d¯ : X̄ × X̄ → [0, ∞) by
¯ B) = lim d(an , bn )
d(A, ∀A, B ∈ X̄. (2.2.2)
n→∞

The limit above exists since

|d(an+p , bn+p ) − d(an , bn )| ≤ d(an+p , an ) + d(bn+p , bn ) ,


1
Felix Hausdorff, German mathematician, 1868–1942.
2.2 Completeness 37

which says that (d(an , bn )) is a Cauchy sequence in (R, | · |).


Note also that the limit in (2.2.2) does not depend on representatives,
as the following inequality shows
|d(an , bn ) − d(an , bn )| ≤ d(an , an ) + d(bn , bn ) → 0 .
Thus d¯ is well defined. It is easy to check that d¯ is a metric.
Now, let ψ : X → X̄ be the mapping which associates with every
a ∈ X the class A of the constant sequence (a, a, . . . ): ψ(a) = A.
Obviously, ψ is injective, so if we denote X1 = ψ(X), then ψ is a
bijection between X and X1 . Moreover, for any A, B ∈ X1 we have
¯
d(ψ(a), ¯ B) = lim d(a, b) = d(a, b).
ψ(b)) = d(A,
¯ are isomorphic, i.e., (j) holds true.
Hence (X, d) and (X1 , d)
Let us now prove (jj). To this purpose, let A be an arbitrary element
of X̄ and let (an ) be a representative of A. For each k ∈ N denote
by Ak the class of the constant sequence (ak , ak , . . . ). Since (an ) is a
Cauchy sequence in (X, d), we can write
∀ε > 0, ∃N ∈ N : d(ak+p , ak ) < ε ∀k > N, p ∈ N.
Therefore,
¯ Ak ) = lim d(am , ak ) ≤ ε ∀k > N .
d(A,
m→∞

This shows that A is approximated by Ak ∈ X1 , hence (jj) holds true.


In order to prove that (X̄, d) ¯ is complete, let (An ) be a Cauchy se-
¯
quence in (X̄, d). For each class Ak there is a class Bk ∈ X1 such
¯ k , Bk ) < 1/k (see (jj)). Notice that Bk is the class of some
that d(A
constant sequence (bk , bk , . . . ) with bk ∈ X. We can show that (bk ) is
a Cauchy sequence in (X, d):
¯ k , Bm )
d(bk , bm ) = d(B
¯ k , Ak ) + d(A
≤ d(B ¯ k , Am ) + d(A
¯ m , Bm )
1 1 ¯ m , Bm ) → 0,
≤ + + d(A
k m
as k, m → ∞, so the class B of the sequence (bk ) belongs to X̄. We
¯ Indeed, given ε > 0,
claim that B is the limit of (Ak ) with respect to d.
d(B, ¯ Bk ) + d(B
¯ Ak ) ≤ d(B, ¯ k , Ak )
1
= lim d(bm , bk ) +
m→∞ k
< ε
38 2 Metric Spaces

for large enough k, so limk→∞ d(A ¯ k , B) = 0, as claimed. Therefore


¯ is complete.
(X̄, d)
Finally, we need to show that any two complete metric spaces (X̄, d) ¯
and (X̂, d)ˆ satisfying (j) and (jj) are isometric. Let X1 ⊂ X̄ and
X2 ⊂ X̂ such that each of these spaces is isometric to (X, d). Let
g : (X, d) → (X1 , d) ¯ and h : (X, d) → (X2 , d) ˆ be the corresponding
isometries. Then (X1 , d) ˆ are isometric, and θ = h ◦ g −1 is
¯ and (X2 , d)
an isometry between these spaces.
Let A be an arbitrary element of X̄. By (jj) there exists a sequence
(An ) in X1 such that d(A ¯ n , A) → 0. Obviously, Bn = θ(An ) ∈ X2 and
(Bn ) is a Cauchy sequence in (X̂, d), ˆ so it is convergent since (X̂, d)ˆ
is complete. Let B ∈ X̂ be its limit: d(B ˆ n , B) → 0. Denote by θ̃
the mapping that takes A to B. Note that B does not depend on the
choice of (An ) so it is unique for each A, i.e., θ̃ is well defined. In fact,
θ̃ is an extension of θ to the whole X̄. It is easily seen that θ̃ is a
bijection between X̄ and X̂.
It remains to prove that θ̃ is an isometry. Let A, A ∈ X̄ and let
(An ), (An ) sequences in X1 which converge, respectively, to A and
A with respect to d. ¯ Let B, B  be the limits of Bn = θ(An ) and
Bn = θ(An ) in (X̂, d).
ˆ By letting n tend to infinity in the equation
ˆ n , B  ) = d(A
d(B ¯ n , A ) ,
n n

we obtain  
dˆ θ̃(A), θ̃(A ) = d(B,
ˆ B  ) = d(A,
¯ A ),
by using the inequality
¯ n , A ) − d(A,
|d(A ¯ A )| ≤ d(A ¯  , A ) ,
¯ n , A) + d(A
n n

and the similar one for Bn , Bn , B, B  . Therefore, (X̄, d)


¯ and (X̂, d)
ˆ
are indeed isometric.

Remark 2.9. Let X be a nonempty subset of a given complete metric


space (Z, d). Then (Cl X, d) is also a complete metric space, where
Cl X is the closure of X in (Z, d), also denoted X̄. Clearly, (Cl X, d)
¯ in Theorem 2.8, so (Cl X, d) can be regarded
plays the role of (X̄, d)
as the completion of X with respect to d.
To illustrate this case, consider X = (0, 1] and Z = R with d(x, y) =
|x − y|. Then, Cl X = [0, 1] and so ([0, 1], d) is the
completion of ((0, 1], d) (which is not itself complete). Further ex-
amples will be discussed later, including examples involving function
spaces.
2.2 Completeness 39

Note that in Theorem 2.8 we had to construct X̄ because it was not a


priori known.

We continue with Baire’s Theorem2 that is used to derive some im-


portant principles of Functional Analysis: the Uniform Boundedness
Principle, the Open Mapping Theorem, and the Closed Graph Theo-
rem (see Theorems 4.7, 4.8, and 4.10).

Theorem 2.10 (Baire). Let (X, d) be a complete metric space and let
Xn ⊂ X, n ∈ N, be closed sets satisfying

Int Xn = ∅ ∀n ∈ N . (2.2.3)

Then,  


Int Xn = ∅. (2.2.4)
n=1

Proof. Notice first that for all F ⊂ X we have

Cl(X \ F ) =: X \ F = X ⇐⇒ Int F = ∅ .

So by (2.2.3) Dn = X \ Xn is dense in X and is also open for all n ∈ N.


We have to show that (2.2.4) holds or, equivalently, that M := ∩∞n=1 Dn
is dense in X, i.e., for every open set D ⊂ X we have D ∩ M = ∅. Fix
such an open set D and choose some x0 ∈ D and r0 > 0 such that
the closed ball B(x0 , r0 ) ⊂ D. Since D1 is open and dense in X there
exist x1 ∈ B(x0 , r0 ) ∩ D1 and r1 > 0 such that
r0
B(x1 , r1 ) ⊂ B(x0 , r0 ) ∩ D1 , 0 < r1 < .
2
By induction one can find sequences (xn ) and (rn ) such that
rn
B(xn+1 , rn+1 ) ⊂ B(xn , rn ) ∩ Dn+1 , 0 < rn+1 < ,
2
for n = 0, 1, 2, . . . It is easily seen that (xn ) is Cauchy, hence convergent
(since (X, d) is complete). If a denotes its limit then a ∈ D ∩ M , hence
D ∩ M = ∅, as claimed.

2
René-Louis Baire, French mathematician, 1874–1932.
40 2 Metric Spaces

2.3 Compact Sets


Let A be a subset of a metric space (X, d). A cover of A is a collection
of sets {Di }i∈I whose union contains A:

A⊂ Di ,
i∈I

where I is a finite or infinite index set. If all Di are open sets then
{Di }i∈I is called an open cover.

Definition 2.11. A subset A of (X, d) is called compact if every


open cover of A has a finite subcover.

The next result is a characterization of compact sets in metric spaces.

Theorem 2.12. A subset A of a metric space (X, d) is compact if and


only if every sequence in A has a subsequence that converges to a point
of A (in other words A is sequentially compact).

Proof. is divided into several steps.

Step 1: If A is compact then A is closed.


We need to show that X \ A is open. Let x ∈ X \ A. If y ∈ A we have
d(y, x) > 0 and so y belongs to

Dn = {z ∈ X; d(z, x) > 1/n}

for n ∈ N large enough. Thus {Dn }n∈N is an open cover of A. Since A


is compact, there is a finite subcover of A. In fact, this subcover can be
reduced to one set DN with N large. By construction B(x, N1 ) ⊂ X \A,
and hence X \ A is open, therefore A is closed, as claimed.

Step 2: If A is compact and B ⊂ A is closed, then B is compact.


If {Di }i∈I is an open cover of B, then {Di }i∈I ∪ {X \ B} is an open
cover of A. Since A is compact, we can extract a finite subcover of A,
say {Di1 , Di2 , . . . , Dim , X \ B}. Thus {Di1 , Di2 , . . . , Dim } is a finite
subcover of B extracted from {Di }i∈I .

Step 3: A being compact implies A is sequentially compact.


Assume, by contradiction, that there is a sequence (xn ) in A that
has no convergent subsequence. So (xn ) has infinitely many distinct
points y1 = xn1 , y2 = xn2 , . . . such that for each ym there is an open
2.3 Compact Sets 41

ball B(ym , rm ) containing no other yi (otherwise ym is a cluster point


of the sequence (yi )). The set C = {y1 , y2 , . . . } is closed since all its
points are isolated. By Step 2, C is compact. On the other hand,
{B(ym , rm )}m∈N is an open cover of C which has no finite subcover.
Hence (xn ) must have a convergent subsequence. Its limit belongs to
A, since A is closed (see Step 1).

Step 4: If A is sequentially compact, then for every open cover {Di }i∈I
of A, there exists an r > 0 such that ∀y ∈ A, B(y, r) is contained in
some Di .
Assume to the contrary that this is not the case. Thus there exists
an open cover {Di } of A such that ∀n ∈ N there is some yn ∈ A
so that B(yn , n1 ) is not contained in any Di . By hypothesis (yn ) has
a subsequence (z1 = yn1 , z2 = yn2 , . . . ) converging to some z ∈ A.
Obviously, z belongs to some Di0 and since Di0 is open and zn → z,
we can choose some large N such that B(zN , N1 ) ⊂ Di0 , which is a
contradiction.

Step 5: A being sequentially compact implies that for all ε > 0 there is
a finite number of open balls of radius ε covering A (i.e., A is totally
bounded).
We need to analyze the case when A is not finite, otherwise the conclu-
sion is obvious. Assume that A is not totally bounded, i.e., for some
ε > 0 we cannot cover A with finitely many open balls of radius ε.
Choose y1 ∈ A and y2 ∈ A \ B(y1 , ε). By the same assumption there
exists a point y3 ∈ A \ B(y1 , ε) ∪ B(y2 , ε) . Repeating this process we
obtain a sequence
 
yn ∈ A \ ∪n−1i=1 B(yi , ε) ,

which satisfies d(yn , ym ) ≥ ε for all n, m ∈ N, n = m. In other


words, (yn ) has no Cauchy subsequence and hence has no convergent
subsequence, thus contradicting sequential compactness.

Step 6: If A is sequentially compact then A is compact.


Let {Di } be an open cover of A. Associate with this cover a positive r
given by Step 4. By Step 5 (see also its proof) there is a finite number
of points, say y1 , y2 , . . . , yp ∈ A, such that
A ⊂ ∪pj=1 B(yj , r) .
By Step 4, each ball B(yj , r) is contained in some Dij . Hence {Di1 ,
Di2 , . . . , Dip } is a finite (open) subcover of A.
42 2 Metric Spaces

Definition 2.13. A set A ⊂ (X, d) is called relatively compact if


Cl A is compact.

Corollary 2.14. A set A ⊂ (X, d) is relatively compact if and only if


every sequence in A has a convergent subsequence.

Proof. Assume that any sequence in A has a convergent subsequence


(its limit being a point of Cl A). Then Cl A is sequentially compact
(hence compact) in (X, d). Indeed, if (xn ) is a sequence in Cl A, then
there exists a sequence (yn ) in A such that d(xn , yn ) < 1/n for all
n ∈ N. As (yn ) has a convergent subsequence (ynk ), it follows that
(xnk ) is also convergent. So the statement of the corollary holds true
by Theorem 2.12.

Now let us recall a result due to Bolzano and Weierstrass.

Theorem 2.15 (Bolzano–Weierstrass). Every bounded sequence in Rk


endowed with the Euclidean norm has a convergent subsequence.

Proof. This theorem is known for k = 1 (see Theorem 1.11) and ex-
tends easily to Rk : a bounded sequence in Rk is bounded in each
coordinate.

From the proof of Theorem 2.12 we see that every compact set in a
metric space is closed and bounded. The converse implication is not
true in general. However, we have the following result attributed to
Heine and Borel.3

Corollary 2.16 (Heine–Borel). Let ∅ = A ⊂ Rk endowed with the


usual Euclidean metric. A is compact if and only if A is closed and
bounded (with respect to the same metric).

Proof. The forward implication is valid in every metric space, as ob-


served above. Conversely, assume that A is closed and bounded. Then
any sequence in A is bounded so it has a convergent subsequence (cf.
Theorem 2.15). Its limit belongs to A because A is closed. Thus A is
sequentially compact, hence compact by Theorem 2.12.

Remark 2.17. The Heine–Borel Theorem extends to any finite dimen-


sional space with Euclidean metric but may not be true for other

3
Heinrich Eduard Heine, German mathematician, 1821–1881; Émile Borel,
French mathematician, 1871–1956.
2.3 Compact Sets 43

metrics. As an example, consider (R, d0 ), where d0 is the discrete


metric 
0 x = y,
d0 (x, y) =
1 x = y.

Let A = N ⊂ R. A is bounded with respect to d0 because A ⊂ B(0, 2).


A = N is closed with respect to d0 , but it is not compact because the
open cover {B(n, 1/2)}n∈N has no finite subcover.
A collection of subsets of (X, d) is said to have the finite intersection
property if the intersection of every finite sub-collection of the family
is nonempty.

Theorem 2.18. If a collection of compact subsets of a metric space


(X, d), say {Ki }i∈I , has the finite intersection property, then ∩i∈I Ki =
∅.

Proof. The statement is trivial if I is a finite set, so let us assume that


I is infinite. Assume to the contrary that ∩i∈I Ki = ∅. Hence

X = ∪i∈I (X \ Ki )
 
= (X \ Ki0 ) ∪i∈I1 (X \ Ki ) , (2.3.5)

where i0 ∈ I is an arbitrary but fixed index, and I1 = I \ {i0 }. It


follows that
Ki0 ⊂ ∪i∈I1 (X \ Ki ) .
As Ki0 is compact and {X \ Ki }i∈I1 is an open cover of Ki0 , there is
a finite set J ⊂ I1 such that

Ki0 ⊂ ∪i∈J (X \ Ki ).

Therefore (see (2.3.5)),

X = ∪i∈J1 (X \ Ki ) ,

where J1 = J ∪ {i0 }, or equivalently

∅ = ∩i∈J1 Ki ,

which contradicts our assumption because J1 is finite.


44 2 Metric Spaces

2.4 Continuous Functions on Compact Sets


Let (X, d) and (X1 , d1 ) be two metric spaces. A function

f : D ⊂ (X, d) → (X1 , d1 )

is said to be continuous at some point x0 ∈ D if for every neigh-


borhood V ⊂ (X1 , d1 ) of f (x0 ) there exists a neighborhood U ⊂ (X, d)
of x0 such that f (U ∩ D) ⊂ V , or equivalently

∀ε > 0, ∃δ > 0 : ∀x ∈ D, d(x, x0 ) < δ ⇒ d1 (f (x), f (x0 )) < ε .


(2.4.6)
U and δ depend on ε and x0 . The continuity of f at x0 ∈ D can also
be equivalently expressed by using sequences

xn ∈ D, d(xn , x0 ) → 0 =⇒ d1 (f (xn ), f (x0 )) → 0 .

If f is continuous at all x0 ∈ D then we say that f is continuous


on D (or simply continuous). The function f is called uniformly
continuous on D if δ can be the same for all x0 ∈ D, i.e., δ is
independent of x0 ∈ D (it depends only on ε).

Theorem 2.19. If D ⊂ (X, d) is a nonempty compact set and f :


D → (X1 , d1 ) is continuous (on D), then the following hold:

• f is uniformly continuous on D;

• the set f (D) := {f (x); x ∈ D} is compact in (X1 , d1 );

• C(D; X1 ) := {f : D → (X1 , d1 ); f continuous on D} is a metric


˜ g) = sup
space with respect to the metric d(f, x∈D d1 (f (x), g(x)).

˜ is also complete.
If in addition (X1 , d1 ) is complete, then (C(D; X1 ), d)

The proof is left to the reader as an exercise.

Theorem 2.20 (Weierstrass). If D ⊂ (X, d) is a nonempty compact


set and f : D → R (R being equipped with the usual metric), then
f (D) is closed and bounded, and there exist x0 , y0 ∈ D such that
f (x0 ) = inf f (D) and f (y0 ) = sup f (D).
2.4 Continuous Functions on Compact Sets 45

Proof. The first part of the theorem follows from Theorem 2.19 which
in particular says that f (D) is compact (in R), hence closed and
bounded. So the infimum and supremum of f (D), denoted m and
M , are finite numbers. Now, for all n ∈ N there exists an xn ∈ D such
that
1
m ≤ f (xn ) < m + . (2.4.7)
n
As D is a compact set, (xn ) has a subsequence which converges to
some x0 ∈ D. This fact combined with (2.4.7) implies m = f (x0 ).
Similarly, there is a point y0 ∈ D such that M = f (y0 ).

Equivalent Norms. Let X be a linear space over K (as usual K is


either R or C). Two norms on X, say  ·  and  · ∗ , are said to be
equivalent if there exist two positive constants C1 , C2 such that
C1 x ≤ x∗ ≤ C2 x ∀x ∈ X . (2.4.8)
Obviously, two equivalent norms on X generate the same topology
on X.
If X is a k-dimensional linear space, k ∈ N, with a basis B = {u1 , . . . ,
uk }, then X can be equipped with different norms, such as
xmax = max |αi |,
1≤i≤n
 n 1/p
xp = |αi |p , p ∈ [1, ∞),
i=1

for all x = ki=1 αi ui ∈ X. Note that  · 2 is precisely the Euclidean
norm of X introduced before.
Theorem 2.21. If X is a k-dimensional linear space, k ∈ N, then
any two norms on X are equivalent.
Proof. It is enough to show that any norm · on X is equivalent
k to the
Euclidean norm  · 2 . On the one hand, for any x = i=1 αi ui ∈ X,
we have

k
x ≤ |αi | · ui 
i=1

k
≤ max ui  · |αi |
1≤i≤k
i=1

≤ k max ui  · x2 . (2.4.9)
1≤i≤k
46 2 Metric Spaces

We have used the triangle inequality√and the Bunyakovsky–Cauchy–


Schwarz inequality. Denoting C := k max1≤i≤k ui , we can derive
from (2.4.9)
x ≤ Cx2 ∀x ∈ X . (2.4.10)
In order to get the other inequality we use Theorem 2.20. Observe
that  ·  is a continuous function on (X,  · 2 ):

|x − x0 | ≤ x − x0  ≤ Cx − x0 2 ,

so  ·  is bounded and attains its infimum, say C1 , on the unit sphere


S2 (0, 1) = {x ∈ X; x2 = 1} (which is compact in (X,  · 2 )), i.e.,

x ≥ C1 ∀x ∈ S2 (0, 1) . (2.4.11)

C1 cannot be zero since it is the value of  ·  at a point in S2 (0, 1).


From (2.4.11) we easily derive

C1 x2 ≤ x ∀x ∈ X . (2.4.12)

According to (2.4.10) and (2.4.12), the two norms are equivalent, as


claimed.

Remark 2.22. In infinite dimensional linear spaces there exist norms


which are not equivalent. For instance, let us consider the following
two norms on the real linear space X = C[a, b] := C([a, b]; R), −∞ <
a < b < +∞,
b
f  = sup{|f (t)|; a ≤ t ≤ b}, f 1 = |f (t)| dt .
a
We have
f 1 ≤ (b − a)f  ∀f ∈ X ,
i.e., the sup-norm  ·  is stronger than  · 1 . But the two norms are
not equivalent. Indeed, let (fn ) be the sequence in X defined by

0, a ≤ t ≤ b − n1 ,
fn (t) =
nt + 1 − nb, b − n1 < t ≤ b ,

where n ∈ N, n > 1/(b − a). Clearly fn  = 1, but


b
1
fn 1 = |nt + 1 − nb| dt = ,
b− 1 2n
n

so there does not exist C such that fn  ≤ Cfn 1 because fn 1 → 0
as n → ∞.
2.4 Continuous Functions on Compact Sets 47

Remark 2.23. It follows from Theorem 2.21 that any norm on a finite
dimensional linear space generates the same topology as that defined
by the Euclidean norm, so any topological result involving the Eu-
clidean norm is also valid with respect to any other norm. In particu-
lar, the Heine–Borel Theorem is valid in any finite dimensional linear
space equipped with any norm. Throughout the rest of this book,
Rk and any other finite dimensional linear space is always considered
as a normed space, equipped with the norm topology (generated by
any convenient norm), unless otherwise specified. The next result is a
characterization (due to Riesz4 ) of the finite dimensionality of normed
spaces clarifying the Heine–Borel Theorem.

Theorem 2.24 (Riesz). Let (X,  · ) be a normed linear space. X is


finite dimensional if and only if every closed bounded subset of X is
compact.

In order to prove Theorem 2.24, we need the following lemma.

Lemma 2.25. Let (X,  · ) be a normed space. Let X1 ⊂ X be a


proper, closed linear subspace of X. Then there exists x0 ∈ X \ X1
such that

x0  = 1 ,
1
x − x0  ≥ ∀x ∈ X1 .
2
Proof. Choose x1 ∈ X \ X1 and let ρ = d(x1 , X1 ) := inf{x1 − z; z ∈
X1 }. We first prove ρ > 0. Suppose ρ = 0. Then there exists a
sequence zn ∈ X1 such that x1 − zn  < 1/n, hence zn → x1 . As X1
is closed, this implies x1 ∈ X1 , which is a contradiction.
By the definition of ρ there exists x2 ∈ X1 such that x1 − x2  < 2ρ.
Let
1
x0 = (x1 − x2 ) .
x1 − x2 

4
Frigyes Riesz, Hungarian mathematician, 1880–1956.
48 2 Metric Spaces

Clearly x0 ∈ X \ X1 and x0  = 1. For x ∈ X1 we have

x − x0  = x − x1 − x2 −1 (x1 − x2 )


1
= x1 − v
x1 − x2 
1
≥ x1 − v

ρ 1
≥ = ,
2ρ 2
where v = x2 + x1 − x2 x ∈ X1 .

Proof of Theorem 2.24. The necessity part follows from the Heine–
Borel Theorem extended to finite dimensional linear spaces (see Re-
mark 2.23).
To prove sufficiency, assume by way of contradiction that X is not
finite dimensional, i.e., there exist infinitely many distinct points in
X, say x1 , x2 , . . . , such that for all n ∈ N, Bn = {x1 , x2 , . . . , xn } is a
linearly independent system. Let Xn = Span Bn . Now, (Xn ,  · ) is
a closed space and Xn ⊂ Xn+1 (proper inclusion) for all n ∈ N. By
Lemma 2.25, there exists yn ∈ Xn+1 \ Xn for n ∈ N such that yn  = 1
and
yn − x ≥ 1/2 ∀x ∈ Xn .
In particular yn − ym  ≥ 1/2 for all m, n ∈ N, m = n. So (yn ) has
no Cauchy subsequence, hence no convergent subsequence. On the
other hand, yn ∈ Cl B(0, 1) ∀n ∈ N, so (yn ) should have a conver-
gent subsequence (since Cl B(0, 1) is compact by assumption). This
contradiction completes the proof.

Arzelà–Ascoli Criterion5
Let (X, d) and (X1 , d1 ) be metric spaces and let ∅ = A ⊂ X. Denote as
usual by C(A; X1 ) the set of all continuous functions from A ⊂ (X, d)
to (X1 , d1 ).
Definition 2.26. A family of functions F ⊂ C(A; X1 ) is called
equicontinuous if for all ε > 0 and all x ∈ A there exists δ > 0
such that y ∈ A and d(x, y) < δ implies d1 (f (x), f (y)) < ε for all
f ∈ F, i.e., δ = δ(ε, x) is independent of f .
5
Cesare Arzelà, Italian mathematician, 1847–1912; Giulio Ascoli, Italian math-
ematician, 1843–1896.
2.4 Continuous Functions on Compact Sets 49

Definition 2.27. If in addition δ = δ(ε) (i.e., δ is independent of x


and f ), then F is uniformly equicontinuous, i.e., ∀ε > 0, ∀x, y ∈
A, d(x, y) < δ implies d1 (f (x), f (y)) < ε for all f ∈ F.

Remark 2.28. If A ⊂ (X, d) is compact and F ⊂ C(A; X1 ) is equicon-


tinuous, then F is uniformly equicontinuous (see Exercise 2.22 below).
Note also that if A is compact then C(A; X1 ) is a metric space with
respect to the metric d(f,˜ g) = sup
x∈A d1 (f (x), g(x)); if in addition
˜ is complete too, and in partic-
(X1 , d1 ) is complete then (C(A; X1 ), d)
ular C(A; R ), k ∈ N, is a Banach space with respect to the sup-norm
k

f C(A; Rk ) = supx∈A f (x), where  ·  is a norm of Rk .

Theorem 2.29 (Arzelà–Ascoli Criterion). Let ∅ = A ⊂ (X, d) be


compact. Assume that F ⊂ C(A, Rk ) is equicontinuous and bounded
in C(A; Rk ) (i.e., ∃M > 0 such that f (x) ≤ M , ∀x ∈ A, ∀f ∈ F).
Then F is relatively compact in C(A; Rk ) equipped with the sup-norm.

Proof. For any δ > 0 we have A ⊂ ∪x∈A B(x, δ) and since A is com-
pact, there exists a finite subcover, so that A ⊂ ∪pj=1 B(yj , δ). Let
Cδ = {y1 , y2 , . . . , yp } and consider C = ∪i∈N C1/i . C is dense in A and
countable so C = {x1 , x2 , . . . }.
In order to prove that F is relatively compact in C(A; Rk ) it suffices
to show that any sequence in F has a convergent subsequence in this
space (cf. Corollary 2.14). So let (fn )n∈N be a sequence in F. Since
F is bounded in C(A; Rk ), then (fn (x1 )N ) is bounded in Rk so there
exists a subsequence of (fn ),
f11 , f12 , . . . , f1n , . . .
which is convergent at x = x1 . By the same assumption this subse-
quence has a subsequence
f21 , f22 , . . . , f2n , . . .
which is convergent at x = x2 (and at x = x1 as well). Continuing the
process we obtain successive subsequences
fm1 , fm2 , . . . , fmn , . . .
..
.
Think of it as an infinite matrix and consider the diagonal sequence
(gn ) = (f11 , f22 , . . . , fnn , . . . ) which converges at any point of C. On
the other hand, as F is equicontinuous and A is compact, F is in fact
50 2 Metric Spaces

uniformly equicontinuous, i.e., for every ε > 0 there exists a δ = δ(ε) >
0 such that

∀z, w ∈ A, d(z, w) < δ implies gn (z)−gn (w) < ε ∀n ∈ N . (2.4.13)

We can choose δ = 1/i, with i ∈ N sufficiently large.


Now, for a given ε fix a δ = 1/i, so Cδ = C1/i is a finite set Cδ =
{y1 , . . . , yp } ⊂ C. If x ∈ A then it belongs to a ball B(yj , δ) for some
j ∈ {1, . . . , p} and we have, by (2.4.13) and the convergence of (gn (yj )),

gn (x) − gm (x)≤ gn (x) − gn (yj ) + gn (yj ) − gm (yj ) + gm (yj )
− gm (x) < ε + ε + ε = 3ε ∀n, m > N (ε, j) .

Therefore,

gn − gm C(A; Rk ) ≤ 3ε ∀n, m > N (ε) := max N (ε, j) .


j∈{1,...,p}

As C(A; Rk ) is a Banach space it follows that (gn ) converges in this


space.

Notice that in the above proof we have used two essential arguments:
the completeness of the space (Rk ,  · ) (implying that C(A; Rk ) is
a Banach space) and the fact that the set {f (x); f ∈ F, x ∈ X1 } is
bounded in Rk (equivalently, relatively compact in this space) for all
x ∈ A. So the following generalization holds true:

Theorem 2.30. Let F ⊂ C(A; X1 ) where A = ∅ is a compact subset


of (X, d) and (X1 , d1 ) is a complete metric space. Assume that F is
equicontinuous and {f (x); f ∈ F} is relatively compact in (X1 , d1 ) for
all x ∈ A. Then F is relatively compact in C(A; X1 ).

Peano’s Existence Theorem6


In what follows we illustrate the Arzelà–Ascoli Criterion with Peano’s
Existence Theorem which is a fundamental result in the theory of
ordinary differential equations.

Theorem 2.31 (Peano). Let a, b ∈ (0, ∞), t0 ∈ R, x0 ∈ Rk , and Rk


be equipped with the norm v = max1≤i≤k |vi |. Let D be the set

D = {(t, v) ∈ R × Rk ; |t − t0 | ≤ a, v − x0  ≤ b} ⊂ Rk+1
6
Giuseppe Peano, Italian mathematician, 1858–1932.
2.4 Continuous Functions on Compact Sets 51

and let f : D → Rk be a continuous function. Then there exists a


continuously differentiable function x : [t0 − δ, t0 + δ] → Rk satisfying
the equation

x (t) = f (t, x(t)) ∀t ∈ [t0 − δ, t0 + δ] , (2.4.14)

and the initial (Cauchy) condition

x(t0 ) = x0 , (2.4.15)

where δ = min(a, b/M ) with M = sup{f (t, v); (t, v) ∈ D}. M is


assumed to be a positive number, because the case M = 0 ⇐⇒ f ≡ 0
is trivial.
Proof. We shall use Euler’s method of polygonal lines.7
Since f ∈ C(D; Rk ) and D is compact, f is uniformly continuous, i.e.,
∀ε > 0, ∃δ = δ1 (ε) > 0 such that

(t, v1 ), (s, v2 ) ∈ D, |t − s| < δ1 , v1 − v2  < δ1 ⇒ f (t, v1 )


− f (s, v2 ) < ε.

We shall only prove existence on I := [t0 , δ]. By symmetry we get the


other side, however we have to check that the solution is differentiable
at t = t0 . Given

xr (t), t ∈ [t0 , t0 + δ],
x(t) =
xl (t), t ∈ [t0 − δ, t0 ],

we have
dxl dxr
x− (t0 ) = (t0 ) = f (t0 , x0 ) = (t0 ) = x+ (t0 ) .
dt dt
Consider the uniform subdivision

Δ : t 0 < t1 < · · · < tN = t0 + δ ,

i.e., tj = t0 + jhε , j = 0, 1, . . . , N , with hε ≤ min{δ1 (ε), δ1M(ε) }.


Now for a given ε > 0 construct φε : I → Rk as

φε (tj ) + (t − tj )f (tj , φε (tj )), tj < t ≤ tj+1 ,
φε (t) =
x0 , t = t0 .
7
Leonhard Euler, Swiss mathematician, physicist, astronomer, logician, and
engineer, 1707–1783.
52 2 Metric Spaces

The graph of φε is a polygonal line called Euler’s polygonal line and we


shall see that it approximates for ε small the trajectory of the solution
of problem (2.4.14) and (2.4.15). For k = 1 Euler’s polygonal line can
be visualized in the (t, x) coordinate plane.
Consider the family F = {φε ; ε > 0}. Let us first show that φε
is well defined on I for all ε > 0. On the interval [t0 , t1 ], φε (t) =
x0 + (t − t0 )f (t0 , x0 ) and

φε (t) − x0  ≤ M (t − t0 ) ≤ M δ ≤ b ,

so (t, φε (t)) ∈ D. In particular (t1 , φε (t1 )) ∈ D.


So on [t1 , t2 ], φε (t) = φε (t1 ) + (t − t1 )f (t1 , φε (t1 )) is well defined and

φε (t) − x0  ≤ φε (t) − φε (t1 ) + φε (t1 ) − x0 


≤ (t − t1 )M + (t1 − t0 )M
≤ (t − t0 )M
≤ Mδ
≤ b,

so by induction φε (t) is well defined and continuous on I and

φε (t) − x0  ≤ M (t − t0 ) ≤ M δ ≤ b (2.4.16)

for all t ∈ I. Thus

φε (t) ≤ φε (t) − x0  + x0  ≤ b + x0  .

Therefore, F is a bounded subset of C(I; Rk ). In order to apply the


Arzelà–Ascoli Theorem, we need to show that F is equicontinuous.
If t, s ∈ [tj , tj+1 ] then φε (t) − φε (s) ≤ M |t − s|. If t, s are in different
intervals, say t ∈ [tp , tp+1 ], s ∈ [tq , tq+1 ] with p < q, then

φε (t) − φε (s) ≤ φε (s) − φε (tq ) + φε (tq ) − φε (tq−1 ) + · · ·


+ φε (tp+1 ) − φε (t)
≤ M (s − tq ) + M (tq − tq−1 ) + · · · + M (tp+1 − t)
≤ M (s − t)
= M |t − s| ,

so F is equicontinuous, in fact it is Lipschitz equicontinuous. Thus


by the Arzelà–Ascoli Criterion there is a sequence εn → 0+ such that
2.4 Continuous Functions on Compact Sets 53

φεn converges in C(I; Rk ) to some φ ∈ C(I; Rk ) as n → ∞. Also


(see (2.4.16))
φ(t) − x0  ≤ b ,
so (t, φ(t)) ∈ D for all t ∈ I.
Now it simply remains to prove that x = φ(t) is a solution of prob-
lem (2.4.14) and (2.4.15). Define

φεn (t) − f (t, φεn (t)) t = tnj ,
gεn (t) =
0 otherwise ,

where {tnj } is the subdivision of I corresponding to εn . If tnj < t < tnj+1


then φεn (t) = 0 + f (tnj , φεn (tnj )). For t ∈ (tnj , tnj+1 ), we have |t − tnj | ≤
hεn ≤ δ1 (εn ), and

φεn (t) − φεn (tnj  ≤ M |t − tnj | ≤ M hεn ≤ δ1 (εn ) .

The final inequality holds by the definition of hε . Because f is uni-


formly continuous

gεn (t) ≤ εn ∀n, ∀t ∈ I ,

so gεn converges uniformly to 0.


On the other hand, for all t ∈ I
t t
gεn (s) ds = φεn (t) − x0 − f (s, φεn (s)) ds . (2.4.17)
t0 t0

Now since φεn converges uniformly to φ on I and f is continuous on


D, f (s, φεn (s)) → f (s, φ(s)) uniformly on I as n → ∞. Therefore,
passing to the limit in (2.4.17), we get
t
φ(t) = x0 + f (s, φ(s)) ds , t ∈ I,
t0

so x = φ(t) is a solution to the given Cauchy problem (2.4.14) and


(2.4.15).

Remark 2.32. There is no guarantee of uniqueness. For example the


Cauchy problem  
x (t) = 2 |x(t)| ,
x(0) = 0 ,
54 2 Metric Spaces

with a = b = 1, D = [−1, 1]×[−1, 1], f (t, v) = 2 |v|, has the following
solutions:

x1 (t) = 0, −1 ≤ t ≤ 1,

t2 , −1 ≤ t ≤ 0,
x2 (t) =
0, 0 < t ≤ 1,

0, −1 ≤ t ≤ 0,
x3 (t) =
−t , 0 < t ≤ 1,
2

t2 , −1 ≤ t ≤ 0,
x4 (t) =
−t , 0 < t ≤ 1.
2

Note that all these solutions are defined on the whole interval [−1, 1],
even if the existence interval given by Peano’s Theorem is smaller:
δ = min{a, b/M } = min{1, 1/2} = 1/2. A solution which is defined on
the whole initial interval [t0 − a, t0 + a] in the case of problem (2.4.14)
and (2.4.15)) is called a global solution. In particular, the above
four solutions are global solutions. In fact, there are infinitely many
solutions of the above Cauchy problem (see Exercise 2.28 below).
Peano’s Theorem provides only a local solution, i.e., a solution de-
fined on an interval around t0 which in general is smaller than the
initial interval. If f in (2.4.14) is defined on an open set Ω ⊂ Rk+1
then one can associate with each pair (t0 , x0 ) ∈ Ω a box D ⊂ Ω so
that Peano’s Theorem gives a local solution to problem (2.4.14) and
(2.4.15) defined on an interval which depends on (t0 , x0 ).

By requiring additional conditions one can guarantee uniqueness. For


example, we get uniqueness if, in addition, f satisfies a Lipschitz con-
dition: ∃L > 0 such that

f (t, v1 ) − f (t, v2 ) ≤ Lv1 − v2  (2.4.18)

for all (t, v1 ), (t, v2 ) ∈ D. Let x = φ(t), y = ψ(t) for t ∈ I = [t0 , t0 + δ]


be two solutions of problem (2.4.14) and (2.4.15). Then
t
φ(t) − ψ(t) ≤ L φ(s) − ψ(s) ds,
t0

or, equivalently,

d −Lt t 
e φ(s) − ψ(s) ds ≤ 0 ,
dt t0
2.5 The Banach Contraction Principle 55

for all t ∈ I. It follows easily that φ(t) = ψ(t) for all t ∈ I. Uniqueness
on [t0 − δ, t0 ] follows by converting problem (2.4.14) and (2.4.15) on
[t0 − δ, t0 ] into a similar Cauchy problem on [0, δ] by using the change
τ = t0 − t. Therefore, we can state the following result.

Theorem 2.33. Under the assumptions of Peano’s Theorem (Theo-


rem 2.31), plus (2.4.18), there exists a unique function x ∈ C 1 ([t0 −
δ, t0 + δ]; R) satisfying (2.4.14) and (2.4.15), where δ is the same as
in Theorem 2.31.

Remark 2.34. Peano’s Theorem is no longer valid in infinite dimen-


sions, i.e., if Rk is replaced by an infinite dimensional Banach space
(see [18]).
Euler’s Difference Scheme.
If x = φ(t) is unique, then φε → φ in C(I; Rk ) as ε → 0+ so the
polygonal line corresponding to φε approximates the graph of φ. Let
Δ : t0 < t1 < · · · < tN = t0 + δ with tj = t0 + jh and h = Nδ . The
points (tj , φε (tj )) give us the polygonal line approximation. Denoting
φj := φε (tj ) we have

φj+1 = φj + hf (tj , φj ), j = 0, 1, . . . , N − 1,
φ0 = x0 .

This is an explicit difference scheme, called Euler’s scheme. Its solu-


tion provides the vertices of a polygonal line approximation, so Euler’s
scheme is important for the numerical analysis of the solutions of dif-
ferential equations.

2.5 The Banach Contraction Principle


We saw in the previous section that under the assumptions of Peano’s
Existence Theorem (Theorem 2.31) plus the Lipschitz condition (2.4.18)
the Cauchy problem

x (t) = f (t, x(t)), x(t0 ) = x0 (2.5.19)

has a unique solution x ∈ C 1 (I; Rk ), where I = [t0 − δ, t0 + δ], with


δ as defined in the statement of Theorem 2.31. This (existence and
uniqueness) result can also be derived by applying the general Banach8
8
Stefan Banach, Polish mathematician, 1892–1945.
56 2 Metric Spaces

Contraction Principle (also known as the Banach Fixed Point Theo-


rem) we present below. Before stating this principle let us explain how
problem (2.5.19) can be reduced to a fixed point problem. Note that
problem (2.5.19) is equivalent to the integral equation
t
x(t) = x0 + f (s, x(s)) ds . (2.5.20)
t0

Denote X = {v ∈ C(I; Rk ); v(t)−x0  ≤ b, t ∈ I}. This is a complete


metric space since it is a closed subset of the Banach space C(I; Rk )
equipped with the sup-norm, denoted  · C , which gives the metric
d(u, v) = u − vC . Define on X the map (operator) T by
t
(T v)(t) = x0 + f (s, v(s)) ds , ∀v ∈ X .
t0

We prefer the notation T v instead of T (v). It is easily seen that under


the assumptions above T v ∈ X for all v ∈ X, i.e., T : X → X.
Equation (2.5.20) can be simply written as

x = Tx, (2.5.21)

so the above Cauchy problem (or Eq. (2.5.20)) reduces to solving


Eq. (2.5.21) in X. In other words, the Cauchy problem (2.5.19) has
a unique solution x defined on I if and only if T has a unique fixed
point x: x = T x. We do not go into further details concerning the
above Cauchy problem, or Eq. (2.5.20), since later on we will address
Volterra equations which are more general. We simply wanted to mo-
tivate the Banach Contraction Principle which is applicable to many
other problems.

Theorem 2.35 (Banach Contraction Principle). Let (X, d) be a com-


plete metric space, and assume T : X → X is a contraction, i.e.,
∃α ∈ (0, 1) such that d(T x, T y) ≤ αd(x, y) for all x, y ∈ X. Then T
has a unique fixed point (i.e., ∃! x∗ ∈ X such that
T x∗ = x∗ ).

Proof. We will use the method of successive approximations. De-


fine a sequence xn = T xn−1 for n ∈ N with x0 ∈ X arbitrary. We have
by induction

d(xn+1 , xn ) ≤ αn d(x1 , x0 ) = αn d(T x0 , x0 ) , ∀n ∈ N . (2.5.22)


2.5 The Banach Contraction Principle 57

We now prove that (xn ) is Cauchy in (X, d):

d(xn+p , xn ) ≤ d(xn+p , xn+p−1 ) + d(xn+p−1 , xn+p−2 ) + · · ·


+ d(xn+1 , xn )

which by (2.5.22) is

≤ αn (1 + α + · · · + αp−1 )d(T x0 , x0 )
1 − αp
= αn d(T x0 , x0 )
1−α
αn
≤ d(T x0 , x0 ) .
1−α
So it is Cauchy in (X, d) (as αn → 0), and since (X, d) is complete, xn
converges to some x∗ ∈ X: d(xn , x∗ ) → 0. Now,

d(x∗ , T x∗ ) ≤ d(T x∗ , xn ) + d(xn , x∗ )


= d(T x∗ , T xn−1 ) + d(xn , x∗ )
≤ αd(x∗ , xn−1 ) + d(xn , x∗ ) ,

which converges to 0 as n → ∞, so d(x∗ , T x∗ ) ≤ 0 and thus x∗ is a


fixed point of T .
We now wish to show that x∗ is unique. Suppose that y ∗ is also a
fixed point of T , then d(x∗ , y ∗ ) = d(T x∗ , T y ∗ ) ≤ αd(x∗ , y ∗ ), so (1 −
α)d(x∗ , y ∗ ) ≤ 0 which implies x∗ = y ∗ .

Remark 2.36. The assumption α < 1 in Theorem 2.35 is essential as


the following counterexample from Natanson9 [38, p. 571] shows. If
X = R, and T : R → R is given by T x = x + π2 − arctan x, then T has
no fixed point because π2 − arctan x > 0 ∀x ∈ R. On the other hand,
by the Mean Value Theorem, we have for all x, y ∈ R, x = y,

|T x − T y| ≤ |x − y − arctan x + arctan y|
 x − y 

= x − y − 
1 + z2
for some z between x and y
1 
= |x − y| · 1 −
1 + z2
< |x − y| ,
9
Isidor P. Natanson, Russian mathematician, 1906–1963.
58 2 Metric Spaces

so, even though the inequality is strict, α = 1 and hence T is not


a contraction. Thus, the fact that this T has no fixed point is not
surprising.
Remark 2.37. From the above proof we see that
αn
d(xn , x∗ ) ≤ d(T x0 , x0 ) ,
1−α
which gives us an approximation of x∗ .

 ◦ ·
Remark 2.38. Suppose that T k = T · · ◦T, k ≥ 2, is a contraction
k f actors
(even though T may not be), then there is a unique fixed point for T .

Proof. A fixed point of T is obviously a fixed point of T k . Conversely if


x∗ is a fixed point of T k (which exists and is unique by Theorem 2.35)
then T x∗ = T k+1 x∗ = T k (T x∗ ), so both x∗ and T x∗ are fixed points
of T k , and consequently T x∗ = x∗ .

2.6 Exercises
 n
1. Let A1 ,A2 , . . . be subsets of a metric space.
  Prove that
 Cl i=1
∞ ∞
Ai = ni=1 Cl Ai for all n ∈ N and Cl i=1 Ai ⊃ i=1 Cl Ai .
Show by an example that the latter inclusion can be proper.

2. Let A be a subset of a metric space. Do A and Cl A always have


the same interior? Do A and Int A always have the same closure?

3. Prove that if X = ∅ and d0 is the discrete metric on X, then any


subset of (X, d0 ) is open.

4. Let ∅ = A ⊂ (X, d). Prove that

p ∈ Cl A ⇔ inf {d(p, x) : x ∈ A} = 0 .

5. Let (X, d) be a metric space, ∅ = A ⊂ (X, d), and let (Y,  · )


be a Banach space. Denote

BC(A; Y ) := {f : (A, d) → (Y,  · );


f continuous and bounded}.

Prove that BC(A; Y ) is a Banach space with respect to the sup-


norm: f sup = supx∈A f (x).
2.6 Exercises 59

6. Find the accumulation points of the following subsets of R2


(equipped with the Euclidean metric):

(a) Z × Z ;
(b) Q × Q ;
(c) {( m
n , n ); m, n ∈ Z, n = 0} ;
1

(d) {( n1 + 1
m , 0); m, n ∈ Z \ {0}} .

7. Find the boundaries of the following sets:

(a) A = [0, 1] ∩ Q ;
(b) B = { n1 ; n ∈ N} ;
(c) C = {(x, y) ∈ R2 ; x2 − y 2 > 1} .

8. Let (X, d) be a linear, metric space with d defined by a norm  · 


(i.e., d(x, y) = x−y, ∀x, y ∈ X). Prove that the closure of any
open ball B(x, r) := {v ∈ X; d(v, x) < r} in (X, d) is the closed
ball B(x, r) := {v ∈ X; d(v, x) ≤ r}. Show that this property
fails to be true if X is equipped with the discrete metric d0 .

9. Show that any Cauchy sequence in a metric space can have at


most one cluster point.

10. Find the cluster points of the following sequences:


 √ 
(a) xn = sin 2π n2 + 3n , n = 1, 2, . . . ;
 √ 
(b) yn = sin π n2 + n , n = 1, 2, . . .

11. Show that B := {f ∈ C([0, 1]; R) : f (x) > 0 for all x ∈ [0, 1]}
is open in C([0, 1]; R) equipped with the metric generated by
the sup-norm. What is the closure of B in this metric (in fact
Banach) space?

12. Denote

BC(R; R) := {f : R → R; f is continuous and

f (R) is bounded} .
Let D := {f ∈ BC(R; R); f (x) > 0 for all x ∈ R}. Is D open in
BC(R; R) equipped with the sup-norm? If not, what is Int D?
What is Cl D?
60 2 Metric Spaces

13. Find an open cover of (0, 1] ⊂ (R, | · |) which has no finite sub-
cover.

14. Find a necessary and sufficient condition for a discrete subset of


a metric space (X, d) to be compact. [Recall that S ⊂ (X, d) is
discrete if all its elements are isolated].

15. If A is a nonempty compact subset of a metric space (X, d), then


A is separable (i.e., there exists a countable subset S of A, such
that A = Cl S).

16. Let A, B be nonempty subsets of a normed space (X,  · )


equipped with the topology given by the metric d defined by
d(x, y) = x − y, x, y ∈ X. We have the following:

(a) If A, B are both compact sets, then A + B := {u + v; u ∈


A, v ∈ A} is compact, too;
(b) If A is closed and B is compact, then A + B is closed, but
not necessarily compact (give a counterexample).

17. Let f : [0, ∞) → R,



sin(π(2x − 1)), x ∈ [ 12 , 1],
f (x) =
0, otherwise.

Let fn (x) = f (2n x) for x ∈ [0, 1], n ∈ N. Show that F =


{fn ; n ∈ N} is closed and bounded in C[0, 1] := C([0, 1], R)
equipped with the sup-norm, but not compact.

18. Let (X, d) be a complete metric space. If ∅ = A ⊂ X is a totally


bounded set, show that A is relatively compact.

19. Let l1 be the


 set of all sequences of real numbers a = (an )n∈N
satisfying ∞n=1 |an | < ∞. Show that

1 is a Banach space over R with respect to the norm a =


(a) l

n=1 |an |, a ∈ l .
1

(b) the set A = {a = (an )n∈N ∈ l1 ; ∞ n=1 n|an | ≤ 1} is compact
in (l ,  · ) (i.e., in (X, d), where d is the metric generated
1

by  · : d(a, b) = a − b, a, b ∈ l1 ).
2.6 Exercises 61

20. Let F be the set of all functions f : D = [0, 1] → R,




f (x) = an sin (nπx),
n=1

where a = (an )n∈N is a sequence in R satisfying ∞ n=1 n|an | ≤
1. Show that F is a compact subset of C[0, 1] := C([0, 1]; R)
equipped with the sup-norm. Does the result hold if the domain
of the f ’s is D = R?
21. Let −∞ < a < b < ∞, un ∈ C 1 ([a, b]; R), n = 1, 2, . . . , such that
(un )n∈N and (un )n∈N are bounded in Lp ([a, b], R), p ∈ (1, ∞),
equipped with the usual norm. Show that (un ) has a subsequence
which is convergent in C([a, b]; R) with respect to the sup-norm.
(Information on Lp spaces is available in Chap. 3 below.)
22. Let (X, d), (Y, ρ) be metric spaces, and let F ⊂ C(A; Y ), where
∅ = A ⊂ X. If A is compact (with respect to d) and F is an
equicontinuous family, then F is uniformly equicontinuous.
23. For a ∈ R consider fa : [0, 1] → R, fa (x) = 1+ax2 x2 . Show that
F = {fa ; a ∈ R} is relatively compact in C[0, 1] := C([0, 1]; R)
equipped with the sup-norm, but not compact.
24. (a) Prove Gronwall’s lemma, namely given
t
u(t) ≤ a(t) + b(s)u(s) ds, t ∈ I = [t0 , T ],
t0

where u, a, b : I → R are all continuous functions and b ≥ 0,


then
t t
u(t) ≤ a(t) + a(s)b(s)e s b(τ )dτ ds ∀t ∈ I.
t0

In particular, prove Bellman’s lemma, which states that in


the case a is a constant function, i.e., a(t) = C ∀t ∈ I, then
t
b(s)ds
u(t) ≤ Ce t0 , t ∈ I.

(b) Let x = x(t) : [t0 −δ, t0 +δ] → Rk be a solution given by The-


orem 2.31 (Peano’s Theorem). Assume (in addition to conti-
nuity on D) that f satisfies the Lipschitz condition (2.4.18).
Use Bellman’s lemma to prove that x is the unique solution
of the corresponding Cauchy problem.
62 2 Metric Spaces

25. Prove that if


t
1 1
x(t)2 ≤ c2 + f (s)x(s) ds ∀t ∈ I = [t0 , T ] ,
2 2 t0

where c ∈ R, f, x ∈ C(I) := C(I; R), f ≥ 0 for all t ∈ I, then


t
|x(t)| ≤ |c| + f (s) ds ∀t ∈ I .
t0

26. Show that the following Cauchy problem in R

x(t)2
x (t) = 1 + t2 + ; x(0) = 0 ,
1 + x(t)2
has a unique solution defined on R.

27. Do the same for the Cauchy problem


 
x (t) = 2e−t + ln 1 + x(t)2 ; x(0) = 0 .
2

28. Show that the Cauchy problem



x (t) = 2 |x(t)|, t ∈ R; x(0) = 0,

has infinitely many solutions defined on R.

29. Show that for every x0 ∈ R the Cauchy problem


 
x (t) = 1 + t 1 + x(t)2 , t ≥ 0; x(0) = x0 ,

has a unique solution defined on a bounded interval.

30. Show that the Cauchy problem

x (t) = t2 + x(t)2 , x(0) = 0 ,

has
√ a solution whose maximal interval is (−T, T ), with
2/2 < T < ∞.

31. Let ∅ = Ω ⊂ Rk+1 , k ≥ 2, be an open set, and let f : Ω → Rk be


a continuous function. Then, for any (t0 , x0 ) ∈ Ω, the Cauchy
problem

(CP ) x (t) = f (t, x(t)), x(t0 ) = x0 ,


2.6 Exercises 63

has at least one solution defined on an interval around t0 . If, in


addition, f satisfies the condition: ∀ compact K ⊂ Ω, ∃LK > 0
such that ∀(t, u), (t, v) ∈ K,

f (t, u) − f (t, v) ≤ LK u − v ,

where  ·  is a norm of Rk , then the (local) solution of (CP ) is


unique.

32. Consider in an interval I ⊂ R the Cauchy problem

x (t) = A(t)x(t) + b(t), t ∈ I ,


x(t0 ) = x0 ,

where t0 ∈ I, x0 = (x01 , x02 , . . . , x0k )T ∈ Rk , A(t) = (aij (t))


is a k × k-matrix, and b(t) = (b1 (t), . . . , bk (t))T with aij , bj ∈
C(I) := C(I; R), i, j = 1, 2, . . . , k. Show that the above Cauchy
problem has a unique solution on the whole interval I.

33. Let T : B(0, 1) → B(0, 1) be a map satisfying

∀x, y ∈ B(0, 1), d2 (T x, T y) ≤ d2 (x, y) ,

where B(0, 1) is the closed unit ball of (Rk , d2 ), and d2 is the


Euclidean metric. Show that T has at least one fixed point.

34. Prove that for every f ∈ C[0, 1] := C([0, 1]; R) and α ∈ (0, 1)
the integral equation
1
 
x(t) = f (t) + e−ts cos αx(s) ds, t ∈ [0, 1]
0

has a unique solution x ∈ C[0, 1].

35. Let (X,  · ) be a Banach space and let f : [0, ∞) × X → X be


a continuous function satisfying

f (t, x1 ) − f (t, x2 ) ≤ a(t)x1 − x2 , t ∈ [0, ∞), x1 , x2 ∈ X ,

where a ∈ C([0, ∞); R). Show that the Cauchy problem

x (t) = f (t, x(t)), t ≥ 0; x(0) = x0 ,

has a unique solution x ∈ C 1 ([0, ∞); X).


Chapter 3

The Lebesgue Integral


and Lp Spaces

In this chapter we discuss Lebesgue1 measurable sets, Lebesgue mea-


surable functions, Lebesgue integration, and Lp spaces. These spaces,
equipped with appropriate norms, are significant examples of Banach
spaces.

3.1 Measurable Sets in Rk


Here we essentially follow [46]. First of all, for any closed cube C ⊂ Rk ,

C = [a1 , b1 ] × [a2 , b2 ] × · · · × [ak , bk ],

where bi − ai = c > 0, i = 1, 2, . . . , k, we denote v(C) := ck (which is


called the volume of C).
A collection of cubes in Rk is said to be almost disjoint if the interiors
of the cubes are disjoint.
It is easily seen that every open set D ⊂ Rk (equipped with the usual
norm topology) can be written as a countable union of almost disjoint
closed cubes: D = ∪∞ j=1 Cj . To prove this, consider a grid in R of
k

closed cubes of side length 1/n, with n sufficiently large, retaining the
cubes of the grid that are completely contained in D. Then, we bisect
each cube of the above grid into 2k cubes with side length 1/(2n) and

1
Henri Léon Lebesgue, French mathematician, 1875–1941.

© Springer Nature Switzerland AG 2019 65


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 3
66 3 The Lebesgue Integral and Lp Spaces

retain those new cubes that are contained in D. Thus, repeating in-
definitely the procedure, we construct a countable collection of almost
disjoint closed cubes whose union equals D, as claimed.
Now, for any set M ⊂ Rk , we define the exterior measure of M by


me (M ) = inf v(Cj ),
j=1

where the infimum is taken over all countable covers of M , ∪∞


j=1 Cj
⊃ M with closed cubes Cj .

Some Remarks on the Exterior Measure

(a) Obviously, the exterior measure of a singleton is zero, and


me (∅) = 0.

(b) If M1 ⊂ M2 ⊂ Rk , then me (M1 ) ≤ me (M2 ).

(c) If C is a closed cube in Rk , then me (C) = v(C).


Indeed, we clearly have me (C) ≤ v(C), and in order to prove
the converse inequality it suffices to show that for any cover by
closed cubes ∪∞j=1 Cj ⊃ C, we have



v(C) ≤ v(Cj ). (3.1.1)
j=1

Let ε > 0 be arbitrary but fixed. Choose for each j an open cube
Cj ⊃ Cj such that v Cj ≤ (1 + ε)v(Cj ). Since {Cj }∞ j=1 is an
open cover of the compact set C, there exists a finite subcover
{Cj 1 , . . . , Cj m }, C ⊂ ∪m 
i=1 Cji . It follows that


m ∞

v(C) ≤ (1 + ε) v(Cji ) ≤ (1 + ε) v(Cj ).
i=1 j=1

As ε was arbitrarily chosen, this implies (3.1.1).

(d) If C is an open cube in Rk , then me (C) = v(C̄).

(e) If M = ∪∞
j=1 Mj , then



me (M ) ≤ me (Mj ). (3.1.2)
j=1
3.1 Measurable Sets in Rk 67

We can assume me (Mj ) < ∞ for all j ∈ N, otherwise the in-


equality is trivially satisfied. For arbitrary ε > 0 we can choose
for each j a cover by closed cubes Mj ⊂ ∪∞ q=1 Cj,q such that


 ε
v(Cj,q ) < me (Mj ) + .
2j
q=1

Then, M ⊂ ∪∞
j,q=1 Cj,q , hence



me (M ) ≤ v(Cj,q )
j,q=1
∞
 ε
≤ me (Mj ) +
2j
j=1


= me (Mj ) + ε,
j=1

which implies (3.1.2).

(f) For every M ⊂ Rk , we have

me (M ) = inf{me (D); D open, D ⊃ M }.

Clearly,

me (M ) ≤ inf{me (D); D open, D ⊃ M }.

For the converse inequality, let ε > 0 and choose a cover of M


by closed cubes, M ⊂ ∪∞j=1 Cj , such that


 ε
v(Cj ) < me (M ) + .
2
j=1

Choose for every j an open cube Cj , such that Cj ⊂ Cj and

  ε
v Cj ≤ v(Cj ) + .
2j+1
68 3 The Lebesgue Integral and Lp Spaces

Then, denoting D = ∪∞  
j=1 Cj , we have that D is an open set
and by (e)

 ∞


   
me (D ) ≤ me Cj = v Cj
j=1 j=1
∞
 ε 
≤ v(Cj ) +
2j+1
j=1

 ε
= v(Cj ) +
2
j=1
< me (M ) + ε.

Hence, inf{me (D); D open, D ⊃ M } ≤ me (M ), as claimed.

(g) If M is a countable union∞ of almost disjoint closed cubes, M =


∪∞ C
j=1 j , then me (M ) = j=1 v(Cj ).

Indeed, by (c) and (e), me (M ) ≤ ∞ j=1 v(Cj ), and for the con-
verse inequality we consider, for a fixed m ∈ N and an arbitrary
but fixed ε, closed cubes C̃j ⊂ Int(Cj ), j = 1, . . . , m, such that
ε
v(Cj ) < v(C̃j ) + , j = 1, . . . , m.
2j
Then,


m 
m
me (M ) ≥ me (∪m
j=1 C̃j ) = v(C̃j ) ≥ v(Cj ) − ε,
j=1 j=1
∞
which implies me (M ) ≥ j=1 v(Cj ).

Definition 3.1. A set M ⊂ Rk is Lebesgue measurable (or simply


measurable) if for every ε > 0 there exists an open set D such that
D ⊃ M and me (D \ M ) < ε. If M is measurable, we define the
Lebesgue measure (or measure) of M by m(M ) := me (M ).

Some Properties of Measurable Sets

(A) It follows from the above definition that every open set is mea-
surable.
3.1 Measurable Sets in Rk 69

(B) If me (M ) = 0, then M is measurable and m(M ) = 0.


Indeed, we know (see (f) above) that
0 = me (M ) = inf{me (D); D open, D ⊃ M },
so for any ε > 0 there exists an open set Dε such that Dε ⊃ M
and me (Dε ) < ε. As Dε \ M ⊂ Dε , we have me (Dε \ M ) < ε.
(C) If M = ∪∞j=1 Mj , where each Mj is measurable, then M is mea-
surable.
Indeed, for a given ε > 0, we can choose for each j an open set
Dj , Dj ⊃ Mj , such that me (Dj \Mj ) < ε/2j . Hence D = ∪∞
j=1 Dj
is open, D ⊃ M and D \ M ⊂ ∪ ∞ (D \ M ) =⇒ m (D \ M ) ≤
∞ j=1 j j e
j=1 m e (D j \ M j ) < ε.

(D) If K ⊂ Rk is a compact set, then K is measurable.


Since K is compact, hence bounded, we have me (K) < ∞. For
any ε > 0 there exists an open set D, D ⊃ K, such that me (D) <
me (K) + ε/2 (cf. (f)). The open set D \ K can be written as a
countable union of almost disjoint closed cubes: D\K = ∪∞j=1 Cj .
Now, for a given p ∈ N, K1 = ∪pj=1 Cj is a compact set with
K1 ∩ K = ∅, K ∪ K1 ⊂ D, and
me (D) ≥ me (K ∪ K1 )
= me (K) + me (K1 )
p
= me (K) + v(Cj ),
j=1

which implies that



p
ε
v(Cj ) ≤ me (D) − me (K) < ,
2
j=1

hence


me (D \ K) ≤ me (∪∞
j=1 Cj ) ≤ me (Cj )
j=1

 ε
= v(Cj ) ≤ < ε,
2
j=1

so K is indeed measurable. It follows that


70 3 The Lebesgue Integral and Lp Spaces

(D1) any closed set F ⊂ Rk is measurable.


Indeed, F can be written as a countable union of compact sets,

F = ∪∞
n=1 F ∩ B(0, n),

so the assertion follows from (C) and (D).

(E) If M ⊂ Rk is measurable, then Rk \ M is also measurable.


To prove this, observe first that for all n ∈ N there exists an
open set Dn such that M ⊂ Dn and me (Dn \ M ) < 1/n. Since
Rk \ Dn is a closed set, it is measurable, hence E := ∪∞
n=1 (R \
k

Dn ) is also measurable (cf. (C)). We have E ⊂ R \ M and


k

Rk \(M ∪E) ⊂ Dn \M , hence me (Rk \(M ∪E)) < 1/n. Therefore


me (Rk \ (M ∪ E)) = 0, so Rk \ (M ∪ E) is measurable (cf. (B)).
Since
Rk \ M = [Rk \ (M ∪ E)] ∪ E,
we conclude by (C) that Rk \ M is measurable, as claimed.

(F) Any countable intersection of measurable sets is also a measur-


able set.
This follows easily from ∩∞ ∞
j=1 Mj = R \ [∪j=1 (R \ Mj )] (see also
k k

(C) and (E)).

Now let us state an important result related to measurable sets:


Theorem 3.2. If {Mn }∞ ∞is any collection of disjoint measurable
n=1
sets, then m(∪∞ M
n=1 n ) = n=1 m(Mn ).

Proof. In a first stage, we assume that each Mn is bounded. Let ε > 0


be arbitrary but fixed. Since Rk \ Mn is measurable, for any n ∈ N
there exists a closed set Fn ⊂ Mn such that me (Mn \ Fn ) < ε/2n . For
each fixed p ∈ N, F1 , . . . , Fp are compact and disjoint, and, denoting
M = ∪∞ n=1 Mn , we have


p 
p
m(M ) ≥ m(∪pn=1 Fn ) = m(Fn ) ≥ m(Mn ) − ε,
n=1 n=1
 
which implies m(M ) ≥ pn=1 m(Fn ) ≥ pn=1 m(Mn ). This concludes
the proof in the case when each Mn is bounded, since the converse
inequality is also satisfied. In the general case, we consider the closed
3.2 Measurable Functions 71

cubes Ci centered at the origin with side length i ∈ N and define


Mn,1 = Mn ∩ C1 , Mn,i = Mn ∩ (Ci \ Ci−1 ), i = 2, 3, . . . Then
Mn = ∪i Mn,i , M = ∪n,i Mn,i ,
so, as each Mn,i is bounded, we can use what we obtained above to
write
   
m(M ) = m(Mn,i ) = m(Mn,i ) = m(Mn ).
n,i n i n

Remark 3.3. There are subsets of Rk which are not Lebesgue measur-
able. See, for example, [46, p. 24].
Remark 3.4. Denote by A the collection of all measurable subsets of
Rk . According to the usual terminology, as ∅ ∈ A and (E) and (C)
hold, the pair (Rk , A) is a σ-algebra. As the Lebesgue measure m is
a nonnegative function on A satisfying m(∅) = 0 and Theorem 3.2,
the triple (Rk , A, m) is a measure space. This definition of a measure
space can be also used for sets other than Rk . In particular, if Ω ⊂ Rk
is a Lebesgue measurable set, and define B = {B ∩ Ω; B ∈ A}, then
(Ω, B, m) is a measure space (where m is the restriction to B of the
Lebesgue measure defined above).

3.2 Measurable Functions


In what follows we consider the measure space (Rk , A, m) defined in
the previous section. Note that similar considerations apply to any
other measure spaces. Assume that R = R1 is equipped with the
usual topology.
Definition 3.5. A function f : Rk → R is called measurable if for
all λ ∈ R the set {f > λ} := {x ∈ Rk ; f (x) > λ} is measurable (i.e.,
it belongs to A).
Remark 3.6. Equivalent definitions are obtained if the set {f > λ} is
replaced by {f ≥ λ}, {f < λ}, or {f ≤ λ}, λ ∈ R. Indeed, if {f > λ}
is measurable for all λ ∈ R then so is
{f ≥ λ} = Rk \ ∩∞
n=1 {f > λ − 1/n} ∀λ ∈ R,

hence so is
{f < λ} = Rk \ {f ≥ λ} ∀λ ∈ R,
and so on (the other implications are trivially satisfied).
72 3 The Lebesgue Integral and Lp Spaces

Theorem 3.7. f : Rk → R is measurable if and only if for every open


set D ⊂ R the set f −1 (D) := {x ∈ Rk ; f (x) ∈ D} is measurable.

Proof. The set D = (λ, ∞) is open for any λ ∈ R. If f −1 (D) is


assumed to be measurable for any λ ∈ R, then f is measurable, since
f −1 (D) = {f > λ}. Conversely, let us assume that f is measurable.
If ∅ = D ⊂ R is an open set then it can be represented as a countable
union of disjoint open intervals. Indeed, for x ∈ D denote by I(x)
the maximal open interval containing x and included into D. If x, y
are distinct points in D, then I(x), I(y) either coincide or are disjoint.
Obviously, D = ∪x∈D I(x). Since each I(x) contains a rational number,
the number of distinct I(x) must be countable so D = ∪∞ n=1 In . Since
f is measurable, we have f −1 (In ) ∈ A for all n ∈ N, which implies
f −1 (D) = ∪∞n=1 f
−1 (I ) ∈ A.
n

We say that a property (P ) holds almost everywhere (abbreviated a.e.)


in Ω ⊂ Rk if it holds in Ω \ E with m(E) = 0; in other words, (P )
holds for almost all (abbreviated a.a.) x ∈ Ω.

Theorem 3.8. Let f, g : Rk → R. If f is measurable and g = f a.e.,


then g is also measurable.

Proof. Denote E = {g = f }. We have for any λ ∈ R,

{g > λ} ∪ E = {f > λ} ∪ E ∈ A,

hence G := {g > λ} ∪ E ∈ A. Since {g > λ} differs from G by a set of


measure zero, it follows that {g > λ} ∈ A.

Observe that the equality a.e. is an equivalence relation in the set of


all measurable functions.

Theorem 3.9. If f : Rk → R is measurable and g : R → R is


continuous, then g ◦ f is measurable.

Proof. As g is a continuous function, for any open set D ⊂ R, g −1 (D) is


open, too. Hence, as f is measurable, we conclude that (g ◦ f )−1 (D) =
f −1 (g −1 (D)) is measurable for any open set D ⊂ R.

Remark 3.10. It follows from the above result that, if f is measurable,


then so are the functions λf (λ ∈ R), |f |p (p > 0), f + = max{f, 0},
f − = − min{f, 0}, etc.
3.2 Measurable Functions 73

Theorem 3.11. If f, g are measurable, then so are f + g and f g. If,


in addition, g = 0 a.e., then f /g is measurable.
Proof. For any λ ∈ R we have
 
{f + g > λ} = ∪q∈Q {f > q > λ − g} = {f > q} ∩ {g > λ − q} ,
q∈Q

where Q is the set of rational numbers. It follows that f + g is mea-


surable. The function f g is also measurable since
1
f g = (f + g)2 − (f − g)2 .
4
In order to prove the last statement, it suffices to prove that 1/g is
measurable. This follows from
{1/g > λ} = ({g > 0} ∩ {λg < 1}) ∪ ({g < 0} ∩ {λg > 1}).
Theorem 3.12. If (fn )n∈N is a sequence of measurable functions,
then all of supn∈N fn , inf n∈N fn , lim supn→∞ fn , and lim inf n→∞ fn
are measurable. In particular, if fn → f a.e. then f is measurable.
Proof. For any λ ∈ R we have {supn∈N fn > λ} = ∪n∈N {fn > λ} which
implies that supn∈N fn is measurable. The function inf n∈N fn is also
measurable since it is equal to − supn∈N (−fn ). The other statements
follow from
lim sup fn = inf {sup fi }, lim inf fn = sup{inf fn },
n→∞ i n≥i n→∞ i n≥i

which coincide a.e. with f = lim fn when fn → f a.e.


Now, let us recall the definition of the characteristic function of a set
E, denoted χE , 
1 if x ∈ E,
χE (x) =
0 if x ∈ / E.
Let E ⊂ Rk . It is easily seen that χE is measurable if and only if E is
measurable.
Definition 3.13. A function f : Rk → R is called a simple function
if it has the form
 p
f (x) = yi χMi (x), (3.2.3)
i=1
where p ∈ N, yi ∈ R, i = 1, . . . , p, and the Mi ’s are disjoint, measurable
subsets of Rk , with m(Mi ) < ∞, i = 1, . . . , p.
74 3 The Lebesgue Integral and Lp Spaces

Any simple function is measurable, as a finite linear combination of


characteristic functions of measurable sets. Normally, in the above
definition y1 , . . . , yp are distinct numbers.
Theorem 3.14. If f : Rk → R is a measurable function, then there
exists a sequence of simple functions (fn )n∈N such that

|fn (x)| ≤ |fn+1 (x)|, x ∈ Rk , n = 1, 2, . . . (3.2.4)

and
lim fn (x) = f (x), x ∈ Rk . (3.2.5)
n→∞
If, in addition, f ≥ 0, n = 1, 2, . . . , then one can find fn ≥ 0, n =
1, 2, . . .
Proof. We assume first that f is a nonnegative measurable function.
For a given n ∈ N, define the following subsets of Rk
j−1 j
Mj = { ≤ f < n }, j = 1, 2, . . . , n2n , and Pn = {f ≥ n},
2n 2
which are all measurable. Let

n2n
j−1
gn (x) = χMj (x) + nχPn (x), x ∈ Rk , n = 1, 2, . . .
2n
j=1

It is easily seen that 0 ≤ gn ≤ gn+1 and 0 ≤ f (x) − gn (x) ≤ 1/2n ,


whenever f (x) ≤ n, hence gn → f . Thus, the sequence (gn ) satisfies
all the properties for a sequence (fn ) mentioned in the statement of
the theorem, except for m(Pn ) < ∞, m(Mj ) < ∞ for all n ∈ N,
j = 1, 2, . . . , n2n (see Definition 3.13). This inconvenience can be
easily removed as follows. For any n ∈ N, consider the closed cube Cn
centered at the origin with side length n and define

fn (x) = gn (x)χCn (x)



n
n2
j−1
= χMj ∩Cn (x) + nχPn ∩Cn (x), x ∈ Rk .
2n
j=1

It is easily seen that (fn ) satisfies all the desired properties, including

0 ≤ fn (x) ≤ fn+1 (x), x ∈ Rk , n = 1, 2, . . .

For a general measurable function f one can use the decomposition


f = f + − f − , which implies |f | = f + + f − . Since f + and f − are both
3.3 The Lebesgue Integral 75

measurable and nonnegative, it follows from the proof above that there
exist sequences (fn+ ) and (fn− ) that satisfy the properties mentioned
above and approximate f + and f − , respectively. Then (fn = fn+ −fn− )
is a sequence of simple functions satisfying (3.2.4) and (3.2.5).

Remark 3.15. Taking into account Theorems 3.12 and 3.14, one can
say that a function f : Rk → R is measurable (in the sense of Defini-
tion 3.5) if and only if f is the limit of a sequence of simple functions
(fn ), i.e., fn (x) → f (x), as n → ∞, for a.a. x ∈ Rk . This equivalent
condition can be used to define the notion of an X-valued measurable
function, where X is a Banach space.

3.3 The Lebesgue Integral


If f : Rk → R is a simple function as in (3.2.3), the Lebesgue integral
of f is defined by
p
f (x) dx := m(Mi ) · yi . (3.3.6)
Rk i=1

If Ω is a measurable subset of Rk then g = f χΩ is also a simple function


and we define
f (x) dx := f (x)χΩ (x) dx .
Ω Rk
Denote by S the set of all simple functions f : Rk → R. It is easily seen
that S is a linear space over R with respect to the usual operations:
addition of functions and scalar multiplication. We have the following
statements:
  
• Rk (αf + βg) dx = α Rk f dx + β Rk g dx ∀f, g ∈ S, α, β ∈ R;
 
• f, g ∈ S, f ≤ g =⇒ Rk f dx ≤ Rk g dx;
• If Ω1 , Ω2 ⊂ Rk are disjoint measurable sets with m(Ωi ) < ∞,
i = 1, 2, then

f dx = f dx + f dx ;
Ω1 ∪Ω2 Ω1 Ω2

• If f ∈ S, then so is |f | and

| f dx| ≤ |f | dx .
Rk Rk
76 3 The Lebesgue Integral and Lp Spaces

The proofs are easy and are left to the reader.


In what follows we are concerned with the Lebesgue integration of non-
negative measurable functions. Denote by S + the set of all nonnegative
simple functions f : Rk → R (i.e., functions of the form (3.2.3), where
each yi ≥ 0).
Definition 3.16. A nonnegative measurable function f : Rk → R is
called integrable in the sense of Lebesgue (or simply integrable) if

sup{ s dx; s ∈ S + , s ≤ f } < +∞ ,
Rk

and denote

f dx := sup{ s dx; s ∈ S + , s ≤ f }.
Rk Rk
 
If sup{ Rk s dx; s ∈ S + , s ≤ f } = ∞, we write Rk f dx = ∞.
Note that if f is a nonnegative simple function, i.e., a function of the
 with yi ≥ 
form (3.2.3) 0, i = 1, . . . , p, then using this definition we
reobtain Rk f (x) dx = pi=1 m(Mi ) · yi .
If f : Rk → R is a nonnegative
 integrable function and Ω ⊂ R is a
k

measurable set, then Ω f dx := Rk f χΩ dx.


We have the following immediate statements for f, g : Rk → R non-
negative measurable functions and α ≥ 0:
 
• f ≤ g =⇒ Rk f dx ≤ Rk g dx ;

• If Ω1 ⊂ Ω2 ⊂ Rk are measurable sets, with Ω1 ⊂ Ω2 , then


Ω1 f dx ≤ Ω2 f dx ; We also have:

• If f : Rk → R is a nonnegative measurable function, then: f = 0


a.e. if and only if Rk f dx = 0.
Proof. Observe first that if f = 0 a.e., then for any s ∈ S + , with
s ≤ f , we have s = 0 a.e., so Rks dx = 0. Therefore Rk f dx =
0. Conversely, let us assume that Rk f dx = 0. Define Ωn = {x ∈
Rk ; f (x) ≥ 1/n}, n ∈ N. We have for all n ∈ N

1 1
0= f dx ≥ χΩn dx = m(Ωn ) .
R k R k n n
So m(Ωn ) = 0 for all n ∈ N =⇒ m({f > 0}) = m(∪∞
n=1 Ωn ) = 0 =⇒
f = 0 almost everywhere.
3.3 The Lebesgue Integral 77

Let us now state the so-called Monotone Convergence Theorem or


Beppo Levi’s theorem.2
Theorem 3.17 (Monotone Convergence Theorem). Let 0 ≤ f1 ≤
f2 ≤ · · · ≤ fn ≤ · · · be a sequence of measurable functions. Denote
f (x) := limn→∞ fn (x). Then

lim fn dx = f dx .
n→∞ Rk Rk

Proof. Obviously, there exists



lim fn dx ≤ f dx .
n→∞ Rk Rk

In order to prove the converse inequality, let s ∈ S + , s ≤ f , and let


ε ∈ (0, 1). Define Mn = {x ∈ Rk ; fn (x) ≥ εs(x)}, n ∈ N. We have
Rk = ∪ ∞n=1 Mn . Indeed, if x ∈ R and f (x) = 0, then s(x) = 0, so
k

x ∈ M1 . If f (x) > 0, then f (x) > εs(x), hence x ∈ Mn for n large


enough.
Next,
fn dx ≥ fn dx ≥ ε s dx .
Rk Mn Mn
Since Mn ⊂ Mn+1 for all n ∈ N, the last inequality implies

lim fn dx ≥ ε s dx ,
n→∞ Rk Rk

hence, as ε ∈ (0, 1) was arbitrary,



lim fn dx ≥ s dx ∀s ∈ S + , s ≤ f .
n→∞ Rk Rk

This implies
lim fn dx ≥ f dx ,
n→∞ Rk Rk
as claimed.

Remark 3.18. Combining Theorems 3.14 and 3.17, we infer that for
any nonnegative integrable function f : Rk → R, there exists an in-
creasing
 sequence
 (sn )N in S + such that sn → f pointwise (or a.e.) and
Rk sn dx → Rk f dx. Using this observation, one can readily deduce
that
2
Beppo Levi, Italian mathematician, 1875–1961.
78 3 The Lebesgue Integral and Lp Spaces

• if f, g : Rk → R are nonnegative integrable functions, then so is


f + g and

(f + g) dx = f dx + g dx .
Rk Rk Rk

We also have
 
• Rk αf dx = α Rk f dx ∀α ≥ 0 .

The next result is known as Fatou’s lemma.3

Theorem 3.19. Let fn : Rk → R be a sequence of nonnegative mea-


surable functions. Set f = lim inf n→∞ fn . Then,

f dx ≤ lim inf fn dx . (3.3.7)
Rk n→∞ Rk

Proof. Denote gn = inf m≥n fm , n ∈ N. Since (gn ) is an increasing


sequence, we have
f = sup gn = lim gn .
n∈N n→∞

By the Monotone Convergence Theorem we have



lim gn dx = f dx . (3.3.8)
n→∞ Rk Rk

On the other hand, since gn ≤ fn , n ∈ N, we have



gn dx ≤ fn dx, n ∈ N . (3.3.9)
Rk Rk

Combining (3.3.8) and (3.3.9) yields (3.3.7).

Now, we are going to define the Lebesgue integral for a general measur-
able function f : Rk → R. One can use the decomposition f = f + −f − .
Obviously, f is measurable if and only if both f + and f − are measur-
able.

Definition 3.20. A measurable function f : Rk → R is called inte-


grable if both f + and f − are integrable and

f dx := f dx −
+
f − dx .
Rk Rk Rk
3
Pierre Joseph Louis Fatou, French mathematician, 1878–1929.
3.3 The Lebesgue Integral 79

Denote by L(Rk ) the set of all (measurable and) integrable functions


f : Rk → R.
One can prove by elementary arguments the following statements:

• If f : Rk → R is measurable, then so is |f | and

f ∈ L(Rk ) ⇐⇒ |f | ∈ L(Rk ) ;

• If f, g : Rk → R are measurable, g ∈ L(Rk ) and |f | ≤ g, then


f ∈ L(Rk ) ;

• If f ∈ L(Rk ) and α ∈ R, then αf ∈ L(Rk ) and



αf dx = α f dx ;
Rk Rk

We also have

• If f, g ∈ L(Rk ), then f + g ∈ L(Rk ) and



(f + g) dx = f dx + g dx .
Rk Rk Rk

Proof. Assume f, g ∈ L(Rk ). Then f + , f − , g + , g − , f + g, (f + g)+ ,


(f +g)− are measurable, and f + , f − , g + , g − ∈ L(Rk ). From (f +g)+ ≤
f + +g + and (f +g)− ≤ f − +g − we infer that (f +g)+ , (f +g)− ∈ L(Rk ),
which implies f + g ∈ L(Rk ). On the other hand,

(f + g)+ − (f + g)− = f + g = f + − f − + g + − g − ,

so
(f + g)+ + f − + g − = (f + g)− + f + + g + ,
which involves only nonnegative integrable functions. Hence,


+
(f + g) dx + f dx + g − dx
Rk Rk Rk

− +
= (f + g) dx + f dx + g + dx,
Rk Rk Rk

which gives the desired equality.

• Let f, g : Rk → R be such that f ∈ L(Rk ) and g = f a.e. Then,


g ∈ L(Rk ) and Rk g dx = Rk f dx.
80 3 The Lebesgue Integral and Lp Spaces

Proof. From g = f a.e. we derive g + = f + ≥ 0 a.e and g − = f − ≥ 0


a.e., so

g + dx = f + dx, g − dx = f − dx ,
Rk Rk Rk Rk

and the result follows.


 
• If f, g ∈ L(Rk ) and f ≤ g a.e., then Rk f dx ≤ Rk g dx.
The proof is easy.
• For every f ∈ L(Rk ) we have
 
 
 
 k f dx ≤ |f | dx .
R Rk

Proof. We know that f ∈ L(Rk ) ⇒ |f | ∈ L(Rk ). We have



f dx = f + dx − f − dx
R
R R
k k k

≤ +
f dx + f − dx
R R
k k

= |f | dx .
Rk

Similarly,
− f dx ≤ |f | dx ,
Rk Rk
so the result follows.

Theorem 3.21. Let f ∈ L(Rk ). Then, for every ε > 0 there exists
δ > 0, such that for every measurable set M ⊂ Rk with m(M ) < δ, we
have M |f | dx < ε.
Proof. For n ∈ N define

|f (x)| if |f (x)| ≤ n,
gn (x) =
n if |f (x)| > n.

Observe that, for every n ∈ N, 0 ≤ gn ≤ |f |, so gn ∈ L(Rk ). Moreover,


(gn ) is an increasing sequence converging pointwise to |f |. By Beppo
Levi,
lim gn dx = |f | dx ,
n→∞ Rk Rk
3.3 The Lebesgue Integral 81

so, for a given ε > 0, there exists an N ∈ N such that



ε
(|f | − gN ) dx < . (3.3.10)
Rk 2

Choosing δ = ε/(2N ), we have



ε
∀M ∈ A with m(M ) < δ, gN dx ≤ N dx = N m(M ) < .
M M 2
(3.3.11)
Now, we derive from (3.3.10) and (3.3.11),

|f | dx = (|f | − gN ) dx + gN dx < ε .
M M M

Recall that the equality a.e. is an equivalence relation in the linear


space of measurable functions, in particular in L(Rk ). Denote by
L1 (Rk ) the quotient space L(Rk )/∼, where ∼ stands for the equiv-
alence relation we are talking about. In general, any equivalence class
in L1 (Rk )/∼ is identified with a representative of the corresponding
class, which is usually selected to be the most regular one. If Ω ⊂ Rk
is a measurable set, we can similarly define L1 (Ω) := L(Ω)/∼. Based
on this identification, we can say that the above theory works for func-
tions (in fact classes of functions) belonging to L1 (Rk ) or to L1 (Ω).
The next result is known as Lebesgue’s Dominated Convergence The-
orem.

Theorem 3.22 (Lebesgue’s Dominated Convergence Theorem). Let


Ω ⊂ Rk be a measurable set, possibly Ω = Rk . Let (fn )n∈N be a
sequence in L1 (Ω) such that

(a) fn (x) → f (x) a.e. on Ω;

(b) ∃g ∈ L1 (Ω) such that |fn (x)| ≤ g(x) a.e. on Ω.



Then, f ∈ L1 (Ω) and limn→∞ Ω |fn (x) − f (x)| dx = 0.

Proof. According to (a), f is measurable. Passing to the limit in (b)


we get |f | ≤ g a.e., so f ∈ L1 (Ω). Set hn := |fn − f |. We have hn → 0
a.e. on Ω and hn ≤ g̃ := g + |f | ∈ L1 (Ω). Applying Fatou’s lemma to
the sequence (g̃ − hn ), we get

g̃ dx ≤ lim inf (g̃ − hn ) dx = g̃ dx − lim sup hn dx,
Ω n→∞ Ω Ω n→∞ Ω
82 3 The Lebesgue Integral and Lp Spaces

which implies
lim sup hn dx ≤ 0 .
n→∞ Ω
Thus
lim hn dx = 0 .
n→∞ Ω

3.4 Lp Spaces
Throughout this section Ω denotes a measurable subset of Rk (possibly
Ω = Rk ). As usual, any class of measurable functions with respect to
the equality a.e. will be identified with one of its representatives.
We have already defined the space L1 (Ω) as being the set of all func-
tions f : Ω → R which are integrable over Ω, i.e., f is measurable and
Ω |f | dx < ∞. This definition can be extended as follows:

Lp (Ω) := {f : Ω → R; f is measurable and |f |p ∈ L1 (Ω)} ,

for 1 ≤ p < ∞. We also define


L∞ (Ω) := {f : Ω → R; f is measurable and there exists C ≥ 0
such that |f (x)| ≤ C a.e. on Ω}.
It is easily seen that, for every 1 ≤ p ≤ ∞, Lp (Ω) is a linear space
over R.
Now, for 1 < p < ∞ denote by q the conjugate of p, i.e.,
1 1
+ = 1.
p q
Recall the so-called Young’s inequality
a p bq
ab ≤ + . (3.4.12)
p q
This inequality follows from the fact that the log function is concave
on (0, ∞), so
1 1  1 1
log ap + bq ≥ log ap + log bq = log(ab) .
p q p q
Now, we set for 1 ≤ p < ∞
1/p
f Lp (Ω) := |f (x)|p dx ∀f ∈ Lp (Ω) ,
Ω
3.4 Lp Spaces 83

and

f L∞ (Ω) := inf{C; |f (x)| ≤ C a.e. on Ω} ∀f ∈ L∞ (Ω) .

We are going to prove that these are norms. To this purpose, we need
the following auxiliary result which is known as Hölder’s inequality.4
Lemma 3.23 (Hölder’s Inequality). Let 1 < p < ∞. If f ∈ Lp (Ω)
and g ∈ Lq (Ω), then f g ∈ L1 (Ω) and

|f g| dx ≤ f Lp (Ω) gLq (Ω) , (3.4.13)
Ω

where q is the conjugate of p.


Proof. If f = 0 a.e. on Ω, then (3.4.13) is trivially satisfied, so we can
assume f Lp (Ω) > 0. By Young’s inequality we have
1 p 1 q
|f g| ≤ |f | + |g| a.e. on Ω .
p q
This shows that f g ∈ L1 (Ω) and

1 1
|f g| dx ≤ f pLp (Ω) + gqLq (Ω) .
Ω p q
By replacing in this inequality f by αf with α > 0, we obtain

αp−1 1
|f g| dx ≤ f pLp (Ω) + gqLq (Ω) ,
Ω p αq
q/p
whose right-hand side achieves its minimum for α = gLq (Ω) /f Lp (Ω) ,
thus (3.4.13) follows.

Theorem 3.24.  · Lp (Ω) is a norm in Lp (Ω) for all 1 ≤ p ≤ ∞.


Proof. The result is trivial for p = 1.
Now, if f ∈ L∞ (Ω), then

|f (x)| ≤ f L∞ (Ω) a.e. on Ω . (3.4.14)

Indeed, we infer from the definition of  · L∞ (Ω) that, for each n ∈ N,
there exists a constant Cn such that
1
f L∞ (Ω) ≤ Cn < f L∞ (Ω) + and |f (x)| ≤ Cn ,
n
4
Otto Ludwig Hölder, German mathematician, 1859–1937.
84 3 The Lebesgue Integral and Lp Spaces

for x ∈ Ω \ An with m(An ) = 0. Setting A = ∪∞ n=1 An , we have


m(A) = 0 and
|f (x)| ≤ Cn , x ∈ Ω \ A .
As Cn → f L∞ (Ω) we derive (3.4.14) by passing to the limit in the
last inequality.
Using (3.4.14) one can easily prove that  · L∞ (Ω) is a norm in L∞ (Ω).
Now, let us consider the case 1 < p < ∞. We have only to prove the
triangle inequality (since the other axioms are trivially satisfied). For
f, g ∈ Lp (Ω), we have

p
f + gLp (Ω) = |f + g|p−1 |f + g| dx
Ω
≤ |f + g| |f | dx +
p−1
|f + g|p−1 |g| dx. (3.4.15)
Ω Ω

Noting that |f + g|p−1 ∈ Lq (Ω), we obtain by Hölder’s inequality


 
f + gpLp (Ω) ≤ f + gp−1
Lp (Ω) f L (Ω) + gL (Ω) ,
p p

which implies

f + gpLp (Ω) ≤ f pLp (Ω) + gpLp (Ω) .

Theorem 3.25. For every 1 ≤ p ≤ ∞, Lp (Ω) equipped with  · Lp (Ω)


is a Banach space.

Proof. The fact that  · Lp (Ω) is a norm was shown before (see Theo-
rem 3.24). So we only need to prove that this norm is complete. We
distinguish two cases.

Case 1: 1 ≤ p < ∞. Let (f )n∈N be a Cauchy sequence in Lp (Ω).


Then there exists a subsequence (fnm )m∈N which satisfies

1
fnm+1 − fnm Lp (Ω) ≤ , m = 1, 2, . . . (3.4.16)
2m
Indeed, one may first choose n1 ∈ N such that fm − fn Lp (Ω) ≤
1/2 ∀m, n ≥ n1 ; then choose n2 ∈ N, n2 ≥ n1 , such that fm −
fn Lp (Ω) ≤ 1/22 ∀m, n ≥ n2 , and so on. We are going to show that
there is a function f ∈ Lp (Ω) such that fnm −f Lp (Ω) → 0, as m → ∞.
If we show this, the initial sequence (fn ) will be convergent in Lp (Ω),
3.4 Lp Spaces 85

as a Cauchy sequence with a convergent subsequence. For simplicity,


we redenote fm := fnm , so (3.4.16) becomes
1
fm+1 − fm Lp (Ω) ≤ , m = 1, 2, . . . (3.4.17)
2m
Set

n
gn (x) = |fi+1 (x) − fi (x)| .
i=1
According to (3.4.17), we have
gn Lp (Ω) ≤ 1, n = 1, 2, . . .
By the Monotone Convergence Theorem, gn (x) converges a.e. to a
finite limit g(x), and g ∈ Lp (Ω). Now, for m ≥ n ≥ 2 and for almost
all x ∈ Ω,
|fm (x) − fn (x)| ≤ |fm (x) − fm−1 (x)| + · · · + |fn+1 (x) − fn (x)|
= gm−1 (x) − gn−1 (x)
≤ g(x) − gn−1 (x) . (3.4.18)
It follows that for almost all x ∈ Ω, (fn (x))n∈N is Cauchy, so it con-
verges to some f (x). We also obtain for almost all x ∈ Ω
|f (x) − fn (x)| ≤ g(x), n = 2, 3, . . .
so, in particular, f ∈ Lp (Ω). As |fn −f |p → 0 a.e. on Ω and |fn −f |p ≤
g p ∈ L1 (Ω), we are in a position to apply the Dominated Convergence
Theorem to conclude that fn − f Lp (Ω) → 0.
Case 2: p = ∞. Let (fn ) be a Cauchy sequence in L∞ (Ω). So, for
any j ∈ N, there exists Nj ∈ N such that
1
fn − fm L∞ (Ω) ≤ ∀n, m ≥ Nj .
j
Hence, there exists a set Mj with m(Mj ) = 0 such that
1
|fn (x) − fm (x)| ≤ ∀x ∈ Ω \ Mj , m, n ≥ Nj . (3.4.19)
j
Obviously, the set M = ∪∞ j=1 Mj has measure zero. For each x ∈
Ω \ M the sequence (fn (x)) is Cauchy and therefore convergent to
some f (x) ∈ R. Now, we deduce from (3.4.19)
1
|fn (x) − f (x)| ≤ ∀x ∈ Ω \ M, n ≥ Nj ,
j
86 3 The Lebesgue Integral and Lp Spaces

hence f ∈ L∞ (Ω) and

1
fn − f L∞ (Ω) ≤ ∀n ≥ Nj .
j

So (fn ) converges to f in L∞ (Ω).

3.5 Exercises
1. A set Ω ⊂ Rk is measurable ⇐⇒ for every ε > 0 there exists a
closed set F ⊂ Ω such that m(Ω \ F ) < ε.

2. Let Ω ⊂ Rk be a measurable set with m(Ω) < ∞. Show that,


for every ε > 0, there exists a compact set K ⊂ Ω such that
m(Ω \ K) < ε.

3. Let A ⊂ Ω ⊂ B ⊂ Rk , where A, B are measurable sets with


m(A) = m(B) < ∞. Then Ω is measurable.

4. Let h ∈ Rk \ {0} and α ∈ R. Show that for every measurable set


Ω ⊂ Rk we have

(a) Ωh := {x + h; x ∈ Ω} is measurable and m(Ωh ) = m(Ω)


(translation invariance);
(b) αΩ := {αx; x ∈ Ω} is measurable and m(αΩ)
= |α|k m(Ω).

5. Let h ∈ Rk \ {0} and α ∈ R \ {0}. If f ∈ L1 (Rk ), then so are the


functions x → f (x − h), x → f (αx) and

f (x − h) dx = f (x) dx, f (αx) dx
Rk Rk Rk

1
= f (x) dx .
|α|k Rk

6. Let −∞ < a < b < +∞ and let f : [a, b] → R be a bounded func-


tion. If f is Riemann integrable then f ∈ L1 (a, b) := L1 ((a, b); R),
and the two integrals coincide:
b b
(L) f (x) dx = (R) f (x) dx .
a a
3.5 Exercises 87

Use the (Dirichlet) function D : [0, 1] → R,



1 if x ∈ Q ∩ [0, 1],
D(x) =
0 if x ∈ [0, 1] \ Q,

to show that the converse implication is not true in general.

7. Let fn : [0, 1] → R be defined by

nxn−1
fn (x) = , x ∈ [0, 1], n ∈ N .
1+x
Show that 1
1
lim fn (x) dx = .
n→∞ 0 2

8. Show that f : [1, ∞) → R defined by

f (x) = x−2 ln x, x ∈ [1, ∞) ,

is Lebesgue integrable and



f (x) dx = 1 .
1

9. Show that n
x n −2x
lim 1+ e dx = 1.
n→∞ 0 n

10. Let f : [0, ∞) → R be a continuous function such that

lim f (x) = a,
x→∞

where a ∈ R. Show that, for every b ∈ (0, ∞),


b
lim f (nx) dx = ab .
n→∞ a

11. Let f : [0, 1] → R be defined by



0 if x = 0,
f (x) = √  1 1
n if x ∈ n+1 , n , n ∈ N.
88 3 The Lebesgue Integral and Lp Spaces

Show that

(a) f is not Riemann integrable on [0, 1] ;


(b) f ∈ Lp (0, 1) for 1 ≤ p < 2, and f ∈
/ Lp (0, 1) for 2 ≤ p ≤ ∞ .

12. Show that the following functions are not Lebesgue integrable:

(a) f (x) = x1 , x ∈ (0, 1) ;


(b) g(x) = sin x + cos x, x ∈ (0, ∞) .

13. Let f ∈ C[0, 1] := C([0, 1]; R), such that f (0) = 0, and f is
differentiable at x = 0. Then prove that g : (0, 1) → R, defined
by
g(x) = x−3/2 f (x), x ∈ (0, 1) ,
belongs to L1 (0, 1).
1
14. If f ∈ L1 (0, 1), show that 0 xn f (x) dx → 0 as n → ∞.

15. Let Ω ⊂ Rk be a measurable set with m(Ω) < ∞ and let 1 ≤


p < q ≤ ∞. Prove that Lq (Ω) ⊂ Lp (Ω) and

f Lp (Ω) ≤ m(Ω)(q−p)/pq f Lq (Ω) ∀f ∈ Lq (Ω) .

16. Let Ω ⊂ Rk be a measurable set with m(Ω) < ∞ and let f ∈


L∞ (Ω). Prove that

lim f Lp (Ω) = f L∞ (Ω) .


p→∞
Chapter 4

Continuous Linear
Operators and Functionals

In this chapter we discuss linear operators between linear spaces, but


our presentation is restricted at this stage to the space of continuous
(bounded) linear operators between normed spaces. When the target
space is either R or C, they are called (continuous linear) functionals
and are used to define dual spaces and weak topologies.
Unless otherwise specified, this chapter only considers linear spaces
over the field K, with K being R or C. When two or more linear
spaces are involved then all of them will be over the same field.

4.1 Definitions, Examples, Operator


Norm
We begin this section with some basic definitions.
Definition 4.1. Let X, Y be linear spaces and let A : D(A) ⊂ X → Y .
A is called a linear operator if D(A) is a linear subspace of X and
A(αx + βy) = αAx + βAy, ∀α, β ∈ K, ∀x, y ∈ D(A) .
We denote the range of A by R(A), i.e., R(A) = {Ax; x ∈ D(A)}. The
range R(A) is a linear subspace of Y .
We say that A is injective or one-to-one if N (A), the nullspace of
A, defined by N (A) = {x ∈ D(A); Ax = 0}, is precisely {0}. The
operator A is called surjective or onto if R(A) = Y .

© Springer Nature Switzerland AG 2019 89


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 4
90 4 Continuous Linear Operators and Functionals

Example 1.
Let X = Rn , Y = Rm with n, m ∈ N. Let M be an m × n matrix with
real entries, then A : D(A) = X → Y defined by

Au = M u ∀u = (u1 , . . . , un )T ∈ X

is a linear operator, and in fact all linear maps between these spaces
can be represented in this way. Here we consider that the elements of
both X and Y are column vectors. If m = 1 then A is a linear form
on X, as defined in Chap. 1.

Example 2.
For X = Y = C[a, b] := C([a, b]; R) with −∞ < a < b < ∞, the
derivative operator Af = f  is defined on D(A) = C 1 [a, b] (which is
the set of all continuously differentiable functions f : [a, b] → R), and
its range is R(A) = C[a, b] = Y , so A is surjective. Note that A is not
injective because its nullspace N (A) := {f ∈ D(A); Af = 0} = {0}
(more precisely, N (A) consists of all constant functions).

Example 3.
For X = Y = C[a, b], −∞ < a < b < ∞, the antiderivative operator
t
(Af )(t) = a f (s) ds is defined on D(A) = C[a, b] = X. It is injective
because Af = 0 implies f = 0. However A is not surjective because
(Af )(a) = 0 for all f ∈ D(A) = C[a, b], and thus R(A) is a proper
subset of Y = C[a, b].

Proposition 4.2. Let (X,  · X ), (Y,  · Y ) be normed (linear) spaces


and let A : X → Y be a linear operator. Then the following are
equivalent

1. A is continuous on X;

2. A is continuous at x = 0;

3. A maps bounded subsets of X to bounded subsets of Y ;

4. There exists c > 0 such that AuY ≤ cuX for all u ∈ X.

Proof. An exercise.

Remark 4.3. If X, Y are finite dimensional spaces then both of them


can be equipped with norms, and every linear operator between the
two spaces is continuous (prove it!). In fact, any such operator can
4.1 Definitions, Examples, Operator Norm 91

be represented by a matrix which depends on the bases of the two


spaces. So continuity of linear operators is interesting only in the case
of infinite dimensional linear spaces.
Remark 4.4. A linear operator A : D(A) ⊂ X → Y is said to be
bounded if

sup {AxY ; x ∈ D(A), xX ≤ 1} < ∞ . (4.1.1)

Otherwise, A is called unbounded.


Obviously, any continuous linear operator from (X,  · X ) to (Y,  · Y )
is bounded. Conversely, if A : D(A) ⊂ X → Y is a bounded linear
operator, then denoting by ĉ the supremum in (4.1.1) we have

AxY ≤ ĉxX ∀x ∈ D(A) , (4.1.2)

so A is continuous from (D(A), ·X ) to (Y, ·Y ) (see Proposition 4.2).


That is why continuous linear operators are also called bounded.
Note that if A is a continuous (bounded) linear operator from (D(A), ·
X ) to (Y, ·Y ), then A can be extended by continuity to a continuous
linear operator A1 : D(A1 ) = X1 → Y1 , where X1 , Y1 denote the
completions of D(A) and Y with respect to ·X and ·Y , respectively.

For (X,  · X ), (Y,  · Y ) normed spaces, denote

L(X, Y ) = {A : X → Y ; A is linear and continuous}.

Obviously, L(X, Y ) is a linear space. It is a normed space with the


so-called operator norm

A = sup {AuY ; u ∈ X, uX ≤ 1} .

Clearly, we have

AuY ≤ A · uX ∀u ∈ X .

If (Z,  · Z ) is another normed space, and A ∈ L(X, Y ), B ∈ L(Z, X),


then AB ∈ L(Z, Y ) and

AB ≤ A · B ,

where AB denotes the composition A◦B.


92 4 Continuous Linear Operators and Functionals

In the case X = Y we simply write L(X) = L(X, X).

Examples. If X = C[0, 1] equipped with the usual sup-norm, the


t
antiderivative operator A : X → X, (Af )(t) = 0 f (s) ds, t ∈
[0, 1], f ∈ X, is linear and continuous (hence bounded) with A = 1.
On the other hand, for the same space X, the derivative operator
B : D(B) = C 1 [0, 1] ⊂ X → X, Bf = f  , is linear but unbounded
because for fn (t) = tn , t ∈ [0, 1], n ∈ N, we have fn  = 1, while
Bfn  = n → ∞.
Remark 4.5. If X = {0}, then

A = sup {AuY ; u ∈ X, uX = 1} . (4.1.3)

Proof. If we denote the right-hand side by a, then clearly

a ≤ A . (4.1.4)

Now, from the inequality

A(u−1
X u)Y ≤ a ∀u ∈ X \ {0}

we derive
AuY ≤ auX ∀u ∈ X . (4.1.5)
By taking the supremum in (4.1.5) over all u ∈ X, uX ≤ 1, we find
A ≤ a, which combined with (4.1.4) proves (4.1.3).

Theorem 4.6. If (X,  · X ) is a normed space and (Y,  · Y ) is a


Banach space, then L(X, Y ) is a Banach space with respect to the
operator norm.

Proof. We know that L(X, Y ) is a normed space, so we have to show


that it is complete. For the sake of simplicity we redenote by  · 
both the norms  · X and  · Y . Consider a Cauchy sequence (An ) in
L(X, Y ), i.e.,

∀ε > 0 ∃Nε such that An − Am  < ε ∀n, m > Nε .

For the same ε, we have

An v − Am v ≤ εv ∀v ∈ X, n, m > Nε .


4.2 Main Principles of Functional Analysis 93

Now, (An v) converges in Y since Y is Banach, so we have an operator


A : X → Y , Av = limn→∞ An v, and because each An is linear, A is as
well. Since for all v ∈ X

Av ≤ Av − AN +1 v + AN +1 v


≤ εv + AN +1  · v
 
= ε + ANε +1  v ,

we see that A is continuous, so A ∈ L(X, Y ).


Since
An v − Av ≤ ε

for v ∈ X such that v ≤ 1 and n > Nε , we get

An − A ≤ ε ∀n > Nε ,
which implies that An → A in L(X, Y ).

4.2 Main Principles of Functional


Analysis
In this section we present some important principles of Functional
Analysis: the Uniform Boundedness Principle, the Open Mapping
Theorem, and the Closed Graph Theorem. We begin with the Uniform
Boundedness Principle, which was proven by Banach and Steinhaus.1

Theorem 4.7 (Banach–Steinhaus, Uniform Boundedness Principle).


Let (X,  · X ) and (Y,  · Y ) be Banach spaces and let {Ti }i∈I ⊂
L(X, Y ) be a collection of operators satisfying
sup Ti xY < ∞ ∀x ∈ X . (4.2.6)
i∈I

Then,
sup Ti  < ∞ . (4.2.7)
i∈I

Proof. Denote
Xn = {x ∈ X; sup Ti xY ≤ n}, n ∈ N .
i∈I

1
Hugo Steinhaus, Polish mathematician, 1887–1972.
94 4 Continuous Linear Operators and Functionals

Obviously, Xn is a closed set for every n ∈ N, and by (4.2.6) we have




X= Xn .
n=1

It follows by Baire’s Theorem (Theorem 2.10) that there exists an


n0 ∈ N such that Int Xn0 = ∅, i.e., there is a ball B(x0 , r0 ) ⊂ Xn0 ,
r0 > 0. Hence,

Ti (x0 + r0 w)Y ≤ n0 ∀i ∈ I, ∀w ∈ B(0, 1),

which implies
r0 Ti  ≤ n0 + Ti x0 Y ∀i ∈ I .
This shows that (4.2.7) holds true (see also (4.2.6)).

Theorem 4.8 (Open Mapping Theorem). Let (X,  · X ), (Y,  · Y )


be Banach spaces. If A : D(A) ⊂ X → Y is a linear, continuous, and
surjective operator, then A maps open sets in X to open sets in Y .

Proof. It suffices to prove that there exists a constant r > 0 such that

BY (0, r) ⊂ A(BX (0, 1)) , (4.2.8)

where BX (0, 1), BY (0, r) denote the open balls in X and Y centered
at 0 with radii 1 and r, respectively. In order to prove (4.2.8) we shall
first show the existence of a constant r1 > 0 such that
 
BY (0, r1 ) ⊂ Cl A(BX (0, 1)) . (4.2.9)
 
Denote Yn = n Cl A(BX (0, 1)) , n ∈ N. Since A is surjective, we
have Y = ∪n∈N Yn . By Baire’s  Theorem (Theorem
 2.10) Int Yn0 = ∅
for some n0 ∈ N, hence Int Cl A(BX (0, 1)) = ∅. So, for some y0 ∈ Y
and some r1 > 0, we have
 
BY (y0 , 2r1 ) ⊂ Cl A(BX (0, 1)) . (4.2.10)
 
Adding the fact that −y0 ∈ Cl A(BX (0, 1)) to (4.2.10), we obtain
   
BY (0, 2r1 ) ⊂ Cl A(BX (0, 1)) + Cl A(BX (0, 1))
 
= 2 Cl A(BX (0, 1))
4.2 Main Principles of Functional Analysis 95
 
(since Cl A(BX (0, 1)) is a convex set), hence (4.2.9) holds true.
Now we are going to prove (4.2.8) by using (4.2.9) with r1 = 2r, i.e.,
 
BY (0, 2r) ⊂ Cl A(BX (0, 1)) . (4.2.11)

Choose an arbitrary y ∈ BY (0, r). By (4.2.11) we have

∀ε > 0 ∃v ∈ BX (0, 1/2) such that y − AvY < ε . (4.2.12)

In particular, for ε = r/2 there exists a v1 ∈ BX (0, 1/2) with


r
y − Av1 Y < .
21
Now choosing y − Av1 instead of y and ε = 1/22 in (4.2.12), we can
find some v2 ∈ BX (0, 1/22 ) with
r
(y − Av1 ) − Av2 Y < .
22
Continuing the process we find vn ∈ BX (0, 1/2n ) such that
r
y − A(v1 + v2 + · · · + vn )Y < . (4.2.13)
2n
Obviously, xn = v1 + v2 + · · · + vn defines a Cauchy sequence in X,
hence xn converges to some x ∈ X with xX < 1 and y = Ax since
A ∈ L(X, Y ) (see (4.2.13)). As y was an arbitrary vector in BY (0, r)
the proof of (4.2.8) is complete.

Remark 4.9. If (X,  · X ), (Y,  · Y ) are Banach spaces and A ∈


L(X, Y ) is bijective, then A−1 ∈ L(Y, X). This follows from (4.2.8).
Theorem 4.10 (Closed Graph Theorem). Let (X,  · X ), (Y,  · Y )
be Banach spaces. If A : X → Y is a linear operator and its graph
G(A) := {(x, Ax); x ∈ X} is closed in X × Y (in other words, A is a
closed operator), then A ∈ L(X, Y ).
Proof. Define on X the norm

xA = xX + AxY , x ∈ X,

which is called the graph norm. Since G(A) is a closed set in (X,  ·
X )×(Y, ·Y ), it follows that (X, ·A ) is a Banach space. Obviously,

xX ≤ xA ∀x ∈ X,
96 4 Continuous Linear Operators and Functionals

so the identity operator I : (X,  · A ) → (X,  · X ) is continuous. So,


by Remark 4.9, its inverse I −1 = I ∈ L((X,  · X ), (X,  · A )), i.e.,
there exists a constant C > 0 such that

xA ≤ CxX ∀x ∈ X.

In particular,
AxY ≤ CxX ∀x ∈ X,
which means A is continuous from (X,  · X ) to (Y,  · Y ).

4.3 Compact Linear Operators


If X, Y are normed spaces and A : X → Y is a linear operator then
A is called compact or completely continuous if A takes bounded
sets of X into relatively compact subsets of Y .

Example.
Let X = Y = C[a, b], −∞ < a < b < +∞, equipped with the usual
sup-norm, and let A : X → X be defined by
b
(Af )(t) = k(t, s)f (s) ds ∀f ∈ X, ∀t ∈ [a, b] ,
a

where k ∈ C([a, b] × [a, b]).


Obviously A is a linear operator. Moreover, it follows from Arzelà–
Ascoli’s Criterion that A is a compact operator. The key argument
here is that the equicontinuity condition is a consequence of the uni-
form continuity of k.

A compact linear operator is clearly continuous (see Proposition 4.2).


Denote by

K(X, Y ) = {A ∈ L(X, Y ); A is compact } .

It is clear that K(X, Y ) is a linear subspace of L(X, Y ). Moreover, we


have the following theorem.

Theorem 4.11. If X is a normed space and Y is a Banach space,


then K(X, Y ) is a closed linear subspace of L(X, Y ), i.e., K(X, Y ) is
a Banach space with respect to the operator norm (see Theorem 4.6).
4.4 Linear Functionals, Dual Spaces, Weak Topologies 97

Proof. We shall denote by  ·  all the three norms of X, Y , and


L(X, Y ). Let (An ) be a sequence in L(X, Y ) which converges to some
A ∈ L(X, Y ), namely An − A → 0. So, for ε > 0 there exists m ∈ N
sufficiently large such that
ε
Am − A < . (4.3.14)
3r
Let (xn ) be a sequence in the ball B(0, r) ⊂ X, where r > 0 is arbitrary
but fixed. Since Am is compact there exists a subsequence of (xn ), say
(xnk )k≥1 , such that (Axnk )k≥1 is convergent, hence Cauchy. Thus, for
any ε > 0 (which can be the same as above), there exists N ∈ N such
that ε
Am xnk − Am xnj  < ∀k, j > N . (4.3.15)
3
Using (4.3.14) and (4.3.15) we deduce
Axnk − Axnj 
≤ Axnk − Am xnk  + Am xnk − Am xnj  + Am xnj − Axnj 
≤ A − Am  · xnk  + Am xnk − Am xnj  + Am − A · xnj 
ε ε ε
< r· + +r· = ε,
3r 3 3r
in other words, (Axnk ) is Cauchy, hence convergent, and therefore
A ∈ K(X, Y ).

Remark 4.12. It is worth pointing out that if A ∈ K(X, Y ), where X


is a normed space and Y is a Hilbert space (see Chap. 6), then there
exists a sequence (An )n≥1 in L(X, Y ), such that the range of An is finite
dimensional (hence An is compact) for all n ≥ 1 and An − A → 0.
For the proof of this nice result see Brezis2 [6, Remark 1, pp. 157–158].

4.4 Linear Functionals, Dual Spaces,


Weak Topologies
We begin this section by defining the important concept of a dual
space.
Definition 4.13. Let (X,  · ) be a normed space. Define the dual
of X, denoted X ∗ , by
X ∗ = {f : X → K; f is linear and continuous },

2
Haim Brezis, French mathematician, born 1944.
98 4 Continuous Linear Operators and Functionals

so X ∗ is in fact L(X, K). The elements of X ∗ are called functionals.

Since (K, |·|) is a Banach space, X ∗ is also a Banach space with respect
to
f  = sup {|f (v)|; v ∈ X, v ≤ 1} .
By definition
|f (v)| ≤ f  · v ∀v ∈ X, ∀f ∈ X ∗ .

Example 1.
Let X be the linear space of all sequences of real numbers (xn )n≥1
satisfying ∞

|xn | < ∞ .
n=1
X is usually denoted by l1 and is a Banach space (over R) with respect
to the norm ∞

(xn ) = |xn | .
n=1
See Exercise 2.19.
It is easily seen that any functional f ∈ X ∗ has the form

  
f (xn ) = an xn ,
n=1

where (an ) is a bounded sequence in R. X ∗ is usually denoted by l∞


and is a Banach space with the norm
(an )∞ = sup |an | .
n≥1

Example 2.
Let X = C[a, b], −∞ < a < b < +∞, with the sup-norm, denoted
 · . For a fixed v ∈ X define f : X → R by
b
f (u) = u(t)v(t) dt ∀u ∈ X .
a

We see that f is linear and also continuous because

|f (u)| ≤ (b − a)v · u ∀u ∈ X ,

and therefore f ∈ X ∗ .
4.4 Linear Functionals, Dual Spaces, Weak Topologies 99

Now, consider the same space X = C[a, b] equipped with another


norm, namely the L2 -norm, and the same functional f , which can be
expressed as the scalar product
f (u) = (u, v)L2 (a,b) ∀u ∈ X .
Again, f is linear and by the Bunyakovsky–Cauchy–Schwarz inequality
|f (u)| ≤ vL2 (a,b) · uL2 (a,b) ∀u ∈ X ,
so f ∈ (X,  · L2 (a,b) )∗ .
Question: Given f ∈ (X,  · L2 (a,b) )∗ , does there exist v ∈ X = C[a, b]
such that f (u) = (u, v)L2 (a,b) for all u ∈ X? We shall show later
(Theorem 6.10) that there exists such a v in the L2 (a, b), but not
necessarily in X = C[a, b].

In what follows we present the Hahn3 –Banach Theorem on the exten-


sion of linear (not necessarily continuous) R-valued functionals.

Theorem 4.14 (Hahn–Banach). Let X be a real linear space, and let


p : X → R be a map which satisfies
p(x + y) ≤ p(x) + p(y) ∀x, y ∈ X ,

p(αx) = αp(x) ∀α > 0, x ∈ X .


If Y is a linear subspace of X and f : Y → R is a linear functional
satisfying
f (x) ≤ p(x) ∀x ∈ Y ,
then there exists a linear functional g : X → R such that

g(x) = f (x) ∀x ∈ Y ,

g(x) ≤ p(x) ∀x ∈ X .

Proof. The case Y = X is trivial, so we assume that Y is a proper


subspace of X. Consider the collection E of all linear extensions of f
in the above sense, i.e., h ∈ E if and only if D(h) is a linear subspace
of X, Y ⊂ D(h), h is linear, h extends f , and h(x) ≤ p(x) ∀x ∈ D(h).
Clearly f ∈ E so E is nonempty. Define on E the order relation

h1  h2 ⇐⇒ D(h1 ) ⊂ D(h2 ) and h2 (x) = h1 (x) ∀x ∈ D(h1 ) .

3
Hans Hahn, Austrian mathematician, 1879–1934.
100 4 Continuous Linear Operators and Functionals

We wish to apply Zorn’s Lemma, so let G = {hi }i∈I be a totally ordered


subset of E and consider the functional h defined by

D(h) = ∪i∈I D(hi ), h(x) = hi (x) if x ∈ D(hi ) for some i ∈ I .

Obviously, h is well defined and belongs to E and is an upper bound


for G. Hence E is inductive, so by Zorn’s Lemma E has a maximal
element g ∈ E.
To complete the proof let us show that D(g) = X. Assume by con-
tradiction that this is not the case, so ∃x0 ∈ X \ D(g). Consider
Z = Span {x0 } ∪ D(g) , and define on Z a linear functional g̃ of the
form
g̃(tx0 + x) = αt + g(x), t ∈ R, x ∈ D(g) ,
where α is a real parameter. We shall prove that there exists an α
such that g̃ ∈ E, i.e.,

αt + g(x) ≤ p(tx0 + x) ∀x ∈ D(g), t ∈ R . (4.4.16)

In particular,

g(x) + α ≤ p(x + x0 ) ∀x ∈ D(g) ,


g(y) − α ≤ p(y − x0 ) ∀y ∈ D(g) ,

hence α should satisfy

g(y) − p(y − x0 ) ≤ α ≤ p(x + x0 ) − g(x) ∀x, y ∈ D(g) ,

which is equivalent to

sup [g(y) − p(y − x0 )] ≤ α ≤ inf [p(x + x0 ) − g(x)] .


y∈D(g) x∈D(g)

Such an α exists indeed since

g(y) − p(y − x0 ) ≤ p(x + x0 ) − g(x)


⇔ g(x + y) ≤ p(x + x0 ) + p(y − x0 ) = p(x + y) ,

which is clearly valid for all x, y ∈ D(g). It is easy to check that g̃


with this alpha satisfies (4.4.16), so g̃ ∈ E. But g̃ is a proper extension
of g (since D(g) is a proper subset of D(g̃) = Z) and this contradicts
the maximality of g.
4.4 Linear Functionals, Dual Spaces, Weak Topologies 101

Corollary 4.15. Let (X,  · ) be a normed space and let Y be a linear


subspace of X. If f ∈ Y ∗ := (Y,  · )∗ , then there exists an extension
g of f such that g ∈ X ∗ := (X,  · )∗ and

gX ∗ = f Y ∗ .

Proof. If K = R then we can apply the Hahn–Banach Theorem with


p(x) = f Y ∗ x to derive the existence of a linear extension g : X →
R satisfying
g(x) ≤ f Y ∗ x ∀x ∈ X .
Since −g(x) = g(−x) satisfies a similar inequality, we have g ∈ X ∗
and
gX ∗ ≤ f Y ∗ .
Obviously, the converse inequality is also satisfied, so gX ∗ = f Y ∗ .
If K = C define
q(x) := Re f (x) ∀x ∈ Y.
Then,
f (x) = q(x) − iq(ix) ∀x ∈ Y,
and
|q(x)| ≤ f Y ∗ x ∀x ∈ Y. (4.4.17)
Now, if we regard X, Y as real linear spaces and take into account
(4.4.17), we deduce from the first part of the proof the existence of a
continuous linear functional h : X → R which extends q and satisfies

|h(x)| ≤ f Y ∗ x ∀x ∈ X . (4.4.18)

Set
g(x) = h(x) − ih(ix), x∈X.
Functional g : X → C is an extension of f and is linear on the complex
space X. Let us prove that

|g(x)| ≤ f Y ∗ x ∀x ∈ X .

Indeed, for each x ∈ X, g(x) can be written as g(x) = reiθ , r ≥ 0, so


 
|g(x)| = r = Re e−iθ g(x)
 
= Re g e−iθ x

= h(e−iθ x
(by (4.4.18)) ≤ f Y ∗ x ∀x ∈ X .
102 4 Continuous Linear Operators and Functionals

Therefore, g ∈ X ∗ , and gX ∗ ≤ f Y ∗ . As the converse inequality is


trivially satisfied, we have gX ∗ = f Y ∗ .

Remark 4.16. In fact, even Theorem 4.14 above can be extended to


the complex case K = C by a similar procedure.

Corollary 4.17. Let (X,  · ) be a normed space. Then for every


x0 ∈ X \ {0} there exists a functional g ∈ X ∗ such that

gX ∗ = 1 and g(x0 ) = x0  .

Proof. Apply Corollary 4.15 with Y = Span{x0 } and f : Y → K


defined by
f (x) = tx0  for x = tx0 , t ∈ K .

Corollary 4.18. Let (X,  · ) be a normed space. Then for every


x ∈ X we have

x = sup {|f (x)|; f ∈ X ∗ , f X ∗ ≤ 1} , (4.4.19)

where the sup is attained.

Proof. For x = 0 (4.4.19) is obvious. Let x ∈ X \ {0} and denote by a


the right-hand side of (4.4.19). Clearly, a ≤ x. In fact, a = x by
virtue of Corollary 4.17.

Remark 4.19. Let (X,  · ) be a normed space. Define

J(x) = {x∗ ∈ X ∗ ; x∗ X ∗ = x, x∗ (x) = x2 } .

From Corollary 4.17 we see that J(x) is nonempty for all x ∈ X. In


general, J(x) is not a singleton, but there are cases when this happens
for all x ∈ X (e.g., if X is a Hilbert space, as will be shown later). The
set-valued map x → J(x) is called the duality map from X to X ∗ .

Recall that, given a normed space (X, ·), the strong (norm) topol-
ogy of X is the metric topology generated by d(x, y) = x − y for
x, y ∈ X. In fact, we can consider that X is a Banach space (in other
words,  ·  is complete, or d is complete), otherwise we can use the
completion procedure (see Theorem 2.8) to reach this framework.
4.4 Linear Functionals, Dual Spaces, Weak Topologies 103

Definition 4.20. The weak topology of X is the one generated by


neighborhoods of the origin of the form

Vx∗1 ,x∗2 ,...,x∗m ;ε = {x ∈ X; |x∗j (x)| < ε, j = 1, 2, . . . , m} ,

for all finite systems of functionals {x∗1 , x∗2 , . . . , x∗m } and for all ε > 0.
w
We write xn → x or xn  x to mean convergence in the weak topology,
i.e., x∗ (xn ) → x∗ (x) for all x∗ ∈ X ∗ .
w
Remark 4.21. If xn → x, i.e., xn − x → 0, then xn → x. Indeed, for
all x∗ ∈ X ∗ ,

|x∗ (xn ) − x∗ (x)| = |x∗ (xn − x)|


≤ x∗  · xn − x ,

which tends to 0. The converse is not true in general, and we shall see
some examples later.
However, if X is finite dimensional then strong and weak convergence
are equivalent. Indeed, by choosing particular functionals, one can see
that weak convergence reduces to convergence on coordinates.
Definition 4.22. In X ∗ , besides the strong topology and the weak
topology, defined by means of functionals from X ∗∗ := (X ∗ )∗ (the bid-
ual of X), we have the so-called weak-star topology w∗ , starting
from another neighborhood basis consisting of

Vx1 ,x2 ,...,xm ;ε = {x∗ ∈ X ∗ ; |x∗ (xj )| < ε, j = 1, 2, . . . , m} ,

for all finite systems {x1 , x2 , . . . , xm } ⊂ X, and for all ε > 0. So con-
w∗
vergence x∗n → x∗ means x∗n (x) → x∗ (x) for all x ∈ X, i.e., pointwise
convergence for a sequence of functionals. In general this is different
than w-convergence.
In general X is embedded into X ∗∗ , which is to say that there is an
i
injection x → fx defined by fx (x∗ ) = x∗ (x) for all x∗ ∈ X ∗ . Clearly,
i ∈ L(X, X ∗∗ ) since
|fx (x∗ )| ≤ x∗  · x .
Moreover, using Corollary 4.17, we see that i is an isometry.
If i : X → X ∗∗ is onto (surjective), then X is said to be reflexive. In
particular Hilbert spaces are reflexive, as will be shown later.
Remark 4.23. It is easily seen that if X is reflexive then w = w∗ on
X ∗.
104 4 Continuous Linear Operators and Functionals

4.5 Exercises
1. Let X, Y be linear spaces. Find a necessary and sufficient condi-
tion for a subset G ⊂ X × Y to be the graph of a linear operator
from X into Y .

2. Let X, Y be normed spaces over R. If : X → Y is a continuous


operator satisfying the condition

A(x1 + x2 ) = Ax1 + Ax2 ∀x1 , x2 ∈ X,

then A is linear (hence A ∈ L(X, Y )).

3. Let −∞ < a < b < +∞. Find the operator norm of A ∈ L(X)
given by
(Af )(t) = tf (t), t ∈ [a, b], f ∈ X,
when
(i) X = C[a, b] with the sup-norm;
(ii) X = Lp (a, b), with the usual norm, for some 1 ≤ p < ∞.

4. Let X = C[a, b], where −∞ < a < b < +∞. Assume that X is
equipped with the usual sup- norm and consider the operator A
defined by
t
(Af )(t) = g(s)f (s) ds, f ∈ X, t ∈ [a, b],
a

where g is a given function in L1 (a, b) with g(s) ≥ 0 for almost


all s ∈ (a, b). Show that A is a compact linear operator from X
into itself (i.e., A ∈ K(X) ⊂ L(X)) and calculate A.

5. Let (X,  · X ), (Y,  · Y ) be normed spaces. Show that a linear


operator A : X → Y is continuous if and only if the following
implication holds

(∗) xn ∈ X, xn X → 0 =⇒ (Axn Y ) is a bounded sequence.

6. Let (X,  · X ) be a normed space and let (Y,  · Y ) be a Banach


space. Show that, for any sequence
 (An )n∈N in L(X, Y  ) satisfy-
ing An  ≤ an ∀n ∈ N with ∞ a
n=1 n < ∞, the series ∞
n=1 An
is convergent in L(X, Y ).
4.5 Exercises 105

7. Let (X,  · ) be a Banach space. Show that


(i) for all A ∈ L(X) the series
1 1 1
I+ A + A2 + · · · + An + · · ·
1! 2! n!
is convergent in L(X) with its usual operator norm (the sum
of the series being denoted eA ). Here I denotes the identity
operator on X.
(ii) for all A ∈ L(X) with A < 1, I − A is invertible and
(I − A)−1 ∈ L(X).
8. Let (X,  · ) be a Banach space. For every pair of operators
A, B ∈ L(X) that commute (i.e., AB = BA) one has eA eB =
eA+B (for the notation see the previous exercise).
9. Let (X,  · X ), (Y,  · Y ) be Banach spaces. Let (Tn )n∈N be a
sequence in L(X, Y ) which is pointwise convergent, i.e.,
∀x ∈ X ∃yx ∈ Y such that Tn x − yx Y → 0.
Define T : X → Y by T x = yx , x ∈ X. Show that
(a) (Tn )n∈N is bounded in (K, | · |);
(b) T ∈ L(X, Y );
(c) T  ≤ lim inf Tn .
10. Let (X,  · ) be a Banach space and let S be a nonempty subset
of X such that for all f ∈ X ∗ the set
f (S) = {f (x); x ∈ S} is bounded in (K, | · |)}.
Prove that S is bounded in (X,  · ).
11. Let X be a Banach space and let A : X → X ∗ be a linear
operator satisfying
(Ax)(y) = (Ay)(x) ∀x, y ∈ X.
Show that A is a continuous operator, i.e. A ∈ L(X, X ∗ ).
12. Let (X,  · X ), (Y,  · Y ) be Banach spaces. If A : D(A) ⊂ X →
Y is a linear closed operator with D(A) closed in (X, ·X ), then
prove there exists a constant C > 0 such that
AxY ≤ CxX ∀x ∈ D(A).
106 4 Continuous Linear Operators and Functionals

13. Let X be a Banach space and let A : X → X ∗ be a linear


operator satisfying

(Ax)(x) ≥ 0 ∀x ∈ X.

Show that A ∈ L(X, X ∗ ).

14. Let X be a linear space, equipped with two norms, ·1 and ·2 ,
such that X is Banach for both norms. Assume there exists a
constant C > 0 such that

x2 ≤ Cx1 ∀x ∈ X.

Show that  · 1 and  · 2 are equivalent.

15. Let X be an n-dimensional linear space, with n ∈ N. Let B =


{u1 , u2 , . . . , un } be a basis in X.
For any linear functional f : X → K we have


n 
n
f (u) = α i fi ∀u = αi ui ∈ X,
i=1 i=1

where fi := f (ui ), i = 1, 2, . . . , n. Obviously, any such f is


continuous with respect to any norm of X, i.e., f ∈ X ∗ .
Compute the norm of f , f X ∗ , explicitly, in terms of the fi ’s,
when the norm of X is defined by

(i) u∞ = max1≤i≤n |αi | ∀u = ni=1 αi ui ∈ X;
 
(ii) u1 = ni=1 |αi | ∀u = ni=1 αi ui ∈ X;
 1/p 
n
(iii) up = i=1 |α i | p ∀u = ni=1 αi ui ∈ X, where p ∈
(1, ∞).

16. Let X = {u ∈ C[0, 1]; u(0) = 0} with the usual sup-norm. Let
f : X → R be defined by
1
f (u) = u(s) ds ∀u ∈ X.
0

Show that f ∈ X ∗ and compute f X ∗ . Can one find some


u ∈ X such that usup = 1 and f (u) = f X ∗ ?
Chapter 5

Distributions, Sobolev
Spaces

In this chapter we first present test functions, which are then used
to introduce scalar distributions. The space D (Ω) of distributions is
analyzed in detail and some related applications are discussed: the
interpretation of the density of a mass concentrated at a point by
means of the Dirac distribution, solving the Poisson equation in D (Ω),
solving ordinary differential equations in D (R), solving the equation
of the vibrating string with non-smooth initial displacement function,
and the boundary controllability for a problem associated with the
same wave equation. We also introduce and discuss Sobolev spaces.
In order to introduce vector distributions we shall present in a separate
section the Bochner integral for vector functions. Vector distributions
and W k,p (a, b; X) spaces are then presented. These will later be used
in solving problems associated with parabolic and hyperbolic PDE’s.

5.1 Test Functions


Let Ω ⊂ Rk be a nonempty open set in Rk (which is equipped with
the usual topology).
For u : Ω → R define the support of u by
supp u = {x ∈ Ω; u(x) = 0} .
For a given m ∈ N, let C m (Ω) denote the set of all functions u : Ω → R
such that u, and all its n-th order partial derivatives, 1 ≤ n ≤ m, exist

© Springer Nature Switzerland AG 2019 107


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 5
108 5 Distributions, Sobolev Spaces

and are continuous. Further let C0m (Ω) = {u ∈ C m (Ω); supp u is a


compact (bounded) set ⊂ Ω}. For m = ∞ extend the definitions
above in the obvious way. The elements (functions) in C0∞ (Ω) are
called test functions since they serve as arguments of distributions
that will be defined later.
A typical example of a test function is φ : Ω = Rk → R defined by
 
exp x 12 −1 , x2 < 1,
φ(x) = 2
0, x2 ≥ 1,

where  · 2 is the Euclidean norm. The function φ ∈ C ∞ (Rk ) (prove


it!) and so φ ∈ C0∞ (Rk ) with supp φ = B(0, 1). For later use we also
define

ω(x) = Cφ(x) with C > 0 such that ω(x) dx = 1 . (5.1.1)
Rk

Obviously, C0∞ (Ω) is a real linear space with respect to the usual op-
erations (addition of functions and scalar multiplication).
In what follows, we introduce the usual topology on C0∞ (Ω). To this
purpose, we must first discuss a few important concepts.

Seminorms, Locally Convex Spaces, Inductive Limit


Let X be a linear space over K (as usual, K is either R or C). A
function p : X → R is called a seminorm if the following conditions
are satisfied:

(j) p(x + y) ≤ p(x) + p(y), x, y ∈ X,

(jj) p(αx) = |α|p(x), α ∈ K, x ∈ X.

If p is a seminorm, then p(0) = 0, while the case when p(x) = 0 for


some x = 0 is not excluded. We also have

|p(x1 ) − p(x2 )| ≤ p(x1 − x2 ) ∀x1 , x2 ∈ X, (5.1.2)

which in particular shows that p(x) ≥ 0 for all x ∈ X. Indeed, by (j),


p(x1 )−p(x2 ) ≤ p(x1 −x2 ) and so p(x1 −x2 ) = p(x2 −x1 ) ≥ p(x2 )−p(x1 ).
Obviously, (5.1.2) follows from these two inequalities.
We will use seminorms to equip X with a topology. If p is a seminorm
and M is the set {x ∈ X; p(x) < ε}, where ε is a positive constant,
then, obviously, 0 ∈ M and M is convex, balanced (i.e., x ∈ M and
5.1 Test Functions 109

|α| ≤ 1 implies αx ∈ M ), and absorbing (i.e., for any x ∈ X there


exists an α > 0 such that α−1 x ∈ M ).
Let F = {pi : X → R; i ∈ I} be a family of seminorms satisfying
the axiom of separation: for any y ∈ X, y = 0, there exists j ∈ I
such that pj (y) = 0. Consider the collection V(0) of all sets which are
finite intersections of sets {x ∈ X; pi (x) < εi }, i ∈ I, ε > 0. Such an
intersection looks like

V = {x ∈ X; pi (x) < εi , i = 1, . . . , n},

where {p1 , . . . , pn } ⊂ F and {ε1 , . . . , εn } ⊂ (0, ∞). Obviously, V is


a convex, balanced, and absorbing set. Each V ∈ V(0) is considered
to be a neighborhood of 0 ∈ X and y + V := {y + v; v ∈ V } a
neighborhood of any y ∈ X.
Now, a set D ⊂ X which is a neighborhood of any y ∈ D is called
open. Indeed, the collection τ of all such sets, plus ∅ ⊂ X, satisfies
the axioms of a topology, so (X, τ ) is a topological space.
Using the separating property of F we can infer that singletons are
closed sets. Indeed, let y ∈ X be a given point. For each x ∈ X, x = y,
let Dx be an open set containing x but not y. Then D = ∪x =y Dx is
open and its complement is {y}, so the singleton {y} is closed, as
claimed. Note that if F does not satisfy the axiom of separation then
the closedness of singletons is not guaranteed.
It is easily seen that the mappings X × X  (x, y) → x + y ∈ X and
K × X  (α, x) → αx ∈ X are both continuous, so X is a topological
linear space.

Definition. A topological linear space X is called locally convex if


every open set containing 0 includes a convex, balanced, and absorbing
open subset.
To summarize, we can say that any linear space X equipped (as above)
with the topology generated by a family of seminorms {pi ; i ∈ I}
satisfying the axiom of separation is a locally convex space in which
any seminorm pi is continuous (cf. (5.1.2)).
Conversely, any locally convex space X is a topological linear space
whose topology is generated by a collection of seminorms. In order to
show this, we define for a convex, balanced, and absorbing set M ⊂ X
the so-called Minkowski functional:

pM (x) = inf{α; α > 0, α−1 x ∈ M }, x ∈ X.


110 5 Distributions, Sobolev Spaces

Observe that M = {x ∈ X; pM (x) ≤ 1}. pM is a seminorm on X.


Indeed, by the convexity of M and the obvious relations
x y
∈ M, ∈ M, ε > 0,
pM (x) + ε pM (y) + ε
we deduce
pM (x) + ε x
·
pM (x) + pM (y) + 2ε pM (x) + ε
pM (y) + ε y
+ · ∈ M.
pM (x) + pM (y) + 2ε pM (y) + ε
Hence
pM (x + y) ≤ pM (x) + pM (y) + 2ε ∀ε > 0
⇒ pM (x + y) ≤ pM (x) + pM (y).
We also have pM (αx) = |α|pM (x), since M is balanced.
So, the topology of a given locally convex space X is the one generated
by the collection of seminorms obtained as the Minkowski functionals
associated with convex, balanced, and absorbing open subsets of X.

Definition. Let X be a linear space over K and let {Xα ; α ∈ J} be a


collection of linear subspaces of X such that X = ∪α∈J Xα . Suppose
that each Xα is a locally convex space such that, if Xα1 ⊂ Xα2 , then
the topology of Xα1 coincides with the relative topology of Xα1 as a
subset of Xα2 . Every convex, balanced, and absorbing set D ⊂ X is
considered open ⇐⇒ D∩Xα is an open set of Xα containing 0 ∈ Xα for
all α ∈ J. If X is a locally convex space with respect to the topology
defined in this way, then X is called the inductive limit of the Xα ’s.
Now let us return to C0∞ (Ω). For any compact K ⊂ Ω define the set
DK (Ω) = {φ ∈ C0∞ (Ω); supp φ ⊂ K},
which is a linear subspace of C0∞ (Ω).
For m ∈ N0 = N ∪ {0} and K ⊂ Ω compact,
pK,m (φ) = sup |Dα φ(x)|
x∈K, |α|≤m

is a seminorm on DK (Ω), where α = (α1 , α2 , . . . , αk ) ∈ Nk0 , |α| =


α1 + α2 + · · · + αk , and the α-derivative of φ is defined as
∂ |α| φ
Dα φ = .
∂xα1 1 ∂xα2 2· · · ∂xαk k
5.1 Test Functions 111

Note that the order of differentiation is not important since φ is a


smooth function. If α = (0, 0, . . . , 0), then Dα φ = φ by convention.
Then DK (Ω) is a locally convex space and, if K1 ⊂ K2 the topology
of DK1 (Ω) coincides with the relative topology of DK1 (Ω) as a subset
of DK2 (Ω). Then C0∞ (Ω) can be regarded as the inductive limit of the
DK (Ω)’s, where K ranges over all compact subsets of Ω. The space
C0∞ (Ω), topologized in this way, is denoted by D(Ω).
One of the seminorms defining the topology of D(Ω) is
p(φ) = sup |φ(x)|, φ ∈ C0∞ (Ω).
x∈Ω

If D = {φ ∈ C0∞ (Ω); p(φ) < 1} and K is a compact subset of Ω, then


D ∩ DK (Ω) = {φ ∈ DK (Ω); pK (φ) := supx∈K |φ(x)| < 1}.

Theorem 5.1. Convergence of a sequence φn → 0 in D(Ω) means


that the following conditions are satisfied:

(a) there exists a compact set K ⊂ Ω such that supp φn ⊂ K for all
n;

(b) Dα φn → 0 uniformly on K as n → ∞ for all α ∈ Nk0 .

Proof. If (a) is satisfied, then (b) is satisfied, too. So all we need to


do is to prove (a). Assume by contradiction that (a) is not satisfied.
So there exists a sequence (xj )j≥1 in Ω with no cluster point in Ω and
a subsequence (φnj )j≥1 such that φnj (xj ) = 0 for all j ≥ 1. Define a
seminorm p : C0∞ (Ω) → R by

 |φ(x)|
p(φ) = 2 sup , φ ∈ C0∞ (Ω),
|φnj (xj )|
j=1 x∈Kj \Kj−1

where the sequence of compacts K1 ⊂ K2 ⊂ · · · ⊂ Ω satisfies ∪j≥1 Kj =


Ω and xj ∈ Kj \ Kj−1 , j = 1, 2, . . . , K0 = ∅. Clearly, the set V = {φ ∈
C0∞ (Ω); p(φ) < 1} is a neighborhood of 0 ∈ C0∞ (Ω) and none of the
φnj belongs to V , which gives a contradiction.

Obviously, the convergence φn → φ in D(Ω) means that (φn ) satisfies


condition (a) with some compact K ⊂ Ω, and Dα φn → Dα φ uniformly
on K as n → ∞ for all α ∈ Nk0 .

Example 1. For Ω = Rk let φn (x) = n1 ω(x), where ω is the test


function defined by (5.1.1). Then K = B(0, 1) and all derivatives of
φn converge uniformly to 0, so φn → 0 in D(Ω).
112 5 Distributions, Sobolev Spaces

Example 2. For Ω = Rk let ψn (x) = n1 ω( n1 x) for x ∈ Rk . Dα ψn → 0


uniformly as n → ∞ for all α ∈ Nk0 , but there is no K satisfying (a).
In fact supp ψn = B(0, n), therefore ψn does not converge in D(Ω).

5.2 Friedrichs’ Mollification


Friedrichs’ mollification will allow us to associate with “bad functions”
very good approximate functions.1
Consider again the test function ω : Rk → R defined in the previous
section, i.e.,
 
C exp x 12 −1 , x2 < 1,
ω(x) = 2
0, x2 ≥ 1,
 
with C > 0 such that Rk ω(x) dx = B(0,1) ω(x) dx = 1.

Definition 5.2. For ε > 0 define ωε (x) = 1


εk
ω( 1ε x) for all x ∈ Rk .
This is called the mollifier.

The mollifier ωε has the following properties:

1. ωε ∈ C ∞ (Rk ) ;

2. supp ωε = B(0, ε) ;
 
3. Rk ωε (x) dx = B(0,ε) ωε (x) dx = 1 .

Definition 5.3. Let f ∈ L1loc (Rk ), i.e., f is a real measurable function


and f ∈ L1 (K) for any compact K ⊂ Rk . For ε > 0 define fε (x) the
Friedrichs’ mollification of f as

fε (x) = (ωε ∗ f )(x),

where ∗ denotes the convolution product



= ωε (x − y)f (y) dy ,
Rk

1
Kurt Otto Friedrichs, German-American mathematician, 1901–1982.
5.2 Friedrichs’ Mollification 113

and by changing variables,



= ωε (y)f (x − y) dy
R
k

= ωε (y)f (x − y) dy
B(0,ε)

for almost all x ∈ Rk .


If f ∈ L1loc (Ω), then f can be extended as f = 0 for x ∈ Rk \ Ω, and
we can define fε as before.

For ε > 0 and f ∈ L1loc (Rk ), we have


1. fε ∈ C ∞ (Rk ) ;

2. supp fε ⊂ supp f + B(0, ε), i.e., not much larger than supp f ;

3. If f has compact support, so does fε .


Proposition 5.4. If f ∈ C0 (Ω), then fε (x) → f (x) uniformly as
ε → 0+ in Ω, where C0 (Ω) = {u ∈ C(Ω); u has compact (bounded)
support ⊂ Ω}.

Proof. Set K = supp f and K  = K + B(0, ε0 ), where ε0 > 0. Then


supp fε ⊂ K  ⊂ Ω for 0 < ε ≤ ε0 , if ε0 is small enough.
For 0 < ε ≤ ε0 and x ∈ K  ,
 
 

|fε (x) − f (x)| =  f (x − y)ωε (y) dy − f (x)ωε (y) dy 
R Rk
k

≤ |f (x − y) − f (x)| ωε (y) dy .
B(0,ε)

f is continuous on K  , hence uniformly continuous on K  , so for any


η > 0, |f (x − y) − f (x)| < η for all y ∈ B(0, ε) with ε > 0 small. Thus
supx∈Ω |fε (x) − f (x)| ≤ η for all ε > 0 sufficiently small, hence fε → f
uniformly in Ω as ε → 0+ .

Theorem 5.5. If f ∈ Lp (Ω) for some 1 ≤ p < ∞, then (the restriction


to Ω of ) fε is in Lp (Ω) for all ε > 0 and
1. fε Lp (Ω) ≤ f Lp (Ω) for all ε > 0 ,

2. fε − f Lp (Ω) → 0 as ε → 0+ .


114 5 Distributions, Sobolev Spaces

Proof. It suffices to consider Ω = Rk , because we can extend f to Rk


as before, and the two conclusions of the theorem for the extension of
f will imply the same conclusions for f ∈ Lp (Ω).
Consider first the case p = 1, i.e., f ∈ L1 (Rk ). Note that

(x, y) → |f (y)| ωε (x − y) (5.2.3)

is measurable on Rk × Rk and

|f (y)| ωε (x − y) dx = |f (y)| ωε (x − y) dx
Rk Rk
  
=1
= |f (y)|

for almost all y ∈ Rk . Next,



|f (y)| ωε (x − y) dx dy = |f (y)| dy = f L1 (Rk ) < ∞ .
Rk Rk Rk
(5.2.4)
Thus, by Fubini-Tonelli’s Theorem (see, e.g., [51, p. 18]), function
(5.2.3) is a member of L1 (Rk × Rk ) and
 
 
|fε (x)| dx =  ωε (x − y)f (y) dy  dx

Rk
R Rk
k

≤ |f (y)| ωε (x − y) dx dy
Rk
R
k
 
=1
= f L1 (Rk ) ,

so that
fε L1 (Rk ) ≤ f L1 (Rk ) ,
as claimed.
We now consider the case 1 < p < ∞ for the same function (5.2.3).
Then fε ∈ Lp (Rk ) and, denoting by p the conjugate of p (i.e., (1/p) +
(1/p ) = 1), we have

|fε (x)| ≤ |f (y)| ωε (x − y) dy
Rk


= ωε (x − y)1/p ωε (x − y)1/p |f (y)| dy
Rk
5.2 Friedrichs’ Mollification 115

which by Hölder’s inequality

1/p ! "1/p
≤ ωε (x − y) dy ωε (x − y)|f (y)| dy
p

R Rk
k
 
=1

so that

|fε (x)| ≤
p
ωε (x − y)|f (y)|p dy
Rk

and integrating
! "
|fε (x)| dx ≤
p
ωε (x − y)|f (y)| dy dx p
Rk Rk Rk

= |f (y)|p ωε (x − y) dx dy
Rk
R
k
 
=1

= |f (y)|p dy
Rk
= f pLp (Rk )

so that
fε Lp (Rk ) ≤ f Lp (Rk )

which concludes the proof of the first statement of the theorem. Before
we continue the proof of the theorem we shall prove two auxiliary
results.

Lemma 5.6. For all compact K ⊂ Ω there exists an open neighborhood


V of K such that V ⊂ Ω and a continuous map g : Ω → R satisfying

g(x) = 1 for all x ∈ K,


g(x) = 0 for all x ∈ Ω \ V, and
0 ≤ g(x) ≤ 1 for all x ∈ Ω .

Proof. Let K ⊂ Ω be a compact set. Consider δ > 0 small and let V


be δ-neighborhood of K whose closure V lies in Ω. Let W = Ω \ V and
ρ(x) = d(x, W ) := inf w∈W x − w2 which is a continuous function.
116 5 Distributions, Sobolev Spaces

Now let α = inf x∈K ρ(x) > 0, and let g(x) = min{1, α1 ρ(x)} which is
also a continuous function. Clearly g(x) = 1 for x ∈ K, g(x) = 0 for
x ∈ W = Ω \ V , and 0 ≤ g(x) ≤ 1 for x ∈ V \ K.

Lp (Ω)
Lemma 5.7. C0 (Ω) is dense in Lp (Ω) for all 1 ≤ p < ∞: C0 (Ω) =
p p
L (Ω) (i.e., every L (Ω) function can be approximated by C0 (Ω) func-
tions with respect to the usual norm of Lp (Ω)).

Proof. Let u ∈ Lp (Ω). We have u = u+ − u− , where both u+ and u−


are nonnegative Lp (Ω) functions. So, it suffices to consider nonnegative
Lp (Ω) functions u which we approximate by simple functions


m
s= yi χ M i ,
i=1

where the sets Mi ⊂ Ω are mutually disjoint and measurable with


m(Mi ) < ∞, and the χMi are their characteristic functions. Consider
a sequence of simple functions (sn ), such that 0 ≤ sn ≤ u and sn → u
as n → ∞ for almost all x ∈ Ω, so sn → u in Lp (Ω). Thus u can
be approximated by simple functions and so all reduces to approxi-
mating characteristic functions u = χM where M ⊂ Ω is a Lebesgue
measurable set with m(M ) < ∞. In fact, we only need to consider
K ⊂ M compact such that m(M \ K) = m(M ) − m(K) is small (see
Exercise 3.2), so

|χK − χM |p dx = 1 dx = m(M \ K)
Ω M \K

is small. Now, choose V as in Lemma 5.6 such that m(V \ K) < εp ,


then there exists g ∈ C0 (Ω) such that

|g − χK | dx =
p
g dx ≤
p
1 dx = m(V \ K) < εp
Ω V \K V \K

so
g − χK Lp (Ω) < ε .
Thus the characteristic functions u = χM can indeed be approximated
by C0 (Ω) functions.
5.2 Friedrichs’ Mollification 117

Proof of Theorem 5.5, continuation.


Consider f ∈ Lp (Ω) and approximate it using Lemma 5.7: for θ > 0
there exists g ∈ C0 (Ω) such that
θ
f − gLp (Ω) < . (5.2.5)
3
We have
fε − f Lp (Ω) ≤ fε − gε Lp (Ω) + gε − gLp (Ω) + g − f Lp (Ω)

which by the first statement of the theorem is

≤ 2f − gLp (Ω) + gε − gLp (Ω)

so by (5.2.5)
2
< θ + gε − gLp (Ω)
3
which by Proposition 5.4
2
< θ + constant · gε − gC(K  )
3   
< θ3


for all ε > 0 small. Therefore,
lim sup fε − f Lp (Ω) = 0 =⇒ lim fε − f Lp (Ω) = 0 .
ε→0+ ε→0+

This completes the proof.

The following is a fundamental theorem.


Theorem 5.8. Let Ω ⊂ Rk be a nonempty open set. We have
Lp (Ω)
C0∞ (Ω) = Lp (Ω) for all 1 ≤ p < ∞ (i.e., every Lp (Ω) function
can be approximated by test functions).
Proof. Let f ∈ Lp (Ω). By Lemma 5.7 for all η > 0 there exists g ∈
C0 (Ω) such that f − gLp (Ω) < η/2. On the other hand, there is a
gε ∈ C0∞ (Ω) and by Theorem 5.5 gε − gLp (Ω) < η/2 for ε > 0 small.
Therefore,
η η
f − gε Lp (Ω) ≤ f − gLp (Ω) + gε − gLp (Ω) < + = η
2 2
for ε > 0 small.
118 5 Distributions, Sobolev Spaces

Theorem 5.9. If f ∈ L1loc (Ω) is such that



f (x)φ(x) dx = 0 ∀φ ∈ C0∞ (Ω) , (5.2.6)
Ω
then f = 0 a.e. on Ω.

Proof. First of all let us extend (5.2.6) to



f (x)g(x) dx = 0 (5.2.7)
Ω

for all g ∈ L∞ (Ω) such that g vanishes almost everywhere on Ω \ K,


where K ⊂ Ω is a compact set. Obviously, such a function g belongs
in particular to L1 (Ω) and (by Theorem 5.5)
gε − gL1 (Ω) → 0 as ε → 0+ .
Hence, there exists a sequence εj → 0 such that
gεj (x) → g(x) as j → ∞ for a.a. x ∈ Ω . (5.2.8)
Therefore by (5.2.6) we have

f (x)gεj (x)dx = 0 , (5.2.9)
Ω

for j large enough such that supp gεj ⊂ K  , where K  is a compact,


K ⊂ K  ⊂ Ω. We also have
|f (x)gεj (x)| ≤ |f (x)| · |gεj (x)|

≤ |f (x)| |g(y)|ωεj (x − y) dy
Ω
≤ |f (x)| · gL∞ (Ω) ,
for j large enough and for almost all x ∈ K  . So we can apply
the Lebesgue Dominated Convergence Theorem (see also (5.2.8) and
(5.2.9)) to get (5.2.7) for all g ∈ L∞ (Ω) such that g vanishes a.e. on
Ω \ K.
Now choose an arbitrary compact set K ⊂ Ω and let g = sign f · χK .
Then by (5.2.7) we have

f g dx = |f |χK dx = |f | dx = 0,
Ω Ω K
which implies f = 0 for almost all x ∈ K. Since K is arbitrary, f = 0
a.e. on Ω.
5.3 Scalar Distributions 119

5.3 Scalar Distributions


Let Ω ⊂ Rk be a nonempty open set. Recall that C0∞ (Ω), topologized
as the inductive limit of the DK (Ω)’s, where K runs over all compact
subsets of Ω, is denoted by D(Ω) (see Sect. 5.1).

Definition 5.10. A functional u : D(Ω) → R is said to be a (scalar)


distribution (on Ω) if u is linear and continuous, i.e.,

• u(α1 φ1 + α2 φ2 ) = α1 u(φ1 ) + α2 u(φ2 ) for all α1 , α2 ∈ R, and all


φ1 , φ2 ∈ D(Ω);

• φn → φ in D(Ω) implies u(φn ) → u(φ) in R.

In fact, it is enough to consider φ = 0 for the second condition because


of linearity. Let D (Ω) denote the set of all distributions on Ω.

It is easily seen that D (Ω) is a real linear space. Sometimes we shall


write (u, φ) instead of u(φ).
Notice that, in general, a distribution is not defined point-wise on Ω,
unless it is a regular distribution, i.e., a distribution defined by a usual
function, as explained below.

Regular Distributions
Let u ∈ L1loc (Ω) and define ũ : D(Ω) → R by

ũ(φ) = u(x)φ(x) dx ∀φ ∈ D(Ω) .
Ω

Since φ has compact support uφ ∈ L1 (Ω) and so ũ is well defined.


Clearly, ũ is linear and continuous and therefore a distribution. Note
that the mapping i : L1loc (Ω) → D (Ω), i(u) = ũ is injective. Since i is
linear, injectivity can be seen by showing the implication ũ = i(u) =
0 =⇒ u = 0 for a.a. x ∈ Ω. This is indeed the case by Theorem 5.9.
We now simply identify ũ with u and write

u(φ) = u(x)φ(x) dx ∀φ ∈ D(Ω).
Ω

A distribution which arises this way is called a regular distribution.


120 5 Distributions, Sobolev Spaces

The Dirac Distribution2


Let Ω = Rk and define (δ, φ) = δ(φ) = φ(0) for all φ ∈ D(Ω). It is
linear and continuous, so δ ∈ D (Ω). δ is called the Dirac distribution
or delta function, to follow the original denomination, even though it
is not in fact a function.
Claim: The distribution δ is not a regular distribution.

Proof. Suppose, by way of contradiction, that there exists a function


f ∈ L1loc (Rk ) such that

(δ, φ) = f (x)φ(x) dx ∀φ ∈ D(Rk ) .
Rk

This means
f φ dx = φ(0) ∀φ ∈ D(Rk ) ,
Rk
and, in particular,

f φ dx = 0 ∀φ ∈ D(Rk ), supp φ ⊂ Rk \ {0} ,
Rk

hence
f φ dx = 0 ∀φ ∈ D(Rk \ {0}) .
Rk \{0}

Then according to Theorem 5.9, f = 0 for almost all x ∈ Rk \ {0} so


f = 0 for almost all x ∈ Rk , thus φ(0) = (δ, φ) = 0 for all φ ∈ D(Rk )
which is false.

A physical interpretation of δ will be provided later.


For a given x0 ∈ Rk one can define a similar Dirac distribution, denoted
δx0 , by
(δx0 , φ) = φ(x0 ) ∀φ ∈ D(Rk ) .
The Dirac distribution associated with x0 = 0 is precisely δ. Of course,
linear combinations of Dirac distributions are also distributions. In
fact, the space of distributions is a large one, as shown below.

2
Paul Adrien Maurice Dirac, English theoretical physicist, 1902–1984.
5.3 Scalar Distributions 121

5.3.1 Some Operations with Distributions


Besides addition and scalar multiplication there are some further op-
erations we can perform on distributions.

• Multiplication by a C ∞ function.

For u ∈ D (Ω) and a ∈ C ∞ (Ω), define au by

(au, φ) := (u, aφ) ∀φ ∈ D(Ω) .

Note that aφ is still a test function, and au is linear and continuous


on D(Ω), so au ∈ D (Ω).
This is a generalization of the usual multiplication of functions. In-
deed, if u ∈ L1loc (Ω) (i.e., u is a regular distribution), then

(au, φ) = (u, aφ)



= u(aφ) dx
Ω
= (au)φ dx ,
Ω

so (au)(x) = a(x)u(x) for almost all x ∈ Ω.

• Reflection about the origin.

For the sake of simplicity, consider Ω = Rk . Let u ∈ D (Rk ). Some-


times we write u(x) instead of u even though it is not a function. For
example, this helps denote the reflection of u

(u(−x), φ(x)) := (u(x), φ(−x))) dx ∀φ ∈ D(Rk ) .

Clearly u(−x) ∈ D (Ω). Notice that if u ∈ L1loc (Rk ), then u(−x) ∈


D (Rk ) is precisely the regular distribution generated by the function
x → u(−x),

u(−x)φ(x) dx = u(x)φ(−x) dx ∀φ ∈ D(Ω) ,
Rk Rk

and this explains the notation for the reflection of the distribution u.
122 5 Distributions, Sobolev Spaces

• Translation by a vector.
For u ∈ D (Rk ) and h ∈ Rk , define u(x + h) by
(u(x + h), φ(x)) := (u(x), φ(x − h)) ∀φ ∈ D(Ω) .
It is clear that u(x + h) ∈ D (Rk ). Again, the notation u(x + h) is
justified by the case when u is a locally integrable function.
Note that the Dirac distribution δx0 defined before is precisely δ(x−x0 )
in terms of the above notation.

5.3.2 Convergence in Distributions


Let (un )n∈N be a sequence in D (Ω). We say that (un ) converges in
D (Ω) if there exists u ∈ D (Ω) such that
lim (un , φ) = (u, φ) ∀φ ∈ D(Ω) . (5.3.10)
n→∞

In fact, D (Ω) is sequentially complete, so if (5.3.10) holds then u is


automatically in D (Ω). More precisely,

Claim: If (un , φ) is convergent for all φ ∈ D(Ω), then the functional


u : D(Ω) → R defined by (5.3.10) is linear and continuous.

Proof. While the linearity of u follows trivially from (5.3.10), its con-
tinuity is not immediate, see Gel’fand and Shilov [17].3 Assume that
u is not continuous, i.e., there exists a sequence φn → 0 in D(Ω) such
that, on a subsequence again denoted φn , we have

|u(φn )| ≥ δ > 0 ∀n . (5.3.11)

Choosing another subsequence, we may assume that for all n


1
sup |Dα φn (x)| < ∀ |α| ≤ n . (5.3.12)
x∈Ω 22n
We consider ψn = 2n φn . By (5.3.12) we get

ψn → 0 in D(Ω) . (5.3.13)

On the other hand (see (5.3.11)),

|u(ψn )| ≥ 2n δ → ∞ . (5.3.14)

3
Israel M. Gel’fand, Russian mathematician, 1913–2009; Georgiy E. Shilov,
Russian mathematician, 1917–1975.
5.3 Scalar Distributions 123

Let us now extract new subsequences, say (ũn ) and (ψ̃n ). In view
of (5.3.14) we can pick a ψ̃1 such that |(u, ψ̃1 )| > 1. Thus, by virtue
of (5.3.10), we can choose ũ1 such that |(ũ1 , ψ̃1 )| > 1. Now, assuming
that ũj and ψ̃j have been chosen for j = 1, 2, . . . , n − 1, we can pick
(by the continuity of the ũj ’s and by (5.3.14)) a test function ψ̃n such
that
1
|(ũk , ψ̃n )| < n−k , k = 1, 2, . . . , n − 1 , (5.3.15)
2

n−1
|(u, ψ̃n )| > |(u, ψ̃j )| + n + 1 . (5.3.16)
j=1

Taking into account (5.3.10) and (5.3.16), we can pick ũn such that


n−1
|(ũn , ψ̃n )| > |(ũn , ψ̃j )| + n + 1 . (5.3.17)
j=1

So by induction we obtain (ũn ) and (ψ̃n ) satisfying (5.3.15) and


(5.3.17). Set
∞
ψ= ψ̃n .
n=1

Since (ψ̃n ) is a subsequence of (ψn ) the above series is convergent in


D(Ω) (see (5.3.12)), hence ψ ∈ D(Ω). Now, let us estimate |(ũn , ψ)|
by using the decomposition

(ũn , ψ) = (ũn , ψ̃j ) + (ũn , ψ̃n ) . (5.3.18)
j =n

From (5.3.15) we get



 ∞
 1
|(ũn , ψ̃j )| < = 1. (5.3.19)
2j−n
j=n+1 j=n+1

Finally, from (5.3.17), (5.3.18), and (5.3.19) we see that

|(ũn , ψ)| > n ,

which contradicts (5.3.10).

As an example, consider the sequence of Friedrichs mollifiers un =


ω1/n , i.e., un (x) = nk ω(nx) for x ∈ Ω = Rk , n ∈ N, where ω is
124 5 Distributions, Sobolev Spaces

the test function defined before in (5.1.1). The graphs of the un ’s for
k = 1 or k = 2 can be visualized in corresponding coordinate systems
to observe the behavior of the un ’s as n gets larger and larger. The
pointwise limit of (un ) is as follows:

0 x = 0
lim un (x) =
n→∞ +∞ x = 0

which is not an R-valued function. On the other hand, viewing the


un ’s as distributions, we have un → δ in D (Rk ):

(un , φ) = un (x)φ(x) dx
R
k

= un (x)φ(x) dx
1
B(0, n )

→ φ(0) = (δ, φ) ,

for all φ ∈ D(Rk ).

Physical Interpretation of the Dirac Distribution: The Dirac


distribution represents, for instance, the density of a unit mass con-
centrated at some point. To explain that, let us suppose that a unit
mass, which is concentrated at the origin of a coordinate system in R3 ,
is distributed uniformly in B(0, 1/n) ⊂ R3 . Thus the corresponding
mass density is given by

3n3
4π x2 ≤ n1 ,
δn (x) =
0 otherwise,

and obviously the total mass δn dx = 1. For n → ∞ the mass
concentrates in x = 0. Obviously, δn (x) → 0 as n → ∞ for all x = 0,
and δn (0) → +∞, so δn does not converge pointwise to a function.
However, δn → δ in D (R3 ) as n → ∞ :

3n3
(δn , φ) = φ(x) dx → φ(0) = (δ, φ) ∀φ ∈ D (R3 ) ,
4π 1
B(0, n )

so δ can indeed be interpreted as the density of an idealized point


mass.
5.3 Scalar Distributions 125

5.3.3 Differentiation of Distributions


For u ∈ C 1 (Ω) and φ ∈ D(Ω) one can write
∂u  ∂u
,φ = φ dx
∂xi ∂x
Ω i
∂u
= φ dx , i = 1, . . . , k
supp φ ∂xi

and switching to a rectangular cell including supp φ to ease computa-


tion

∂u
= φ dx
cell ∂xi

and integrating by parts



∂φ
=− u dx
∂xi
cell
∂φ
=− u dx
Ω ∂xi
∂φ 
= − u, .
∂xi

Hence, if u ∈ C 1 (Ω) we have


∂u  ∂φ 
, φ = − u, ∀φ ∈ D(Ω), i = 1, . . . , k. (5.3.20)
∂xi ∂xi
If u is an arbitrary distribution, then we use (5.3.20) as the definition
∂u
of ∂x i
which is also an element of D (Ω). Whenever u is a smooth
function, its distributional derivative defined by (5.3.20) coincides with
the classical derivative of u.
Since u ∈ D (Ω) implies ∂x ∂u
i
∈ D (Ω) for i = 1, . . . , k, we deduce by
induction that every distribution u ∈ D (Ω) is infinitely differentiable,
and we have

(Dα u, φ) = (−1)|α| (u, Dα φ) ∀φ ∈ D(Ω), α = (α1 , . . . , αk ) ∈ Nk0 .


(5.3.21)
By convention D(0,0,...,0) u = u. It is clear from (5.3.21) that mixed
derivatives in the sense of distributions do not depend on the order of
differentiation.
126 5 Distributions, Sobolev Spaces

Let us now discuss some examples.


Example 1. Consider the Heaviside function H, defined on Ω = R
by 
1 x ≥ 0,
H(x) =
0 x < 0.

We use Ḣ to indicate the pointwise derivative, and H  for the derivative


in D (R). Obviously Ḣ(x) = 0 for x = 0, hence Ḣ = 0 a.e. On the
other hand, H  = δ: for all φ ∈ D(R) we have

(H  , φ) = −(H, φ̇)

=− H(x)φ̇(x) dx
−∞

=− φ̇(x)dx
0
x=∞
= −φ(x)x=0

and since φ has compact support

= φ(0) = (δ, φ).

So, if u is not smooth, the distributional derivative may not coincide


with the pointwise derivative.

Example 2. Consider u = u(x1 , x2 ) : Ω = R2 → R,



x1 x2 ≥ 0,
u = x1 H(x2 ) =
0 x2 < 0.

By a straightforward computation we find that the distributional deri-


vative D(1,0) u = H(x2 ), which coincides with the classical partial
derivative ∂u/∂x1 . On the other hand,

∂φ
(D (0,1)
u, φ) = − u dx1 dx2
R2 ∂x2
+∞ ! +∞ "
∂φ
=− x1 dx2 dx1
−∞ 0 ∂x2
+∞
= x1 φ(x1 , 0) dx1 ∀φ ∈ D(R2 ).
−∞
5.3 Scalar Distributions 127

Note that D(0,1) u is not a regular distribution. Indeed, assuming the


contrary, we obtain D(0,1) u = 0 almost everywhere in R2 by using
test functions with support in R × (0, +∞) and in R × (−∞, 0), while
D(0,1) u cannot be zero. So D(0,1) u is different from the classical partial
derivative ∂u/∂x2 (which is zero almost everywhere).

Example
 3. Let Ω = R3 and consider u = 1/r, where r = x2 =
x21 + x22 + x23 . We want to calculate

3
∂2u
Δu = ,
i=1
∂x2i

where the derivatives are in the sense of distributions. The operator


Δ is called the Laplace operator (or Laplacian).4 Note that u is not
an element of L1loc (R3 ) (because of a singularity at the origin) so that
we cannot define the distribution Δu directly. We replace u with

1
r ≥ n1 ,
un = r
0 r < n1 ,

which belongs to L1loc (R3 ), for all n ∈ N. For any test function φ ∈
D(R3 ) we have
(Δun , φ) = (un , Δφ)

and since un is regular



Δφ
= dx .
1
r≥ n r

We wish to accept

1 Δφ
(Δu, φ) = (Δ , φ) = lim dx
r n→∞ 1
r≥ n r

as the definition of Δu, but of course we must show that this limit
exists. For a fixed φ ∈ D(R3 ), define the spherical shell
1
Sn = {x ∈ R3 ; ≤ r ≤ a}
n

4
Pierre-Simon Laplace, French mathematician and astronomer, 1749–1827.
128 5 Distributions, Sobolev Spaces

where a is large enough that supp φ ⊂ B(0, a). We then use the second
Green formula5 (see, for example, [14, p. 628]) to deduce that
# $ # $
 1 1  1 1
Δφ − φΔ dx = Δφ − φ Δ dx
r≥ 1 r r Sn r r

n
=0

and changing the direction of the normal and consequently the sign in
the double integral below (we can ignore the outer edge of the shell
since φ vanishes there),
# $
∂φ 1 ∂ 1
=− −φ dσ
1
r= n ∂r r ∂r r

1
and since r = n on the edge

∂φ
= −n dσ − n2 φ dσ
1
r= n ∂r 1
r= n

∂φ
and because ∂r is bounded
1
= −nO 2 − n 2
φ dσ
n r= 1 n

which as n → ∞ becomes

= −4πφ(0)
= −4π(δ, φ).

So, the limit exists and



Δφ
lim dx = −4π(δ, φ).
n→∞ 1
r≥ n r

Hence
1
(Δ , φ) = −4π(δ, φ) ∀φ ∈ C0∞ (R3 ) ,
r
that is to say, Δ 1r = −4πδ.

5
George Green, British mathematical physicist, 1793–1841.
5.3 Scalar Distributions 129

This result can be easily generalized to higher dimensions by showing


that for k ≥ 3, 1 
Δ k−2 = −(k − 2)ak δ
r

in D (R ), where ak is the “area” of the unit hyper-sphere in Rk .
k

Also, for k = 2 we have


1
Δ ln = −2πδ
r
in D (R2 ), so that defining for k ≥ 2

⎨− (k−2)ak rk−2
⎪ k ≥ 3,
1 1

E(x) =

⎩ 1
− 2π ln 1r k = 2,
we have
ΔE = δ in D (Rk ) . (5.3.22)
E is called the fundamental solution of the Laplacian Δ. In particular,
it can be used to find a solution to the Poisson equation6
Δu = f (x), x ∈ Rk . (5.3.23)
Assume that f ∈ L∞ (Rk ) and vanishes almost everywhere outside a
compact set. Then the function

u(x) = (E ∗ f )(x) = E(x − y)f (y) dy (5.3.24)
Rk
is well defined (since E is locally summable) and satisfies
Eq. (5.3.23). Indeed, we first notice that for all y ∈ Rn and φ ∈
C0∞ (Rk )

E(x − y)Δφ(x) dx = (E(x − y), Δφ(x))
Rk
= (Δx E(x − y), φ(x))

and taking into account (5.3.22)

= (δ(x − y), φ(x))


= (δ(x), φ(x + y))
= φ(y).
6
Siméon Denis Poisson, French mathematician, engineer, and physicist, 1781–
1840.
130 5 Distributions, Sobolev Spaces

Now,

(Δu, φ) = (u, Δφ)



= u(x)Δφ(x) dx
R !
k
"
= E(x − y)f (y) dy Δφ(x) dx
Rk Rk
! "
= f (y) E(x − y)Δφ(x) dx dy
Rk Rk

and using the last relation above



= f (y)φ(y) dy
Rk
= (f, φ) ∀φ ∈ D(Rk ),

which implies Δu = f , as claimed.


Remark 5.11. We point out (without proof) the following result known
as Weyl’s regularity lemma7 (see, e.g., [47]): if ∅ = Ω ⊂ Rk is open,
f ∈ C ∞ (Ω), u ∈ D (Ω) and Δu = f in D (Ω), then u ∈ C ∞ (Ω).

The following result says that differentiation in D (Ω) is a continuous


operation.
Proposition 5.12. Suppose that ∅ = Ω ⊂ Rk is open. If un → u in
D (Ω), then ∂u 
∂xi → ∂xi in D (Ω) for all i = 1, . . . , k.
n ∂u

Proof. Let un → u in D (Ω). We have for all φ ∈ D(Ω)


∂u  ∂φ 
n
, φ = − un ,
∂xi ∂xi
∂φ
and since ∂xi is a test function, as n → ∞ the right-hand side converges
to
∂φ 
= − u,
∂xi
∂u 
= ,φ .
∂xi
7
Hermann Weyl, German mathematician, theoretical physicist, and philoso-
pher, 1885–1955.
5.3 Scalar Distributions 131

Remark 5.13. As an immediate consequence of Proposition 5.12, if


un → u in D (Ω), then Dα un → Dα u in D (Ω) for all α = (α1 , . . . , αk ) ∈
Nk0 with |α| > 0.

Series in D (Ω)
Suppose (un )n∈N is a sequence in D (Ω). Then we can associate with
this sequence the series

u1 + u2 + · · · + un + · · ·

and say that it converges in D (Ω) if the sequence of partial sums

sn = u1 + · · · + un

converges, sn → u in D (Ω), and write

u1 + u2 + · · · + un + · · · = u .

By Remark 5.13, sn → u implies Dα sn → Dα u in D (Ω) for all α,


hence we can differentiate the series term by term as many times as
we wish, i.e.,

D α u1 + D α u2 + · · · + D α un + · · · = D α u

in D (Ω). This is not the case in classical analysis. For example,


with Ω = R, un (x) = n1 sin(nx) converges uniformly to 0 as n → ∞
(and uniform convergence implies convergence in D (Ω)), but un (x) =
cos(nx) which does not converge, even pointwise. However, it does
converge in D (R). In fact, un → 0 as n → ∞ in D (R) for all
(j)

j = 1, 2, . . . .

5.3.4 Differential Equations for Distributions


Consider Ω = R, u, b ∈ D (R) and smooth functions a1 , a2 , . . . , an ∈
C ∞ (R). Then, if u(j) indicates the j-th derivative of u in D (R), the
differential equations

u(n) + a1 u(n−1) + · · · + an−1 u(1) + an u = 0 (E0 )


u (n)
+ a1 u (n−1)
+ · · · + an−1 u (1)
+ an u = b (E)

make sense. Classically, there are nice solutions u to (E0 ) and they
are solutions in the sense of distributions as well. In fact, there are
132 5 Distributions, Sobolev Spaces

no other solutions to (E0 ) in D (R) as long as ai ∈ C ∞ (R), as proven


below.
The equation u = 0 has constant solutions C in D (R) since for all
φ ∈ D(R)

 
(C , φ) = −(C, φ ) = −C φ dt = 0,
−∞

But, are constant functions the only solutions to the equation u = 0


in the sense of distributions? We answer this question in the following
way. If u ∈ D (R) and u = 0 in D (R), we have
0 = (u , φ) = −(u, φ ) ∀φ ∈ D(R). (5.3.25)
Given φ ∈ D(R) define

ψ(t) = φ(t) − ω(t) φ(s) ds
−∞
for all t ∈ R with ω= ω(t) defined as in (5.1.1) (where k = 1). Note
+∞
that ψ ∈ D(R) and −∞ ψ dt = 0. Define
t
φ1 (t) = ψ(s) ds, t ∈ R ,
−∞

and notice that φ1 ∈ D(R) and φ1 = ψ. Now for all φ ∈ D(R)
! ∞ "
(u, φ) = (u, ψ) + φ(s) ds (u, ω)
−∞   
constant


= (u, φ1 ) + Cφ ds
−∞

and according to (5.3.25)

= (C, φ) ,
thus u = C. Therefore, any distributional solution of the equation
u = 0 is a constant distribution (i.e., a distribution generated by a
constant function).

Now consider the linear differential system




⎪ u1 = a11 u1 + a12 u2 + · · · + a1n un


⎨u = a21 u1 + a22 u2 + · · · + a2n un
2
.. (5.3.26)

⎪ .


⎩ 
un = an1 un + an2 u2 + · · · + ann un
5.3 Scalar Distributions 133
 
where the aij ∈ C ∞ (R). Denoting by A(x) the matrix aij (x) and by
u the column vector (u1 , . . . , un )T , we can rewrite (5.3.26) as

u = Au . (5.3.27)

Let X = X(t) be a fundamental matrix of the system (5.3.26). We


know from the classical theory of linear differential systems (see, e.g.,
[8, 11]) that X is invertible and X  = AX for all t ∈ R. Consider the
transformation u = Xz then in the sense of distributions

u = X  z + Xz 
= AXz + Xz 

and by (5.3.27)

= AXz.

Hence Xz  = 0, so having in mind the fact that X is invertible, we


deduce that z  = 0. We have denoted by u and z  the column vectors
whose components are the distributional derivatives of u1 , . . . , un and
z1 , . . . , zn , respectively. As z  = 0, z must be a constant vector z = c ∈
Rn and we find u = Xc. Therefore, there are no solutions in D (Rn )
to system (5.3.26) other than the classical ones.
Finally, consider the homogeneous Eq. (E0 ). Since it can be writ-
ten in the vector form (5.3.27) which has only classical solutions, so
does (E0 ). The non-homogeneous case (E) has a general solution
which is obtained by adding to the general solution of (E0 ) a particu-
lar solution to (E) in the sense of distributions. Indeed, if up ∈ D (R)
is such a particular solution, and u is an arbitrary solution in D (R)
of (E), then u − up is a (classical) solution of (E0 ), hence a linear
combination of the functions belonging to the fundamental system of
solutions.

Example. Consider in D (R) the differential equation

u − 2u + u = 2δ(t − 1) , t ∈ R. (5.3.28)

In order to solve this equation, we first notice that if u is a distribu-


tional solution of it, then

u − 2u + u = 0 in D (Ωi ), i = 1, 2,


134 5 Distributions, Sobolev Spaces

where Ω1 = (−∞, 1), Ω2 = (1, +∞). Therefore, u is a classical solution


of the corresponding homogeneous equation within each of these two
intervals, i.e., u is a function (regular distribution) of the form

(c1 t + c2 )et , t ∈ (−∞, 1),
u(t) =
(c3 t + c4 )et , t ∈ (1, +∞),

where c1 , c2 , c3 , c4 are real constants. Not all these functions u are so-
lutions of the given differential equation. The fact that such a function
u is a solution means
1
  
uφ + 2uφ + uφ dt
−∞

  
+ uφ + 2uφ + uφ dt = 2φ(1) ∀φ ∈ D(R).
1

Integrating by parts and bearing in mind that u is a classical solution


of the homogeneous equation in (−∞, 1) and also in (1, ∞), plus the
fact that φ(1) and φ (1) can be any real numbers, we obtain

u(1 + 0) = u(1 − 0), u̇(1 + 0) − u̇(1 − 0) = 2,

so
c3 = c1 + 2e−1 , c4 = c2 − 2e−1 .
Thus the general solution of the given equation is

t 2(t − 1)et−1 , t > 1,
u(t) = (c1 t + c2 )e +
0, t < 1.

i.e., u(t) = (c1 t + c2 )et + 2(t − 1)et−1 H(t − 1).


It is worth pointing out that there is no classical solution of the given
equation, more precisely there is a jump at t = 1 in the first derivative
of any solution (which is caused by the Dirac distribution in the right-
hand side of the equation).
Remark 5.14. Note that in equation (E) above the coefficient of u(n) is
1, i.e., we do not have any singularity in the coefficient of the leading
term. Otherwise, some difficulties may occur. For example, consider
the simple equation
tu = 0 in D (R). (5.3.29)
5.3 Scalar Distributions 135

If u is a distributional solution of (5.3.29), then it must be constant


in (−∞, 0) and in (0, ∞) as well. So the general solution is

u(t) = c1 + c2 H(t), t ∈ R,

where c1 , c2 are real constants. Note that in this case there are two
independent solutions (e.g., u1 (t) = 1, u2 (t) = H(t)), even if the given
equation is of order one.

Now, in order to illustrate the need for distributions in solving prob-


lems associated with partial differential equations, consider the follow-
ing examples:

Example 1.
Consider the equation of an infinite vibrating string with no external
force acting on it

utt − uxx = 0, (t, x) ∈ R2 , (5.3.30)

with some conditions at t = 0, say,

u(0, x) = ψ(x), ut (0, x) = 0, x ∈ R, (5.3.31)

where
∂u ∂2u ∂2u
ut := , utt := 2 , uxx := .
∂t ∂t ∂x2
First assume that ψ ∈ C 2 (R). Recall that using the change of variables

α = x + t, β = x − t ,

Eq. (5.3.30) can be reduced to the equation

uαβ = 0 .

So it is easily seen that any solution of the Eq. (5.3.30) has the form
u = g(x+t)+h(x−t), and so applying (5.3.31), we find the D’Alembert
formula8
1 
u= ψ(x + t) + ψ(x − t) (D’Alembert’s formula) . (5.3.32)
2
8
Jean-Baptiste le Rond d’Alembert, French mathematician, mechanician,
physicist, philosopher, and music theorist, 1717–1783.
136 5 Distributions, Sobolev Spaces

Clearly, u is a C 2 function. It is the unique classical solution of prob-


lem (5.3.30), (5.3.31).
On the other hand, assuming that ψ ∈ C 1 (R), then u given by (5.3.32)
is no longer a classical solution of Eq. (5.3.30). However, this u still
satisfies conditions (5.3.31).
Now, assume that ψ ∈ C(R). In this case, the function u given
by (5.3.32) only satisfies classically the condition u(0, x) = ψ(x), x ∈
R. However, it should be some relation between this u and prob-
lem (5.3.30), (5.3.31). Indeed, we can show that this u satisfies (5.3.30)
and the condition ut (0, x) = 0 (x ∈ R) in a weak sense, that is in the
sense of distributions.
If ψ  , ψ  denote the first and second derivative of ψ in D (R), then it
is easily seen that

D(1,0) ψ(x + t) = D(0,1) ψ(x + t) = ψ  (x + t) in D (R2 ) ,


D(2,0) ψ(x + t) = D(0,2) ψ(x + t) = ψ  (x + t) in D (R2 ) .

Similarly,

D(0,1) ψ(x − t) = ψ  (x − t) = −D(1,0) ψ(x − t) in D (R2 ) ,


D(2,0) ψ(x − t) = D(0,2) ψ(x − t) = ψ  (x − t) in D (R2 ) .

Consequently, for all φ ∈ D(R2 ), we have


 (2,0)  1   
D u(t, x), φ(t, x) = ψ (x + t) + ψ  (x − t), φ(t, x)
2
 (0,2) 
= D u(t, x), φ(t, x) ,

which shows that u given by (5.3.32) satisfies Eq. (5.3.30) in the sense
of distributions:

D(2,0) u − D(0,2) u = 0 in D (R2 ) .

We also have
1  
D(1,0) u(t, x) = ψ (x + t) − ψ  (x − t)
2
= 0, if t = 0 .
5.3 Scalar Distributions 137

Example 2.
Here we discuss the boundary controllability of the 1-dimensional wave
equation describing the vibrations of a finite string. Specifically, let us
consider the following initial-boundary value problem:


⎨utt − uxx = 0, 0 < x < 1, t > 0,
u(t, 0) = 0, u(t, 1) = f (t), t > 0, (5.3.33)

⎩ 0 1
u(0, x) = u (x), ut (0, x) = u , 0 < x < 1,

where u0 , u1 ∈ L1 (0, 1).


We shall prove that

∃T > 0, ∀(u0 , u1 ) ∈ L1 (0, 1)2 , ∃f ∈ L1loc [0, ∞), ∀t > T, u = 0,

where u is the corresponding solution of problem (5.3.33). In fact, we


shall see that there exists a lowest time instant T with this property,
precisely T = 2. Obviously, any T > 2 satisfies the same property.
This result is in accordance with similar results previously obtained
by other authors by using different arguments (see, e.g., [34, p. 57]).
Our direct approach is more advantageous since it provides the solution
u (in a generalized sense, under weak assumptions on the data) as a
function of u0 , u1 , and f and allows us to determine the minimal time
interval (0, 2) and an explicit control function f (depending on u0 and
u1 ) which steers the solution u to zero. It may happen that this direct
approach is known, but we could not find anything about it in the
literature. Nevertheless, we present it here as a nice application and
do not claim originality.
Existence of Solutions to Problem (5.3.33)
Denote R = {(t, x); t ≥ 0, 0 ≤ x ≤ 1}. Consider in the first instance
that u = u(t, x) is a classical solution of problem (5.3.33) corresponding
to regular u0 , u1 , and f . Obviously, the solution of the above wave
equation has the general form

u(t, x) = g(x + t) + h(x − t) . (5.3.34)

From the initial and boundary conditions we can determine


g(x + t) and h(x − t) (hence u = u(t, x)) within different subsets
(triangles or squares) of R, as follows:
From the initial conditions we get
x
g(x) + h(x) = u (x), g(x) − h(x) =
0
u1 + c, 0 < x < 1 ,
0
138 5 Distributions, Sobolev Spaces

where c is a real constant, hence


⎧  0 x 1 
⎪ 1
⎨g(x) = 2 u (x) + 0 u + c , 0 < x < 1,
(5.3.35)

⎩  x 
h(x) = 1
2 u0 (x) − 0 u1 −c , 0 < x < 1.

From the boundary conditions we obtain

g(t) + h(−t) = 0, t > 0 , (5.3.36)

and
g(1 + t) + h(1 − t) = f (t), t > 0 . (5.3.37)
Now (5.3.36) yields (see also (5.3.35))

h(−t) = −g(t)
t
1 0 
= − u (t) + u1 + c , 0 < t < 1 . (5.3.38)
2 0

From (5.3.37) we derive (see also (5.3.35))

g(1 + t) = f (t) − h(1 − t)


1−t
1 0 
= f (t) − u (1 − t) − u1 − c , 0 < t < 1 . (5.3.39)
2 0

We also have for 0 < t < 1

h(−t − 1) = −g(1 + t)

1 1−t 
= −f (t) + u0 (1 − t) − u1 − c , (5.3.40)
2 0

g(2 + t) = f (1 + t) − h(−t)
t
1 0 
= f (1 + t) + u (t) + u1 + c , (5.3.41)
2 0

h(−t − 2) = −g(t + 2)
! t "
1
= −f (1 + t) − 0
u (t) + 1
u +c , (5.3.42)
2 0
5.3 Scalar Distributions 139

g(3 + t) = f (2 + t) − h(−t − 1)
! 1−t "
1
= f (2 + t) + f (t) − u (1 − t) −
0
u − c , (5.3.43)
1
2 0

and so on. By using the above formulas we can determine u = u(t, x)


in R. We decompose R into triangles and squares as in Fig. 5.1.

x
1

1

0

2
t=


t=

t=

t=


x


x

x
1
1≤x+t≤2 2≤x+t≤3 3≤x+t≤4
0≤x−t≤1 −1 ≤ x − t ≤ 0 −2 ≤ x − t ≤ −1

C F I
0≤x+t≤1 1≤x+t≤2 2≤x+t≤3 3≤x+t≤4
0≤x−t≤1 −1 ≤ x − t ≤ 0 −2 ≤ x − t ≤ −1 −3 ≤ x − t ≤ −2

A D G J
B E H
−1 ≤ x − t ≤ 0 0 ≤ x − t ≤ −1 −3 ≤ x − t ≤ −2
0≤x+t≤1 1≤x+t≤2 2≤x+t≤3
t
0 1 2 3
x

x
+

+
t=

t=

t=

t=
0

Figure 5.1: Regions of the plane

We first determine g(x + t) in the triangle A ∪ B (i.e., the intersection


of R and the strip {0 ≤ x + t ≤ 1}) (see (5.3.35)):
! x+t "
1 0 1
g(x + t) = u (x + t) + u +c . (5.3.44)
2 0

Now let us determine g(x + t) in the parallelogram C ∪ D ∪ E (i.e., the


intersection of R and the strip {1 ≤ x + t ≤ 2}) (see (5.3.39)):
 
g(x + t) = g (x + t − 1) + 1
1 0
= f (x + t − 1) − u (2 − x − t)
2"
2−x−t
− u1 − c . (5.3.45)
0
140 5 Distributions, Sobolev Spaces

Note that choosing f : (0, 1) → R,


1−y
1 0 
f (y) = u (1 − y) − u1 − c ∀y ∈ (0, 1),
2 0

implies g(x + t) = 0 in C ∪ D ∪ E.

For (t, x) ∈ F ∪ G ∪ H (i.e., 2 ≤ x + t ≤ 3, 0 ≤ x ≤ 1) we have


(see (5.3.41))
 
g(x + t) = g (x + t − 2) + 2
1 0
= f (x + t − 1) + u (x + t − 2)
2"
x+t−2
+ u1 + c . (5.3.46)
0

If we choose f : (1, 2) → R,
! z−1 "
1
f (z) = − u (z − 1) +
0 1
u +c ∀z ∈ (1, 2) ,
2 0

then g(x + t) = 0 in F ∪ G ∪ H.

In what follows we shall try to determine h(x − t).


First, in the triangle A ∪ C (see (5.3.35))
! x−t "
1
h(x − t) = u (x − t) −
0
u −c .
1
(5.3.47)
2 0

Next, for (t, x) ∈ B ∪ D ∪ F we have


 
h(x − t) = h − (t − x)
= −g(t − x)
! t−x "
1
= − u (t − x) +
0 1
u +c . (5.3.48)
2 0

For (t, x) ∈ E ∪ G ∪ I we have


 
h(x − t) = h − (t − x − 1) − 1
1
= −f (t − x − 1) + u0 (2 + x − t)
2
2+x−t

− u1 − c . (5.3.49)
0
5.3 Scalar Distributions 141

Observe that if f : (0, 1) → R is the function defined above then


h(x − t) = 0 in E ∪ G ∪ I.

For (t, x) ∈ H ∪ J we have


 
h(x − t) = h − (t − x − 2) − 2
1
= −f (t − x − 1) − u0 (t − x − 2)
2
t−x−2

+ u1 + c . (5.3.50)
0

With f : (1, 2) → R as defined before, we have h(x − t) = 0 in H ∪ J.


In fact all the above computations are valid for u0 , u1 ∈ L1 (0, 1) and
f ∈ L1loc [0, ∞).
These calculations lead to the following theorem:

Theorem 5.15. For any u0 , u1 ∈ L1 (0, 1) and f ∈ L1loc [0, ∞) (i.e., f


is Lebesgue summable on (0, m) for all m > 0) problem (5.3.33) has
a unique weak solution u. If u0 ∈ C[0, 1], u1 ∈ L1 (0, 1), f ∈ C[0, ∞),
and the following compatibility conditions are satisfied:

u0 (0) = 0, u0 (1) = f (0) , (5.3.51)

then u = u(t, x) ∈ C([0, ∞) × [0, 1]).

Proof. Using the above computations we construct u(t, x) = g(x + t) +


h(x − t). Obviously, u satisfies the wave equation in the distribution
sense on the interior of each of the sets A, B, C, D, E, F, G, and so
on. In this sense, u is a weak solution of the wave equation. By
construction, the initial and boundary conditions are also satisfied. It
is easily seen that the constant c disappears when constructing u =
g(x + t) + h(x − t) in A, B, C, . . . so the solution u is unique.
If u0 ∈ C[0, 1], u1 ∈ L1 (0, 1), f ∈ C[0, ∞) and u0 , f satisfy (5.3.51),
then u is continuous on [0, ∞) × [0, 1]. It suffices to observe that u is
continuous on the characteristic lines {x − t = i}, i = 0, 1, . . . , and
{x + t = −j}, j = 1, 2 . . . , restricted to the infinite strip R.

Remark 5.16. For higher regularity of u one needs to assume more


regularity of the data and additional compatibility conditions.
142 5 Distributions, Sobolev Spaces

Exact Boundary Controllability


A careful analysis of the above computations shows that there are pairs
(u0 , u1 ) for which there are no functions f : (0, T ) → R, T < 2, making
u = 0 in the trapezoid A ∪ B ∪ C ∪ D ∪ F . In other words, the waves
cannot be controlled in [0, T ] if T < 2. On the other hand, we have

Theorem 5.17. For any pair (u0 , u1 ) ∈ L1 (0, 1)×L1 (0, 1) there exists
a control function f : (0, +∞) → R defined by
⎧  


1
u 0 (1 − y) + 1−y u1 − c , y ∈ (0, 1),

⎪ 2 0



⎨ 

f (y) = − 1 u0 (y − 1) + y−1 u1 + c , y ∈ (1, 2), (5.3.52)

⎪ 2 0






0, y > 2,

1
with c = −u0 (1) − 0 u1 , which makes u = 0 in the infinite trapezoid
{x ≤ t − 1} ∩ {0 < x < 1}.

Proof. The proof follows easily from the computations performed above,
including the remarks on the regions where g(x+t) = 0 or h(x−t) = 0
(based on the fact that f is that given in (5.3.52)). Of course, u van-
ishes in {x < t − 1} ∩ {0 < x < 1} since f (t) = 0 for t > 2.

Remark 5.18. Let us emphasize that, if f is chosen as in (5.3.52), then


the corresponding (unique) solution u vanishes starting from the line
segment defined as the intersection of R and the characteristic line
{x − t = −1} and remains zero everywhere on the right side of that
segment, which can be interpreted as a threshold. So the waves can
be controlled in the minimal time interval (0, 2) and in fact in any
interval (0, T ) with T ≥ 2.
Remark 5.19. While the solution u is unique in Theorem 5.15, the
control function f is not since the constant c in (5.3.52) can be chosen
arbitrarily. Indeed, the restriction of f to the interval (0, 2) is unique
up to an additive constant,
 1 as follows from the computations above.
We chose c = −u0 (1) − 0 u1 in Theorem 5.17 in order to obtain a
continuous control function f .
5.4 Sobolev Spaces 143

5.4 Sobolev Spaces


Let ∅ = Ω ⊂ Rk be an open set. For m ∈ N, 1 ≤ p ≤ ∞ define the
Sobolev space of order m to be9

W m,p (Ω) = {u ∈ Lp (Ω); Dα u ∈ Lp (Ω) ∀α ∈ Nk0 , 0 < |α| ≤ m} ,

where the derivatives Dα u are considered in the sense of distributions.


Obviously, W m,p (Ω) is a linear space with respect to the usual opera-
tions of addition and scalar multiplication. In particular,
 )
∂u
W (Ω) = u ∈ L (Ω);
1,p p
∈ L (Ω) ∀i = 1, . . . , k ,
p
∂xi
∂u
where ∂x i
is the partial derivative of u with respect to xi in the sense
of distributions.

Theorem 5.20. For all m ∈ N, 1 ≤ p ≤ ∞, W m,p (Ω) is a real Banach


space with respect to the norm
 1/p
um,p = Dα upLp (Ω) , 1 ≤ p < ∞,
|α|≤m

um,∞ = max Dα uL∞ (Ω) , p = ∞.


|α|≤m

Proof. Obviously,  · m,p is a norm for all 1 ≤ p ≤ ∞.


Let (un )n∈N be a Cauchy sequence in W m,p (Ω), i.e., for all ε > 0 there
exists N = N (ε) ∈ N such that

un − um m,p < ε for all n, m > N .

It follows that (Dα un ) is Cauchy in Lp (Ω) for all α ∈ Nk0 , |α| ≤ m.


Since Lp (Ω) is a Banach space (with respect to  · Lp (Ω) ), there exist
u, uα ∈ Lp (Ω) such that

un → u, Dα un → uα in Lp (Ω) ∀α ∈ Nk0 , 0 < |α| ≤ m . (5.4.53)

On the other hand, we have the following.


Claim: In general, if vn → v in Lp (Ω), 1 ≤ p ≤ ∞, then vn → v in
D (Ω).

9
Sergei, L. Sobolev, Russian mathematician, 1908–1989.
144 5 Distributions, Sobolev Spaces

Indeed, for all φ ∈ D(Ω) we have


 
 
|(vn − v, φ)| =  (vn − v)φ dx
Ω

which for p = 1

≤ vn − vL1 (Ω) sup |φ|

and for 1 < p < ∞ by Hölder

≤ vn − vLp (Ω) φLp (Ω) ,

so vn → v in D (Ω), and similarly for p = ∞.


By the above claim, it follows that the convergences in (5.4.53) also
hold in D (Ω). Since Dα is a closed operation in D (Ω), it follows that

uα = D α u ∀α ∈ Nk0 , 0 < |α| ≤ m .

Therefore, u ∈ W m,p (Ω) and un − um,p → 0 as n → ∞.

Now, for m ∈ N, 1 ≤ p ≤ ∞, denote as usual by W0m,p (Ω) the closure


of C0∞ (Ω) in (W m,p (Ω),  · m,p ). Obviously, W0m,p (Ω) is a Banach
space with respect to  · m,p for all m ∈ N, 1 ≤ p ≤ ∞.
For p = 2 there are specific notations

H m (Ω) := W m,2 (Ω), H0m (Ω) := W0m,2 (Ω),

and corresponding norm  · m :=  · m,2 . These are Hilbert spaces


with the scalar product
  
(u, v)m := Dα u, Dα v L2 (Ω) .
|α|≤m

(A Banach space (X,  · )is called Hilbert if  ·  is given by a scalar


product (·, ·), i.e., x = (x, x), x ∈ X; see Chap. 6 for more infor-
mation on Hilbert spaces).
In particular, the scalar product of H 1 (Ω) (and of H01 (Ω) as well) is
k
 ∂u ∂v 
(u, v)1 = (u, v)L2 (Ω) + ,
∂xj ∂xj L2 (Ω)
j=1
k
 ∂u ∂v
= uv dx + dx ,
Ω Ω ∂xj ∂xj
j=1
5.4 Sobolev Spaces 145

where the derivatives are in the sense of distributions, so that


k
 ∂u 2
u21 = u2L2 (Ω) + dx .
j=1Ω ∂xj

It is well known that W m,p (Ω) is separable if 1 ≤ p < ∞. See, e.g.,


[1, p. 47], where further results on Sobolev spaces can be found. See
also [2, 6, 14].
Let us recall (without proof) the following approximation result (cf.,
e.g., [14, p. 252]).

Theorem 5.21. Let ∅ = Ω ⊂ Rk be an open bounded set of class


C 1 , and let 1 ≤ p < ∞. Then for every u ∈ W m,p (Ω) there exists a
sequence (un ) in C ∞ (Ω) such that un → u in W m,p (Ω).

For the definition of a C 1 open set see [6, p. 272]. Generally, in appli-
cations ∂Ω is smooth enough and consequently Ω is of class C 1 .
Notice also that W0m,p (Rk ) = W m,p (Rk ), i.e., C0∞ (Rk ) is dense in
W m,p (Rk ) (see, e.g., [1, p. 56]). But, in general, W0m,p (Ω) is a proper
subspace of W m,p (Ω).
Let us also state (without proof) a unified version of some results due
to Sobolev, Rellich & Kondrashov10 (see, e.g., [2, pp. 3–4]).

Theorem 5.22. If ∅ = Ω ⊂ Rk is an open set of class C 1 and 1 ≤


p < ∞, then there are the continuous embeddings

(a) if m < k
p , then W m,p (Ω) → Lq (Ω) ∀q ∈ [p, p∗ ], where p∗ =
kp
k−mp ;

(b) if m = k
p , then W m,p (Ω) → Lq (Ω) ∀q ∈ [p, ∞);

(c) if m > kp , then W m,p (Ω) → C 0,α (Ω) (which is the space of
Hölder continuous functions defined on Ω with exponent α ∈
(0, 1), and with α = 1 if m − kp > 1).

If, in addition, Ω is bounded, then all the above embeddings are com-
pact except for the case q = p∗ in (a), and furthermore, if we replace
W m,p (Ω) by W0m,p (Ω), then all these embeddings (including the com-
pact ones) hold without any regularity condition on ∂Ω.

10
Vladimir I. Kondrashov, Russian mathematician, 1909–1971; Franz Relich,
Austrian-German mathematician, 1906–1955.
146 5 Distributions, Sobolev Spaces

The above embeddings are the natural linear injective maps between
the corresponding spaces. In particular, the embedding (c) above as-
sociates with every u ∈ W m,p (Ω) (which is a class of functions with
respect to the a.e. equality) its continuous representative. Continuity
and compactness of the above embeddings are understood in the usual
sense.

We continue with a few words on the trace of functions from W m,p (Ω)
on the boundary ∂Ω of Ω. The concept of trace is important for appli-
cations to boundary value problems for partial differential equations.
We restrict our attention to W 1,p (Ω), 1 ≤ p < ∞, since this case is
sufficient for the applications that will be discussed later.
Clearly, for a function u ∈ C(Ω) its restriction to ∂Ω, denoted u|∂Ω , is
well defined. But if u ∈ W 1,p (Ω) then u is only defined a.e. on Ω so it
does not make sense to speak about the restriction of u to ∂Ω because
the k-dimensional Lebesgue measure of ∂Ω is zero; however, there is
a trace of u on ∂Ω which plays the role of the restriction u|∂Ω . More
precisely, we have the following theorem (cf. [14, pp. 258–259]):

Theorem 5.23. Let ∅ = Ω ⊂ Rk be an open bounded set of class C 1 ,


and let 1 ≤ p < ∞. There exists a continuous linear operator γ :
W 1,p (Ω) → Lp (∂Ω) such that γ(u) = u|∂Ω for all u ∈ W 1,p (Ω) ∩ C(Ω).
Moreover, u ∈ W01,p (Ω) if and only if u ∈ W 1,p (Ω) and γ(u) = 0.

In fact, the operator γ from the above statement is the extension by


continuity of the classical restriction to ∂Ω from W 1,p (Ω) ∩ C(Ω) to
Lp (∂Ω). This extension is unique since W 1,p (Ω) ∩ C(Ω) is dense in
(W 1,p (Ω),  · 1,p ) (see Theorem 5.21). If u ∈ W01,p (Ω), hence γ(u) = 0,
we say that u = 0 on ∂Ω in a generalized sense. For details on traces
and Lp (∂Ω), 1 ≤ p < ∞, see [14].

The case k = 1
If Ω = (a, b) ⊂ R, −∞ ≤ a < b ≤ +∞, we denote
   
Lp (a, b) := Lp (a, b) , W m,p (a, b) := W m,p (a, b) ,

   
W0m,p (a, b) := W0m,p (a, b) , H m (a, b) := H m (a, b) ,
 
H0m (a, b) := H0m (a, b) .

The case Ω = (a, b) will be discussed later in Sect. 5.6 on vector


distributions. In particular, we shall see that for 1 ≤ p < ∞ and
5.4 Sobolev Spaces 147

−∞ < a < b < +∞ every u ∈ W 1,p (a, b) has a representative which


is an absolutely continuous function on [a, b], so identifying u with
this representative, u(a) and u(b) make sense classically. According
to Theorem 5.23, u is in W01,p (a, b) if and only if u ∈ W 1,p (a, b) and
u(a) = 0 = u(b). This shows in particular that W01,p (a, b) is a proper
subspace of W 1,p (a, b).

Green’s Identity
Let ∅ = Ω ⊂ Rk be an open and bounded set of class C 1 . Recall the
classical divergence (Gauss–Ostrogradski11 ) formula

∇ · F dx = F · n ds
Ω ∂Ω
∀F = (f1 , . . . , fk ), fi ∈ C 1 (Ω), i = 1, . . . , k, (5.4.54)

where n is the outward pointing unit normal. Choosing in (5.4.54)


F = g∇f , with f ∈ C 2 (Ω) and g ∈ C 1 (Ω), one obtains the classical
Green identity

∂f
gΔf dx + ∇f · ∇g dx = g ds . (5.4.55)
Ω Ω ∂Ω ∂n

Taking into account Theorems 5.21 and 5.23, the identity (5.4.55) can
be easily extended by density to

gΔf dx + ∇f · ∇g dx
Ω
Ω
∂f
= g ds ∀f ∈ W 2,p (Ω), g ∈ W 1,q (Ω) , (5.4.56)
∂Ω ∂n

where 1 < p < ∞ and q is the conjugate ofp, i.e., q = (p − 1)/p. Here,
the functions in the right-hand side under ∂Ω actually represent their
traces on ∂Ω.

Poincaré’s Inequality12
Now we present an important inequality which holds in W01,p (Ω) for
1 ≤ p < ∞ and Ω open and bounded.
11
Mikhail V. Ostrogradski, Russian-Ukrainian mathematician, mechanician, and
physicist, 1801–1862.
12
Henri Poincaré, French mathematician, theoretical physicist, engineer, and
philosopher of science, 1854–1912.
148 5 Distributions, Sobolev Spaces

Theorem 5.24 (Poincaré). Let ∅ = Ω ⊂ Rk be an open bounded set


and let 1 ≤ p < ∞. Then

uLp (Ω) ≤ C∇uLp (Ω) ∀u ∈ W01,p (Ω) , (5.4.57)

where C is a positive constant depending on Ω and



k 1/p
∇uLp (Ω) := ∂u/∂xi Lp (Ω) .
i=1

Proof. Taking into account the definition of W01,p (Ω), it is enough to


prove (5.4.57) for all u ∈ C0∞ (Ω).
Consider first the case
 k=  1, i.e., Ω = (a, b), −∞ < a < b < ∞. If
u ∈ C0∞ (a, b) := C0∞ (a, b) , then
x b

u(x) = u (t) dt =⇒ |u(x)| ≤ |u (t)| dt ∀x ∈ [a, b] .
0 a

If p = 1 we obtain (5.4.57) with C = b − a by integrating the last


inequality over [a, b].
If 1 < p < ∞ then we can derive from the same inequality by using
Hölder 1
|u(x)| ≤ (b − a) p u Lp (a,b) ∀x ∈ [a, b] ,
where p = p/(p − 1). It follows that
b
|u(x)|p dx ≤ (b − a)p u pLp (a,b) ,
a

so (5.4.57) holds again with C = b − a.

Now, consider the case k = 2. Let D = [a, b] × [c, d] be a rectangle in


the xy-plane such that Ω ⊂ D. Take u ∈ C0∞ (Ω) and extend it as zero
in D \ Ω. We have
x

u(x, y) = u(s, y) ds =⇒ |u(x, y)|
a ∂s
b  
∂ 
≤  u(s, y) ds ∀(x, y) ∈ D .
 ∂s 
a

If p = 1 we obtain by integrating the last inequality over D


∂u ∂u
uL1 (D) ≤ (b − a) L1 (D) =⇒ uL1 (Ω) ≤ (b − a) L1 (Ω) .
∂x ∂x
5.5 Bochner’s Integral 149

If 1 < p < ∞ we derive by using Hölder


∂u
uLp (Ω) ≤ (b − a)  p , (5.4.58)
∂x L (Ω)
so, in fact, (5.4.58) is valid for p ∈ [1, ∞).
Similarly,
∂u
uLp (Ω) ≤ (d − c) Lp (Ω) . (5.4.59)
∂y
By (5.4.58) and (5.4.59) it follows that (5.4.57) holds with C = 2 max
{b − a, d − c}.
The proof is similar for k ≥ 3.

Remark 5.25. An inspection of the above proof shows that the Poincaré
inequality still holds if the Lebesgue measure of Ω is finite, and also if
the projection of Ω on some coordinate plane is bounded.
Remark 5.26. If Ω is bounded or satisfies one of the conditions in the
previous remark then, according to the Poincaré inequality, W01,p (Ω)
can be equipped with a new norm

u∗1,p = ∇uLp (Ω) ,

which is equivalent to the usual norm  · 1,p .

5.5 Bochner’s Integral


Let ∅ = Ω ⊂ Rk be a Lebesgue measurable set, and let (X,  · ) be a
real Banach space.
As in the case of R-valued functions, a function g : Ω → X is a simple
function if it is of the form

p
g(s) = χMi (s)yi
i=1

for some yi ∈ X, Mi ⊂ Ω measurable with finite measure (i.e., m(Mi ) <


∞), and Mi ∩ Mj = ∅ if i = j. Here, we prefer to use s to denote a
generic point in Ω (instead of x which could be used to designate points
of X).
A function f : Ω → X is called strongly measurable (or simply mea-
surable) if there exists a sequence of simple functions gn : Ω → X such
that
lim gn (s) − f (s) = 0 for a.a. s ∈ Ω .
n→∞
150 5 Distributions, Sobolev Spaces

If g is a simple function as above, then it is clearly measurable. Define


its integral over Ω to be

p
g(s) ds := m(Ai )yi .
Ω i=1

If g is a simple function, then g (i.e., the function s → g(s)) is


a simple function as well (hence Lebesgue integrable over Ω) and the
following inequality holds:
* *
* *
* g(s) ds* ≤ g(s) ds .
* *
Ω Ω

Denote by S the set of all simple functions g : Ω → X. Clearly S is


a real linear space with respect to the usual operations (addition of
functions and scalar multiplication), and

(α1 g1 + α2 g2 ) ds = α1 g1 ds
Ω

Ω

+α2 g2 ds ∀α1 , α2 ∈ R, ∀g1 , g2 ∈ S .


Ω

Definition 5.27. f : Ω → X is said to be Bochner integrable (over


Ω)13 if there exists a sequence of simple functions gn : Ω → X con-
verging strongly to f a.e. in Ω (so f is measurable) and

lim gn (s) − gm (s) ds = 0 , (5.5.60)
n,m→∞ Ω

and the Bochner integral of f is defined as



f (s) ds := lim gn (s) ds. (5.5.61)
Ω n→∞ Ω

Let us justify the above definition. We have


* * * *
* * * *
* gn ds − g ds * = * (g − g ) ds *
* m * * n m *
Ω Ω
Ω

≤ gn − gm  ds .
Ω

13
Salomon Bochner, American mathematician, 1899–1982.
5.5 Bochner’s Integral 151

So (5.5.60) implies
* *
* *
*
lim * gn ds − gm ds*
n,m→∞ * = 0,
Ω Ω

i.e., the limit in (5.5.61) exists. To prove the limit does not depend on
the choice of (gn ), consider another sequence (g̃n ) satisfying the same
properties. Then, by (5.5.60), we have for all ε > 0

gn − g̃n − gm + g̃m  ds ≤ gn − gm  ds + g̃n − g̃m  ds
Ω Ω Ω
≤ ε ∀n, m > Nε .

Letting m → ∞ it follows from Fatou’s Lemma that



gn − g̃n  ds ≤ ε ∀n > Nε . (5.5.62)
Ω

Now, since gn , g̃n are simple functions, we have


* * * *
* * * *
* gn ds − g̃n ds* * *
* * = * (gn − g̃n ) ds* ≤ gn − g̃n  ds .
Ω Ω Ω Ω
(5.5.63)
From (5.5.62) and (5.5.63) we deduce

lim g̃n ds = lim gn ds = f ds ,
n→∞ Ω n→∞ Ω Ω

so the definition is correct.


Remark 5.28. Note that if X = RN , N ∈ N, then f = (f1 , . . . , fN ) is
measurable in the sense above if and only if fi is Lebesgue measurable
for all i = 1, . . . , N , and integrability of f in the sense of Bochner
means integrability of all fi ’s in the sense of Lebesgue. If (X,  · ) is
an infinite dimensional Banach space, then, in addition to the concept
of strong measurability of a function from Ω to X as defined before,
there is also a concept of weak measurability, namely f : Ω → X is
said to be weakly measurable if s → x∗ (f (s)) is Lebesgue measurable
for every continuous linear functional x∗ : (X,  · ) → R. If X is a
separable Banach space, then the weak measurability of f is equiv-
alent to its strong measurability. In fact, this equivalence holds if f
is almost separably valued, that is {f (s); s ∈ Ω \ M } is a separable
set, where M ⊂ Ω has zero Lebesgue measure. This result belongs to
152 5 Distributions, Sobolev Spaces

Pettis,14 see, e.g., [51, p. 131]. It is worth mentioning that, in all the
applications discussed in this book, X will always stand for separable
Banach spaces.
The next result says that Bochner integrability of any X-valued func-
tion f reduces to Lebesgue integrability of f .
Theorem 5.29 (Bochner). Let (X,  · ) be a real Banach space and
let Ω ⊂ Rk be a measurable set. If f : Ω → X is strongly measurable,
then f is Bochner integrable if and only if f  is Lebesgue integrable,
where f (s) := f (s) for almost all s ∈ Ω.
Proof. Since f is strongly measurable, f  is also (Lebesgue) measur-
able because a sequence of simple functions gives a sequence of simple
functions upon taking the norm.
To prove necessity, assume that f is Bochner integrable. If (gn ) is
a sequence of simple functions as in Definition 5.27, we can write
(see (5.5.60))

gn − gm  ds ≤ ε ∀n, m > Nε .
Ω

Applying Fatou’s Lemma, we get



gn − f  ds ≤ ε ∀n > Nε ,
Ω

i.e., gn − f  is Lebesgue integrable for all n > Nε . So integrating the


obvious inequality
f  ≤ f − gn  + gn 
we obtain

f  ds ≤ f − gn  ds + gn  ds < ∞ ∀n > Nε ,
Ω Ω Ω

hence f  is Lebesgue integrable.


In order to prove sufficiency, assume that f  is Lebesgue integrable
and consider a sequence of simple functions hn : Ω → X such that
lim hn (s) − f (s) = 0 for almost all s ∈ Ω .
n→∞

Define 
hn (s) if hn (s) ≤ (1 + δ)f (s),
gn (s) =
0 otherwise,
14
Billy James Pettis, American mathematician, 1913–1979.
5.5 Bochner’s Integral 153

where δ is a positive constant. This is a sequence of simple functions


and
lim gn (s) − f (s) = 0 for a.a. s ∈ Ω . (5.5.64)
n→∞

We must show
lim gn − gm  ds = 0 . (5.5.65)
n,m→∞ Ω

To do this, we shall apply the Lebesgue Dominated Convergence The-


orem to the sequence (gn − f ). The first condition of this theorem
is satisfied (see (5.5.64)), and

gn (s) − f (s) ≤ gn (s) + f (s)


≤ (1 + δ)f (s) + f (s)
= (2 + δ)f (s) ,

so the second condition of the Lebesgue Dominated Convergence The-


orem is also satisfied, hence

lim gn − f  ds = 0 .
n→∞ Ω

This along with the obvious inequality



gn − gm  ds ≤ gn − f  ds + gm − f  ds
Ω Ω Ω

implies (5.5.65).

Remark 5.30. It is worth pointing out that for every f : Ω → X which


is Bochner integrable, we have
* *
* *
* f ds* ≤ f  ds ,
* *
Ω Ω

because this inequality holds for simple functions. In general, the usual
properties of the Lebesgue integral are also satisfied by the Bochner
integral.
Remark 5.31. Let (X,  · ) and (Y,  · ∗ ) be real Banach spaces. If
f : Ω → X is Bochner integrable over Ω and A is a continuous linear
operator from (X, ·) to (Y, ·∗ ), then A◦f is also Bochner integrable
and
A◦f ds = A f ds .
Ω Ω
154 5 Distributions, Sobolev Spaces

Indeed, if (gn ) is a sequence of simple functions converging to f , then


(A◦gn ) is also a sequence of simple functions which converges to A◦f .
Moreover,

A◦gn − A◦gm  ds ≤ A gn − gm  ds → 0 as n, m → ∞ .
Ω Ω

It follows that

A◦f ds = lim A◦gn ds = lim A gn ds = A f ds ,
Ω n→∞ Ω n→∞ Ω Ω

as claimed.

For X a real Banach space, Ω ⊂ Rk measurable, and 1 ≤ p < ∞ define



L (Ω; X) = {f : Ω → X; f is measurable and
p
f p ds < ∞} .
Ω

We also define

L∞ (Ω; X) = {f : Ω → X; f is measurable
and ess sup f (s) < ∞} ,
s∈Ω

where
ess sup f (s) := inf{C; f (s) ≤ C a.e. on Ω}.
s∈Ω

Let ∼ denote equality a.e. and define the quotient space

Lp (Ω; X) := Lp (Ω)/∼ .

This is a real Banach space for 1 ≤ p ≤ ∞ with respect to the norm


! "1/p
f Lp (Ω; X) := f  ds
p
, 1 ≤ p < ∞,
Ω
f L∞ (Ω; X) := ess sups∈Ω f (s) .

The proof follows by arguments similar to those from the proof of the
classical theorem corresponding to the case X = R (Theorem 3.25),
so we leave it to the reader as an exercise. The key condition is the
completeness of X.
If Ω = (a, b) with −∞ ≤ a < b ≤ ∞ denote Lp (a, b; X) := Lp ((a, b); X).
5.6 Vector Distributions, W m,p (a, b; X) Spaces 155

5.6 Vector Distributions, W m,p (a, b; X) Spaces

∞ ∞
 let −∞ ≤ a < b ≤ ∞. Denote as before
Let X be a Banach space and
D(a, b) = C0 (a, b) := C0 (a, b) equipped with the inductive limit
topology.

Definition 5.32. An X-valued distribution over (a, b) is an operator


u : D(a, b) → X which is linear and continuous (in the sense that
if φn → 0 in D(a, b) then u(φn ) → 0). The set of all such vector
distributions is denoted D (a, b; X).

As in the scalar case, a regular distribution is one which is generated


by a locally integrable function u ∈ L1loc (a, b; X), i.e., u : (a, b) → X
is strongly measurable and u ∈ L1 (K) for all K ⊂ (a, b) compact.
Define ũ : D(a, b) → X by
b
ũ(φ) := φ(t)u(t)dt ∀φ ∈ D(a, b) .
a

The mapping u → ũ is injective, as its null set is {0}. Indeed, for


φ ∈ D(a, b) and v ∈ L1loc (a, b; X) satisfying
b
φ(t)v(t) dt = 0 ,
a

we have (cf. Remark 5.31)


b
φ(t)x∗ (v(t)) dt = 0 ∀x∗ ∈ X ∗ ,
a

where X ∗ is the dual of X. Since t → x∗ (v(t)) is a real, locally


summable function, it follows by Theorem 5.9 that

x∗ (v(t)) = 0 ∀x∗ ∈ X ∗ , and a.a. t ∈ (a, b)

so v(t) = 0 for a.a. t ∈ (a, b).


Consequently, one can identify the (regular) distribution ũ with the
locally summable function u, and write
b
u(φ) := φ(t)u(t)dt ∀φ ∈ D(a, b) .
a
156 5 Distributions, Sobolev Spaces

Of course, as in the scalar case, not all vector distributions arise in


this way, e.g., u : D(R) → X defined by u(φ) = φ(0)x for all φ ∈ D(R)
and a fixed x ∈ X \ {0}.

For u ∈ D (a, b; X) define the derivative

u (φ) := −u(φ ) ∀φ ∈ D(a, b) ,

and inductively,

u(j) (φ) = (−1)j u(φ(j) ) ∀φ ∈ D(a, b), j ∈ N ,

and by convention

u(0) = u .

In applications, intervals (a, b) are sufficient, though the theory extends


to Ω ⊂ Rk .

For m ∈ N, 1 ≤ p ≤ ∞, we set

W m,p (a, b; X) := {u ∈ D (a, b; X);


u(j) ∈ Lp (a, b; X), j = 0, 1, . . . , m} ,

so, in fact, u is a regular distribution because j = 0 is included. Also,


all (distributional) derivatives above are regular as well.

Theorem 5.33. If X is a Banach space then, for all m ∈ N and


1 ≤ p ≤ ∞, W m,p (a, b; X) is a Banach space with respect to the norm


m 1/p
uW m,p (a,b; X) := u(j) pLp (a,b; X) , 1 ≤ p < ∞,
j=0

uW m,∞ (a,b; X) := max u(j) L∞ (a,b; X) , p = ∞.


0≤j≤m

Proof. Similar to the proof of Theorem 5.20.


5.6 Vector Distributions, W m,p (a, b; X) Spaces 157

The notation Wlocm,p


(a, b; X) indicates the set of all u ∈ D (a, b; X) such
that u ∈ W m,p (t1 , t2 ; X) for every bounded interval (t1 , t2 ) ⊂ (a, b).
For p = 2 denote H m (a, b; X) = W m,2 (a, b; X). If X is a Hilbert
space, then so is H m (a, b; X) with respect to the inner product
m b
  
(u, v)H m (a,b; X) = u(j) (t), v (j) (t) X
dt .
j=0 a

Now for −∞ < a < b < +∞ denote by Am,p (a, b; X) the space of all
functions f : [a, b] → X which are absolutely continuous on [a, b], the
pointwise derivatives dj f /dtj exist and are absolutely continuous on
[a,b] for j = 1, 2, . . . , m − 1, and dm f /dtm ∈ Lp (a, b; X).
Remark 5.34. If X is reflexive, it follows by a well-known theorem due
to Kōmura15 (see [25]; see also [45, p. 105]) that

A1,1 (a, b; X) = AC([a, b]; X) ,

where AC([a, b]; X) is the space of all X-valued absolutely continuous


functions on [a, b].

Theorem 5.35. For m ∈ N, 1 ≤ p ≤ ∞, −∞ < a < b < ∞, and


u ∈ Lp (a, b; X) then the following are equivalent:

(j) u ∈ W m,p (a, b; X) ;

(jj) there exists u1 ∈ Am,p (a, b; X) such that u1 (t) = u(t) for almost
all t ∈ (a, b) .

Proof. We shall prove the case m = 1, and then the result follows by
induction.
To prove the implication (j) ⇒ (jj) fix u ∈ W 1,p (a, b; X) and extend
it as zero in R \ (a, b). For ε > 0 small define uε as before, i.e.,

uε (t) = ωε (t − s)u(s) ds ,
R

where

− 1
1 Ce 1−t2 , |t| < 1 ,
ωε (t) = ω(t/ε), and ω(t) =
ε 0, |t| ≥ 1 ,
15
Yukio Kōmura, Japanese mathematician, born 1931.
158 5 Distributions, Sobolev Spaces

with C > 0 such that R ω(t) dt = 1. We have

d
u̇ε (t) = uε (t)
dt
= ωε (t − s)u(s) ds , ∀t ∈ R ,
R

which is a function, but we understand it as a distribution and apply


it to a test function φ ∈ C0∞ (R)

(u̇ε , φ) = φ(t)u̇ε (t) dt
R
! "
= φ(t) ωε (t − s)u(s) ds dt
R R

and interchanging the order of integration


! "
= ωε (t − s)φ(t) dt u(s) ds
R R

=− φε (s)u(s) ds
R
= −(u, φε )
= u (φε )

= φε (t)u (t) dt
R
! "
= ωε (t − s)φ(s) ds u (t) dt
R R

and changing the order of integration again


! "

= φ(s) ωε (t − s)u (t) dt ds
R R

= φ(s)(u )ε (s) ds
R

so that

(u̇ε , φ) = ((u )ε , φ) , ∀φ ∈ C0∞ (R) . (5.6.66)

In other words, the pointwise derivative u̇ε is equal to (u )ε .


5.6 Vector Distributions, W m,p (a, b; X) Spaces 159

Now, integrate to obtain


t
uε (t) − uε (s) = (u )ε (τ ) dτ . (5.6.67)
s

Note that uε → u and (u )ε → u in Lp (a, b; X) as ε → 0+ (the proof


is the same as in the scalar case).
Hence, there exists a function u1 such that
t
u1 (t) − u1 (s) = u (τ ) dτ for a.a. s, t ∈ (a, b) .
s

Therefore, u1 ∈ AC([a, b]; X) and u̇1 = u for almost all t ∈ (a, b), i.e.,
the pointwise derivative u̇1 is a representative of the distributional
derivative u ∈ Lp (a, b; X). So u̇1 ∈ Lp (a, b; X), which together with
absolute continuity implies that u1 ∈ A1,p (a, b; X).
For the implication (jj) =⇒ (j), assume there exists
u1 ∈ A1,p (a, b; X) an element of the class u. We must show that
u ∈ W 1,p (a, b; X). Since u1 ∈ AC[a, b], u ∈ Lp (a, b; X), and we must
show that u ∈ Lp (a, b; X). We start with u̇1 and interpret it as a
distribution. For all φ ∈ D(a, b), we have
b
(u̇1 , φ) = φu̇1 dt
a

and, integrating by parts,


b
=− φ̇u1 dt
a

and, since changing u1 to another element of its class won’t affect the
integral,
b
=− φ̇u dt
a
= −u(φ̇)
= u (φ) .

Therefore, u̇1 = u as distributions, but since u̇1 is a function, so is u


and u̇1 ∈ Lp (a, b; X) so u ∈ Lp (a, b; X).

Note that usually good representatives are preferred since their values
at particular points make sense.
160 5 Distributions, Sobolev Spaces

5.7 Exercises
1. Let Ω = R × (−1, +1) ⊂ R2 and let u : Ω → R be defined by

u(x) = |x1 |x21 (1 + x1 x2 + |x2 |x22 ).

Show that u ∈ C 2 (Ω) and find supp u.

2. Find a collection F of seminorms on C[0, 1] := C([0, 1]; R) such


that the topology generated by F coincide with the pointwise
convergence topology.

3. Let Ω ⊂ Rk be a nonempty open set. For any compact set K ⊂ Ω


and m ∈ N ∪ {0} define the seminorm p : C ∞ (Ω) → R

pK,m (f ) = sup |Dα f (x)|, f ∈ C ∞ (Ω),


x∈K,|α|≤m

where α = (α1 , . . . , αk ) are multi-indices, |α| = α1 + · · · αk , and

∂ |α|
Dα f (x) = f (x1 , . . . , xk ).
∂xα1 1 · · · ∂xαk k

Consider a sequence of compact sets K1 ⊂ K2 ⊂ · · · ⊂ Kn ⊂


· · · ⊂ Ω, such that Ω = ∪∞
n=1 Kn . Define for each j ∈ N


j
1 pKj,m (f − g)
dj (f, g) = · , f, g ∈ C j (Ω),
2m 1 + pKj,m (f − g)
m=0

and

 1 dj (f, g)
d(f, g) = j
· , f, g ∈ C ∞ (Ω).
2 1 + dj (f, g)
j=1

Show that d is a metric on C ∞ (Ω).

4. Find a function φ ∈ C ∞ (R) with supp φ = [0, 4], φ ≥ 0 and


maxR φ = 1.

5. Let φ ∈ C0∞ (Rk ). Prove that there exists ψ ∈ C0∞ (Rk ) such that
kψ 
φ = ∂x1∂···∂x k
if and only if Rk φ(x) dx = 0.

6. Let (an )n∈N be a sequence of real numbers. Prove that there


exists a function φ ∈ C0∞ (R) such that φ(n) = an ∀n ∈ N if and
only if there exists an n0 ∈ N such that an = 0 ∀n > n0 .
5.7 Exercises 161

7. Let m ∈ N and ψ ∈ C0∞ (Rk ). Define the sequence (φn ) by

φn (x) = 2−n nm ψ(nx), x ∈ Rk , n ∈ N.

Show that φn → 0 in D(Rk ) as n → ∞.

8. Let h be a nonzero vector in Rk and let ψ ∈ C0∞ (Rk ). Consider


the sequence (φn )n∈N , where
 1  
φn (x) = n ψ x + h − ψ(x) , x ∈ Rk , n ∈ N.
n
Prove that

k
∂ψ
φn → hj in D(Rk ).
∂xj
j=1

Deduce from this result the convergence in D(Rk ) to 0 of the


sequence (γn )n∈N defined by
 1   1 
γn (x) = n ψ x + h − ψ x − h , x ∈ Rk , n ∈ N.
n n

9. Let Ω ⊂ Rk be a nonempty open set. For φ ∈ C0∞ (Ω) consider



φn (x) = ω1/n (x − y)φ(y) dy, x ∈ Ω, n ∈ N sufficiently large,
Ω

where ω1/n denotes the usual Friedrichs mollifier. Prove that φn


converges to φ in D(Ω).

10. Let Ω ⊂ Rk be a nonempty open set. For a given point a ∈ Ω


and for a multi-index α ∈ Nk0 , define u : D(Ω) → R by

u(φ) = Dα φ(a) ∀φ ∈ D(Ω).

Here N0 = N ∪ {0}. Prove that u ∈ D (Ω) and u is not a regular


distribution.

11. Let Ω ⊂ Rk be a nonempty open set. Show that, if φ ∈ D(Ω)


satisfies
u(φ) = 0 ∀u ∈ D (Ω),
then φ = 0.
162 5 Distributions, Sobolev Spaces

12. Let u : D(R) → R,



 
u(φ) = φ(1/i2 ) − φ(0) , φ ∈ D(R).
i=1

Prove that u is well defined, u ∈ D (R), and u is not a regular


distribution.

13. Show that mixed derivatives of distributions do not depend on


the order of differentiation.

14. Let ∅ = Ω ⊂ Rk be an open set, u ∈ D (Ω), a ∈ C ∞ (Ω). Show


that
∂(au) ∂a ∂u
= u+a .
∂xi ∂xi ∂xi
Extend this formula to Dα (au) for a general multi-index α.

15. Find the n-th derivatives (n = 1, 2, 3) in the sense of distribu-


tions of f, g : R → R,
1
f (x) = x|x|, x ∈ R ,
2
g(x) = H(x) · cos x, x ∈ R ,

where H denotes the usual Heaviside function.

16. Find a sequence (Hn )n∈N in C0∞ (R) such that Hn → H in D (R),
where H is the Heaviside function.

17. Let u : D(R2 ) → R,



u(φ) = φ(x1 , 0) dx1 ∀φ ∈ D(R2 ).
−∞

(i) Prove that u ∈ D (R2 );


(ii) Show that u is not a regular distribution;
∂u
(iii) Check that ∂x1 = 0.

18. Let Ω ⊂ Rk be a nonempty open set and let S ⊂ Ω be a countably


infinite set of isolated points, S = {x1 , x2 , . . . ,
xn , . . . }. Show
∞ that for any sequence of real numbers (an )n∈N

the series n=1 an δxn converges in D (Ω).
5.7 Exercises 163

19. Let (xn )n∈N be a sequence in Rk . Prove the following implication:

δxn → 0 in D (Rk ) =⇒ xn  → ∞.

20. Solve the following equations in D (R):

(a) u + tu = χ[0,1] (t) (where χ[0,1] is the characteristic function


of [0, 1]);
(b) u + u = H + δ  (where H denotes the Heaviside function
and δ is the Dirac distribution);
(c) u − 2u + u = 2δ(t − 1) + δ(t − 2);
(d) u − 4u = δ  − δ  − 8.

21. Solve the Cauchy problem



u − u = δ(t − 1) + 2δ(t − 3) − 2t − 1 in D (R),
u(0) = 1, u (0) = 0.

22. Prove that the solution set of the equation

(sin t) · u = 0 in D (R)

is an infinite dimensional linear subspace of D (R).

23. Find u1 , u2 , u3 ∈ D (R) satisfying the differential system



⎪ 
⎨u1 = 4u1 − u2 + H,
u2 = 3u1 + u2 − u3 + δ,

⎩ 
u3 = u1 + u3 + H.

24. Let a be a given real number. If u ∈ W01,1 (a, ∞) : = W01,1


(a, ∞; R), prove that there exists a function v ∈ C[a, ∞) which
is a representative of the class u, and v(a) = 0.

25. Let p ∈ (1, ∞). Show that W 2,p (0, 1) is compactly embedded
into C 1 [0, 1]. The Sobolev space W 2,p (0, 1) is equipped with the
usual norm, and C 1 [0, 1] is equipped with the norm

f C 1 = max |f (t)| + max |f  (t)| ∀f ∈ C 1 [0, 1].


0≤t≤1 0≤t≤1
164 5 Distributions, Sobolev Spaces

26. Let φ ∈ C0∞ (R) \ {0} and let 1 ≤ p ≤ +∞. Define un : R → R


by un (t) = φ(t + n), t ∈ R, n ∈ N. Prove that

(i) (un )n∈N is bounded in W m,p (R) for every m ∈ N;


(ii) there exists no subsequence of (un ) converging strongly in
Lq (R) for any 1 ≤ q ≤ ∞.

27. Let ∅ = Ω ⊂ Rk be an open bounded set. If u, v ∈ H 1 (Ω) =


W 1,2 (Ω), show that uv ∈ W 1,1 (Ω) and

∂ ∂u ∂v
(uv) = ·v+u· , i = 1, 2, . . . , k,
∂xi ∂xi ∂xi

in D (Ω) and a.e. in Ω.


Chapter 6

Hilbert Spaces

Let X be a linear space over K equipped with a scalar (inner) product


(·, ·) (i.e., X is an inner product space or a generalized Euclidean space,
as defined in Chap. 1). As usual, throughout this chapter K is either
R or C. Define the norm

x = (x, x), x ∈ X .

If (X,  · ) is a Banach space (i.e., (X, d) is a complete metric space,


where d(x, y) = x − y, x, y ∈ X), then X is said to be a Hilbert1
space. In other words, a Hilbert space is a Banach space (X,  · )
whose norm is given by a scalar product.

6.1 Examples
We have already met some Hilbert spaces, such as the Euclidean space
Rk , Ck , L2 (Ω), H m (Ω), m ∈ N, these spaces being equipped with their
usual scalar products, i.e.,

1
David Hilbert, German mathematician, 1862–1943.

© Springer Nature Switzerland AG 2019 165


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 6
166 6 Hilbert Spaces


k
(x, y) = x i yi , x = (x1 , . . . , xk ) , y = (y1 , . . . , yk ) ∈ Rk ,
i=1
k
(x, y) = x i yi , x = (x1 , . . . , xk ), y = (y1 , . . . , yk ) ∈ Ck ,
i=1
(u, v)L2 (Ω) = uv dx , u, v ∈ L2 (Ω) ,
Ω
  α 
(u, v)m = D u, Dα v L2 (Ω) , u, v ∈ H m (Ω) ,
|α|≤m

and the corresponding induced norms


k
x = 2
x2i , x = (x1 , . . . , xk ) ∈ Rk ,
i=1
k
x2 = |xi |2 , x = (x1 , . . . , xk ) ∈ Ck ,
i=1
u2L2 (Ω) = u2 dx , u ∈ L2 (Ω) ,
Ω

u2m = Dα u2L2 (Ω) , u ∈ H m (Ω) ,
|α|≤m

where Ω is a measurable or open subset of Rk in the third and fourth


cases, respectively.
Obviously, every Cauchy sequence in Rk is convergent since the cor-
responding coordinate sequences are Cauchy in (R, | · |), hence con-
vergent in that space. So the Euclidean space Rk equipped with the
above scalar product and norm is a Hilbert space over R. Similarly, Ck
equipped with the above scalar product and norm is a Hilbert space
over C.
Note also that Lp (Ω) equipped with the usual norm is a Banach space
for all 1 ≤ p ≤ ∞ (see Theorem 3.25). So (L2 (Ω),  · L2 (Ω) ) is a real
Hilbert space. Also, H m (Ω) equipped with the above scalar product
and norm is a real Hilbert space, and so is its closed subspace H0m (Ω),
m ∈ N.
It is worth pointing out that H01 (Ω) can be equipped with a different
scalar product,


(u, v)1 = ∇u · ∇v dx , u, v ∈ H01 (Ω) ,
Ω
6.1 Examples 167

and the induced norm

u∗1 = ∇uL2 (Ω) , u ∈ H01 (Ω) ,

whenever Ω is open and has finite measure, or its projection on a


coordinate plane is bounded (see Theorem 5.24 and Remarks 5.25 and
5.26).

Note also that, for −∞ ≤ a < b ≤ ∞ and a Hilbert space X,


L2 (a, b; X) equipped with the scalar product
b 
(u, v)L2 (a,b; X) = u(t), v(t) X
dt , u, v ∈ L2 (a, b; X) ,
a

and the induced norm


b
u2L2 (a,b; X) = u(t)2L2 (a,b; X) dt ,
a

is a Hilbert space, too. Also, H m (a, b; X) is a Hilbert space for any


m ∈ N with respect to the scalar product
m b
  
(u, v)m = u(j) (t), v (j) (t) X
dt , u, v ∈ H m (a, b; X) ,
j=0 a

and the induced norm


m
 b
u2m = u(j) (t)2X dt , u ∈ H m (a, b; X) .
j=0 a

Let us point out that any inner product space can be extended
(uniquely up to isomorphism) to a Hilbert space, by a completion pro-
cedure similar to that used in the proof of Theorem 2.8. To illustrate
this consider the space C[0, 2] endowed with the scalar product
2
u, v = u(t)v(t) dt , u, v ∈ C[0, 2] ,
0

and the induced norm


2
u2L2 = u, u = u(t)2 dt , u ∈ C[0, 2] .
0
168 6 Hilbert Spaces

The space (C[0, 2],  · L2 ) is not complete (i.e., it is not a Hilbert
space), as can be seen by using the sequence (un )n≥2 defined by


⎨0, 0 ≤ t ≤ 1 − n1 ,
un (t) = nt − n + 1, 1 − n1 < t < 1 ,


1, 1 ≤ t ≤ 2,

but it can be extended to the Hilbert space (L2 (0, 2),  · L2 ) (each
element ∈ C[0, 2] being identified with its L2 equivalence class).
If X is a finite dimensional, inner product space, then it is a Hilbert
space with respect to the norm induced by the corresponding inner
product, so no extension is needed (in particular, Rk and Ck are Hilbert
spaces).

6.2 Jordan–von Neumann Characterization


Theorem
Our aim in this chapter is to present the main properties of Hilbert
spaces which are of course common to all the particular spaces men-
tioned above. First of all, we state the following characterization result
due to Jordan and von Neumann.2

Theorem 6.1 (Jordan–von Neumann). Let (H,  · ) be a normed


space. Then the norm · is given by a scalar product  (i.e., there exists
a scalar product (·, ·) : H × H → K such that x = (x, x), x ∈ H)
if and only if  ·  satisfies the parallelogram law. (Hence, a Banach
space (H,  · ) is Hilbert ⇐⇒ its norm  ·  satisfies the parallelogram
law).

Proof. Necessity has already been proved in Chap. 1, though we repeat


here the proof which is immediate. Assuming that  ·  is generated
by a scalar product (·, ·), we have for all x, y ∈ H

x + y2 + x − y2 = (x + y, x + y) + (x − y, x − y)
= 2(x2 + y2 ) , (6.2.1)

i.e., the norm satisfies the parallelogram law.

2
Pascual Jordan, German theoretical and mathematical physicist, 1902–1980;
John von Neumann, Hungarian-American mathematician, physicist, and computer
scientist, 1903–1957.
6.2 Jordan–von Neumann Characterization Theorem 169

Now let us prove sufficiency. Assume that the norm  ·  of H satisfies


the parallelogram law (see (6.2.1)).
Consider first the case K = R. Define f : H × H → R by
1 
f (x, y) = x + y2 − x − y2 , x, y ∈ H,
4
which we will show is a scalar product on H. Clearly,
1
f (x, x) = 2x2 = x2 ∀x ∈ H , (6.2.2)
4
f (x, y) = f (y, x) ∀x, y ∈ H , (6.2.3)
f (x, 0) = 0 ∀x ∈ H . (6.2.4)
Obviously, for any x1 , x2 , y ∈ H, we have
1 
f (x1 + x2 , y) = x1 + x2 + y2 + x1 + x2 − y2 ,
4
1 
f (x1 − x2 , y) = x1 − x2 + y2 + x1 − x2 − y2 .
4
Add the two equations and apply the parallelogram law to get
1
f (x1 + x2 , y) + f (x1 − x2 , y) = x1 + y2 + x2 2
2 
−x1 − y2 − x2 2
1 
= x1 + y2 − x1 − y2
2
= 2f (x1 , y) . (6.2.5)

In the special case x1 = x2 = x we have (see also (6.2.4) and (6.2.3))

f (2x, y) = 2f (x, y) ∀x, y ∈ H . (6.2.6)

Now choose in (6.2.5) x1 + x2 = x and x1 − x2 = x to obtain


 x + x 
f (x, y) + f (x , y) = 2f ,y ,
2
which by (6.2.6) gives

f (x + x , y) = f (x, y) + f (x , y) ∀x, x , y ∈ H . (6.2.7)

From (6.2.7) we obtain f (nx, y) = nf (x, y) for all n ∈ N which can be


extended to

f (nx, y) = nf (x, y) ∀x, y ∈ H, ∀n ∈ Z , (6.2.8)


170 6 Hilbert Spaces

since f (−x, y) = −f (x, y) (by (6.2.7)). Now for a rational number


r = m/n, m, n ∈ Z, n = 0, we have (by (6.2.8))
m  ! "
1 m  
f x, y = mf x, y = f x, y ,
n n n
so
f (rx, y) = rf (x, y) ∀x, y ∈ H, ∀r ∈ Q .
Since f is continuous on H × H, this extends to r ∈ R, i.e.,

f (rx, y) = rf (x, y) ∀x, y ∈ H, ∀r ∈ R . (6.2.9)

Summarizing, we see that f satisfies (6.2.2), (6.2.3), (6.2.7), and (6.2.9),


so f (·, ·) is a scalar product and generates the given norm: x2 =
f (x, x), x ∈ H.
Sufficiency in the complex case K = C can be treated similarly, with
f : H × H → C defined by

1 m
3
f (x, y) = i x + im y2 , x, y ∈ H ,
4
m=0

where i is the imaginary unit.

Remark 6.2. In fact, the scalar product generating a norm is unique.


Indeed, if (·, ·) and ·, · are two scalar products such that (x, x) =
x, x = x2 , x ∈ H, then we easily derive from

(x + y, x + y) = x + y, x + y ∀x, y ∈ H ,

that
Re(x, y) = Rex, y ∀x, y ∈ H , (6.2.10)
and this completes the proof in the real case. If K = C, then by
replacing y by iy in (6.2.10), we also get

Im(x, y) = Imx, y ∀x, y ∈ H .

Remark 6.3. We have already noticed that Rk equipped with the usual
Euclidean norm is a Hilbert space, but Rk is not Hilbert with respect
to other norms, such as


k
u1 = |ui |, or umax = max |ui |, u = (u1 , . . . , uk ) ∈ Rk .
1≤i≤k
i=1
6.3 Projections in Hilbert Spaces 171

Indeed, one can easily find pairs of vectors that do not satisfy the
parallelogram law expressed in terms of these norms.
Similarly, L1 (a, b), −∞ ≤ a < b ≤ ∞, equipped with its usual norm,
is not a Hilbert space, as can be seen by finding a pair of functions
f, g ∈ L1 (a, b) that does not satisfy the parallelogram law (do it!).

6.3 Projections in Hilbert Spaces


A Hilbert space is similar in many respects to k-dimensional Euclidean
space. That is why Hilbert spaces are more useful in applications than
general Banach spaces.
Theorem 6.4. Let H be a Hilbert space with scalar product (·, ·) and
induced norm  · , and let C be a nonempty, convex, closed subset of
H. Then for all x ∈ H there exists a unique y ∈ C such that

x − y = d(x, C) := inf x − v . (6.3.11)


v∈C

Proof. First we prove the existence of y. If x ∈ C then d(x, C) = 0 so


a good candidate is y = x.
Assume x ∈ H \ C. Denote ρ = d(x, C). By the definition of inf, for
all n ∈ N there exists yn ∈ C such that
1
ρ ≤ x − yn  < ρ + ,
n
which gives
lim x − yn  = ρ . (6.3.12)
n→∞
We have ρ > 0. Indeed if ρ = 0, then by (6.3.12) yn → x and C is
closed, so x ∈ C, contradiction.
Apply the parallelogram law (see (6.2.1)) to x − yn and x − ym to get
 
2x − (yn + ym )2 + yn − ym 2 = 2 x − yn 2 + x − ym 2 , (6.3.13)

for all n, m. Consider the first term of the left-hand side of (6.3.13)
and factor out a 4

4x − (1/2)(yn + ym )2 ≥ 4ρ2 . (6.3.14)

Note that (1/2)(yn + ym ) is a convex combination of elements of C


and therefore is in C by convexity. Hence (see (6.3.13) and (6.3.14)),
 
yn − ym 2 ≤ 2 x − yn 2 + x − ym 2 − 4ρ2 . (6.3.15)
172 6 Hilbert Spaces

Using (6.3.12) we get that (yn ) is Cauchy because the right-hand side
of (6.3.15) converges to 0 as n, m → ∞. Therefore (yn ) converges
strongly to some y, and y ∈ C because C is closed. It follows from
(6.3.12) that
x − y = ρ .
We now prove uniqueness. Suppose x − y = ρ = x − y   for some
y, y  ∈ C. We use the parallelogram law for x − y, x − y  to obtain
 
2x − (y + y  )2 + y − y  2 = 2 x − y2 + x − y  2

which implies

4x − (1/2)(y + y  )2 + y − y  2 = 4ρ2 . (6.3.16)

(1/2)(y + y  ) ∈ C since it is a convex combination, therefore

4x − (1/2)(y + y  )2 ≥ 4ρ2

yielding (see (6.3.16))

y − y  2 ≤ 4ρ2 − 4ρ2 = 0 ,

and thus y = y  .

Remark 6.5. Both assumptions (C closed and convex) are essential.


For example, if C is an open disc in R2 , then there is no y for x ∈ R2 \C.
On the other hand, if C is not convex there may exist more (possibly
infinitely many) y’s for the same x, as the reader can easily imagine.

Definition 6.6. Let ∅ = C ⊂ H be a closed and convex set. A point y


as above is called the projection of x on C and is denoted y = PC x.
Since a projection exists and is unique for any x ∈ H we can define a
projection operator PC : H → C : x → y = PC x.

Theorem 6.7. Let H be a Hilbert space and let ∅ = C ⊂ H be a


closed and convex set. For x ∈ H, y ∈ C the following are equivalent:

(a) y = PC x;

(b) x − y ≤ x − v for all v ∈ C;

(c) Re(x − y, y − v) ≥ 0 for all v ∈ C;

(d) Re(x − v, y − v) ≥ 0 for all v ∈ C.


6.3 Projections in Hilbert Spaces 173

If H is a real Hilbert space, then the “Re” from (c) and (d) can be
removed.
Proof.
(a) ⇐⇒ (b) : Trivial.
(b) =⇒ (c) : x − y2 ≤ x − v2 for all v ∈ C. Let v = (1 − λ)y + λw
for 0 < λ < 1, and w ∈ C. Since v is a convex combination, v is in
C. We have
x − y2 ≤ x − y + λ(y − w)2
≤ x − y2 + 2λ Re(x − y, y − w) + λ2 y − w2 ,

so that

0 ≤ 2 Re(x − y, y − w) + λy − w2 .


Let λ → 0+ to find
Re(x − y, y − w) ≥ 0 for all w ∈ C .
(c) =⇒ (b) : Since Re(x − y, y − x + x − v) ≥ 0 we have
x − y2 ≤ Re(x − y, x − v)
≤ |(x − y, x − v)|
≤ x − y · x − v ∀v ∈ C ,

so if x − y = 0 then we are done; otherwise divide by it, and we get

x − y ≤ x − v , ∀v ∈ C .
(c) =⇒ (d) : Re(x − v + v − y, y − v) ≥ 0 for all v ∈ C so that
Re(x − v, y − v) ≥ y − v2 ≥ 0, ∀v ∈ C .
(d) =⇒ (c) : Replacing v in (d) by (1 − λ)y + λw for λ ∈ (0, 1), w ∈ C,
we get
Re(x − y + λ(y − w), λ(y − w)) ≥ 0 ,

and, as λ is strictly positive, this implies

Re(x − y, y − w) + λy − w2 ≥ 0 .


Thus, letting λ → 0+ we obtain
Re(x − y, y − w) ≥ 0 ∀w ∈ C .
174 6 Hilbert Spaces

Remark 6.8. The projection operator is Lipschitz.

Proof. Using condition (c) of Theorem 6.7 we have

Re(x1 − PC x1 , PC x1 − PC x2 ) ≥ 0 ,
Re(x2 − PC x2 , PC x1 − PC x2 ) ≥ 0 .

Add the two to obtain

Re(PC x1 − PC x2 , x1 − PC x1 − x2 + PC x2 ) ≥ 0 ,

which implies

Re(PC x1 − PC x1 , x1 − x2 ) ≥ PC x1 − PC x2 2 .

By the Bunyakovsky–Cauchy–Schwarz inequality, this leads to

PC x1 − PC x2  ≤ x1 − x2  ∀x1 , x2 ∈ H .

Thus PC is Lipschitz with constant L = 1. For this reason the operator


PC is also called nonexpansive.

Remark 6.9. Let C ⊂ H be a closed linear subspace. By condition (c)


of Theorem 6.7 we have for all v ∈ C, Re(x − y, y − v) ≥ 0, and in fact
we can write it as Re(x − y, v) ≥ 0 for all v ∈ C since C is a linear
subspace. Both v, −v ∈ C because of linearity and this gives equality
Re(x − y, v) = 0 for all v ∈ C. We can also replace v with iv, and so
Im(x − y, v) = 0, therefore

(x − y, v) = 0, ∀v ∈ C . (6.3.17)

In general, when two vectors w1 , w2 ∈ H satisfy (w1 , w2 ) = 0 they


are said to be orthogonal by analogy with orthogonality in Euclidean
space, and we write w1 ⊥ w2 . So, (6.3.17) can be expressed as (x−y) ⊥
C. The reader is invited to imagine what the orthogonality relation
(6.3.17) looks like in the Euclidean space R3 equipped with the usual
scalar product and norm.
6.4 The Riesz Representation Theorem 175

6.4 The Riesz Representation


Theorem
Let (H, (·, ·),  · ) be a Hilbert space and let M ⊂ H be a closed linear
subspace. The orthogonal complement M ⊥ of M is defined as

M ⊥ = {u ∈ H; (u, v) = 0 ∀v ∈ M }

and is a closed subset (subspace) because (·, ·) : H × H → K is contin-


uous.

Orthogonal Decomposition of H: We claim that any vector u ∈ H


can be written as u = u1 + u2 with u1 ∈ M and u2 ∈ M ⊥ , and this
decomposition is unique. We write H = M ⊕ M ⊥ and call it a direct
sum.

Proof. Note that u1 = PM u (which is unique) is the component in M ,


while u2 = u − u1 = u − PM u is in M ⊥ because (u − PM u, v) = 0 for
all v ∈ M (see (6.3.17)). Let us now prove that this decomposition
(u = u1 + u2 ) is unique.
Suppose that u = u1 + u2 = u1 + u2 with u1 , u1 ∈ M and u2 , u2 ∈ M ⊥ .
Then

0 = (u1 − u1 + u2 − u2 , u1 − u1 )


= u1 − u1 2 + (u2 − u2 , u1 − u1 ) ,

where the second term is 0 because u1 − u1 ∈ M , u2 − u2 ∈ M ⊥ . Thus


u1 − u1 2 = 0 so that u1 = u1 which in turn implies u2 = u2 .

Theorem 6.10 (Riesz Representation Theorem). Let (H, (·, ·),  · )


be a Hilbert space. For all f ∈ H ∗ (i.e., f is a continuous linear
functional from H to K) there exists a unique v ∈ H such that

f (u) = (u, v) ∀u ∈ H and v = f  .

Proof.

Step 1. We first show that such a v is unique. Suppose that (u, v) =


(u, v  ) for all u ∈ H, then (u, v − v  ) = 0 for all u ∈ H and in
particular (v − v  , v − v  ) = 0 so v = v  .
176 6 Hilbert Spaces

Step 2. We now prove the existence of v. If f = 0 then clearly v = 0


works. If f = 0 consider the nullspace N (f ) = {z ∈ H; f (z) =
0}. It is a closed linear subspace so H = N (f )⊕N (f )⊥ . In fact
N (f ) = H because f is not identically 0. Thus there exists
u0 ∈ N (f )⊥ \ {0}. We may assume f (u0 ) = 1 by scaling. Let
u ∈ H be arbitrary and define

w = u − f (u)u0 .

Now consider

f (w) = f (u) − f (u)f (u0 ) = f (u) − f (u) = 0 ,

showing that w ∈ N (f ). So

u = w + f (u)u0 with w ∈ N (f ), f (u)u0 ∈ N (f )⊥ ,

and this decomposition is unique. Thus

(u, u0 ) = (w, u0 ) + f (u)(u0 , u0 ) = 0 + f (u)u0 2 ,

and solving for f (u),


1 
f (u) = u, u 0 ,
u0 2

so f (u) is of the given form with v = u0 −2 u0 , and v is unique


by the previous step.

Step 3. We finally prove that v = f . For f = 0 this is obvious,


so assume that f = 0, which implies v = 0. By Bunyakovsky–
Cauchy–Schwarz

|f (u)| ≤ v · u ,

and by considering those u with u ≤ 1, we get

f  ≤ v . (6.4.18)

Now f (v) = (v, v) = v2 so that


1 
f v = v ,
v
6.4 The Riesz Representation Theorem 177

which combined with (6.4.18) shows that f  = v.

Remark 6.11. Recall that in Sect. 4.4 of Chap. 4 we asked whether


functionals f from the dual of (C[a, b],  · L2 (a,b) ), −∞ < a < b <
∞, can be expressed as f (u) = (u, v)L2 (a,b) , u ∈ C[a, b], with v ∈
C[a, b]. The answer is, in general, no. First of all, any f ∈ (C[a, b],  ·
L2 (a,b) )∗ can be extended by continuity to (L2 (a, b),  · L2 (a,b) ) which
is a Hilbert space. By the Riesz Representation Theorem, for each
such f (extended to L2 (a, b)) there exists a unique v ∈ L2 (a, b) such
that f (u) = (u, v)L2 (a,b) , ∀u ∈ L2 (a, b), but this v is not necessarily
an element of C[a, b] (i.e., v has no representative in C[a, b]). In fact,
we can consider f (u) = (u, v)L2 (a,b) , u ∈ L2 (a, b), with v ∈ L2 (a, b) \
C[a, b]; this f is continuous on (C[a, b], ·L2 (a,b) ) and its representation
as a scalar product, f (u) = (u, v)L2 (a,b) , is unique (i.e., v is unique);
but this v is not an element of C[a, b], so the answer to the above
question is negative.
Remark 6.12. In the proof of Theorem 6.10 we saw that for all u ∈ H,
0 = f ∈ H ∗ we have the decomposition u = w+f (u)u0 with w ∈ N (f ),
u0 ∈ N (f )⊥ , f (u0 ) = 1, so that dim N (f )⊥ = 1. Another way to say
this is that the codimension of N (f ) is 1. For such a functional f
and for some a ∈ K we have an affine subspace of H,

Y := {u ∈ H; f (u) = a} = au0 + N (f ) ,

whose codimension is 1 (i.e., the codimension of N (f ) is 1), thus Y is


a usual hyperplane if H is the Euclidean space.
Conversely, given a closed affine subspace Y of H of codimension 1,
i.e., Y = u1 + Z, for some u1 ∈ H, Z ⊂ H a closed linear subspace
with codimension 1, there exists u0 ∈ H \ {0} which is orthogonal on
Z, i.e., (u, u0 ) = 0, u ∈ Z. Define f : H → K,

f (u) = (u, u0 ), ∀u ∈ H ,

so that f ∈ H ∗ , N (f ) = Z, f = 0 (since f (u0 ) = u0 2 = 0) and Y


can be expressed by means of this f as follows:

Y = u1 + N (f ) = {u ∈ H; f (u) = f (u1 )} .
1
A simple example is H = L2 (0, 1), Z = {u ∈ H; 0 u(t) dt = 0}.
Clearly, Z is a closed linear subspace of H with codim Z = 1. Indeed,
any v ∈ H can be uniquely decomposed into
178 6 Hilbert Spaces


1  1 
v(t) = v(s) ds + v(t) − v(s) ds
0
 
0

u(t)
= C + u(t) , for a.a. t ∈ (0, 1) ,

where u ∈ Z and C is a constant, i.e., H = Span{1}⊕Z.


1 We can
choose u0 to be the constant function 1, so f (u) = 0 u(t) dt.
The Weak Topology of H
Taking into account the Riesz Representation Theorem, we see that
the weak topology of H is generated by the neighborhood system

Vv1 ,v2 ,...,vp ;ε = {x ∈ H; |(x, vj )| < ε, j = 1, . . . , p},


ε > 0, v1 , . . . , vp ∈ H, p ∈ N .

So the fact that a sequence (xn ) in H converges weakly to some x


means (xn , v) → (x, v) for all v ∈ H.
If dim H = ∞ then we can use the Gram–Schmidt method (see Chap. 1)
to construct an infinite orthonormal sequence
(x1 , x2 , . . . , xn , . . . ). This sequence converges weakly to 0. Indeed,
for v ∈ H arbitrary, we have
*N *2
* * N 
N
* *
* (v, xn )xn − v * = |(xn , v)|2 − 2 |(xn , v)|2 + v2
* *
n=1 n=1 n=1

N
= v2 − |(xn , v)|2
n=1
≥ 0,

so that


N
|(xn , v)|2 ≤ |v|2 , ∀N ∈ N ,
n=1
∞
which is known as Bessel’s inequality.3 So the series n=1 |(xn , v)|
2 is
convergent and consequently

3
Friedrich Wilhelm Bessel, German astronomer, mathematician, physicist and
geodesist, 1784–1846.
6.4 The Riesz Representation Theorem 179

(xn , v) → 0, ∀v ∈ H ,

i.e., (xn ) converges weakly to 0. But (xn ) is not strongly convergent


(to 0) since xn  = 1 for all n ∈ N. Therefore, weak convergence in any
infinite dimensional Hilbert space is different from strong convergence.

Based on the Riesz Representation Theorem, we can define the so-


called Riesz operator R : H → H ∗ by v → (· , v) so that (Rv)(u) =
(u, v) for all u, v ∈ H and Rv = v. As seen before, R is also
bijective.

Theorem 6.13. Every Hilbert space is reflexive.


φ
Proof. Let φ : H → H ∗∗ , v → fv ∈ H ∗∗ such that fv (x∗ ) = x∗ (v) for
all x∗ ∈ H ∗ .
As we have already seen, φ is injective. For the convenience of the
reader, let us prove this again in the present context. If fv = 0,
x∗ (v) = 0 for all x∗ ∈ H ∗ which implies, by the Riesz Representation
Theorem, that (v, w) = 0 for all w ∈ H so that v = 0. Thus φ is
injective.
We now prove that φ is surjective. Let x∗∗ ∈ H ∗∗ and define u∗ ∈ H ∗
by u∗ (v) := x∗∗ (Rv) for all v ∈ H. Denote u = R−1 u∗ and calculate
x∗∗ (x∗ ) = x∗∗ (R(R−1 x∗ ))
= u∗ (R−1 x∗ )
= (R−1 x∗ , u)
= (u, R−1 x∗ )
= x∗ (u)
= fu (x∗ ) ,
so that all functionals x∗∗ are of the form fu (x∗ ), and φ is onto, i.e.,
for all x∗∗ ∈ H ∗∗ there exists u ∈ H such that x∗∗ = fu .

Remark 6.14. The above proof is a direct one. In fact, Theorem 6.13
follows from the Milman–Pettis4 general result we state without proof:
every uniformly convex Banach space is reflexive.
Recall that a normed space (H,  · ) is said to be uniformly convex if
∀ε ∈ (0, 2) ∃δ > 0 such that ∀x, y ∈ H, x ≤ 1, y ≤ 1, x − y > ε
we have (1/2)(x + y) < 1 − δ.
4
David P. Milman, Soviet and later Israeli mathematician, 1912–1982.
180 6 Hilbert Spaces

If H is a Hilbert space, it follows easily by using the parallelogram law


that H is uniformly convex, hence reflexive (by Milman–Pettis).

6.5 Lax–Milgram Theorem


We begin this section with a preparatory lemma whose proof is based
on the Banach Contraction Principle.
Lemma 6.15. Let (H, (·, ·),  · ) be a real Hilbert space and let A :
H → H be a not necessarily linear operator satisfying
(a) (Au − Av, u − v) ≥ cu − v2 for all u, v ∈ H (strong mono-
tonicity);
(b) Au − Av ≤ Lu − v for all u, v ∈ H (Lipschitz condition),
where c and L are given positive constants. Then for all w ∈ H there
exists a unique u∗ ∈ H such that Au∗ = w, i.e., A is a bijection.
Proof. We first prove uniqueness: Suppose u1 , u2 ∈ H such that Au1 =
w = Au2 . Then by (a),
0 = (Au1 − Au2 , u1 − u2 ) ≥ cu1 − u2 2 ,
which implies u1 = u2 .
We now prove existence: First we note that c ≤ L by using (a) and
(b) together with Bunyakovsky–Cauchy–Schwarz. For a fixed w ∈ H,
define B : H → H by
Bu = u − t(Au − w), t > 0, u ∈ H .
Note that if there is a fixed point of B then it is u∗ as desired. We wish
to apply the Banach Contraction Principle in (H, d), where d(u, v) =
u − v. We have for all u, v ∈ H
d(Bu, Bv)2 = Bu − Bv2
= u − v2 − 2t(u − v, Au − Av) + t2 Au − Av2
≤ u − v2 − 2tcu − v2 + t2 L2 u − v2
     
from (a) from (b)

= (1 − 2tc + t L ) u − v
2 2 2
  
call this m
= mu − v2
= md(u, v)2 .
6.5 Lax–Milgram Theorem 181

Obviously, m ≥ 0. We choose t to minimize m = m(t) and find that


t = Lc2 . Thus the minimum value of m is

c2 c2 c2
m=1−2 + = 1 − ≥ 0,
L2 L2 L2
since c ≤ L. If c = L, then m = 0, so B is constant, i.e., Bu = w0 , so
that w0 = u − (c/L2 )(Au − w). In this case A is affine, namely
L2
Au = (u − w0 ) + w ,
c
so that u∗ = w0 .
When c < L then 0 < m < 1 so that B is a contraction and hence by
the Banach Contraction Principle (see Sect. 2.5) B has a unique fixed
point u∗ .

Theorem 6.16 (Nonlinear Lax–Milgram Theorem). 5 Let H be a


real Hilbert space and consider two functionals a : H × H → R and
b : H → R satisfying
1. For all u ∈ H the map v → a(u, v) is linear and continuous on
H (i.e., it belongs to H ∗ );

2. a(u, u − v) − a(v, u − v) ≥ cu − v2 for all u, v ∈ H and some


c > 0;
3. |a(u, w) − a(v, w)| ≤ Lu − v · w for all u, v, w ∈ H and
some L > 0;
4. b is a continuous linear functional (i.e., b ∈ H ∗ ).
Then there exists a unique u ∈ H such that

a(u, v) = b(v) ∀v ∈ H . (6.5.19)

Proof. By the first assumption and the Riesz Representation The-


orem 6.10 for all u ∈ H there exists a unique z ∈ H such that
a(u, v) = (v, z) for all v ∈ H. So there exists an operator A : H → H
defined by Au := z. We now rewrite the second condition

a(u, u − v) − a(v, u − v) = (u − v, Au) − (u − v, Av)


= (u − v, Au − Av)
5
Peter D. Lax, Hungarian-born American mathematician, born 1926; Arthur
N. Milgram, American mathematician, 1912–1961.
182 6 Hilbert Spaces

and since K = R

= (Au − Av, u − v)
≥ cu − v2 ,
for all u, v ∈ H, so A satisfies condition (a) of the previous lemma.
From the third assumption we have for all u, v, z ∈ H
|a(u, z) − a(v, z)| = |(z, Au) − (z, Av)|
= |(z, Au − Av)|
≤ Lu − v · z .
Choosing z = Au − Av we see that operator A also satisfies condition
(b) of Lemma 6.15.
On the other hand, by the fourth assumption and the Riesz Represen-
tation Theorem there exists a unique w such that b(v) = (v, w) for all
v ∈ H. Now (6.5.19) can be written as
 
(v, Au) = (v, w), ∀v ∈ H ⇐⇒ Au = w ,
so the conclusion of the theorem follows by Lemma 6.15.

Theorem 6.17 (Classic Lax–Milgram Theorem). Let H be a real


Hilbert space and consider two functionals a : H × H → R and
b : H → R satisfying
1. a is bilinear;
2. a is bounded (continuous) on H×H, namely |a(u, v)| ≤ Lu·v
for all u, v ∈ H for some L > 0;
3. a is strongly positive (or coercive), i.e., there exists c > 0 such
that a(v, v) ≥ cv2 for all v ∈ H;
4. b is linear and continuous (i.e., b ∈ H ∗ ).
Then there exists a unique u ∈ H satisfying
a(u, v) = b(v) ∀v ∈ H. (6.5.19 )
If, in addition, a is symmetric (i.e., a(u, v) = a(v, u) for all u, v ∈
H) then u is a solution of (6.5.19 ) if and only if it is a solution
(minimizer) of the quadratic minimization problem
 )
1
min a(v, v) − b(v) . (6.5.20)
v∈H 2
6.5 Lax–Milgram Theorem 183

Proof. Observe that the conditions of Theorem 6.16 are satisfied, so


all that remains is to prove the final statement.
Define
1
F (v) = a(v, v) − b(v), v ∈ H .
2

If u is a solution of (6.5.20) then F (u) ≤ F (v) for all v ∈ H. Define


φ(t) = F (u + tv) for t ∈ R, v ∈ H. We have
1
φ(t) = a(u + tv, u + tv) − b(u + tv)
2
1 1
= a(u, u) + ta(u, v) + t2 a(v, v) − b(u) − tb(v)
2 2
  1
= F (u) + t a(u, v) − b(v) + t2 a(v, v) .
2
Therefore,
φ (t) = a(u, v) − b(v) + ta(v, v) ,
hence
a(u, v) − b(v) = φ (0) = 0 ,
since t = 0 is a minimizer of φ, so that u satisfies (6.5.19 ) because v
is arbitrary.
Conversely, suppose that u satisfies (6.5.19 ). We must show F (u) ≤
F (v) for all v ∈ H. It is enough to prove F (u+v)−F (u) is nonnegative:

1 1
F (u + v) − F (u) = a(u + v, u + v) − b(u + v) − a(u, u) + b(u)
2 2
1 1 1
= a(u, u) + a(u, v) + a(v, v) − b(v) − a(u, u)
2    2 2
symmetric
1
= a(u, v) − b(v) + a(v, v)
   2
=0
1
= a(v, v)
2
≥ 0.

So F (u) ≤ F (u + v) for all v ∈ H which implies u is a solution


to (6.5.20).

Next we illustrate the above results with some applications.


184 6 Hilbert Spaces

Dirichlet’s Principle6 : Let ∅ = Ω ⊂ Rk be a bounded domain. For


all f ∈ L2 (Ω) there exists a unique u ∈ H01 (Ω) which is a solution to
the following minimization problem:
 )
1
min ∇v · ∇v dx − f v dx , (6.5.21)
v∈H01 (Ω) 2 Ω Ω

and equivalently u is a solution to



u ∈ H01 (Ω) ,
  (6.5.22)
Ω ∇u · ∇v dx = Ω f v dx ∀v ∈ H01 (Ω) .

Remark 6.18. In the sense of distributions we can rewrite (6.5.22) to


be
u ∈ H01 (Ω), −Δu = f in Ω , (6.5.23)
which is known as the Euler–Lagrange equation7 associated with
the minimization problem (6.5.21) (being a Poisson equation in this
example) and u being 0 on the boundary is interpreted as meaning the
trace of u on the boundary ∂Ω is 0. Indeed, for every test function
φ ∈ C0∞ (Ω) we have

∇u · ∇φ dx = f φ dx ⇐⇒ (−Δu, φ) = (f, φ) ,
Ω Ω

i.e., −Δu = f in D (Ω). Since f is in L2 (Ω), −Δu is as well, so


u satisfies the equation −Δu = f for a.a. x ∈ Ω. In fact, if ∂Ω is
smooth enough, then u ∈ H01 (Ω) ∩ H 2 (Ω) (see [39, Theorem 3.1, p.
212]). Moreover, if f ∈ C ∞ (Ω) then so is u. Actually, the following
regularity result holds.

Lemma 6.19 (Weyl). If ∅ = Ω ⊂ Rk is open, f ∈ L∞ (Ω), and u ∈


D (Ω) satisfies the equation −Δu = f in the sense of distributions,
then u ∈ C ∞ (Ω).

Proof of Dirichlet’s Principle.


We wish to use the classical Lax–Milgram Theorem 6.17. Denote H :=
H01 (Ω). Recall that H is a real Hilbert space as a closed subspace of
H 1 (Ω). According to Remark 5.26, H can be equipped with the norm

6
Johann Peter Gustav Lejeune Dirichlet, German mathematician, 1805–1859.
7
Joseph-Louis Lagrange, Italian mathematician and astronomer, 1736–1813.
6.5 Lax–Milgram Theorem 185
! "1/2
u∗ = |∇u| dx2
, u ∈ H = H01 (Ω) ,
Ω

which is equivalent with the usual H 1 (Ω) norm. Define a : H ×H → R


and b : H → R by

a(u, v) := ∇u · ∇vdx , b(v) := f v dx .
Ω Ω

Clearly a is bilinear and symmetric. Moreover, a is also continuous


(bounded),
|a(u, v)| ≤ u∗ · v∗ ∀u, v ∈ H ,
and coercive

a(v, v) = ∇v · ∇v dx = v2∗ ∀v ∈ H .
Ω

Obviously, b is linear and also continuous because

|b(v)| ≤ f L2 (Ω) vL2 (Ω)

so by Poincaré’s inequality

≤ Cf L2 (Ω) v∗ ∀v ∈ H .

Thus all the conditions of Theorem 6.17 are fulfilled, so the proof of
Dirichlet’s Principle is complete.

Now let us consider the following nonlinear boundary value problem:



−Δu(x) + β(u(x)) = f (x), x ∈ Ω ,
(6.5.24)
u = 0, x ∈ ∂Ω ,

where ∅ = Ω ⊂ Rk is a bounded domain, f ∈ L2 (Ω), and β : R → R is


a nonlinear Lipschitz continuous, nondecreasing function. We wish to
prove that problem (6.5.24) has a unique solution u ∈ H01 (Ω). To this
purpose we can apply Theorem 6.16 with H = H01 (Ω) equipped with
the norm  · ∗ as above, and with a : H × H → R, b : H → R defined
by

a(u, v) = ∇u · ∇v dx + β(u)v dx , b(v) = f v dx .
Ω Ω Ω
186 6 Hilbert Spaces

It is a simple exercise to show that all the assumptions of Theorem 6.16


are fulfilled, so there is a unique u ∈ H = H01 (Ω) satisfying

u(u, v) = b(v), ∀v ∈ H ,

i.e., −Δu + β ◦ u = f in D (Ω). Note that β◦u ∈ L2 (Ω) so −Δu =


f − β(u) is in L2 (Ω) as well, i.e., u satisfies the given equation for
a.a. x ∈ Ω. In fact, if ∂Ω is smooth enough, then u ∈ H 2 (Ω) (cf. [39,
Theorem 3.1, p. 212]).

6.6 Fourier Series Expansions


Let (H, (·, ·),  · ) be a Hilbert space with m := dim H ≥ 1.
If m < ∞ then starting from a basis of H, say B = {e1 , . . . , em }, one
can construct by the Gram–Schmidt procedure (see Chap. 1) an or-
thonormal basis B  = {u1 , . . . , um }, i.e., (ui , uj ) = δij , i, j = 1, . . . , m.
So every u ∈ H can be written as


m
u= ci ui , ci ∈ K, i = 1, . . . , m .
i=1

This yields ci = (u, ui ), i = 1, . . . , m, hence


m
u= (u, ui )ui , ∀u ∈ H . (6.6.25)
i=1

(6.6.25) is called the Fourier expansion of u and (u, ui ) are called


Fourier coefficients.8

In what follows we are interested in Fourier series expansions in the


case m = ∞. A set S ⊂ H is said to be an orthonormal set if for
any pair u, v ∈ S, u = v, we have

u = v = 1, and (u, v) = 0 .

An orthonormal set S ⊂ H is called a complete orthonormal sys-


tem in H if it is not properly included in any other orthonormal set
in H.

8
Jean-Baptiste Joseph Fourier, French mathematician and physicist,1768–1830.
6.6 Fourier Series Expansions 187

Remark 6.20. Claim: any Hilbert space H = {0} has a complete


orthonormal system, and any orthonormal set can be extended to a
complete orthonormal system. Indeed, choosing x ∈ H \ {0} and
denoting u1 = (1/x)x we see that {u1 } is an orthonormal system
in H. Consider the collection of all orthonormal systems in H which
contain {u1 }. This collection is partially ordered with respect to the
usual inclusion relation. By Zorn’s Lemma there exists a maximal
element for the collection, which is a complete orthonormal system in
H. If m = ∞ then this system is infinite, be it countable or not (this
issue will be clarified later).
Theorem 6.21. Let (H, (·, ·),  · ) be an infinite dimensional Hilbert
space and let S = {un }n∈N ⊂ H be a countably infinite orthonormal
system. Then the following are equivalent:
(a) S is complete;

(b) u = ∞ n=1 (u, un )un ∀u ∈ H;
∞
n=1 |(u, un )| = u ∀u ∈ H (Parseval’s relation)9 ;
(c) 2 2

(d) Span S is dense in H.


Proof. First of all, using the orthogonality of system S, we have for
all u ∈ H and N ∈ N

N 
N
0≤ (u, un )un − u2 = u2 − |(u, un )|2 . (6.6.26)
n=1 n=1

We deduce from (6.6.26) that (b) ⇐⇒ (c).


Let us prove that (b) =⇒ (a). Assume by contradiction that (b) holds,
but S is not complete, i.e., there exists a vector û ∈ H \ S such that
û = 1, and (û, un ) = 0 ∀n ∈ N. From (b) with u = û it then follows
û = 0 which is a contradiction.
Now, we prove that (a) =⇒ (b). Fix u ∈ H. By a standard computa-
tion we get
* m+p *2 m+p
* * 
* *
* (u, un )un * = |(u, un )|2 . (6.6.27)
*n=m * n=m
∞
n=1 |(u, un )|
Since the numerical series 2 is convergent
(see (6.6.26)), we deduce from (6.6.27) that the sequence of partial
9
Marc-Antoine Parseval, French mathematician, 1755–1836.
188 6 Hilbert Spaces

sums of the series in (b) is Cauchy in H, hence convergent to some


ũ ∈ H, so we can write


ũ = (u, un )un . (6.6.28)
n=1

We compute
 

N
(ũ, uj ) = lim (u, un )un , uj = (u, uj ) ∀j ∈ N ,
N →∞
n=1
so
(ũ − u, uj ) = 0 ∀j ∈ N,
which implies ũ = u by the completeness of S. Therefore ũ in (6.6.28)
can be replaced by u.
It is clear that (b) =⇒ (d). To complete the proof it suffices to show
(d) =⇒ (a). Assume by contradiction that (d) holds but S is not
complete, i.e., there exists a vector v ∈ H \ S such that v = 1, and
(v, un ) = 0 ∀n ∈ N. According to (d), we obtain (v, w) = 0 ∀w ∈ H,
hence v = 0, another contradiction.

Remark 6.22. According to Theorem 6.21, if S = {un }n∈N is a com-


plete orthonormal system in H, then every u ∈ H is the sum of the
Fourier series associated with it (see (b)), similar to the finite dimen-
sional case m < ∞. That is why S is also called a countable or-
thonormal basis of H. The next result is a characterization of the
Hilbert spaces possessing countable orthonormal bases.
Theorem 6.23. A Hilbert space has a countable orthonormal basis if
and only if it is separable.
Proof. Let H be a Hilbert space. Denote m := dim H.
If m < ∞, then the result is trivial, so let us assume m = ∞.
Let S = {un }n∈N be a (countable) orthonormal basis in H. Then
Span S is dense in H (cf. Theorem 6.21). On the other hand, using
the fact that Q is dense in R, we can show that there exists a countable
subset of Span S which is dense in Span S, hence in H. Indeed, for any
u ∈ Span S, say u = pk=1 αk uk , and any ε > 0, there are numbers
rk ∈ Q if H is a real Hilbert space, or rk ∈ Q + iQ if H is a complex
Hilbert space, such that
* *
* p *
ε * *
|rk − αk | < , k = 1, . . . , p =⇒ *u − rk uk * < ε.
p * *
k=1
6.6 Fourier Series Expansions 189

Thus H is separable.
Conversely, assume H is separable, i.e., there exists a countably in-
finite set, say M = {x1 , x2 , . . . , xn , . . . } such that M = H. Using
Gram–Schmidt (see Chap. 1) we can construct with vectors from M
an orthonormal system S = {u1 , u2 , . . . , un , . . . } eliminating depen-
dent vectors of M if any. An inspection of the Gram–Schmidt method
shows that in fact M ⊂ Span S so that

H = M ⊂ Span S ⊂ H ⇒ Span S = H ,

so S is an orthonormal basis (cf. Theorem 6.21).

Remark 6.24. If H is not separable, then the existence of a complete


orthonormal system S = {ui }i∈I in H is still valid (cf. Remark 6.20).
Obviously, the index set I is no longer countable. Surprisingly, in this
case, for every u ∈ H there is a sequence of indices i1 , i2 , . . . such that


u= (u, uij )uij ,
j=1

i.e., u has a Fourier series expansion as in the separable case. For the
proof of this result, see [51, pp. 86–87].

A Classical Fourier Series Expansion


Let H = L2 (−π, π) with the usual scalar product
π
(f, g) = f (x)g(x) dx , f, g ∈ H ,
−π

and f  = (f, f ) for all f ∈ H. Let S = {un }∞
n=0 , where

1 1 1
u0 = √ , u2k−1 (x) = √ cos kx, u2k (x) = √ sin kx, k = 1, 2, . . . .
2π π π

By a straightforward computation we can see that S is an orthonormal


system in H. Moreover, S is complete as stated in the following result.

Theorem 6.25 (Fischer10 –Riesz). The orthonormal system S as above


is a basis in H = L2 (−π, π).

10
Ernst Sigismund Fischer, Austrian mathematician, 1875–1954.
190 6 Hilbert Spaces

Proof. According to Theorem 6.21 it suffices to prove that


Span S = H. We know that C0∞ (−π, π) is dense in L2 (−π, π) (see
Theorem 5.8). To conclude we can use Weierstrass’ lemma below (cf.
[52, p. 205]). This is an approximation result with respect to the
sup-norm of C[−π, π] which is obviously stronger than the norm of
H = L2 (−π, π).

Lemma 6.26 (Weierstrass). Span S is dense in the space

X = {f ∈ C[−π, π]; f (−π) = f (π)}

equipped with the sup-norm  · C , where S is the function system


defined above.

Proof. Let f ∈ X be even, i.e., f (−x) = f (x), x ∈ [−π, π]. Since the
function y → f (arccos y) is continuous on [−1, 1], for all ε > 0 there
exists a Bernstein11 polynomial p such that

sup |f (arccos y) − p(y)| < ε ⇐⇒ sup |f (x) − p(cos x)| < ε .


y∈[−1,1] x∈[0,π]
(6.6.29)
In fact, since both f and x → p(cos x) are even, we can extend (6.6.29)
to [−π, π],
sup |f (x) − p(cos x))| < ε . (6.6.30)
x∈[−π,π]

By elementary trigonometric formulas we see that p(cos x) ∈ Span S,


so (6.6.30) concludes the proof in the case when f is even.
Now, consider an odd function f ∈ X, so f (−π) = f (π) = f (0) = 0.
f (x)
Then x → sin x is an even function, but has singularities at x = 0, ±π.
So we consider for δ > 0 small
 π(x−δ) 
f π−2δ , x ∈ (δ, π − δ),
f˜(x) =
0, x ∈ [0, δ] ∪ [π − δ, π],

and f˜(x) := −f˜(−x) for x ∈ [−π, 0). Clearly, f˜ is a continuous odd


function which approximates f uniformly. Now define

f (x)
, x ∈ [−π, π] \ {0, ±π} ,
ψ(x) = sin x
0 x ∈ {0, ±π} .
11
Sergei N. Bernstein, Russian mathematician, 1880–1968.
6.6 Fourier Series Expansions 191

Thus, by the first part of the proof,

∀ε > 0, ∃q ∈ Span S such that ψ −qC < ε =⇒ f˜−q sin xC ≤ ε .

Obviously q sin x ∈ Span S and thus odd continuous functions can be


approximated as well by elements in Span S.
To conclude the proof, it is enough to notice that any function f can
be decomposed into f = fe + fo where

1 1
fe (x) = [f (x) + f (−x)], fo (x) = [f (x) − f (−x)] ,
2 2
are even and odd, respectively.

Some Comments

1. Since L2 (−π, π) has a countable orthonormal basis S, it fol-


lows that L2 (−π, π) is separable (by Theorem 6.23). Obviously,
L2 (a, b) is separable for any a < b. In fact, for any measurable
set Ω ⊂ Rk , Lp (Ω) is separable for all p ∈ [1, ∞) (see, e.g., [6, p.
95]).

2. By Theorems 6.21 and 6.25 it follows that every


u ∈ L2 (−π, π) is the sum of the Fourier series associated with it,
i.e.,
∞
u= (u, un )un , (6.6.31)
n=0
n
meaning that sn (u) = k=0 (u, uk )uk converges strongly to u
in L2 (−π, π). Taking into account the structure of the basis S,
(6.6.31) can be written as

a0 
u(x) = + (an cos nx + bn sin nx) , (6.6.32)
2
n=1

where
π
1 1 π
a0 = u(t) dt, ak = u(t) cos(kt) dt,
π −π π −π
π
1
bk = u(t) sin(kt) dt , (6.6.33)
π −π
192 6 Hilbert Spaces

for all k ∈ N. Note that (6.6.32) is precisely the classical form of


the Fourier series associated with u. For the moment, we know
(by Fischer–Riesz) that for u ∈ L2 (−π, π) the series expansion
(6.6.32) is valid in L2 (−π, π), i.e.,

a0 
n
sn (u)(x) = + (ak cos kx + bk sin kx) (6.6.34)
2
k=1

converges to u in L2 (−π, π). Then there is a subsequence of


(sn (u)) that converges to u for a.a. x ∈ (−π, π). There is a
question whether the sequence (sn (u)) itself converges a.e., i.e.,
(6.6.32) holds for a.a. x ∈ (−π, π). This question was posed in
1920 by Luzin.12 In 1966, Carleson13 proved that this is indeed
the case. The proof is not trivial and is omitted. Later, Hunt14
extended the result to Lp -functions, i.e., the series expansion
(6.6.32) holds a.e. for every Lp -function u, for 1 < p < ∞. On
the other hand, in 1922 Kolmogorov15 gave a counterexample
showing that it does not hold for p = 1. However, the Fourier
expansion (6.6.32) holds for L1 -functions in the sense of distri-
butions, as explained below.

Fourier Series Expansions of L1 Functions


Recall that in general L1 functions do not admit Fourier series expan-
sions in classical theory. However, the Fourier coefficients of u (see
(6.6.33)) are still well defined if u ∈ L1 (−π, π). Fix such a function
u ∈ L1 (−π, π) and associate with it the series

a0 
u(x) ≈ + (an cos nx + bn sin nx) .
2
n=1

We can prove that



a0 
u(x) = + (an cos nx + bn sin nx) in D (−π, π) , (6.6.35)
2
n=1

where ak , bk are the Fourier coefficients of u defined in (6.6.33).


12
Nikolai N. Luzin, Russian mathematician, 1883–1950.
13
Lennart Axel Edvard Carleson, Swedish mathematician, born 1928).
14
Richard Allen Hunt, American mathematician, 1937–2009.
15
Andrey N. Komogorov, Russian mathematician, 1903–1987.
6.6 Fourier Series Expansions 193

Recall that distributions are not defined pointwise, and the appearance
of x in (6.6.35) is simply for convenience.
In order to prove (6.6.35), consider the series

a0 2  an 

bn
x + − 2 cos nx − 2 sin nx ,
4 n n
n=1

which is obtained by formally integrating twice in the right-hand side


of (6.6.35). This series is uniformly and absolutely convergent since
for all n ≥ 1

 a 
 n bn  1
 − 2 cos nx − 2 sin nx ≤ 2 (|an | + |bn |)
n n n
π
4
≤ 2 |u(t)| dt
n π −π
1
=C 2.
n

Let

a0  an∞
bn 
s(x) = x2 + − 2 cos nx − 2 sin nx . (6.6.36)
4 n n
n=1

Of course, uniform convergence on [−π, π] implies convergence in


D (−π, π), so (6.6.36) also holds in D (−π, π). Differentiating (6.6.36)
twice in the sense of distributions we get


a0 
+ (an cos nx + bn sin nx) = s in D (−π, π) . (6.6.37)
2
n=1

Finally we must show that s = u, i.e., s is generated by the function
u. We consider the partial sums

a0 
l
sl (u)(x) = + (an cos nx + bn sin nx) .
2
n=1
194 6 Hilbert Spaces

For φ ∈ D(−π, π) = C0∞ (−π, π) we have


π
(sl (u), φ) = sl (u)(x)φ(x) dx
−π
π # $
a0 
l
= φ(x) + (an cos nx + bn sin nx) dx
−π 2
n=1
π #
π
1
= φ(x) u(t) dt
−π 2π −π
 π
 l
1
+ cos nx u(t) cos nt dt
π −π
n=1
π $
1
+ sin nx f (t) sin nt dt dx ,
π −π

and now we change the order of integration to get


π
(sl (u), φ) = u(t)sl (φ)(t) dt . (6.6.38)
−π

On the other hand,

lim sl (φ)(t) = φ(t) uniformly for t ∈ [−π, π] . (6.6.39)


l→∞

Indeed, if we denote

1 π 1 π
Ak = φ(t) cos kt dt (k ≥ 0), Bk = φ(t) sin kt dt (k ≥ 1) ,
π −π π −π

we may integrate by parts twice (since φ is infinitely differentiable), so


that for k ≥ 1
π
1
Ak = − φ (t) sin kt dt
kπ −π
π
1
=− 2 φ (t) cos kt dt
k π −π

and similarly
π
1
Bk = − 2 φ (t) sin kt dt .
k π −π
6.7 Exercises 195

Therefore there exists a constant C1 > 0 (depending on φ) such that

C1 C1
|Ak | ≤ 2
, |Bk | ≤ 2 , ∀k ≥ 1 . (6.6.40)
n n

As Ak , Bk are the Fourier coefficients of φ, we deduce from (6.6.40)


that the Fourier series of φ is uniformly convergent (see Weierstrass’ M
Test) and its sum is φ (by the classical theory, or by Theorem 6.25),
i.e., (6.6.39) holds. Finally, taking into account (6.6.39) and letting
l → ∞ in (6.6.38), we get
π

(s , φ) = u(t)φ(t) dt = (u, φ).
−π

As φ was arbitrarily chosen this implies s = u, as claimed.

6.7 Exercises
1. Let ∅ = Ω ⊂ Rk be an open set and let p ∈ (1, ∞). It is well
known that Lp (Ω) is a Banach space with respect to the usual
norm
 1/p
uLp (Ω) = |u(x)| dx
p
, u ∈ Lp (Ω).
Ω
 
Prove that Lp (Ω),  · Lp (Ω) is a Hilbert space if and only if
p = 2.

2. Let H be a pre-Hilbert space, i.e., a linear space equipped with


a scalar product (·, ·) and the induced norm  · . Show that for
x, y ∈ H we have |(x, y)| = x · y if and only if x and y are
linearly dependent.

3. Let −∞ < a < b < ∞. Show that C[a, b] with the sup-norm is
not a Hilbert space.

4. Let n be a given natural number. Let C be the set of all poly-


nomials with real coefficients of degree ≤ n. Show that for any
u ∈ L2 (0, 1) there exists a unique pu ∈ C such that

u − pu L2 (0,1) ≤ u − pL2 (0,1) ∀p ∈ C.


196 6 Hilbert Spaces

5. Let (H,  · ) be a Hilbert space. Define P : H → H, by



u if u ≤ 1,
Pu = −1
u u if u > 1.

(Operator P is called radial retraction). Prove that

(i) P is nonexpansive, i.e., Lipschitzian with Lipschitz constant


L = 1;
(ii) if H is a general Banach space, then P is Lipschitzian with
L = 2.

6. Let R3 be equipped with the usual scalar product and Euclidean


norm. Set

M = {x = (x1 , x2 , x3 )T ∈ R3 ; 2x1 − x2 − 3x3 = 0}.

Show that M is a closed linear subspace of R3 . Determine M ⊥


and for x = (1, 2, −1)T determine PM x and write x as a direct
sum of vectors in M and M ⊥ , i.e., x = x1 +x2 , x1 ∈ M, x2 ∈ M ⊥ .

7. Let −∞ < a < b < +∞ and let L2 (a, b) := L2 (a, b; R) be


equipped with the usual scalar product and norm. Show that
 b )
M = u ∈ L (a, b);
2
u(t) dt = 0
a

is a closed linear subspace of L2 (a, b). Determine M ⊥ and write


any u ∈ L2 (a, b) as a direct sum of vectors in M and M ⊥ , i.e.,
u = u1 + u2 , u1 ∈ M, u2 ∈ M ⊥ .

8. Same exercise for L2 (−1, 1) and

M = {u ∈ L2 (−1, 1); u(t) = u(−t) for a.a. t ∈ (−1, 1)}.

9. Show that any linear subspace Y of a Hilbert space (H, (·, ·)) one
has  ⊥ ⊥
Y = Cl Y.

10. Let H = L2 (0, 1) be the real Hilbert space equipped with the
usual scalar product and norm. Is the subspace Y = {u ∈
1
H; 0 u(t)
t dt = 0} closed in H?
6.7 Exercises 197

11. Prove that the dual of any Hilbert space is a Hilbert space, too.
12. Let {un }∞n=1 be an orthonormal basis in a Hilbert space H and
let (an )n∈N be a bounded sequence in R. Prove that

(i) the sequence (vn )n∈N defined by

1
n
vn = ai ui , n ∈ N,
n
i=1

converges strongly to zero;



(ii) the sequence ( nvn )n∈N converges weakly to zero.
13. Let (H,  · ) be a Hilbert space and A ∈ L(H). Show that the
following two conditions are equivalent:

(i) there exists a constant c > 0 such that cx ≤ Ax ∀x ∈


H;

(ii) there exists an operator B ∈ L(H) such that B ◦ A = I,


where I is the identity operator on H.
14. Let (H,  · , (·, ·)) be a real Hilbert space. For any A ∈ L(H)
satisfying (Ax, x) ≥ 0 ∀x ∈ H, we have

(i) H = N (A) ⊕ [Cl R(A)];

(ii) for all t > 0, I + tA is bijective and


lim (I + tA)−1 u = PN (A) u ∀u ∈ H,
t→∞

where I denotes the identity operator.


15. Let (un )n∈N be a sequence in a Hilbert space (H,  · ) which
is weakly convergent to a point u ∈ H. If, in addition, lim sup
un  ≤ u then show un − u → 0.
16. Prove that for any f ∈ L1 (0, 1) there exists a unique u ∈ H01 (0, 1)
satisfying
1 1 1
 
u (t)v (t) dt + u(t)v(t) dt = f (t)v(t) dt ∀v ∈ H01 (0, 1),
0 0 0
198 6 Hilbert Spaces

and, furthermore, u ∈ W 2,1 (0, 1) and


−u + u = f a.e. in (0, 1),
u(0) = 0, u(1) = 0.

17. Let f ∈ L2 (0, 1) and α > 0.

(i) Show that the following boundary value problem, denoted


(P ),



⎨u ∈ H (0, 1),
2

−u (t) + αu(t) = f (t) for a.a. t ∈ (0, 1),



⎩ 
u (0) = 0, u (1) = u(1),

is equivalent to the variational formulation, denoted (P̃ ),

1 1
u ∈ H 1 (0, 1), −u(1)v(1) + u v  + α uv
0 0
1
= f v ∀v ∈ H 1 (0, 1).
0

(ii) Using Lax–Milgram prove that for α large enough there


exists a unique solution u of problem (P ).

(iii) Show that the solution u can be expressed as the minimizer


of a functional defined on H 1 (0, 1).

18. Let (H, (·, ·)) be a Hilbert space and let Y ⊂ H be a closed
subspace with an orthonormal basis {un }∞
n=1 . Prove that ∀y ∈ H

the closest point to y in Y is i=1 (y, un )un .

19. Let (H,  · ) be an infinite dimensional, separable Hilbert space.


Show that for any x ∈ H, x ≤ 1, there exists a sequence
(xn )n∈N in H such that xn  = 1 for all n ∈ N and xn → x
weakly.
6.7 Exercises 199

20. Find the Fourier expansions of the functions

f1 (x) = cos x − |x|, −π ≤ x ≤ π,


f2 (x) = −3x + sin x, −π ≤ x ≤ π,

−1 −π ≤ x ≤ 0,
f3 (x) =
x + 1 0 ≤ x ≤ π,

x + 1 −1 ≤ x ≤ 0,
f4 (x) =
x2 − 1 0 ≤ x ≤ 1.
Chapter 7

Adjoint, Symmetric, and


Self-adjoint Linear
Operators

Here we first recall the definition of the adjoint of a linear operator


and discuss some related results. Then we shall address the case of
compact operators A : H → H, where H is a Hilbert space, and
present the Fredholm theorem as an application. The last section is
devoted to symmetric operators and self-adjoint operators.
Throughout this chapter we consider linear operators between linear
spaces over K, where K is either R or C, unless otherwise specified.

7.1 The Adjoint of a Linear Operator


Let X, Y be Banach spaces with duals X ∗ and Y ∗ and let A : D(A) ⊂
X → Y be a linear operator that is densely defined: D(A) = X. The
adjoint of A is an operator A∗ : D(A∗ ) ⊂ Y ∗ → X ∗ defined as follows.
The domain of A∗ is the set

D(A∗ ) = {y ∗ ∈ Y ∗ ; ∃c > 0 such that |y ∗ (Ax)| ≤ cx ∀x ∈ D(A)},

which is a linear subspace of Y ∗ . Note that for y ∗ ∈ D(A∗ ) the linear


functional f (x) = y ∗ (Ax) is continuous on D(A) (equipped with the
norm  ·  of X), i.e., |f (x)| ≤ cx for all x ∈ D(A). According to the
Hahn–Banach Theorem, f can be extended to a functional g ∈ X ∗ ,

© Springer Nature Switzerland AG 2019 201


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 7
202 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

such that |g(x)| ≤ cx for all x ∈ X. This extension is unique since
D(A) is dense in X. We now define

A∗ y ∗ = g

and we can write

y ∗ (Ax) = (A∗ y ∗ )(x) ∀x ∈ D(A), y ∗ ∈ D(A∗ ) . (7.1.1)

Example.
Let X = Y = l1 (for the definition of l1 see Chap. 4). Let A : D(A) ⊂
l1 → l1 be defined by

D(A) = {(xn )n≥1 ∈ l1 ; (nun )n≥1 ∈ l1 }, A(un ) = (nun ) .

Obviously, D(A) is dense in l1 . It is also easily seen that

D(A∗ ) = {(yn )n≥1 ∈ l∞ ; (nyn )n≥1 ∈ l∞ }, A∗ (yn ) = (nyn ) .

Note that both A and A∗ are closed operators, i.e., their graphs are
closed in l1 ×l1 and l∞ ×l∞ , respectively. In fact, we have the following
general result:

Theorem 7.1. Let X and Y be Banach spaces and let A : D(A) ⊂


X → Y be a densely defined, linear operator. Then A∗ is closed.

Proof. Let (yn∗ ) be a sequence in D(A∗ ) such that yn∗ → y ∗ in Y ∗ and


A∗ yn∗ → x∗ in X ∗ . We have

yn∗ (Ax) = (A∗ yn∗ )(x) ∀x ∈ D(A) ,

which yields (by letting n → ∞)

y ∗ (Ax) = x∗ (x) ∀x ∈ D(A) .

Therefore y ∗ ∈ D(A∗ ) and A∗ y ∗ = x∗ .

We also have the following results about continuity.


7.1 The Adjoint of a Linear Operator 203

Theorem 7.2. Let X, Y be Banach spaces with duals X ∗ and Y ∗ . If


A ∈ L(X, Y ), then A∗ ∈ L(Y ∗ , X ∗ ), and A = A∗ .

Proof. Obviously, D(A∗ ) = Y ∗ . From (7.1.1) we deduce (using the


same symbol  ·  for different norms)

|(A∗ y ∗ )(x)| ≤ y ∗  · A · x ∀x ∈ X, y ∗ ∈ Y ∗ .

Therefore,

A∗ y ∗  ≤ A · y ∗  ∀y ∗ ∈ Y ∗ ⇒ A∗  ≤ A .

On the other hand, using (7.1.1) again, we obtain

|y ∗ (Ax)| ≤ A∗  · y ∗  · x ∀x ∈ X, y ∗ ∈ Y ∗ ,

hence, by Corollary 4.18 in Chap. 4,

Ax ≤ A∗  · x ∀x ∈ X ⇒ A ≤ A∗  .

We continue with some simple properties of adjoint operators. Let


X, Y, Z be three Banach spaces over K, where K is the same (either
R or C) for all the three spaces. Then the following properties hold:

(a) If A : D(A) ⊂ X → Y is a densely defined, linear operator, and


B : D(B) ⊂ X → Y is another linear operator, such that A ⊂ B (i.e.,
D(A) ⊂ D(B) and Bx = Ax ∀x ∈ D(A)), then B ∗ ⊂ A∗ ;

(b) For all α , β ∈ K and A, B ∈ L(X, Y ) ,

(αA + βB)∗ = αA∗ + βB ∗ ;

(c) If A ∈ L(X, Y ) and B ∈ L(Y, Z), then (B ◦ A)∗ = A∗ ◦ B ∗ .

(d) If A ∈ L(X, Y ) is bijective, then A∗ is bijective too, and

(A∗ )−1 = (A−1 )∗ .

The proofs are left to the reader.


204 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

7.2 Adjoints of Operators on Hilbert Spaces


Let (H, (·, ·),  · ) be a Hilbert space. Let A : D(A) ⊂ H → H be a
densely defined, linear operator. Taking into account the Riesz Rep-
resentation Theorem, the adjoint of A can be redefined as an operator
from H into itself, as follows:

D(A∗ ) = {y ∈ H; ∃c > 0 such that |(Ax, y)| ≤ cx ∀x ∈ D(A)} .

Now, for y ∈ D(A∗ ) the linear functional x → (Ax, y) (which is con-


tinuous on (D(A),  · )) can be extended uniquely to a functional
belonging to H ∗ so (by Riesz) there is a corresponding element in H,
denoted A∗ y. Thus we have a linear operator A∗ : D(A∗ ) ⊂ H → H,
such that

(Ax, y) = (x, A∗ y) ∀x ∈ D(A), y ∈ D(A∗ ) . (7.2.2)

In fact, if R : H → H ∗ denotes the Riesz isomorphism, this adjoint is


nothing else but the operator R−1 ◦ A∗ ◦ R, with A∗ being the adjoint
defined in the previous section. Whenever we deal with a densely
defined linear operator A : D(A) ⊂ H → H, we shall associate with A
the A∗ defined in this section. It is easily seen that all the properties
discussed in the previous section remain valid, except for (b) which
now takes the form
(b ) For all α , β ∈ K and A, B ∈ L(X, Y ) ,

(αA + βB)∗ = ᾱA∗ + β̄B ∗ ,

where ᾱ, β̄ denote the complex conjugates of α, β. In fact, for any


α ∈ K and any densely defined, linear operator A : D(A) ⊂ X → Y ,
we have
(αA)∗ = ᾱA∗ .

If H is finite dimensional, then the matrix corresponding to A∗ is


the transposed conjugate of the matrix corresponding to A (while the
matrix associated with the adjoint of A as defined in the previous
section is just the transpose of the matrix corresponding to A. This
shows the difference between the two notions of adjoint).
7.2 Adjoints of Operators on Hilbert Spaces 205

If A ∈ L(H) := L(H, H), then A∗∗ := (A∗ )∗ = A. Indeed, we have

(Ax, y) = (x, A∗ y)
= (A∗ y, x)
= (y, A∗∗ x)
= (A∗∗ x, y) ∀x, y ∈ H ,

which proves the assertion.

7.2.1 The Case of Compact Operators


Denote by K(H) := K(H, H) the space of compact linear operators
from H into itself. This is a closed subspace of L(H) := L(H, H) with
respect to the operator norm, hence K(H) is a Banach space with
respect to this norm (see Theorem 4.11).

Theorem 7.3. If (H, (·, ·), ·) is a Hilbert space and A ∈ K(H), then
the nullspace of I − A, denoted N = N (I − A), is a finite dimensional
subspace of H, where I denotes the identity operator of H.

Proof. Obviously N is a (closed) linear subspace of (H,  · ). Let Q be


a bounded subset of N . Since A is compact and Q = AQ we deduce
that Q is relatively compact in (N ,  · ). According to Theorem 2.24,
N is finite dimensional.

Theorem 7.4 (Schauder1 ). If (H, (·, ·),  · ) is a Hilbert space and


A ∈ K(H) then A∗ ∈ K(H), too.

Proof. Let r > 0 be arbitrary but fixed. Since A∗ ∈ L(H), the set
A∗ B(0, r) is bounded: x < r =⇒ A∗ x ≤ rA∗ . As A is compact,
it follows that for any sequence (xn )n≥1 in B(0, r) the sequence ((A ◦
A∗ )xn )n≥1 has a convergent subsequence, say ((A ◦ A∗ )xnk )k≥1 . We
also have
 
A∗ xnk − A∗ xnj 2 = A∗ (xnk − xnj ), A∗ (xnk − xnj )
 
= xnk − xnj , A(A∗ (xnk − xnj ))
≤ 2r(A ◦ A∗ )xnk − (A ◦ A∗ )xnj ,

so (A∗ xnk )k≥1 is convergent.

1
Juliusz Pawel Schauder, Polish mathematician, 1899–1943.
206 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

Remark 7.5. Let A ∈ L(H). Then A is compact if and only if A∗ is


compact. This follows from Schauder’s Theorem above combined with
(A∗ )∗ = A.
Remark 7.6. If A, B ∈ L(H) and at least one is compact, then A ◦ B
is compact as well.
We continue with an important result, essentially due to Fredholm,2
that provides a necessary and sufficient condition for an operator equa-
tion involving a compact linear operator to have a solution.

Theorem 7.7 (Fredholm). Let (H, (·, ·),  · ) be a Hilbert space and
let A ∈ K(H). The equation x − A∗ x = f has a solution if and only if
f ∈ N ⊥ , where N = N (I − A) (the nullspace of I − A).

Corollary 7.8. If (H, (·, ·),  · ) is a Hilbert space and A ∈ K(H),


then the equation x − Ax = f has a solution if and only if f ∈ N (I −
⊥
A∗ ) .

Proof. Use Theorem 7.7 with A∗ instead of A.

In order to prove Fredholm’s Theorem, we need the following lemma.

Lemma 7.9. Let (H, (·, ·),  · ) be a Hilbert space and let A ∈ K(H).
Then there exists a constant C > 0 such that
Cx ≤ (I − A)x ∀x ∈ N ⊥ , (7.2.3)
where N = N (I − A).

Proof. Assume by contradiction that (7.2.3) is not true, i.e., for all
n ∈ N there exists an xn ∈ N ⊥ such that xn  = 1 and
1
(I − A)xn  < .
n
Therefore,
xn − Axn → 0 . (7.2.4)
As A is compact there is a subsequence of (xn )n≥1 , say (xnk )k≥1 , such
that (Axnk )k≥1 is convergent. By (7.2.4) we deduce that (xnk )k≥1 is
also convergent, and its limit x ∈ N ⊥ (since N ⊥ is closed). Using again
(7.2.4), we infer that x − Ax = 0, i.e., x ∈ N . Since N ∩ N ⊥ = {0},
we have x = 0, which contradicts xn  = 1 ∀n ≥ 1.

2
Erik Ivar Fredholm, Swedish mathematician, 1866–1927.
7.2 Adjoints of Operators on Hilbert Spaces 207

Proof of Fredholm’s Theorem.


Necessity. Assume that the equation x−A∗ x = f has a solution x ∈ H.
Then, for all y ∈ N , we have
(f, y) = (x, y) − (A∗ x, y)
= (x, y) − (x, Ay)
= (x, (I − A)y )
  
=0
= 0.
Therefore f ∈ N ⊥ .

Sufficiency. Assume f ∈ N ⊥ . Since N ⊥ is a closed subspace of (H,  ·


), N ⊥ is a Hilbert space with the same scalar product and norm.
According to Lemma 7.9,  ·  is equivalent (on N ⊥ ) with the norm
defined by the scalar product
x, y = (T x, T y) ∀x, y ∈ N ⊥ ,
where T = I − A. Since the functional x → (x, f ) is linear and
continuous on N ⊥ , it follows by the Riesz Representation Theorem
that there exists xf ∈ N ⊥ such that
(x, f ) = x, xf  ∀x ∈ N ⊥ . (7.2.5)
  
=(T x,T xf )

In fact, (7.2.5) holds for all x ∈ H, since x = x +x , with x ∈ N , x ∈


N ⊥ . Denoting x̃ = T xf , we can write (see (7.2.5) extended to H)
(T x, x̃) = (x, f ) ∀x ∈ H ,
  
=(x,x̃−A∗ x̃)

so
x̃ − A∗ x̃ = f .

The following result provides some information that supplements The-


orem 7.7.
Theorem 7.10. Let (H, (·, ·),  · ) be a Hilbert space and let A ∈
K(H). Then,
R(I − A) = H ⇐⇒ N = {0} ⇐⇒ N ∗ = {0} ⇐⇒ R(I − A∗ ) = H ,
where N = N (I − A), N ∗ = N (I − A∗ ), and R(I − A), R(I − A∗ )
denote the ranges of I − A, I − A∗ .
208 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

Proof. Keeping in mind Theorem 7.7 and Corollary 7.8, it suffices to


prove that
R(I − A) = H ⇐⇒ R(I − A∗ ) = H . (7.2.6)
Assume R(I − A) = H. Let us prove that N = {0}. Assume by way
of contradiction that N = {0}, i.e., there exists an x0 ∈ N , x0 = 0.
As R(I − A) = H we can construct a sequence (xn )n≥1 in D(A) such
that
T xn = xn−1 ∀n ≥ 1 ,
where T := I − A. We have

T n xn = x0 = 0, and T n+1 xn = 0 ,

where T k := T ◦ T ◦ · · · ◦ T (k factors). Hence, denoting Hn = N (T n ),


we have that Hn is a proper linear subspace of Hn+1 for all n ∈ N.
According to Theorem 7.3, every Hn is a finite dimensional space,
hence closed, since
 n ! "
n
T = (I − A) = I −
n n
(−1)k+1 Ak .
k
k=1  
compact operator

By Lemma 2.25 there exists a sequence (un )n≥1 such that


1
un ∈ Hn+1 , un  = 1, un − u ≥ ∀u ∈ Hn .
2
Since for 1 ≤ m < n

T n (T un + Aum ) = T n+1 un + AT n um = 0 ,

we have T un + Aum ∈ Hn , and


1
Aun − Aum  = un − (T un + Aum ) ≥ .
2
Thus the sequence (Aun )n≥1 cannot have Cauchy (hence convergent)
subsequences. This contradicts the fact that A is compact combined
with un  = 1 for all n ≥ 1. Therefore, N = {0}, which (by The-
orem 7.7) implies that R(I − A∗ ) = H. Thus we have proved the
implication
R(I − A) = H ⇒ R(I − A∗ ) = H .
The converse implication follows by replacing A with A∗ .
7.3 Symmetric Operators and Self-adjoint Operators 209

Remark 7.11. From Corollary 7.8 and Theorem 7.10 we deduce that
if the equation x − Ax = f has a solution uf for all f ∈ H then
uf is unique (since N (I − A) = {0}). So we can now state the so-
called Fredholm’s alternative regarding the equation x − Ax = f with
A ∈ K(H), namely one of the following must hold:
• for every f ∈ H the equation x − Ax = f has a unique solution
(equivalently, N (I − A) = {0});
• N (I − A) = {0}, in which case the equation x − Ax = f is solvable if
and only if f ⊥ N (I − A∗ ) (i.e., f satisfies m orthogonality relations,
where m = dim N (I − A∗ ) = dim N (I − A)).
We shall later apply Fredholm’s alternative to a class of integral equa-
tions that are named after him.
Remark 7.12. In fact, the above theory is valid in a general Banach
space H (see, e.g., [6, Chapter 6] or [15, Chapter 5]).

7.3 Symmetric Operators and


Self-adjoint Operators
We begin this section with the following definition.

Definition 7.13. Let (H, (·, ·),  · ) be a Hilbert space and let A :
D(A) ⊂ H → H be a densely defined, linear operator.
(a) A is called symmetric if A ⊂ A∗ , i.e.,

(Ax, y) = (x, Ay) ∀x, y ∈ D(A) ;

(b) A is called self-adjoint if A = A∗ , i.e., A ⊂ A∗ and A∗ ⊂ A.

Obviously, if D(A) = H then A is symmetric if and only if it is self-


adjoint, and in this case A is closed (by Theorem 7.1), hence A ∈ L(H)
(by the Closed Graph Theorem).

Example 1. Let X = L2 (a, b; K), where −∞ < a < b < +∞ and let
A : X → X be defined by
b
(Af )(t) = k(t, s)f (s) ds, a ≤ t ≤ b ,
a

where k ∈ C([a, b] × [a, b]; K). The space X equipped with the usual
scalar product and norm is a Hilbert space and A ∈ L(X). Moreover,
210 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

it is easy to see (by using Arzelà–Ascoli’s Criterion) that A ∈ K(X).


Note that for all f, g ∈ X we have
b
(Af, g)L2 (a,b; K) = (Af )(t) · g(t) dt
a
b b 
= k(t, s)f (s) ds · g(t) dt
a a
b b 
= f (s) · k(t, s)g(t) dt ds
a a
b b 
= f (t) · k(s, t)g(s) ds dt ,
a a

thus b
(A∗ g)(t) = k(s, t) · g(s) ds ∀g ∈ X .
a
Obviously,
A = A∗ ⇐⇒ k(t, s) = k(s, t) ∀t, s ∈ [a, b] .

Example 2. Let X = L2 (R; K) with its usual scalar product and


Hilbertian norm, and let A : D(A) ⊂ X → X be given by
D(A) = {f ∈ X; tf (t) ∈ X} ,
(Af )(t) := tf (t) ∀t ∈ K, f ∈ D(A) .
It is easily seen that A is self-adjoint.

Example 3. Let H = L2 (Ω) be equipped with the usual scalar product


and norm, where ∅ = Ω ⊂ RN , N ≥ 2, is a bounded domain with
smooth boundary. Let A : D(A) ⊂ H → H, where
D(A) = C0∞ (Ω), Au = Δu ∀u ∈ D(A) .
Obviously, D(A) is dense in H. By Green’s identity, we have

vΔu dx = uΔv dx ∀u ∈ D(A) = C0∞ (Ω), v ∈ H 2 (Ω) .
Ω Ω

Thus ⊂ D(A∗ ) and A∗ v = Δv for all v ∈ D(A). Therefore


H 2 (Ω)
A is symmetric but not self-adjoint because D(A) is a proper subset
of D(A∗ ). If the domain of A = Δ is extended to H01 (Ω) ∩ H 2 (Ω)
then A becomes self-adjoint. More precisely, we have the following
proposition.
7.3 Symmetric Operators and Self-adjoint Operators 211

Proposition 7.14. Let H = L2 (Ω) be equipped with the usual scalar


product (·, ·) and the induced norm  · , where ∅ = Ω ⊂ RN , N ≥ 2,
is a bounded domain with smooth boundary. Let B : D(B) ⊂ H → H
be defined by D(B) = H 2 (Ω) ∩ H01 (Ω), Bu = Δu for all u ∈ D(B).
Then B is self-adjoint.

Proof. Clearly, D(B) is dense in H and, by Green’s formula, we have


for all u, v ∈ D(B)

(Bu, v) = Δu · v dx = u · Δv dx = (u, Bv),
Ω Ω

hence D(B) ⊂ D(B ∗ ) and B ∗ v = Bv for all v ∈ D(B) (i.e., B is sym-


metric). Let us prove that D(B ∗ ) = D(B). Using the Lax–Milgram
Theorem, we can see that R(I + B) = H. In addition, since B is
positive, I + B is invertible and J := (I + B)−1 ∈ L(H). As B is
symmetric, so is J. Now, let v be an arbitrary function in D(B ∗ ).
Denoting g = v + B ∗ v, we have

(g, u) = (v, u + Bu) ∀u ∈ D(B).

Therefore, for every h ∈ H, we have

(g, Jh) = (v, h) =⇒ (Jg, h) = (v, h),

so v = Jg ∈ R(J) = D(B).

We know that, for every bijective A ∈ L(H), A∗ is also bijective and


(A∗ )−1 = (A−1 )∗ . In fact, the following more general result holds.

Theorem 7.15. Let (H, (·, ·),  · ) be a Hilbert space and let A :
D(A) ⊂ H → H be a symmetric linear operator, with R(A) = H.
Then
(A−1 )∗ = (A∗ )−1 ,
where all operations are permitted. If, in addition, A is self-adjoint,
then so is A−1 .

Proof. A is injective. Indeed, if u ∈ D(A) and Au = 0 then

0 = (Au, v) = (u, Av) ∀v ∈ D(A) ,

which implies u = 0 since R(A) is dense in H.


212 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

A∗ is also injective because if v ∈ D(A∗ ) and A∗ v = 0, then

(Au, v) = (u, A∗ v) = 0 ∀u ∈ D(A),

and thus v = 0 since R(A)


 ∗=−1H. Therefore, A−1 and (A∗ )−1 exist
−1
with D(A ) = R(A), D (A ) = R(A ). Since D(A−1 ) is dense in

H, (A−1 )∗ exists. Denote B := (A−1 )∗ . We have

(u, v) = (A−1 (Au), v) = (Au, Bv) ∀u ∈ D(A), v ∈ D(B), (7.3.7)

and

(z, w) = (A(A−1 z), w) = (A−1 z, A∗ w) ∀z ∈ D(A−1 )


= R(A), w ∈ D(A∗ ) . (7.3.8)

By (7.3.7) Bv ∈ D(A∗ ) and

v = A∗ (Bv) ∀v ∈ D(B) . (7.3.9)


 
On the other hand, by (7.3.8), A∗ w ∈ D (A∗ )−1 = D(B) and

w = (A−1 )∗ (A∗ w) ∀w ∈ D(A∗ ) . (7.3.10)


  
=B

From (7.3.9) and (7.3.10) we derive

B = (A∗ )−1 ⇐⇒ (A−1 )∗ = (A∗ )−1 .

If A = A∗ , then (A−1 )∗ = A−1 .

7.4 Exercises
1. Let X, Y be Banach spaces. Let A : D(A) ⊂ X → Y be a
densely defined, closed linear operator and B ∈ L(X, Y ). Define
T : D(T ) = D(A) ⊂ X → Y by T x = Ax + Bx ∀x ∈ D(A).
Prove that
(i) T is a closed operator;
(ii) D(T ∗ ) = D(A∗ ) and T ∗ = A∗ + B ∗ .
7.4 Exercises 213

2. Let X, Y be Banach spaces and let A : D(A) ⊂ X → Y be a


densely defined linear operator. Show that A∗ is injective if and
only if Cl R(A) = Y .

3. Let H be a Hilbert space. If A : D(A) ⊂ H → H is a symmetric


linear operator with R(A) = H, then A is self-adjoint, i.e., A =
A∗ .

4. Let H be a Hilbert space, with the scalar product denoted (·, ·),
and let A, B ∈ L(H). Show that

A∗ A = B ∗ B ⇐⇒ (Ax, Ay) = (Bx, By) ∀x, y ∈ H.

5. Let H be a Hilbert space. For any A ∈ L(H), show that A∗ A =


A2 .

6. Let (H, (·, ·)) be a Hilbert space over C and let A ∈ L(H). Prove
that
A is symmetric (hence self-adjoint) ⇐⇒ (Ax, x) ∈ R ∀x ∈ H.

7. Let (H, (·, ·)) be a Hilbert space over R. Prove that for any a > 0
and any A ∈ L(H) the operator T = I + aA∗ A is invertible and
T −1 ∈ L(H), where I denotes the identity operator on H.

8. Let H be a Hilbert space over C and let A ∈ L(H) be a symmetric


(hence self-adjoint) operator. Denote T = A + iI, where i2 = −1
and I is the identity operator on H. Prove that

(a) T is a normal operator (i.e., T ∗ T = T T ∗ );


(b) T is invertible and T −1 ∈ L(H).

9. Let H be a Hilbert space over C. For A ∈ L(H) and a0 , a1 , . . . , an ∈


C, denote by P (A) the operator polynomial a0 I + a1 A + · · · +
an An , where I stands for the identity operator.

(j) If A is symmetric (hence self-adjoint) and a0 , a1 , . . . , an ∈


R, then P (A) is symmetric, too;
(jj) If A is a normal operator (i.e., A∗ A = AA∗ ), then so is
P (A).
214 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

10. Let H1 , H2 be Hilbert spaces. Define H = H1 × H2 to be the


Hilbert space consisting of all pairs (x1 , x2 )T , x1 ∈ H1 and x2 ∈
H2 , with ! " ! " ! "
x1 y1 x 1 + y1
+ = ,
x2 y2 x 2 + y2
! " ! "
x αx1
α 1 = ∀α ∈ K,
x2 αx2
and a scalar product defined by
+! " ! ", ! " ! "
x1 y1 x y1
, = (x1 , y1 )H1 + (x2 , y2 )H2 ∀ 1 , ∈ H.
x2 y2 x2 y2

Given A1 ∈ L(H1 ) and A2 ∈ L(H2 ), define the matrix operator


- .
A1 0
A= .
0 A2

Prove that A ∈ L(H) and A = max {A1 , A2 }. Find A∗ .

11. Let A ∈ L(H), where H is a Hilbert space over C. As in the


previous exercise, define Y = H × H to be the Hilbert space
consisting of all pairs (x1 , x2 )T , x1 ∈ H and x2 ∈ H, with the
corresponding operations and scalar product. Define on Y the
matrix operator B by
- .
0 iA
B= ,
−iA∗ 0

where i = −1. Prove that B ∈ L(Y ), B = A, and that
B ∗ = B.
Now, assume that A : D(A) ⊂ H → H is a linear, densely
defined operator. Prove that B : D(A∗ ) × D(A) ⊂ Y → Y is
symmetric.

12. Let H be a Hilbert space and let A ∈ L(H) satisfying A ≤ 1.


Prove that Ax = x if and only if A∗ x = x.

13. Let H be the real Hilbert space L2 (0, 1) equipped with the usual
scalar product and induced norm. Define A : D(A) ⊂ H → H
by
D(A) = {u ∈ H 1 (0, 1); u(0) = 0}, Au = u .
7.4 Exercises 215

(a) Show that D(A) is dense in H and that A is closed;


(b) Compute N (A) and R(A);
(c) Determine A∗ and show that D(A∗ ) is dense in H.

14. Let H be the real Hilbert space L2 (0, 1) equipped with the usual
scalar product and induced norm. Let A : D(A) ⊂ H → H be
the operator defined by Au = u , where

(a) D(A) = H01 (0, 1);


(b) D(A) = {u ∈ H 1 (0, 1); u(0) = αu(1)} for some α ∈ R \ {0}.

Determine N (A), R(A), A∗ , N (A∗ ), R(A∗ ) in each of these two


cases.

15. Let H be the real Hilbert space L2 (0, 1) equipped with the usual
scalar product and induced norm. Let A : D(A) ⊂ H → H,
Au = u , where D(A) is specified below. Determine A∗ in each
of the following cases:

(a) D(A) = {u ∈ H 2 (0, 1); u(0) = u(1) = 0};


(b) D(A) = {u ∈ H 2 (0, 1); u(0) = u(1) = u (0) = u (1) = 0};
(c) D(A) = {u ∈ H 2 (0, 1); u(0) = u (1) = 0};
(d) D(A) = {u ∈ H 2 (0, 1); u(0) = u(1)};

16. Let H = l2 (C) be the complex Hilbert space


 of all 2sequences of
complex numbers x = (xn )n∈N satisfying ∞ n=1 |xn | < ∞, with
the usual scalar product


x, y = xn ȳn ∀x = (xn ), y = (yn ) ∈ H,
n=1

and the induced norm, denoted  · . Define the operators A :


H → H and B : D(B) ⊂ H → H by

A(xn ) = (xp+1 , xp+2 , xp+3 , . . . ), for a given p ∈ N,


 
nα i n
B(xn ) = xn , for a given α ∈ R.
1+n
216 7 Adjoint, Symmetric, and Self-adjoint Linear Operators

(a) Show that A ∈ L(H) and compute A and A∗ ;


(b) Show that if α ≤ 1 then D(B) = H and B ∈ L(H); compute
B;
(c) For α > 1 find (the maximal domain) D(B) and prove that
D(B) is dense in H;
(d) Compute B ∗ for all α ∈ R;
(e) Check whether A and B with α ≤ 1 are normal operators.
Chapter 8

Eigenvalues and
Eigenvectors

In this chapter we present the main results regarding eigenvalues and


eigenvectors of compact and/or symmetric operators. This includes
the Hilbert–Schmidt Theorem and its applications to the main eigen-
value problems for the Laplacian.
Throughout this chapter we consider linear operators defined on linear
spaces over K, where K is either R or C, unless otherwise specified.

8.1 Definition and Examples


We first introduce the concept of an eigenpair (i.e., eigenvector + the
corresponding eigenvalue).

Definition 8.1. Let X be a linear space. A vector u ∈ X \ {0} is


said to be an eigenvector of a linear operator A : X → X if there
exists λ ∈ K such that Au = λu. Such a λ is called an eigenvalue
corresponding to u, and the pair (u, λ) is called an eigenpair.

Remark 8.2. For a given eigenvector u of A the corresponding eigen-


value λ is unique. Indeed,

λu = Au = λ1 u =⇒ (λ − λ1 )u = 0 =⇒ λ − λ1 = 0,

since u = 0.

© Springer Nature Switzerland AG 2019 217


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 8
218 8 Eigenvalues and Eigenvectors

For a given eigenvalue λ of A, the set of the corresponding eigenvectors


is N (λI − A) \ {0}, where I is the identity operator of X.
Remark 8.3. Note also that a set of eigenvectors u1 , u2 , . . . , um of A
corresponding to distinct eigenvalues λ1 , λ2 , . . . , λm (m ∈ N) is a lin-
early independent system. The proof is by induction.
Example 1. Let X = Cn , A : X → X, Au = M u ∀u = (u1 , . . . , un )T ∈
X, where M = (aij ) is an n × n matrix with entries aij ∈ C. Then,
λ is an eigenvalue of A if and only if det(λI − M ) = 0, where I is the
n × n identity matrix.

Example 2. Let H = l2 (C) be the complex Hilbertspace of all se-


quences of complex numbers x = (xn )n∈N satisfying ∞
n=1 |xn | < ∞,
2

with the usual scalar product




x, y = xn ȳn ∀x = (xn ), y = (yn ) ∈ H,
n=1

and the induced norm, denoted  · . Define the linear operator A by


2 3 n 
A(xn ) = x2 , x3 , . . . , xn , . . . .
1 2 n−1
We have for all x = (xn ) ∈ H


Ax 2
= |n(n − 1)−1 xn |2
n=2
∞
≤ 4 |xn |2
n=2
≤ 4x , 2

so A ∈ L(H) and A ≤ 2. In fact, A = 2, since for x̃ = (0, 1, 0, 0 . . . )


we have x̃ = 1 and Ax̃ = 2.
Consider the equation Ax = λx, or, equivalently,
n+1
xn+1 = λxn , n = 1, 2, . . . (8.1.1)
n
Observe that λ = 0 is an eigenvalue of A with eigenvectors (x1 , 0,
0, . . . ), x1 ∈ C \ {0}.
If λ = 0, then it follows easily from (8.1.1) that
1 n−1
xn = λ x1 , n = 1, 2, . . .
n
8.2 Main Results 219

In order for (xn ) to be an eigenvector, we choose x1 = 0. The condition


(xn ) ∈ H is equivalent to |λ| ≤ 1. So the set {λ ∈ C; |λ| ≤ 1} is the
set of all eigenvalues of A.

8.2 Main Results

We begin this section with a general result about the eigenvalues of a


compact linear operator.

Theorem 8.4. Let (X,  · ) be a normed space and let A ∈ K(X)


(i.e., A : X → X is linear and sends bounded sets to relatively compact
sets). Then A has a countable set of eigenvalues, and the only possible
accumulation point of the set of eigenvalues is λ = 0. Moreover, for
any eigenvalue λ = 0, dim N (λI − A) < ∞ (one says that λ has a
finite rank or finite multiplicity).

Proof. The proof is trivial if X is finite dimensional, so let us assume


that X is infinite dimensional. To prove the first statement of the
theorem, it suffices to show that for all r > 0 the set {λ ∈ K; |λ| ≥ r}
contains a finite number of eigenvalues. Suppose not, i.e., there exists
r0 > 0 and infinitely many distinct eigenvalues λ1 , λ2 , . . . such that
|λn | ≥ r0 ∀n ≥ 1. Then there exists a sequence un ∈ X \ {0} such that
Aun = λn un ∀n ≥ 1, and we may assume that un  = 1 ∀n ≥ 1.
Because the λn ’s are distinct, Bn = {u1 , u2 , . . . , un } are independent
systems. Set Xn = Span Bn , n = 1, 2, . . . By Lemma 2.25, there exists
yn ∈ Xn \ Xn−1 such that yn  = 1 ∀n ≥ 2 and

1 1
yn − v ≥ ∀v ∈ Xn−1 , n ≥ 2 =⇒ yn − ym  ≥ ∀n = m.
2 2

Thus (yn ) has no Cauchy (hence no convergent) subsequence. On the


other hand, assuming that 1 ≤ m < n, we have

Ayn − Aym = λ n yn − λn ym + (Ayn − λn yn ) − (Aym − λm ym )


           
∈Xn \Xn−1 ∈Xm ∈Xn−1 ∈Xm ⊂Xn−1

= λn yn − vmn
220 8 Eigenvalues and Eigenvectors

with vmn ∈ Xn−1 , because


n
 
n

Ayn − λn yn = A αi u i − λn
n
αin ui
i=1 i=1

n
  n

= αin λi ui − λn αin ui
i=1 i=1

n
= αin (λi − λn )ui
i=1

n−1
= αin (λi − λn )ui
i=1

which is in Xn−1 . Hence we have

Ayn − Aym  = λn yn − vnm 


= |λn | · yn − λ−1
n vnm 
≥ r0 yn − λ−1
n vnm 
r0
≥ ,
2
so (Ayn ) has no Cauchy (hence no convergent) subsequence. But A
is compact and yn  = 1 ∀n ≥ 1 so (Ayn ) must have a convergent
subsequence. This contradiction shows that {λ ∈ K; |λ| ≥ r} contains
a finite number of eigenvalues of A for all r > 0, as claimed.
The proof of the latter statement of the theorem is similar to the proof
of Theorem 7.3.

Proposition 8.5. Let (H, (·, ·),  · ) be a Hilbert space and let A :
H → H be a symmetric (hence self-adjoint) operator. Then,
(i) every eigenvalue of A is real, even if K = C;
(ii) every two eigenvectors of A corresponding to distinct eigenvalues
are orthogonal.

Proof. To prove (i) suppose λ is an eigenvalue of A. Let u ∈ H \ {0}


be a corresponding eigenvector, i.e., Au = λu. Then

(Au, u) = (λu, u) = λu2 ,


(u, Au) = (u, λu) = λu2 .

As A is symmetric and u = 0, we infer that λ = λ.


8.2 Main Results 221

To prove (ii), consider two eigenpairs of A, (u1 , λ1 ), (u2 , λ2 ), where


λ1 , λ2 ∈ R (from (i)) and λ1 = λ2 . We have
λ1 (u1 , u2 ) = (Au1 , u2 ) = (u2 , Au1 ) = λ2 (u1 , u2 ) ,
thus
(λ1 − λ2 )(u1 , u2 ) = 0 ,
  
=0
so (u1 , u2 ) = 0.
Proposition 8.6. Let (H, (·, ·),  · ) be a Hilbert space, H = {0}, and
let A ∈ L(H) be a symmetric operator. Then,
A = sup { |(Ax, x)|; x ∈ H, x = 1 } .
Proof. Trivial if A = 0 (equivalently A = 0). Assume A = 0 (A >
0) and set
a = sup { |(Ax, x)|; x ∈ H, x = 1} .
Since
|(Ax, x)| ≤ Ax · x ≤ A · x2 ∀x ∈ H ,
we infer that
a ≤ A . (8.2.2)
Now, for given b > 0 and x ∈ H such that x = 1 and Ax > 0, we
have
1  1 1   1 1 
Ax2 = A(bx + Ax), bx + Ax − A(bx − Ax), bx − Ax .
4 b b b b
(8.2.3)
We also have
|(Av, v)| ≤ av2 ∀v ∈ H . (8.2.4)
Combining (8.2.3) and (8.2.4) we obtain
a 1 1 
Ax2 ≤ bx + Ax2 + bx − Ax2
4 b b
a 2 1 
≤ b x + 2 Ax ,
2 2
2 b
so for x = 1 and b = Ax > 0 we have
Ax2 ≤ aAx .
Therefore,
Ax ≤ a ∀x ∈ H, x = 1 =⇒ A ≤ a .
This together with (8.2.2) implies A = a.
222 8 Eigenvalues and Eigenvectors

Note that the assumption that A is symmetric in the above theorem


is essential.
We have the following central theorem.
Theorem 8.7 (Hilbert–Schmidt). Let (H, (·, ·),  · ) be an infinite
dimensional, separable Hilbert space and let A : H → H be a symmetric
(equivalently, self-adjoint), compact linear operator, with N (A) = {0}.
Then there exist a sequence of eigenvalues of A, (λ1 , λ2 , . . . , λn , . . . ),
such that (|λn |) is a decreasing sequence of positive numbers converging
to 0 and a complete orthonormal system (basis) in H of corresponding
eigenvectors {un }∞ n=1 (i.e., Aun = λn un for n = 1, 2, . . . ).

Proof. We first observe that A > 0 ⇐⇒ A = 0 (since N (A) = {0}).


Let us prove that either A or −A is an eigenvalue of A. By
Proposition 8.6 there exists (vn )n≥1 , with vn  = 1 ∀n ≥ 1, such that
|(Avn , vn )| → A. In fact, one can extract from (vn ) a subsequence,
again denoted (vn ), such that (Avn , vn ) converges to either A or
−A, say
(Avn , vn ) → λ1 := A. (8.2.5)
Since A is compact we can now take another subsequence, also denoted
(vn ), such that
Avn → u1 , (8.2.6)
and this is the subsequence we keep. Now, passing to the limit in

0 ≤ Avn − λ1 vn 2 = Avn 2 − 2λ1 (Avn , vn ) + λ21 , (8.2.7)

we get (see (8.2.5) and (8.2.6))

0 ≤ u1 2 − λ21 =⇒ |λ1 | ≤ u1  .

Hence, in particular, u1 = 0. The converse is also true since we have

Avn  ≤ A · vn  = A ,

so by (8.2.6)
u1  ≤ A = |λ1 | .
Therefore,
u1  = |λ1 | = A . (8.2.8)
From (8.2.7) (see also (8.2.5), (8.2.6) and (8.2.8)) we derive

Avn − λ1 vn  → 0 . (8.2.9)
8.2 Main Results 223

So, in view of (8.2.6), (λ1 vn ) converges to u1 and thus by (8.2.9) and


continuity of A we get
Au1 = λ1 u1 ,
i.e., (u1 , λ1 ) is an eigenpair of A. We normalize without changing
notation, u1 := |λ1 |−1 u1 , since we want an orthonormal system of
eigenvectors.
It is worth pointing out that any other eigenvalue λ satisfies |λ| ≤ |λ1 |.
Indeed, if we assume by contradiction the existence of an eigenpair
(u, λ), with |λ| > |λ1 | and u = 1, then |(Au, u)| = |λ| which contra-
dicts |λ1 | = A being the supremum from Proposition 8.6.

We now use induction to prove the existence of eigenpairs (un , λn ) for


n = 2, 3, . . .
Denote by Y the orthogonal complement of Span{u1 }, i.e.,

Y = { u ∈ H; (u, u1 ) = 0 } .

Since H is infinite dimensional, so is Y . Moreover Y is a Hilbert space


(with the scalar product and norm of H), and is invariant to A in the
sense that AY ⊂ Y because for y ∈ Y ,

(Ay, u1 ) = (y, Au1 ),

since A is symmetric, and

= (y, λ1 u1 )
= λ1 (y, u1 )
= 0.

The restriction A|Y is not 0 since then N (A) = Y . In fact, all the
properties are inherited (symmetric, compact, and N (A|Y ) = {0})
and by the previous step we have an eigenvalue

λ2 = ± sup { |(Av, v)|; v ∈ Y, v = 1} ,

and a corresponding eigenvector

u2 ∈ Y, u2  = 1, Au2 = λ2 u2 .

Moreover, |λ2 | ≤ λ1 .
224 8 Eigenvalues and Eigenvectors

Next, take
⊥
Z = { u ∈ Y ; (u, u2 ) = 0 } = Span{u1 , u2 } ,

which is an infinite dimensional (Hilbert) subspace of H, and obtain


a new eigenpair (u3 , λ3 ), with u3  = 1, |λ3 | ≤ |λ2 |. We may continue
doing this, each time obtaining an infinite dimensional subspace. We
thus construct a sequence of eigenvalues (λn ) such that

|λ1 | ≥ |λ2 | ≥ · · · ≥ |λn | ≥ · · · , (8.2.10)

and the corresponding sequence of eigenvectors (un ),

Aun = λn un , un  = 1, n ≥ 1 ,

forms an orthonormal system by construction.

Next, we prove that




Au = λn (u, un )un ∀u ∈ H . (8.2.11)
n=1

Define the space


⊥
Vm := {u ∈ H; (u, uj ) = 0, j = 1, . . . , m} = Span{u1 , . . . , um } ,

which is an infinite dimensional Hilbert space (with respect to (·, ·),  ·


), invariant under A (i.e., Av ∈ Vm ∀v ∈ Vm ). By the previous step
of our proof, there is an eigenpair (um+1 , λm+1 ) of A such that

|λm+1 | = A|Vm  = sup {|(Av, v)|; v ∈ Vm+1 , v = 1} .

In particular,
Av ≤ |λm+1 | · v ∀v ∈ Vm+1 . (8.2.12)
Now, choose a particular

m
wm = u − (u, un )un
n=1

and notice that wm ∈ Vm because (vm , uj ) = (u, uj )−(u, uj ) = 0 ∀j =


1, . . . , m. Calculate

m
wm  = u −
2 2
|(u, un )|2 ≤ u2 . (8.2.13)
n=1
8.2 Main Results 225

Combining (8.2.12) and (8.2.13) we get


m
Awm = Au − (u, un )Aun
n=1
m
= Au − λn (u, un )un ,
n=1

so that

Awm  ≤ A|Vm  · wm 


= |λm+1 | · wm 
≤ |λm+1 | · u . (8.2.14)

On the other hand, λn → 0. Indeed, since (|λn |) is decreasing (see


(8.2.10)), there exists

lim |λn | = α ≥ 0 .
n→∞

Suppose by way of contradiction that α > 0. Obviously, |λn | ≥ α for


all n ≥ 1 and so

un  1 1
λ−1
n un  = = ≤ ∀n ≥ 1 .
|λn | |λn | α

Since A is compact un = A(λ−1


n un ) has a convergent subsequence. But
this is impossible because

un − um 2 = un 2 + um 2 = 2 ∀n = m .

So α = 0, i.e., λn → 0, as claimed.
Consequently, we have by (8.2.14) that Awm  → 0 as m → ∞, i.e.,
(8.2.11) holds true.

Finally, let us prove that {un }∞


n=1 is a basis in H.
We
∞know from the proof of Theorem 6.21 that for all u ∈ H the series
(u, u )u converges (as {u }∞
n=1 n n n n=1 is an orthonormal system), so
we can write
∞
v= (u, un )un
n=1
226 8 Eigenvalues and Eigenvectors

and we simply need


to check that u = v. Consider the sequence of
partial sums sm = m n=1 (u, un )un which converges strongly to v as
m → ∞, so Asm → Av. On the other hand, by (8.2.11) we have that

m
Asm = λn (u, un )un → Au as m → ∞ .
n=1
Hence,
Av = Au =⇒ A(v − u) = 0 =⇒ v = u ,
since ker A = {0}. Thus the system {un }∞
n=1 is complete, i.e., a basis
in H (cf. Theorem 6.21).
Remark 8.8. If we assume in addition that A is positive (i.e., (Av, v) >
0 for all v ∈ H \{0}), then it has eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn ≥ · · · ,
with λn > 0 ∀n ≥ 1. This follows from
(Aun , un ) = λn un 2 = λn , n ≥ 1.
Note also that
λ1 = A = sup{(Av, v); v ∈ H, v = 1} and
λn+1 = A|Vn  = sup{(Av, v); v ∈ Vn , v = 1} ∀n ≥ 1 ,
 ⊥
where Vn = Span{u1 , u2 , . . . , un } , n ≥ 1.

8.3 Eigenvalues of −Δ Under the


Dirichlet Boundary Condition
In what follows we apply the Hilbert–Schmidt Theorem to an eigen-
value problem for the Laplace operator. Specifically, let ∅ = Ω ⊂ RN ,
N ≥ 2, be a bounded domain with smooth boundary ∂Ω. Consider
the Dirichlet eigenvalue problem

−Δu = λu in Ω ,
(8.3.15)
u=0 on ∂Ω .
Definition 8.9. A real number λ is said to be an eigenvalue of the
Dirichlet problem (8.3.15) if there is a function u ∈ H01 (Ω) \ {0} such
that the problem is satisfied in the sense that

∇u · ∇v dx = λ uv dx ∀v ∈ H01 (Ω) , (8.3.16)
Ω Ω
or, equivalently,
−Δu = λu in D (Ω) .
8.3 Eigenvalues of −Δ Under the Dirichlet Boundary Condition 227

Remark 8.10. As ∂Ω is assumed to be smooth, the eigenfunction u is


in fact more regular (see [6, Theorem 9.25, p. 298]).
Theorem 8.11. Let ∅ = Ω ⊂ RN be a bounded domain with smooth
boundary ∂Ω. Then there exist an increasing sequence of positive
eigenvalues λn for (8.3.15) such that λn → +∞ and a complete or-
thonormal system (in H = L2 (Ω)) of eigenfunctions un satisfying prob-
lem (8.3.15) with λ = λn , n = 1, 2, . . .
Proof. Let H = L2 (Ω) equipped with the usual inner product and
norm. H is an infinite dimensional, separable Hilbert space (over R).
We know that for every f ∈ H = L2 (Ω) the problem

−Δu = f in Ω ,
u=0 on ∂Ω ,

has a unique solution u ∈ H01 (Ω) (by Dirichlet’s Principle, Chap. 6).
Define an operator A : H → H by assigning f → u. Note that A is
linear and N (A) = {0}. Moreover, A is symmetric (hence self-adjoint
since D(A) = H). Indeed, if v = Ag with g ∈ H, i.e.,

−Δv = g in Ω ,
v=0 on ∂Ω ,
then, by Green, we can write

∇u · ∇v dx = f v dx = f Ag dx ,
Ω Ω Ω

∇v · ∇u dx = gu dx = gAf dx ,
Ω Ω Ω
 
so Ω f Ag dx = Ω gAf dx as desired.
Let us show that operator A is also compact, i.e., for every constant
M > 0, the set
SM := {Af ; f ∈ L2 (Ω), f L2 (Ω) ≤ M }
is relatively compact in H = L2 (Ω). Indeed, if u = Af ∈ SM it follows
from (8.3.16) with v = u that


∇u2L2 (Ω) = f u dx
Ω
≤ f L2 (Ω) · uL2 (Ω)
228 8 Eigenvalues and Eigenvectors

and by the Poincaré inequality

≤ Cf L2 (Ω) · ∇uL2 (Ω) .

Finally,
∇uL2 (Ω) ≤ Cf L2 (Ω) ≤ CM ,
so that Af H01 (Ω) is less than or equal to some constant. We know
that bounded sets in H01 (Ω) are relatively compact in L2 (Ω) so that
SM is relatively compact in this space.
We can apply the Hilbert–Schmidt Theorem which guarantees the ex-
istence of a sequence of eigenpairs for A, {(un , μn )}∞
n=1 , such that |μn |
decreases to zero and {un }∞ n=1 is a complete orthonormal system (ba-
sis) in H = L2 (Ω). Note that Aun = μn un says that un satisfies the
problem 
−Δun = λn un in Ω ,
un = 0 on ∂Ω ,
where λn = 1/μn , i.e., (un , λn ) is an eigenpair of problem (8.3.15).
Note also that

λn = λn un 2 dx = |∇un |2 dx > 0 ∀n ≥ 1 ,
Ω Ω

so (λn )n≥1 is an increasing sequence of positive numbers, and λn →


+∞ (since |μn | = μn decreases to 0).

8.4 Eigenvalues of −Δ Under the


Robin Boundary Condition
Let again ∅ = Ω ⊂ RN , N ≥ 2, be a bounded domain with smooth
boundary ∂Ω. Consider the classical Robin eigenvalue problem

−Δu = λu in Ω ,
∂u
(8.4.17)
∂ν + αu = 0 on ∂Ω ,

where α is a positive constant and ∂u/∂ν denotes the outward unit


normal to ∂Ω. In this case we have the following natural definition:
8.4 Eigenvalues of −Δ Under the Robin Boundary Condition 229

Definition 8.12. A real number λ is said to be an eigenvalue of the


Robin problem (8.4.17) if there is a function u ∈ H 1 (Ω) \ {0} such that

∇u · ∇v dx + α uv ds = λ uv dx ∀v ∈ H 1 (Ω) . (8.4.18)
Ω ∂Ω Ω

Remark 8.13. Again, as ∂Ω was assumed to be smooth enough, the


eigenfunction u is, in fact, more regular.
Theorem 8.14. Assume ∅ = Ω ⊂ RN is a bounded domain with
smooth boundary ∂Ω and α is a positive constant. Then there exists
an increasing sequence of positive eigenvalues λn for (8.4.17) such that
λn → +∞ and a complete orthonormal system (in H = L2 (Ω)) of
eigenfunctions un satisfying problem (8.4.17) with λ = λn , n = 1, 2, . . .
Proof. Again, let H = L2 (Ω) equipped with the usual inner product
and norm. By the Lax–Milgram Theorem (see Chap. 6) we easily infer
that for every f ∈ H = L2 (Ω) the problem

−Δu + u = f in Ω ,
∂u
∂ν + αu = 0 on ∂Ω ,

has a unique solution u ∈ H 1 (Ω). Now define A : H → H by assigning


f → u. It is an easy exercise to check that A is positive and satisfies
all the conditions of the Hilbert–Schmidt Theorem. In contrast with
the previous Dirichlet case, we have replaced −Δ by −Δ + I in order
to ensure the strong positivity (coercivity) of the corresponding bilin-
ear form as well as the compactness of A (based on Theorem 5.22).
Therefore there exists a sequence of eigenpairs for A, {(un , μn )}∞
n=1 ,
such that |μn | = μn decreases to 0 and {un }∞ n=1 is an orthonormal
basis in H. The fact that Aun = μn un can be written as

−Δun = λn un in Ω ,
∂u
∂ν + αun = 0 on ∂Ω ,

where λn = −1+1/μn , i.e., (un , λn ) is an eigenpair of problem (8.4.17).


Note that

λn = λ n un 2 dx = ∇un 2 dx + α u2n ds > 0 ∀n ≥ 1 ,
Ω Ω ∂Ω
(8.4.19)
so (λn )n≥1 is an increasing sequence of positive numbers converging to
∞ (since |μn | = μn decreases to 0).
230 8 Eigenvalues and Eigenvectors

8.5 Eigenvalues of −Δ Under the


Neumann Boundary Condition
Under the same conditions on Ω we consider the Neumann eigenvalue
problem 
−Δu = λu in Ω ,
∂u
(8.5.20)
∂ν = 0 on ∂Ω ,
i.e., α > 0 in the Robin eigenvalue problem is replaced by α = 0. The
definition of an eigenvalue is the same as before (see Definition 8.12)
with α = 0 in (8.4.18). We have a result similar to Theorem 8.14
which we explain in what follows. One can again consider H = L2 (Ω)
with its usual scalar product and norm, and A : H → H the operator
which associates with each f ∈ H the unique solution u ∈ H 1 (Ω) of
the problem 
−Δu + u = f in Ω ,
∂u
∂ν = 0 on ∂Ω .
The Hilbert–Schmidt Theorem is again applicable (see also Remark 8.8),
thus there exist a decreasing sequence of positive eigenvalues of oper-
ator A, say (μn )n≥0 , μn → 0, and a corresponding complete orthonor-
mal system {un }∞ n=0 , i.e., Aun = μn un for n = 0, 1, 2, . . . So denoting
λn = −1 + 1/μn we have

−Δun = λn un in Ω ,
∂un
∂ν = 0 on ∂Ω ,

for n = 0, 1, 2, . . . and (λn ) is an increasing sequence converging to ∞.


We also have (8.4.19) with α = 0, hence λn ≥ 0 for all n ≥ 0. Note that
λ0 = 0 is the first eigenvalue of problem (8.5.20), the corresponding
eigenfunctions being the nonzero constant functions. Thus λ0 = 0 has
multiplicity one (so λ0 = 0 is said to be a simple eigenvalue)
 and the
corresponding normalized eigenfunction is u0 = ±1/ m(Ω), where
m(Ω) denotes the Lebesgue measure of Ω. Consequently, a result
similar to Theorem 8.14 holds, with the only difference that the first
eigenvalue is no longer a positive number (it is λ0 = 0).

In fact, the proof can also be done as in the Dirichlet case, as explained
below. Denote by V0 the one-dimensional space generated by u0 =
8.5 Eigenvalues of −Δ Under the Neumann Boundary Condition 231

1/ m(Ω): V0 = Span{u0 } = Span{1}. Obviously, the space H =
L2 (Ω) can be written as a direct sum


H = V0 ⊕ V1 , V1 = V0 = {v ∈ H; v dx = 0} .
Ω

The space V1 is a closed linear subspace of H, so it is a real Hilbert


space with respect to the same scalar product and norm. We can use
V1 as a basic space to show the existence of (λn , un ) for n = 1, 2, . . .
Note that W = V1 ∩ H 1 (Ω) is a real Hilbert space with respect to the
scalar product (see (8.5.21) below)

v, w = ∇v · ∇w dx ∀v, w ∈ W,
Ω

and the corresponding induced norm. Indeed, we can show that



β = inf { ∇v dx; v ∈ W,
2
v 2 dx = 1}
Ω
 Ω

Ω ∇v dx
2
= inf  ,
v∈W \{0} v2 dx
 Ω  
Rayleigh quotient

is a positive number. If we assume by way of contradiction that β = 0


then there exists a minimizing sequence (vk )k≥1 in W , vk L2 (Ω) =
1 ∀k ≥ 1, such that (vk ) converges to some v̂ weakly in H 1 (Ω) and
strongly in V1 . From

∇v̂L2 (Ω) =
2
∇v̂ · ∇(v̂ − vk ) dx + ∇v̂ · ∇vk dx
Ω Ω

≤ ∇v̂ · ∇(v̂ − vk ) dx + ∇v̂L2 (Ω) ∇vk L2 (Ω)


Ω

we derive
∇v̂ dx ≤ lim inf
2
∇vk 2 dx = 0 ,
Ω Ω
which implies
∇v̂2 dx = 0 ,
Ω
and so v̂ is a constant function. Since v̂ ∈ V1 it follows that v̂ = 0.
On the other hand, one can derive from vk L2 (Ω) = 1, k ≥ 1, that
232 8 Eigenvalues and Eigenvectors

v̂L2 (Ω) = 1, a contradiction. Thus β > 0 as claimed. In particular,


this implies the following Poincaré-type inequality:

βv2L2 (Ω) ≤ ∇v2L2 (Ω) ∀v ∈ W . (8.5.21)

Now, according to the Lax–Milgram Theorem, for each f ∈ V1 the


problem 
−Δu = f in Ω ,
∂u
∂ν = 0 on ∂Ω ,
has a unique solution u ∈ W . Moreover the operator A : V1 → V1
defined by Af = u, f ∈ V1 (i.e., A = (−Δ)−1 ), is positive and sat-
isfies the conditions of the Hilbert–Schmidt Theorem. Therefore the
existence of {(λn , un )}∞
n=1 is again guaranteed.
Summarizing what we have done so far, we obtain the following result.
Theorem 8.15. Assume ∅ = Ω ⊂ RN is a bounded domain with
smooth boundary ∂Ω. Then there exist a sequence of eigenvalues for
(8.5.20), 0 = λ0 < λ1 ≤ λ2 ≤ · · · ≤ λn ≤ · · · , such that λn → ∞
and a complete orthonormal system (in H = L2 (Ω)) of eigenfunctions
un verifying problem (8.5.20) with
 λ = λn , n = 0, 1, 2, . . . ; in addition
λ0 = 0 is simple and u0 = ±1/ m(Ω).

8.6 Some Comments


1. Let f ∈ L2 (Ω). The Neumann problem

−Δu = f in Ω ,
∂u
∂ν = 0 on ∂Ω ,

has a solution (that is unique up to additive constant) if and only if


f ∈ V1 (i.e., Ω f dx = 0). Sufficiency follows by the Lax–Milgram
Theorem, as noticed before, while the converse implication follows by
Green’s Identity.

2. Define

λD
1 = inf { ∇v dx; v ∈ H0 (Ω),
2 1
v 2 dx = 1}
Ω
 Ω
∇v 2 dx
= inf Ω .
Ω v dx
v∈H01 (Ω)\{0} 2
  
Rayleigh quotient
8.6 Some Comments 233

1 is positive and is attained for a function u1 ∈


It is easily seen that λD D

WD = H01 (Ω), uD 1 L2 (Ω) = 1 which is an eigenfunction corresponding


to λD 1 . Moreover, λD
1 is the first eigenvalue (or principal eigenvalue),
i.e., λ1 = λ1 given by Theorem 8.11, λD
D D
1 is simple, and u1 is positive
within Ω (see [14, Theorem 2, p. 336]). If we define

W1D = {v ∈ H01 (Ω); uD
1 v dx = 0} ,
Ω

then
λD
2 = inf { ∇v2
dx; v ∈ W D
1 , v 2 dx = 1},
Ω Ω

is attained at some uD
2∈ W1D , uD2 L2 (Ω) = 1, which is an eigenfunc-
tion corresponding to λ2 , u2 ⊥u1 . In general,
D D D setting

D
Wn−1 = {v ∈ H01 (Ω); j v dx = 0, j = 1, . . . , n − 1}, n ≥ 2,
uD
Ω

λD
n = inf { ∇v dx; v ∈ Wn−1 ,
2 D
v 2 dx = 1},
Ω Ω

we obtain a sequence of eigenpairs (λD D


n , un ), such that

1 < λ2 ≤ · · · ≤ λn ≤ · · · , λn = λn → ∞ ,
λD D D D

and {uD ∞
n }n=1 is an orthonormal basis in L (Ω).
2

This method is an alternative to that described in the proof of the


Hilbert–Schmidt Theorem.

Similar arguments work for the Robin and Neumann eigenvalue prob-
lems. We just recall that the lowest positive eigenvalues are given
by

λ1 = inf { ∇v dx + α
R 2
v ds; v ∈ H (Ω),
2 1
v 2 dx = 1}
Ω
 ∂Ω
 Ω
∇v2 dx + α 2
∂Ω v ds
= inf Ω  , (8.6.22)
2
v∈H 1 (Ω)\{0} Ω v dx


λN
1 = inf { ∇v dx; v ∈ W = V1 ∩ H (Ω),
2 1
v 2 dx = 1}
Ω
 Ω

Ω∇v dx
2
= inf 2
.
v∈W \{0} Ω v dx
234 8 Eigenvalues and Eigenvectors

It is readily seen that both λR N


1 and λ1 (which is equal to β defined be-
fore) are positive numbers and are attained for functions uR1 ∈ H (Ω)
1

and u2 ∈ W which are the corresponding eigenfunctions. It is also


N

well known that both λR 1 and λ1 are simple and u1 ∈ H (Ω), u2 ∈ W


N R 1 N

do not change sign within Ω.

3. For all f ∈ L2 (Ω) the Robin problem



−Δu = f in Ω ,
∂u
∂ν + αu = 0 on ∂Ω ,

where α is a given positive constant, has a unique solution u ∈ H 1 (Ω).


Indeed, by (8.6.22) we have the inequality

λR
1 v 2
dx ≤ ∇v 2
dx + α v 2 ds ∀v ∈ H 1 (Ω) , (8.6.23)
Ω Ω ∂Ω

which (along with the continuity of the canonical injection of H 1 (Ω)


into L2 (∂Ω)) shows that its right-hand side defines a norm equivalent
to the usual norm in H 1 (Ω). So the claim follows from Lax–Milgram
applied to the bilinear form

(u, v) → ∇u · ∇v dx + α uv ds .
Ω ∂Ω

4. For some particular sets Ω ⊂ RN the eigenpairs (λn , un ) can be


calculated. In the one-dimensional case (N = 1), if Ω = (0, 1), the
three eigenvalue problems look as follows:

−u = λu, 0 < x < 1 ,
u(0) = u(1) = 0 ,

−u = λu, 0 < x < 1 ,
u (0) = u (1) = 0 ,

−u = λu, 0 < x < 1 ,
−u (0) + αu(0) = 0 = u (1) + αu(1) ,
where α is a given positive constant. In the first two cases (Dirichlet
and Neumann) we obtain by easy computations

λD 2 2 D
n = π n , un (x) = 2 sin(nπx), n = 1, 2, . . . ;
8.6 Some Comments 235


λN N N 2 2
0 = 0, u0 (x) = 1; λn = π n , uN
n (x) = 2 cos(nπx), n = 1, 2, . . .
In the Robin case we cannot calculate by elementary methods the
corresponding eigenpairs (un , λn ), n ≥ 1.
−1/2
5. In the Dirichlet case above, the system {wn = λn un }∞ n=1 is an
orthonormal basis in WD = H01 (Ω). Indeed, we can deduce from

−Δun = λn un in Ω ,
(8.6.24)
un = 0 on ∂Ω ,

that
∇wn · ∇wk dx = un uk dx = δnk ∀n, k ≥ 1,
Ω Ω
which shows that {wn }∞ n=1 is an orthonormal system in WD . Now,

since {un }n=1 is complete in H = L2 (Ω), any u ∈ H can be written as
(see Theorem 6.21)

 ∞
 
u= (u, un )L2 (Ω) un = uun dx un ,
n=1 n=1 Ω

so, according to (8.6.24),



 
u= ∇u · ∇wn dx wn .
n=1 Ω

Thus {wn }∞
n=1 is complete in WD .

Similar statements hold true for the other two cases (Neumann and
Robin) within WN = V1 ∩ H 1 (Ω) and WR = H 1 (Ω) equipped with the
scalar products

(w1 , w2 )N = ∇w1 · ∇w2 dx,
Ω

(w1 , w2 )R = ∇w1 · ∇w2 dx + α w1 w2 dx .
Ω ∂Ω

In fact, these statements on the negative Laplacian with Dirichlet,


Neumann or Robin boundary conditions can be derived from the ab-
stract framework we describe below, related to the so-called energetic
extension of a linear operator Q satisfying the following assumptions:
236 8 Eigenvalues and Eigenvectors

(a) Q : D(Q) ⊂ H → H is a linear, densely defined, self-adjoint,


strongly positive operator, where (H, (·, ·),  · ) is a real, infinite di-
mensional, separable Hilbert space.
Define on the vector subspace D(Q) the so-called energetic scalar prod-
uct
(u, v)E = (Qu, v) ∀u, v ∈ D(Q).
It induces the energetic norm on D(Q): u2E = (u, u)E , u ∈ D(Q).
Denote by HE the completion of (D(Q),  · E ). Then HE is a Hilbert
space with respect to the scalar product

(u, v)E := lim (uk , vk )E ,


k→∞

where (uk ) and (vk ) are sequences in D(Q) converging to u and v,


respectively. Since Q is strongly positive, i.e., there exists a constant
c > 0 such that

(Qu, u) ≥ cu2 ∀u ∈ D(Q) , (8.6.25)

we have
1
u ≤ √ uE ∀u ∈ HE ,
c
so the identity map from HE to H is continuous (i.e., HE is contin-
uously embedded in H). Denote by QE the Riesz isomorphism from
(HE ,  · E ) onto its dual HE∗ , namely,

(QE u)(v) = (u, v)E ∀u, v ∈ HE .

Identifying H with its dual, we have

D(Q) ⊂ HE ⊂ H ⊂ HE∗ .

Since D(Q) is dense in H, we see that

QE u = Qu ∀u ∈ D(Q) ,

i.e., QE is an extension of Q which is called the energetic extension.


The term energetic will become clear later when we discuss examples.
We also assume that
(b) the identity map from HE into H is compact (i.e., HE is com-
pactly embedded into H).
Now we can state the following abstract spectral result.
8.6 Some Comments 237

Theorem 8.16. Assume (a) and (b) above are fulfilled. Then there
exist an increasing sequence (λn )n≥1 in (0, ∞) converging to ∞, and
an orthonormal basis {un }∞
n=1 in H such that

un ∈ D(Q), Qun = λn un ∀n ≥ 1 . (8.6.26)


−1/2
In addition, {λn un }∞ n=1 is an orthonormal basis in HE (the ener-
getic space defined above).
Proof. We shall adapt the proof of Theorem 8.11 to the present ab-
stract framework.
First of all, note that Q : D(Q) ⊂ H → H is bijective since its
extension QE : HE → HE∗ is. Denote A = Q−1 . Obviously, A ∈
L(H), N (A) = {0}, and A is self-adjoint. Operator A is also compact.
Indeed, if for some M > 0 we take f ∈ H such that f  ≤ M , then
we have for u = Af (equivalently Qu = f ),
u2E = (Qu, u) ≤ f  · u ≤ M u . (8.6.27)
Combining (8.6.27) with (8.6.25) yields
M
Af E = uE ≤ √ ,
c
i.e., A sends bounded sets in H to bounded sets in HE , hence A is
compact (cf. (b)). According to the Hilbert–Schmidt
 Theorem
 there
−1
exists a sequence of eigenpairs for A = Q , (μn , un ) n≥1 with the
known properties, with μn > 0, n ≥ 1, since Q is strongly positive.
Thus, the first part of the theorem follows with (λn , un ) n≥1 , where
λn = 1/μn , n = 1, 2, . . .
−1/2
In order to prove the second part, denote wn = λn un , n ≥ 1. It
follows from (8.6.26) that
(wn , wk )E = (Qun , uk ) = δnk ∀n, k ≥ 1 ,
i.e., the system {wn }∞ ∞
n=1 is orthonormal in HE . Now, since {un }n=1 is
complete in H, any u ∈ H can be expressed as (cf. Theorem 6.21)


u= (u, un )un .
n=1

So, by virtue of (8.6.26),




u= (u, wn )E wn ,
n=1

and so we can conclude that {wn }∞


n=1 is complete in HE .
238 8 Eigenvalues and Eigenvectors

Remark 8.17. In addition to the conclusions of Theorem 8.16 one can


show that {λn un }∞ ∗
1/2
n=1 is an orthonormal basis in HE .
For more details on energetic spaces and extensions we refer the reader
to [52, Chapter 5]. See also [22, Chapter 1, p. 18].
Remark 8.18. One can reobtain from Theorem 8.16 the previous state-
ments related to Q = −Δ with Dirichlet, Neumann or Robin boundary
condition.
In the Dirichlet case we have H = L2 (Ω) with its usual scalar product
and induced norm, D(Q) = H01 (Ω)∩H 2 1
 (Ω), and HE = H0 2(Ω) with the
energetic scalar product (u, v)E = Ω ∇u · ∇v dx and uE = (u, u)E .
Note that HE is equal to WD defined above. 
In the Neumann case, H = V1 := {v ∈ L2 (Ω); Ω v dx = 0} with the
scalar product and norm inherited from L2 (Ω), D(Q) = V1 ∩ H 
2 (Ω),

and HE = V1 ∩ H 1 (Ω) (denoted above by WN ) with (u, v)E = Ω ∇u ·


∇v dx, u2E = (u, u)E . Of course, in this case we have an additional
eigenvalue λ0 = 0 as specified before.
Finally, in the case of the Robin boundary condition, H = L2 (Ω) with
its usual scalar product and norm, D(Q) 2 1
 = H (Ω), and HE = H (Ω)
(denoted above by WR ) with (u, v)E = Ω ∇u · ∇v dx + α ∂Ω uv ds and
u2E = (u, u)E .
There are also many other specific examples covered by Theorem 8.16,
in particular the case Q = −Δ with different conditions on parts of
the boundary of Ω.
Remark 8.19. In order to develop the above theory on energetic exten-
sions we can begin with an operator Q which satisfies all the assump-
tions in (a), with one exception: Q is only symmetric, not self-adjoint.
Everything works similarly and HE and QE can be constructed by us-
ing the same arguments. Now define an operator Q̂ : D(Q̂) ⊂ H → H
as follows:
D(Q̂) = {v ∈ HE ; QE v ∈ H}, Q̂v = QE v ∀v ∈ D(Q̂) .
Obviously, Q̂ is an extension of Q so D(Q̂) is dense in H. It is also
easily seen that Q̂ is strongly positive. As QE is bijective so is Q̂ since
it is a restriction of QE . Note also that Q̂−1 ∈ L(H) and is symmetric,
hence self-adjoint. Thus Q̂ is self-adjoint as well. Operator Q̂ is called
the Friedrichs extension of Q. It is easily seen that the energetic space
and the energetic extension defined by Q̂ are exactly HE and QE .
Summarizing, we see that Q̂ satisfies all the conditions in (a) and plays
the role of the former Q. So assuming in (a) that Q is a self-adjoint
8.7 Exercises 239

operator (not a symmetric one) does not restrict the generality. In


fact, in this case the Friedrichs extension of Q is Q itself.
For example, if we choose H = L2 (Ω) (where Ω ⊂ RN is an open
bounded set with smooth boundary) and D(Q) = C0∞ (Ω), Qu = −Δu,
then Q is symmetric in H (not self-adjoint), the corresponding ener-
getic space is HE = H01 (Ω), and QE : HE → HE∗ is given by

QE (u)(v) = ∇u · ∇v dx ∀u, v ∈ H01 (Ω) ,
Ω

i.e., the same energetic extension we had before (see Remark 8.18).
Obviously, the corresponding Friedrichs extension of Q is given by

D(Q̂) = H01 (Ω) ∩ H 2 (Ω), Q̂u = −Δu ∀u ∈ D(Q̂) .

8.7 Exercises
1. Let X denote the real linear space of all polynomials with real
coefficients of degree ≤ 3. Define A : X → X by

(Ap)(x) = xp (x), x ∈ R, p ∈ X,

where p denotes the derivative of p.

(a) Determine N (A) and R(A);


(b) Find all the eigenpairs of A.

2. Let X = C[0, 1] be the usual real Banach space equipped with


the sup-norm. Define on X the operator A by

(Au)(t) = (at + b)u(t), t ∈ [0, 1], u ∈ X,

where a, b are real constants.

(i) Show that A ∈ L(X);

(ii) Find the eigenvalues and eigenvectors of A.

3. Let X be a Banach space over K. Let A, B ∈ L(X) and λ ∈ K,


λ = 0. Prove that λ is an eigenvalue of AB := A ◦ B if and only
if λ is an eigenvalue of BA := B ◦ A.
240 8 Eigenvalues and Eigenvectors

4. Let X denote the real Banach space C[0, 1] with the usual sup-
norm. Let k = k(t, s) ∈ C[0, 1] × C[0, 1], with ∂k/∂t ∈ C[0, 1] ×
C[0, 1], k(t, t) = 0 ∀t ∈ [0, 1]. Define on X the operator A by
t
(Au)(t) = k(t, s)u(s) ds, t ∈ [0, 1].
0

Show that

(a) A ∈ L(X);
(b) A has no eigenvalue.

Solve the same exercise for X = L2 (0, 1) with the usual norm.

5. Let H = l2 be the usual ∞Hilbert2 space of sequences x = (x1 ,


x2 , . . . ) in C satisfying n=1 |xn | < ∞ with the inner product


x, y = xi ȳi , x = (x1 , x2 , . . . ), y = (y1 , y2 , . . . ) ∈ H,
n=1

and the corresponding Hilbertian norm. Define the multiplica-


tion operator A by

Ax = (λ1 x1 , λ2 x2 , . . . ) ∀x = (x1 , x2 , . . . ) ∈ H,

where (λn )n∈N is a given sequence in C with supn∈N |λn | < ∞.

(a) Show that A ∈ L(H) and determine A;


(b) Show that A is symmetric (hence self-adjoint) ⇐⇒ λn ∈ R
for all n ∈ N;
(c) Find all the eigenvalues of A.

6. Let H = L2 (0, 1) be the real Hilbert space equipped with the


usual scalar product and the induced norm, denoted  · . Define
A : H → H by
1 t
(Au)(t) = t u(s) ds + su(s) ds, 0 ≤ t ≤ 1, u ∈ H.
t 0

(a) Check that A ∈ L(H);


(b) Prove that A is a compact operator;
(c) Prove that A is symmetric (hence self-adjoint);
8.7 Exercises 241

(d) Find all the eigenvalues and eigenvectors (eigenfunctions)


of A and use this information to determine an orthonormal
basis of H.

7. Let (H, (·, ·),  · ) be a Hilbert space. Show that x ∈ H \ {0} is


an eigenvector of A ∈ L(H) ⇐⇒ |(Ax, x)| = Ax · x.

8. Let (H, (·, ·),  · ) be a Hilbert space and let u, v ∈ H \ {0} be


two orthogonal vectors (i.e., (u, v) = 0). Define A : H → H by

Ax = (x, v)u + (x, u)v, x ∈ H.

Obviously, A ∈ L(H).

(a) Calculate A;


(b) Show that A is symmetric (hence self-adjoint);
(c) Using (a) calculate A, where A : L2 (−π, π)
→ L2 (−π, π) is the linear operator defined by
π
(Af )(t) = sin t f (s) cos s ds+
−π
π
cos t f (s) sin s ds, t ∈ [−π, π],
−π

for all f ∈ L2 (−π, π);


(d) Find all the eigenpairs of A.

9. Let (H, (·, ·),  · ) be a Hilbert space and let {e1 , e2 , . . . , em }


⊂ H be an orthonormal system, where m is a given natural
number. Define A : H → H by


m
Ax = ci (x, ei )ei , x ∈ H,
i=1

where ci ∈ K \ {0}, i = 1, . . . , m.

(a) Show that A ∈ L(H) and determine A, R(A) and N (A);
(b) Show that A is symmetric ⇐⇒ ci ∈ R ∀i ∈ {1, . . . , m};
(c) Determine all the eigenvalues of A.
242 8 Eigenvalues and Eigenvectors

10. Let H = L2 (0, 1) be the real Hilbert space equipped with the
usual scalar product and norm. Define A : H → H by
1
t s
(Au)(t) = u(s) ds, t ∈ [0, 1], u ∈ H.
1+t 0 1+s

(a) Show that A ∈ L(H) and A is symmetric (hence self-adjoint);


(b) Determine R(A) and N (A);
(c) Determine all the eigenpairs of A.

11. Let H = L2 (0, 1) be the real Hilbert space equipped with the
usual scalar product and norm. For u ∈ H consider the problem

v  (t) = u(t) a.e. in (0, 1),

v (0) = 0, v(1) = 0.

Define A : H → H by Au = v, u ∈ H, where v is the solution of


the above problem corresponding to u.

(a) Show that A ∈ L(H) and N (A) = {0};


(b) Prove that A is symmetric and compact;
(c) Find all the eigenpairs of A and use this information to
determine an orthonormal basis of H.

12. Solve the Dirichlet eigenvalue problem



−Δu = λu in Ω ⊂ R2 ,
u=0 on ∂Ω,

where Ω is the rectangle (0, a) × (0, b) ⊂ R2 , a, b ∈ (0, ∞).

13. Consider in Ω = (0, a) × (0, b) ⊂ R2 , a, b ∈ (0, ∞), the eigenalue


problem for −Δ with Neumann conditions on all sides of the rect-
angle Ω or combinations of Dirichlet and Neumann conditions on
different sides of Ω. Solve all these eigenvalue problems.
Chapter 9

Semigroups of Linear
Operators

Let A be an n × n matrix with entries aij ∈ C for all i, j = 1, 2, . . . , n.


Consider the Cauchy problem

u (t) = Au(t), t ≥ 0, (E)

u(0) = x, (IC)
where x is a given (column) vector in Cn . It is well known that problem
(E), (IC) has a unique solution given by

u(t) = etA x, t ≥ 0, (9.0.1)

where etA represents the fundamental matrix of the linear differential


system (E) which equals I (the n × n identity matrix) for t = 0. We
have
∞ k
tA t k
e = A , (9.0.2)
k!
k=0

which is valid for all t ∈ R. Here A and etA can be interpreted as


linear operators A, etA ∈ L(X), where X = Cn , equipped with one of
its equivalent norms, and L(X) denotes, as usual, the space of bounded
linear operators from X into itself. As we will see later, the family of
matrices (operators) {T (t) = etA ; t ≥ 0} is a uniformly continuous
semigroup on X = Cn . What’s more, the family {T (t); t ≥ 0} extends

© Springer Nature Switzerland AG 2019 243


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 9
244 9 Semigroups of Linear Operators

to a group of linear operators, {etA ; t ∈ R}. The representation of the


solution u(t) as
u(t) = T (t)x, t ≥ 0 (9.0.3)
allows the derivation of some properties of solutions from the proper-
ties of the family {T (t); t ≥ 0}. This idea extends easily to the case
when X is a general Banach space and A is a bounded (continuous)
linear operator, A ∈ L(X).
If A is not an element of L(X), then the operator exponential etA no
longer makes sense. This case is not trivial, rather it is much more
interesting and very useful in applications. If A : D(A) ⊂ X → X
satisfies certain conditions, then one can associate with A a so-called
C0 -semigroup of linear operators {T (t); t ≥ 0} ⊂ L(X) (see Defi-
nition 9.1 below), so that the solution of the Cauchy problem (E),
(IC) can again be represented by the above formula (9.0.3). Indeed,
there is a central result in the linear semigroup theory, known as the
Hille–Yosida theorem,1 which establishes the necessary and sufficient
conditions for a linear operator A to “generate” a C0 -semigroup of
linear operators {T (t); t ≥ 0} ⊂ L(X). In this way, one can solve
linear partial differential equations of the form (E), where A repre-
sents unbounded linear differential operators with respect to the space
variables, defined on convenient function spaces.
The linear semigroup theory received considerable attention in the
1930s as a new approach in the study of parabolic and hyperbolic
linear partial differential equations. This theory has since developed
as an independent theory with applications in some other fields, such
as ergodic theory, the theory of Markov processes, etc.
In this chapter we present some of the most important results of the
linear semigroup theory and provide some related applications.

9.1 Definitions
Throughout this chapter X will be a Banach space over K with norm
 · , where K is either R or C. Denote as usual by L(X) the space
of all bounded (continuous) linear operators T : X → X, which is a
Banach space with respect to the operator norm
T  = sup {T x : x ∈ X, x ≤ 1}.

1
Carl Einar Hille, American mathematician, 1894–1980; Kosaku Yosida,
Japanese mathematician, 1909–1990.
9.1 Definitions 245

Definition 9.1. A one-parameter family {T (t); t ≥ 0} ⊂ L(X) is said


to be a semigroup if

(i) T (0) = I (the identity operator on X);

(ii) T (t + s) = T (t)T (s) for all t, s ≥ 0 (the semigroup property).

If, in addition,

(iii) limt→0+ T (t)x − x = 0 for all x ∈ X,

then {T (t); t ≥ 0} is called a C0 -semigroup (or a semigroup of


class C0 , or a strongly continuous semigroup).

Condition (iii) says that the function t → T (t)x is continuous at t = 0,


that is why {T (t); t ≥ 0} is called a C0 -semigroup.

Definition 9.2. A family {T (t); t ≥ 0} ⊂ L(X) is said to be a uni-


formly continuous semigroup if it satisfies conditions (i) and (ii)
above, and

(iii)’ limt→0+ T (t) − I = 0.

Remark 9.3. Obviously, condition (iii) is stronger than (iii). Indeed,


for any x ∈ X, we have

T (t)x − x ≤ T (t) − I · x,

which proves the assertion.

Definition 9.4. Let {T (t); t ≥ 0} ⊂ L(X) be a C0 -semigroup. Denote

1
Ax := lim [T (h)x − x], (9.1.4)
h→0+ h

for all x ∈ X for which the above limit exists. If D(A) is the set of all
such x’s, then we have a linear operator A : D(A) ⊂ X → X, which is
called the infinitesimal generator of the semigroup {T (t); t ≥ 0}.

Now let us state a first result on semigroups of linear operators:

Theorem 9.5. For any operator A ∈ L(X) the family {T (t) = etA ; t ≥
0} is a uniformly continuous semigroup whose infinitesimal generator
is A.
246 9 Semigroups of Linear Operators

Proof. Recall that


∞ k

tA t
e = Ak ,
k!
k=0

meaning that for any t ≥ 0 this series is convergent in L(X) and its
sum is etA . It is easily seen that the family {T (t) = etA ; t ≥ 0} satisfies
(i) and (ii). Condition (iii) is also satisfied since
∞ k
 t
T (t) − I ≤  A ≤ tA · et A
k!
k=1

for all t ≥ 0. Note also that for all h > 0

h−1 [T (h) − I] − A ≤ hA2 eh A ,

which shows that A is the infinitesimal generator of {T (t) = etA ;


t ≥ 0}.

Remark 9.6. We will see later that, in fact, every uniformly continuous
semigroup is a family of operator exponentials {etA ; t ≥ 0} with A ∈
L(X). Note that A can be obtained from the right derivative of T (t) =
etA calculated at t = 0. This explains the above definition of the
generator of a C0 -semigroup {T (t); t ≥ 0}. In this case, we can expect
only the existence of the right derivative at t = 0 of T (t)x for some
points x ∈ X.
Examples of C0 -semigroups (that do not belong to the class of uni-
formly continuous semigroups) will be provided later.

9.2 Some Properties of C0 -Semigroups


We start this section with a basic result in the linear semigroup theory:

Theorem 9.7. If {T (t); t ≥ 0} ⊂ L(X) is a C0 -semigroup, then the


following hold:

(a) there exist constants M ≥ 1 and ω ∈ R such that

T (t) ≤ M eωt ∀t ≥ 0; (9.2.5)

(b) the function t → T (t)x is continuous on [0, ∞) for all x ∈ X.


9.2 Some Properties of C0 -Semigroups 247

Proof. Assertion (a): Let us first prove that there exists a constant
δ > 0 such that T (t) is bounded on [0, δ], i.e.,

sup {T (t) : 0 ≤ t ≤ δ} =: C < ∞. (9.2.6)

Assume, by way of contradiction, that this is not the case, i.e., there
exists a sequence of real numbers tk ! 0 such that T (tk ) → ∞. On
the other hand, condition (iii) of Definition 9.1 implies that for each
x ∈ X there exists a natural number N = N (x) such that

T (tk )x ≤ T (tk )x − x + x ≤ 1 + x, ∀k > N. (9.2.7)

By the Uniform Boundedness Principle, we derive from (9.2.7) that


T (tk ) is bounded, which contradicts the assumption above. Thus
(9.2.6) holds true for some δ > 0. Since T (0) = I = 1, we have
C ≥ 1.
Now, for all t ≥ 0 we have the decomposition (division with remainder)

t = nδ + r, n ∈ N, 0 ≤ r < δ.

So, by using condition (ii) of Definition 9.1, we can derive the estimate

T (t) ≤ T (δ)n · T (r) ≤ C n+1 .

Therefore,
T (t) ≤ C · C t/δ , t ≥ 0,
which shows that (9.2.5) holds true with M = C and ω = (ln C)/δ.
Assertion (b): Let t0 > 0 and x ∈ X be arbitrary but fixed. For any
h > 0 we have (cf. condition (ii) from Definition 9.1)

T (t0 + h)x − T (t0 )x = T (t0 )[T (h)x − x]


≤ T (t0 ) · T (h)x − x,

which shows that the function t → T (t)x is continuous from the right
at t = t0 (cf. condition (iii) of Definition 9.1). Now, for 0 < h < t0 ,
we can write (cf. (ii) and (9.2.5))

T (t0 − h)x − T (t0 )x = T (t0 − h)[x − T (h)x]


≤ M eω(t0 −h) x − T (h)x,

which implies that t → T (t)x is continuous from the left at t = t0 .


248 9 Semigroups of Linear Operators

Remark 9.8. In fact, if {T (t); t ≥ 0} ⊂ L(X) is a C0 -semigroup,


one can easily derive the following property that is stronger than (b)
above: the map (t, x) → T (t)x is continuous from [0, ∞) × X to X
(see Exercise 9.4).
Remark 9.9. The constant ω in (9.2.5) determined in the proof above
is nonnegative, but this is not the best constant. Indeed, sometimes ω
can be negative (e.g., this is the case if T (t) = etA , where A is a real
square matrix whose eigenvalues have negative real parts).
Theorem 9.10. Let {T (t) : t ≥ 0} ⊂ L(X) be a C0 -semigroup and
let A be its infinitesimal generator. Then,
(c) A is densely defined: D(A) = X;

(d) A is a closed operator;

(e) for all t ≥ 0, x ∈ D(A), we have T (t)x ∈ D(A) and


d
T (t)x = AT (t)x = T (t)Ax. (9.2.8)
dt

Proof of (c): Obviously,


t
1
x = lim T (s)x ds, ∀x ∈ X.
t→0+ t 0

Since D(A) is a linear subspace of X, to prove (c) it suffices to show


that t
T (s)x ds ∈ D(A), ∀t > 0, x ∈ X. (9.2.9)
0
Indeed, for some given t > 0, x ∈ X, and for all h > 0, we have
t t
h−1 [T (h) − I] T (s)x ds = h−1 [T (s + h)x − T (s)x] ds
0 0
t+h
−1
=h T (s)x ds
h
t 
− T (s)x ds
0
t+h
−1
=h T (s)x ds
t
h
− h−1 T (s)x ds.
0
9.2 Some Properties of C0 -Semigroups 249

Therefore, there exists


t
−1
lim h [T (h) − I] T (s)x ds = T (t)x − x, (9.2.10)
h→0+ 0

which implies (9.2.9).


Proof of (d): Let (xn ) be a sequence in D(A) such that xn → x and
Axn → y. Using (9.2.10), we can write

t
T (t)xn − xn = lim T (s)h−1 [T (h)xn − xn ] ds
h→0+ 0
t
= T (s)Axn ds ∀t > 0.
0

It follows that
t
T (t)x − x = T (s)y ds ∀t > 0,
0

so
lim t−1 [T (t)x − x] = y.
t→0+

It follows that x ∈ D(A) and y = Ax.


Proof of (e): Let t ≥ 0 and x ∈ D(A). We have

T (t)Ax = lim T (t){h−1 [T (h)x − x]}


h→0+
= lim h−1 [T (h)T (t)x − T (t)x],
h→0+

which shows that T (t)x ∈ D(A) and

T (t)Ax = AT (t)x. (9.2.11)

On the other hand,

lim h−1 [T (t + h)x − T (t)x] = lim T (t){h−1 [T (h)x − x]}


h→0+ h→0+
= T (t)Ax. (9.2.12)

From (9.2.11) and (9.2.12) we derive

d+
T (t)x = AT (t)x = T (t)Ax. (9.2.13)
dt
250 9 Semigroups of Linear Operators
+
We have used ddt to denote the right derivative. To conclude, we need
to show that the left derivative of T (t)x exists and equals its right
derivative at any t > 0. For 0 < h < t, we have

 − h−1 [T (t − h)x − T (t)x] − T (t)Ax


= T (t − h){h−1 [T (h)x − x] − T (h)Ax}
≤ M eω(t−h) {h−1 [T (h)x − x] − Ax + Ax − T (h)Ax}.

It follows that for all t > 0 and x ∈ D(A),

d−
T (t)x = T (t)Ax. (9.2.14)
dt
Obviously, (e) follows from (9.2.13) and (9.2.14).

Remark 9.11. In fact, if A is the generator of a C0 -semigroup {T (t); t ≥


0} ⊂ L(X), then the subspace Y := ∩∞ n
n=1 D(A ) is dense in X, where
the operators A : D(A ) → X are inductively defined as follows:
n n

D(An ) = {x ∈ D(An−1 ); An−1 x ∈ D(A)},


An x = A(An−1 x) ∀x ∈ D(An ),

for all n ∈ N, n ≥ 2. Now, for any x ∈ X and φ ∈ C0∞ (R), with


supp φ ⊂ (0, +∞), define

x(φ) = φ(t)T (t)x dt.
0

For h > 0 we have



1  1 ∞
T (h) − I x(φ) = φ(t)T (t + h)x dt
h h
0
∞ 
− φ(t)T (t)x dt

0
1 ∞
= φ(t − h)T (t)x dt
h h
∞ 
− φ(t)T (t)x dt
∞0
φ(t − h) − φ(t)
= T (t)x dt,
0 h
which converges to −x(φ ) as h → 0+ . Hence x(φ) ∈ D(A) and Ax =
−x(φ ). We infer by induction that x(φ) ∈ D(An ) and An x(φ) =
9.2 Some Properties of C0 -Semigroups 251

(−1)n x(φ(n) ) for all n ∈ N, hence x(φ) ∈ Y . Now, let us prove that
any x ∈ X can be approximated by x(φ) for suitable φ’s (see [49, p.
44]). If ω ∈ C0∞ (R) is the usual test function with supp ω = [−1, +1]
 +1
and −1 ω(t) dt = 1, define the mollifier
 
1 t
φε (t) = ω − 2 ∀t ∈ R, ε > 0.
ε ε

Since

x(φε ) − x =  φε (t)[T (t)x − x] dt
ε

≤ φε (t)T (t)x − x dt
ε

≤ sup T (t)x − x φε (t) dt
t∈[ε,3ε] ε
= sup T (t)x − x.
t∈[ε,3ε]

Therefore
lim x(φε ) − x = 0.
ε→0+

Theorem 9.12. If two C0 -semigroups have the same infinitesimal


generator, then they coincide.
Proof. Let A be the common generator of two C0 -semigroups, say
{T (t); t ≥ 0} and {S(t); t ≥ 0}. For any t > 0 and x ∈ D(A) we have
(see Theorem 9.10, (e))
d
[T (t − s)S(s)x]
ds
= −T (t − s)AS(s)x + T (t − s)AS(s)x = 0, ∀ 0 ≤ s < t.

Hence, for all t > 0 and x ∈ D(A), the function s → T (t − s)S(s)x is


constant on the interval [0, t]. In particular, T (t)x = S(t)x on D(A)
for all t ≥ 0. This concludes the proof since D(A) = X.

Remark 9.13. Property (e) of Theorem 9.10 says that for every x ∈
D(A) the function u(t) = T (t)x is continuously differentiable on [0, ∞)
and satisfies the Cauchy problem

u (t) = Au(t), t ≥ 0; u(0) = x. (CP )


252 9 Semigroups of Linear Operators

This u, which is a C 1 -solution of problem (CP ) (hence a classical


solution on every bounded interval [0, r] in the sense of Definition 9.44
below) is unique. Indeed, if ũ is also a C 1 -solution of problem (CP ),
then for any t > 0 we have
d
T (t − s)ũ(s) = −T (t − s)Aũ(s) + T (t − s)ũ (s) = 0 ∀ s ∈ (0, t),
ds
hence s → T (t − s)ũ(s) is a constant function on [0, t]. In particular,
its values at s = 0 and s = t coincide:
ũ(t) = T (t)ũ(0) = T (t)x,
which proves that the solution of (CP ) is unique and is given by u(t) =
T (t)x, t ≥ 0.
Now, if x ∈ X\D(A), then the function u(t) = T (t)x satisfies the initial
condition u(0) = x, but is no longer differentiable (see Sect. 9.5 below),
so it cannot satisfy the Cauchy problem above in a classical sense.
However, u can be regarded as a generalized solution (or mild solution,
as it will be called later, see Sect. 9.11) since the initial condition is still
satisfied, u(0) = x, and there exists a sequence (un ) of C 1 -solutions
of equation (CP )1 , such that un → u in C([0, r]; X) for all r > 0.
Indeed, one can choose a sequence (xn ) in D(A), such that xn → x (cf.
Theorem 9.10, (c)), and obviously un (t) = T (t)xn are all C 1 -solutions
satisfying the required condition:
T (t)xn − T (t)x ≤ T (t) · xn − x ≤ M eωr xn − x,
for all t ∈ [0, r]. Clearly, the definition of the generalized solution is
independent of the choice of the sequence (un ) (or (xn = un (0))).
It is worth pointing out that in the discussion above A was assumed to
be the infinitesimal generator of a C0 - semigroup. Now, given a linear
operator A we want to know the conditions on A ensuring the existence
of a C0 -semigroup whose generator is precisely A. This will allow us
to solve Cauchy problems like (CP ) above. From Theorem 9.10 we
know that such an A has to necessarily be densely defined and closed.
The complete answer will be provided later.

9.3 Uniformly Continuous Semigroups


Uniformly continuous semigroups have been defined before. We have
also seen that for any A ∈ L(X), the family {T (t) = etA ; t ≥ 0} is
a uniformly continuous semigroup whose generator is A. According
9.3 Uniformly Continuous Semigroups 253

to Theorem 9.12, this is the unique C0 -semigroup, hence the unique


uniformly continuous semigroup, having A as its generator. The next
result shows that, in fact, the class of uniformly continuous semigroups
reduces to {{etA ; t ≥ 0}; A ∈ L(X)}.

Theorem 9.14. Let {T (t); t ≥ 0} ⊂ L(X) be a uniformly continuous


semigroup. If A is its infinitesimal generator, then A ∈ L(X).

Proof. Since t
1
lim I − T (s) ds = 0,
t→0 + t 0
there exists a t0 > 0 such that
t0
1
I − B < 1, where B = T (s) ds.
t0 0
 −1
Therefore, B is invertible and B −1 = I − (I − B) ∈ L(X). Now,
for all h > 0, we have
t0
1 1 t0 
[T (h) − I]B = T (s + h) ds − T (s) ds
h ht0 0 0

1 1 t0 +h 1 h 
= T (s) ds − T (s) ds .
t0 h t 0 h 0

Therefore, there exists


1 1
lim [T (h) − I]B = [T (t0 ) − I], (9.3.15)
h→0+ h t0
with respect to the topology of L(X). Since the generator of {T (t); t ≥
0} is A, it follows from (9.3.15) that
1
AB = [T (t0 ) − I]. (9.3.16)
t0

Since B is invertible and B −1 ∈ L(X), we infer from (9.3.16)


1
A= [T (t0 ) − I]B −1 ∈ L(X).
t0

In fact, every uniformly continuous semigroup {etA ; t ≥ 0}, A ∈ L(X),


can naturally be extended to the group {etA ; t ∈ R} (see the next
section).
254 9 Semigroups of Linear Operators

Remark 9.15. Let {T (t); t ≥ 0} ⊂ L(X) be a C0 -semigroup whose


infinitesimal generator A : D(A) ⊂ X → X is bounded, i.e., there
exists a constant c > 0 such that Ax ≤ cx for all x ∈ D(A).
Then, D(A) = X, A ∈ L(X) and so the semigroup is in fact uniformly
continuous: T (t) = etA , t ≥ 0. Indeed, since D(A) = X, A has an
extension à ∈ L(X). Denote by {T̃ (t) = età ; t ≥ 0} the (uniformly
continuous) semigroup with generator Ã. For an arbitrary t > 0 and
x ∈ D(A), we have
d
[T̃ (t − s)T (s)x]
ds
= −T̃ (t − s)ÃT (s)x + T̃ (t − s)AT (s)x = 0 ∀s ∈ (0, t),

since T (s)x ∈ D(A) ⊂ D(Ã) = X and ds d


T (s)x = AT (s)x for all
s ∈ (0, t). It follows that the function s → T̃ (t − s)T (s)x is constant
on [0, t], and hence T̃ (t)x = T (t)x for all x ∈ D(A) which shows that
T̃ (t)x = T (t)x for all x ∈ X. Therefore, A coincides with à and the
assertion follows.

9.4 Groups of Linear Operators.


Definitions and Link to Operator
Semigroups
Definition 9.16. A family {G(t); t ∈ R} ⊂ L(X) is called a group
if
(j) G(0) = I (the identity operator on X);

(jj) G(t + s) = G(t)G(s) for all t, s ∈ R (the group property).


If, in addition,
(jjj) limt→0 G(t)x − x = 0 for all x ∈ X,
then {G(t); t ∈ R} is called a C0 -group (or a group of class C0 ).
The infinitesimal generator A of a group {G(t); t ∈ R} is defined
by
1
Ax = lim [G(h)x − x] ∀x ∈ D(A),
h→0 h

where D(A) is the set of all x ∈ X for which the limit above exists.
If {G(t); t ∈ R} satisfies conditions (j), (jj) and, in addition,
9.4 Groups of Linear Operators: Definitions and Link to Operator. . . 255

(jjj)’ limt→0 G(t) − I = 0,

(which is stronger than (jjj)), then {G(t); t ∈ R} is called a uni-


formly continuous group.
Remark 9.17. If {G(t); t ∈ R} is a C0 -group, then the families {G(t);
t ≥ 0} and {G(−t); t ≥ 0} are both C0 -semigroups, with generators
A and −A, respectively (prove it!). Conversely, if {T+ (t); t ≥ 0},
{T− (t); t ≥ 0} are C0 -semigroups with generators A and −A, respec-
tively, then one can define a C0 -group

T+ (t) if t ≥ 0,
G(t) =
T− (−t) if t < 0,

having A as its generator. The proof of this assertion relies on the


identity
T+ (t)T− (t) = T− (t)T+ (t) = I ∀t ≥ 0. (9.4.17)
Indeed, for any x ∈ D(A) = D(−A) and t ≥ 0, we have (cf. Theo-
rem 9.10, (e))
d
T+ (t)T− (t)x = T+ (t)AT− (t)x − T+ (t)AT− (t)x = 0,
dt
hence t → T+ (t)AT− (t)x is a constant function. Since it takes the
value x for t = 0, it follows that

T+ (t)T− (t)x = x ∀t ≥ 0, x ∈ D(A). (9.4.18)

We know that D(A) = X, therefore (9.4.18) holds for all x ∈ X, i.e.,

T+ (t)T− (t) = I ∀t ≥ 0.

Similarly,
T− (t)T+ (t) = I ∀t ≥ 0,
so (9.4.17) holds true. Identity (9.4.17) shows that T+ (t) and T− (t) are
invertible for all t ≥ 0, being inverse to each other. Thus {G(t); t ∈ R}
satisfies the group property (jj). Since (j) and (jjj) are trivially
satisfied, we conclude that {G(t); t ∈ R} constructed above is indeed
a C0 -group, and its generator is A, as claimed.
Note that all the members G(t) of any group are necessarily invertible
operators, since G(t)G(−t) = I = G(−t)G(t). The next result shows
that invertibility allows one to extend any C0 -semigroup to a C0 -group.
256 9 Semigroups of Linear Operators

It is worth pointing out that if {T (t); t ≥ 0} ⊂ L(X) is a semigroup


and T (t0 ) is a bijection from X to itself (hence T (t0 ) is invertible) for
some t0 > 0, then so is T (t) for all t ≥ 0. Indeed, for t ∈ (0, t0 ), we
have
T (t0 ) = T (t)T (t0 − t) = T (t0 − t)T (t),
which shows that T (t) is bijective. For t > t0 we write t as t = nt0 + s,
where n ∈ N and 0 ≤ s < t0 (division with remainder) and so T (t) =
T (t0 )n T (s), which clearly shows that T (t) is also bijective in this case.

Theorem 9.18. Let {T (t); t ≥ 0} be a C0 -semigroup and let A de-


note its infinitesimal generator. If T (t) is a bijection from X to itself
for all t > 0 (equivalently, T (t0 ) is a bijection for some t0 > 0),
then {T (t)−1 ; t ≥ 0} is a C0 -semigroup with the generator −A, so
{G(t); t ∈ R} defined by

T (t) if t ≥ 0,
G(t) =
T (−t)−1 if t < 0,

is a C0 -group whose generator is A.

Proof. Denote S(t) = T (t)−1 , t ≥ 0. Obviously, S(0) = I and

S(t + s) = [T (s)T (t)]−1 = T (t)−1 T (s)−1 = S(t)S(s),

for all t, s ≥ 0. Thus, the family {S(t) = T (t)−1 ; t ≥ 0} is a semigroup,


and {G(t); t ∈ R} defined in the statement above is a group. Now, let
us prove that the semigroup {S(t) = G(−t); t ≥ 0} satisfies condition
(iii) of Definition 9.1. Let x ∈ X and s > 1. Denote y := T (s)−1 x.
For 0 < t < 1, we have

S(t)x − x = G(−t)x − x
= G(−t)G(s)y − T (s)y
= T (s − t)y − T (s)y → 0 as t → 0+ ,

since t → T (t)y is continuous on [0, ∞). Therefore {S(t); t ≥ 0}


satisfies condition (iii) as claimed, i.e., it is a C0 -semigroup. Let B be
the infinitesimal generator of {S(t) = T (t)−1 ; t ≥ 0}. For x ∈ D(A)
we have
1
lim { [x − T (h)x] + Ax} = 0.
h→0 + h
9.5 Translation Semigroups 257

This implies that


1
lim S(h){ [x − T (h)x] + Ax} = 0,
h→0+ h

since S(h) ≤ M1 eω1 h for some M1 ≥ 0 and ω1 ∈ R (cf. Theorem 9.7,


(a)). Therefore,

lim h−1 [T (h)−1 x − x] = −Ax,


h→0+

i.e., D(A) ⊂ D(B) and Bx = −Ax ∀x ∈ D(A). Since T (t) =


 −1
T (t)−1 , t ≥ 0, we also have D(B) ⊂ D(A). Hence, D(A) = D(B)
and Bx = −Ax ∀x ∈ D(A), i.e., B = −A.

Remark 9.19. Let {G(t); t ∈ R} ⊂ L(X) be a group. If for all x ∈ X


the function t → G(t)x is continuous from the right (or from the left)
at some point t = t0 ∈ R, then there exist constants M ≥ 1 and ω ∈ R
such that
G(t) ≤ M eω|t| ∀t ∈ R. (9.4.19)
This follows by the Uniform Boundedness Principle (see the proof of
Theorem 9.7). Moreover, using this estimate and the invertibility of
every G(t), one can easily see that t → G(t)x is continuous on R; even
more, the function (t, x) → G(t)x is continuous from R × X to X.
Remark 9.20. If A ∈ L(X), then {G(t) = etA ; t ∈ R} is a uniformly
continuous group. In fact, it follows from the discussion above that
the class of uniformly continuous groups is precisely {{etA ; t ∈ R}; A ∈
L(X)}.

9.5 Translation Semigroups


In this section we present the first examples of C0 -semigroups which
are not uniformly continuous ones.
Let X be the space of all functions f : [0, ∞) → R which are uniformly
continuous and bounded. The space X is a real Banach space with
respect to the norm
f ∞ = sup |f (t)|.
t≥0

For each t ≥ 0 define T (t) : X → X by


 
T (t)f (s) = f (t + s), s ∈ [0, ∞), f ∈ X.
258 9 Semigroups of Linear Operators

It is easily seen that the family {T (t); t ≥ 0} is a C0 -semigroup. Its


infinitesimal generator is defined by

D(A) = {f ∈ X; f is differentiable on [0, ∞) and f  ∈ X}, (9.5.20)

Af = f  ∀f ∈ D(A). (9.5.21)
Indeed, if f ∈ X, f is differentiable on [0, ∞), and f  ∈ X, then for all
h > 0 and s ≥ 0
 −1 
h [T (h)f − f ] (s) = h−1 [f (s + h) − f (s)] = f  (θ),

for some θ ∈ (s, s + h), so


 −1 
h [T (h)f − f ] (s) − f  (s) = f  (θ) − f  (s) → 0,

as h → 0+ , uniformly in s (since f  is uniformly continuous). There-


fore, f ∈ D(A) and Af = f  .
To conclude the proof, we need to show that (9.5.20) holds true, i.e.,
the converse inclusion relation is valid. To this end, let f ∈ D(A),
which means there exists

lim h−1 [T (h)f − f ] = lim h−1 [f (· + h) − f (·)] = f+ ∈ X,


h→0+ h→0+

where f+ denotes the right derivative of f . It remains to prove that


f is differentiable on [0, ∞) so that f+ = f  . For an arbitrary ε > 0
define t
g(t) = f (t) − f (0) − f+ (s) ds − εt.
0
We have g(0) = 0 and

g+ (t) = −ε < 0 ∀t ≥ 0,

which implies g(t) ≤ 0, which in turn means


t
f (t) ≤ f (0) + f+ (s) ds,
0

for all t ≥ 0 (since ε was arbitrarily chosen). Similarly, replacing −ε


by +ε, we obtain the converse inequality, so
t
f (t) = f (0) + f+ (s) ds ∀t ≥ 0,
0
9.5 Translation Semigroups 259

which shows that f is indeed differentiable on [0, ∞) and so f  = f+ ∈


X, as claimed.
The semigroup defined above is called a translation semigroup. Obvi-
ously,
T (t)f ∞ ≤ f ∞ ∀t ≥ 0,

which shows that T (t) ≤ 1 for all t ≥ 0, i.e., the estimate in Theo-
rem 9.7 holds with M = 1 and ω = 0.
It is worth pointing out that A is not a member of L(X) in this case,
so {T (t); t ≥ 0} is not a uniformly continuous semigroup (see The-
orem 9.14). This confirms the fact that the unit sphere of X is not
equicontinuous (equivalently, condition (iii) is not valid).
Remark 9.21. If f ∈ D(A) (see (9.5.20)), then
 
u(t) = u(t, ·) = T (t)f (·) = f (t + ·)

satisfies the Cauchy problem in X



u (t) = Au(t) ∀t ≥ 0,
u(0) = f,

i.e., 
∂u
∂t (t, s)= ∂u
∂s (t, s), t, s ≥ 0,
u(0, s) = f (s), s ≥ 0.

If f ∈ X is not differentiable, then u(t, s) = f (t + s) does not satisfy


the above partial differential equation in a classical sense; it has to be
interpreted as a generalized solution of the Cauchy problem above.
If X is replaced by the space of all functions f : R → R which are
uniformly continuous and bounded, with the norm

f ∞ = sup |f (t)|,
t∈R

then one can define similarly a semigroup of translations, T (t) : X →


X, t ≥ 0,
 
T (t)f (s) = f (t + s) ∀s ∈ R, f ∈ X.

In this case, the family {T (t); t ≥ 0} is again a C0 -semigroup, with


T (t) = 1 for all t ≥ 0, and its infinitesimal generator A is given by
260 9 Semigroups of Linear Operators

D(A) = {f ∈ X; f is differentiable on R and f  ∈ X},

Af = f  ∀f ∈ D(A).

It is worth mentioning that this C0 -semigroup can be extended to a


C0 -group {G(t); t ∈ R} defined by
 
G(t)f (s) = f (t + s) ∀t, s ∈ R, f ∈ X.

This is not a uniformly continuous group, since its infinitesimal gen-


erator does not belong to L(X).

9.6 The Hille–Yosida Generation


Theorem
Let X be a Banach space and let A : D(A) ⊂ X → X be a linear
closed operator, not necessarily bounded. The set

ρ(A)
= {λ ∈ K; λI − A is a bijective operator from D(A) to X}
(9.6.22)

is called the resolvent set of A. If ρ(A) is nonempty, then, for λ ∈ ρ(A),


denote
R(λ, A) = (λI − A)−1 , (9.6.23)

which is called the resolvent of A. Since A is a closed operator, so is


R(λ, A) for all λ ∈ ρ(A). If we also take into account the fact that
D(R(λ, A)) = X, we infer that R(λ, A) ∈ L(X) for all λ ∈ ρ(A) (cf.
Theorem 4.10 (Closed Graph Theorem)).
Now, let us state a central result in the theory of semigroups of linear
operators, which belongs to E. Hille and K. Yosida.

Theorem 9.22. A linear operator A : D(A) ⊂ X → X is the in-


finitesimal generator of a C0 -semigroup of contractions {T (t); t ≥ 0}
(i.e., T (t) ≤ 1 ∀t ≥ 0) if and only if

(k) D(A) = X and A is closed;

(kk) (0, ∞) ⊂ ρ(A) and R(λ, A) ≤ 1


λ ∀λ > 0.
9.6 The Hille–Yosida Generation Theorem 261

Proof. Necessity: If A is the generator of a C0 -semigroup, then the two


conditions of (k) are fulfilled (cf. Theorem 9.10). It remains to prove
(kk), under the assumption that {T (t); t ≥ 0} is a C0 -semigroup of
contractions. To this purpose, define

Rλ x = e−λt T (t)x dt ∀λ > 0, x ∈ X. (9.6.24)
0

Note that Rλ is well defined, since

e−λt T (t)x ≤ e−λt x ∀t ≥ 0.

Furthermore, Rλ ∈ L(X) and



Rλ x ≤ e−λt T (t) · x dt

0
∞ 
≤ e−λt dt x
0
1
= x ∀x ∈ X, λ > 0,
λ
which implies that
1
Rλ  ≤ ∀λ > 0. (9.6.25)
λ
Let us prove that for all λ > 0 and x ∈ X, Rλ x ∈ D(A). For all h > 0
we have

−1 −1
h [T (h) − I]Rλ x = h e−λt T (t + h)x dt
∞ 0

− e−λt T (t)x dt
0

eλh − 1 eλh h −λτ
= Rλ x − e T (τ )x dτ.
h h 0

Observe that the right-hand side of the last equality converges to


λRλ x − x as λ → 0+ . Therefore, Rλ x ∈ D(A) and

ARλ x = λRλ x − x,

i.e.,
(λI − A)Rλ = I ∀λ > 0. (9.6.26)
262 9 Semigroups of Linear Operators

On the other hand, for all x ∈ D(A) and t ≥ 0, T (t)x ∈ D(A) (cf.
Theorem 9.10, (e)) and

ARλ x = lim e−λt h−1 [T (t + h)x − T (t)x] dt
h→0+ 0

= e−λt T (t)Ax dt
0
= Rλ Ax,
hence (see also (9.6.26))
Rλ (λI − A) = ID(A) ∀λ > 0, (9.6.27)
where ID(A) is the identity operator on D(A). From (9.6.26) and
(9.6.27) we infer that λI − A is a bijective operator from D(A) to X
and
Rλ = (λI − A)−1 ∀λ > 0.
Therefore, (0, ∞) ⊂ ρ(A) and
Rλ = R(λ, A) ∀λ > 0,
so (9.6.25) implies that
1
R(λ, A) ≤ ∀λ > 0.
λ
Thus the proof of necessity is complete.
Sufficiency: Assume that both (k) and (kk) hold. For the convenience
of the reader, the proof will be divided into several steps.
Step 1: limλ→∞ λR(λ, A)x = x ∀x ∈ X.
If x ∈ D(A), then, according to (kk), we have
1
λR(λ, A)x − x = R(λ, A)Ax ≤ Ax,
λ
which shows that
lim λR(λ, A)x = x ∀x ∈ D(A). (9.6.28)
λ→∞

Now, if x ∈ X, according to (k), there exists a sequence (xn ) in D(A)


such that xn → x. Since
λR(λ, A)x − x ≤ λR(λ, A)(x − xn ) + λR(λ, A)xn − xn 
+ xn − x
≤ λR(λ, A)xn − xn  + 2xn − x,
we have (see (9.6.28))
9.6 The Hille–Yosida Generation Theorem 263

lim sup λR(λ, A)x − x ≤ 2xn − x,


λ→∞
which concludes Step 1.
Step 2: Define Aλ := λAR(λ, A), λ > 0 (the Yosida approximation of
A); then, for all λ > 0, Aλ ∈ L(X), and

lim Aλ x = Ax ∀x ∈ D(A). (9.6.29)


λ→∞

Indeed, since

Aλ x = λ2 R(λ, A)x − λx ∀x ∈ X, λ > 0,

we have Aλ ∈ L(X) and, if x ∈ D(A),

Aλ x = λR(λ, A)Ax ∀λ > 0.

According to Step 1, this implies (9.6.29), thus the proof of Step 2 is


complete.
Step 3: For all t ≥ 0, x ∈ X, and λ, ν > 0, we have

etAλ x − etAν x ≤ tAλ x − Aν x. (9.6.30)

First of all, note that for all t ≥ 0 and λ > 0,

etAλ  = e−λt etλ


2 R(λ,A)

−λt tλ2 R(λ,A)
≤ e ·e
≤ 1.

It is also easily seen that etAλ , etAν , Aλ , Aν commute with each other.
Using this information, we infer that
1
d tsAλ t(1−s)Aν 
etAλ x − etAν x ≤  e e x ds
0 ds
1
≤ tetsAλ et(1−s)Aν (Aλ x − Aν x) ds
0
≤ tAλ x − Aν x, (9.6.31)

as claimed.
Step 4: The limit limλ→∞ etAλ x =: T (t)x, t ≥ 0, x ∈ X exists, and
{T (t); t ≥ 0} is a C0 -semigroup of contractions having A as its gener-
ator.
264 9 Semigroups of Linear Operators

First of all, according to Steps 2 and 3, the above limit exists for each
x ∈ D(A), uniformly on compact subintervals of [0, ∞), thus t → T (t)x
is a continuous function on [0, ∞). It is also easily seen that

T (0)x = x, T (t)x ≤ x ∀x ∈ D(A), t, s ≥ 0. (9.6.32)

Obviously, T (t) extends to D(A) = X as a bounded (continuous)


operator and (9.6.32) are satisfied for all x ∈ X. Moreover, t → T (t)x
is continuous on [0, ∞) for all x ∈ X. Indeed, if (xn ) is a sequence in
D(A) converging to x, then

T (t)xn − T (t)xm  = T (t)(xn − xm ) ≤ xn − xm ,

hence T (t)xn → T (t)x uniformly and so the function t → T (t)x is


indeed continuous on [0, ∞). On the other hand,

T (t)x − etAλ x ≤ T (t)x − T (t)xn  + T (t)xn − etAλ xn 


+ etAλ (xn − x)
≤ 2x − xn  + T (t)xn − etAλ xn ,

which implies that

T (t)x = lim etAλ x ∀x ∈ X,


λ→∞

uniformly on compact subintervals of [0, ∞). Since

e(t+s)Aλ x = etAλ esAλ x ∀λ > 0, t, s ≥ 0, x ∈ X,

we have
T (t + s)x = T (t)T (s)x ∀t, s ≥ 0, x ∈ X.
Thus we have already proved that {T (t); t ≥ 0} is a C0 -semigroup of
contractions, and all we have to prove next is that its generator, say
B, coincides with the given operator A. If x ∈ D(A), we have

T (t)x − x = lim etAλ x − x
λ→∞
t
= lim esAλ Aλ x ds
λ→∞ 0
t
= T (s)Ax ds, (9.6.33)
0
9.7 The Lumer–Phillips Theorem 265

since esAλ Aλ x → T (s)Ax uniformly on bounded subintervals of [0, ∞),


as λ → ∞. From (9.6.33) we easily see that D(A) ⊂ D(B) and
Bx = Ax for all x ∈ D(A). Now, by assumption 1 ∈ ρ(A). On
the other hand, according to the forward implication, we also have
1 ∈ ρ(B) (since B is the generator of a C0 -semigroup of contractions).
So both A and B are bijections from D(A) and respectively D(B) to
X. Since Ax = Bx for all x ∈ D(A), it follows that D(A) = D(B)
and A = B.

9.7 The Lumer–Phillips Theorem


In this section we discuss another result which also provides neces-
sary and sufficient conditions for a linear operator to generate a C0 -
semigroup of contractions. This result belongs to Lumer and Phillips2
and is useful in applications. Before stating this result we need the
following definition.

Definition 9.23. A linear operator A : D(A) ⊂ X → X is said to be


dissipative if

λx ≤ λx − Ax ∀λ > 0, x ∈ D(A). (9.7.34)

If in addition R(λI − A) = X for all λ > 0, then A is called m-


dissipative.

Remark 9.24. If A is a dissipative linear operator, then it is m-dissipative


if and only if there exists a λ0 > 0 such that R(λ0 I − A) = X. Indeed,
by the dissipativity condition (9.7.34) it follows that λ0 I − A is a bijec-
tion between D(A) and X, (λ0 I − A)−1 ∈ L(X) and (λ0 I − A)−1  ≤
1/λ0 . Using this information and Banach’s Fixed Point Theorem it
follows easily that R(λI − A) = X for all λ ∈ (0, 2λ0 ). Obviously, this
interval can be extended indefinitely to the right and so R(λI −A) = X
for all λ > 0.

Theorem 9.25 (Lumer–Phillips). A linear operator A : D(A) ⊂ X →


X is the infinitesimal generator of a C0 -semigroup of contractions if
and only if the following conditions hold: (a) D(A) = X, and (b) A is
m-dissipative.

2
Günter Lumer, German-born mathematician, 1929–2005; Ralph S. Phillips,
American mathematician, 1913–1998.
266 9 Semigroups of Linear Operators

Proof. Sufficiency: Assume that both (a) and (b) hold. From (b) it
follows that for every λ > 0 we have λ ∈ ρ(A), R(λ, A) ∈ L(X), and
R(λ, A) ≤ 1/λ (see the remark above). Also, A is a closed operator
since (λI − A)−1 ∈ L(X) for all λ > 0. It follows by the Hille–Yosida
Theorem that A generates a C0 -semigroup of contractions.
Necessity: Assume that A is the generator of a C0 -semigroup of con-
tractions {T (t); t ≥ 0}. According to the Hille–Yosida Theorem, it
suffices to show that A is dissipative. Let x ∈ D(A) and x∗ ∈ J(x),
where J is the duality mapping of X. We have

Re x∗ (Ax) = lim Re x∗ (h−1 [T (h)x − x])


h→0+
 
= lim h−1 Re x∗ (T (h)x) − x2
h→0+
≤ 0,

since
Re x∗ (T (h)x) ≤ x∗  · T (h) · x ≤ x2 ,

where Re denotes the real part. Therefore,

Re x∗ (λx − Ax) = λx2 − Re x∗ (Ax)


≥ λx2 ∀λ > 0,

which obviously implies (9.7.34).

Remark 9.26. A linear operator A : D(A) ⊂ X → X is dissipative if


and only if

∀x ∈ D(A) ∃x∗ ∈ J(x) such that Re x∗ (Ax) ≤ 0. (9.7.35)

From the proof above we see that (9.7.35) implies (9.7.34). For the
proof of the converse implication, see [13, p. 81] or [39, p. 14]. If X
is a Hilbert space, then this implication follows easily. If X is a real
Hilbert space, then (9.7.35) means that A is negative semidefinite:
(Ax, x) ≤ 0 ∀x ∈ D(A) (equivalently, −A is positive semidefinite or
monotone).
Note that if X is assumed to be reflexive then condition (a) in the
Lumer–Phillips Theorem becomes superfluous, so we have
9.7 The Lumer–Phillips Theorem 267

Theorem 9.27. Assume X is a reflexive Banach space. Then a linear


operator A : D(A) ⊂ X → X is the infinitesimal generator of a C0 -
semigroup of contractions if and only if A is m-dissipative.
Proof. Bearing in mind the Lumer–Phillips Theorem, we need to prove
that if X is reflexive and A is m-dissipative (equivalently, A satisfies
(9.7.35) and R(λ0 I − A) = X for some λ0 > 0), then D(A) = X.
Obviously, (0, ∞) ⊂ ρ(A), R(λ, A) ∈ L(X), and R(λ, A) ≤ 1/λ for
all λ > 0. Now, for x ∈ D(A) and λ > 0 denote xλ := λR(λ, A)x. As
in the proof of the Hille–Yosida Theorem, we can see that
1
xλ − x ≤ Ax,
λ
hence xλ converges to x as λ → +∞. (Note that this property cannot
be extended, for the time being, to all x ∈ X, as in the proof of
the Hille–Yosida Theorem, since D(A) = X is now a target, not a
hypothesis). It is also easily seen that for x ∈ D(A) and λ > 0

Aλ x = λR(λ, A)Ax ≤ Ax.

Now, by the reflexivity assumption on X we derive the existence of a


sequence λn → ∞ such that (Axλn ) converges weakly. Moreover, since
A is m-dissipative, its graph is closed in X × X, hence weakly closed,
so we have
lim x∗ (Axλn ) = x∗ (Ax) ∀x∗ ∈ X ∗ , (9.7.36)
n→∞

where X∗ denotes the dual of X. Now, let x∗ ∈ X ∗ such that x∗ (x) = 0


for all x ∈ D(A). Since Axλ = λR(λ, A)Ax ∈ D(A) for all λ > 0, we
derive from (9.7.36) that

x∗ (Ax) = 0 ∀x ∈ D(A). (9.7.37)

Taking into account (9.7.37) and R(λ0 I −A) = X, we infer that x∗ = 0.


Therefore D(A) = X as claimed.

Remark 9.28. The reflexivity of X is an essential assumption in Theo-


rem 9.27, as the following counterexample shows: X = C[0, 1] equipped
with the usual sup-norm (which is a non-reflexive Banach space),
A : D(A) ⊂ X → X, D(A) = {u ∈ C 1 [0, 1]; u(0) = 0}, Au = −u . It
is easily seen that A is m-dissipative, but not densely defined (D(A) =
268 9 Semigroups of Linear Operators

{u ∈ C[0, 1]; u(0) = 0}). Hence, according to the Lumer–Phillips


Theorem, A cannot be the generator of a C0 -semigroup in X. This
counterexample clearly shows that Theorem 9.27 fails to hold in non-
reflexive Banach spaces.
We close this section with the following result which is valid in a general
Banach space X.

Theorem 9.29. If A : D(A) ⊂ X → X is a closed linear operator


such that D(A) = X and both A and A∗ are dissipative (where A∗
denotes the adjoint of A), then A is m-dissipative (hence, according to
the Lumer–Phillips Theorem, A is the generator of a C0 -semigroup of
contractions).

Proof. Let x∗ ∈ X ∗ be such that x∗ (x − Ax) = 0 for all x ∈ D(A).


It follows that x∗ ∈ D(A∗ ) and x∗ − A∗ x∗ = 0. Since A∗ is assumed
to be dissipative, we infer that x∗ = 0, so R(I − A) = X. In fact,
R(I − A) is a closed subspace of X (since A is dissipative and closed),
hence R(I − A) = X.

9.8 The Feller–Miyadera–Phillips


Theorem
The Hille–Yosida theorem has the following significant generalization
that belongs to Feller, Miyadera, and Phillips.3

Theorem 9.30. A linear operator A : D(A) ⊂ X → X is the infinites-


imal generator of a C0 -semigroup {T (t); t ≥ 0} satisfying T (t) ≤
M eωt , t ≥ 0, with M ≥ 1, ω ∈ R, if and only if

(k) D(A) = X and A is closed;

(kk)’ (ω, ∞) ⊂ ρ(A) and R(λ, A)n  ≤ M


(λ−ω)n
∀λ > ω, n = 1, 2, . . .

Proof. Necessity: This is similar to the necessity part of the proof


of the Hille–Yosida Theorem. Here Rλ is well defined for λ > ω
and one can similarly prove that (ω, ∞) ⊂ ρ(A) and R(λ, A) = Rλ ,
R(λ, A) ≤ λ−ωM
for all λ > ω. Then, for all λ > ω and x ∈ X,
we have

3
William S. Feller, Croatian-American mathematician, 1906–1970; Isao
Miyadera, Japanese mathematician, born 1926.
9.8 The Feller–Miyadera–Phillips Theorem 269

R(λ, A)2 x = e−λt T (t)Rλ x dt
0 ∞ ∞ 
= e−λt T (t) e−λs T (s)x ds dt
0 ∞ ∞ 0

= e−λ(t+s) T (t + s)x ds dt
0 ∞ 0 ∞ 
= e−λr T (r)x dr dt
0 ∞ t

= te−λt T (t)x dt.


0

It follows by induction that for all λ > ω, x ∈ X and n = 1, 2, . . .



1
R(λ, A)n x = tn−1 e−λt T (t)x dt. (9.8.38)
(n − 1)! 0

We derive from (9.8.38) and the exponential estimate satisfied by the


semigroup that for all λ > ω, x ∈ X and n = 1, 2, . . .

n M
R(λ, A) x ≤ tn−1 e(ω−λ)t dt · x
(n − 1)! 0
M
= · x,
(λ − ω)n
which completes the proof of necessity.
Sufficiency: To simplify the proof, we note that, in general, if {T (t); t ≥
0} is a C0 -semigroup satisfying T (t) ≤ M eωt , t ≥ 0, for some M ≥ 1
and ω ∈ R, with generator A, then the family {S(t) = e−ωt T (t); t ≥
0} is also a C0 -semigroup with the generator A − ωI. Thus, one
can assume in the following that ω = 0 (i.e., (0, ∞) ⊂ ρ(A) and
λn R(λ, A)n  ≤ M for all λ > 0 and n = 1, 2, . . . ). The idea that
can be used to complete the proof is to define a new norm on X, say
 · ∗ , equivalent to the original, such that the corresponding operator
norm of R(λ, A) be less than or equal to 1/λ for all λ > 0. Then the
conclusion will follow from the Hille–Yosida theorem. First, define for
ν > 0 the following norm on X

xν = sup{ν n R(ν, A)n x; n ∈ N ∪ {0}}.

Obviously, the new norm is equivalent to the original one because

x ≤ xν ≤ M x ∀x ∈ X, (9.8.39)


270 9 Semigroups of Linear Operators

and the operator norm of R(ν, A) with respect to the new norm satisfies
1
R(ν, A)ν ≤ ∀ν > 0. (9.8.40)
ν
In addition,
1
R(λ, A)ν ≤ for all 0 < λ ≤ ν. (9.8.41)
λ
This follows easily from (9.8.40) and the so-called resolvent identity:

R(λ, A) − R(ν, A) = (ν − λ)R(ν, A)R(λ, A). (9.8.42)

Now, define
x∗ = sup{xν ; ν > 0},
and observe that (see (9.8.39) and (9.8.41))
1
x ≤ x∗ ≤ M x and R(λ, A)∗ ≤ ∀λ > 0.
λ
So, according to the Hille–Yosida Theorem, A generates a C0 -semigroup
{T (t); t ≥ 0} ⊂ L(X) satisfying

T (t)∗ ≤ 1 ∀t ≥ 0,

hence
T (t) ≤ M ∀t ≥ 0.

Remark 9.31. Obviously, if  · ν is the norm defined in the proof of


Theorem 9.30, then

λn R(λ, A)n x ≤ λn R(λ, A)n xν


≤ xν ∀ 0 < λ ≤ ν, x ∈ X, n = 0, 1, 2, . . .

which implies

xλ ≤ xν ∀x ∈ X, 0 < λ ≤ ν.

Therefore, the norm  · ∗ can be obtained as a limit

x∗ = lim xλ .


λ→∞

Taking into account the above discussion on groups and their relation-
ship with semigroups, one can easily derive the following extension to
groups of the Feller–Miyadera–Phillips generation theorem.
9.9 A Perturbation Result 271

Theorem 9.32. A linear operator A : D(A) ⊂ X → X is the in-


finitesimal generator of a C0 -group {G(t); t ∈ R} satisfying G(t) ≤
M eω|t| , t ∈ R, with M ≥ 1, ω ∈ R, if and only if

(k) D(A) = X and A is closed;

(kk) for every λ ∈ R with |λ| > ω one has λ ∈ ρ(A) and R(λ, A)n  ≤
M
(|λ|−ω)n
∀n = 1, 2, . . .

Remark 9.33. Obviously, if M = 1 in the above theorem, then the in-


equality R(λ, A)n  ≤ (|λ|−ω)
1
n ∀n = 1, 2, . . . is equivalent to R(λ, A)

≤ |λ|−ω
1
. If M = 1 and ω = 0, then G(t) = 1 for all t ∈ R, or equiv-
alently G(t)x = x for all t ∈ R (i.e., all G(t)’s are isometries).
Summarizing, we have the following result.

Corollary 9.34. A linear operator A : D(A) ⊂ X → X is the in-


finitesimal generator of a C0 -group of isometries {G(t); t ∈ R} if and
only if

(k) D(A) = X and A is closed;

(kk)* for every λ ∈ R \ {0} one has λ ∈ ρ(A) and R(λ, A) ≤ 1
|λ| .

9.9 A Perturbation Result


It is intuitive that perturbing the generator A of a C0 -semigroup with
any operator B ∈ L(X) yields a generator. Indeed, the following result
holds.

Theorem 9.35. Let A : D(A) ⊂ X → X be the generator of a C0 -


semigroup {T (t); t ≥ 0} ⊂ L(X) satisfying T (t) ≤ M eωt for all
t ≥ 0, with M ≥ 1, ω ∈ R, and let B ∈ L(X). Then the operator
C = A + B with D(C) = D(A) is the generator of a C0 -semigroup
{S(t); t ≥ 0} ⊂ L(X) satisfying S(t) ≤ M e(ω+M B )t for all t ≥ 0.

Proof. As in the proof of the Feller–Miyadera–Phillips Theorem (The-


orem 9.30), one can assume that ω = 0. Next, we also assume that
M = 1. Then (0, ∞) ⊂ ρ(A) and for all λ > 0 we can write
 
λI − C = I − BR(λ, A) (λI − A). (9.9.43)
272 9 Semigroups of Linear Operators

For all λ > B we have BR(λ, A) ≤ B · R(λ, A) < 1, so
I − BR(λ, A) is invertible in L(X). Thus, taking into account (9.9.43),
we can see that (B, ∞) ⊂ ρ(C) and for all λ > B
 −1
R(λ, C) = R(λ, A) I − BR(λ, A)
 ∞
 n
= R(λ, A) BR(λ, A) ,
n=0

which shows that


1
R(λ, C) ≤ ∀λ > B.
λ − B

This is enough to conclude that C generates a C0 -semigroup {S(t); t ≥


0} satisfying S(t) ≤ e B t for all t ≥ 0.
Now, let us consider the general case M ≥ 1 (and ω = 0). Define the
norm
x∗ = supt≥0 T (t),

which is equivalent to the original norm of X:

x ≤ x∗ ≤ M x ∀x ∈ X.

Obviously, T (t)∗ ≤ 1 for all t ≥ 0 and

Bx∗ ≤ M B · x ≤ M B · x∗ ∀x ∈ X.

By the above proof for the case M = 1, C = A + B generates a


C0 -semigroup {S(t); t ≥ 0} satisfying

S(t)∗ ≤ e B ∗ t t ≥ 0.

Therefore,

S(t)x ≤ S(t)x∗
≤ e B ∗ t x∗
≤ M eM B t x ∀t ≥ 0,

which concludes the proof.


9.10 Approximation of Semigroups 273

9.10 Approximation of Semigroups


An example of approximation has already been encountered in the
proof of Theorem 9.22. Specifically, we saw that if {T (t); t ≥ 0} ⊂
L(X) is a C0 -semigroup of contractions with generator A, then S(t)x
can be approximated (uniformly for t in compact intervals) by etAλ x as
λ → ∞, where Aλ denotes the Yosida approximation of A. Definitely,
this approximation result extends to any C0 -semigroup.
In what follows, we present another approximation result, known as
the Trotter Theorem,4 which is relevant for applications. As in [39],
for M ≥ 1 and ω ∈ R denote by G(M, ω) the class of operators which
generate C0 -semigroups {T (t); t ≥ 0} satisfying T (t) ≤ M eωt , ∀t ≥
0. The Trotter Theorem [48] says that the convergence of a sequence
An ∈ G(M, ω) to A ∈ G(M, ω) in some sense (see below) is equivalent
to the convergence of the corresponding semigroups.
Theorem 9.36. If A, An ∈ G(M, ω) and {T (t); t ≥ 0}, {Tn (t); t ≥ 0}
are the C0 -semigroups generated by A, An (n = 1, 2, . . . ), then the
following conditions are equivalent:
(a) for some λ > ω and for all x ∈ X, R(λ, An )x → R(λ, A)x as
n → ∞;
(b) for all x ∈ X and t ≥ 0, Tn (t)x → T (t)x as n → ∞, uniformly
for t in compact subintervals of [0, ∞).
Proof. We first prove that (a) implies (b). For a given t > 0, every
s ∈ (0, t), and every x ∈ X, we have
d
[Tn (t − s)R(λ, An )T (s)R(λ, A)x]
ds
= −Tn (t − s)An R(λ, An )T (s)R(λ, A)x
+ Tn (t − s)R(λ, An )AT (s)R(λ, A)x
= Tn (t − s)[−An R(λ, An )R(λ, A) + R(λ, An )AR(λ, A)]T (s)x
= Tn (t − s)[R(λ, A) − R(λ, An )]T (s)x.
Note that all the above operations are allowed. Integrating the above
equality over [0, t] yields
R(λ, An )[Tn (t) − T (t)]R(λ, A)x
t
= Tn (t − s)[R(λ, A) − R(λ, An )]T (s)x ds. (9.10.44)
0
4
Hale F. Trotter, Canadian mathematician, born 1931.
274 9 Semigroups of Linear Operators

It follows from (9.10.44) that, for all t in an arbitrary compact interval


[0, t1 ], one has

R(λ, An )[Tn (t) − T (t)]R(λA)x


t
≤ Tn (t − s) · [R(λ, An ) − R(λ, A)]T (s)x ds
0
t1
≤ Tn (t − s) · [R(λ, An ) − R(λ, A)]T (s)x ds. (9.10.45)
0

Note that the sequence of the integrands in (9.10.44) converges point-


wise to zero in [0, t1 ] and it has in this interval the upper bound
2M 3 eωt1 x(λ − ω)−1 . Thus, according to the Lebesgue Dominated
Convergence Theorem, one gets from (9.10.45)

lim R(λ, An )[Tn (t) − T (t)]R(λ, A)x = 0,


n→∞

uniformly for t in every compact subinterval of [0, ∞). In fact, since


the range of R(λ, A) = D(A), we have

lim R(λ, An )[Tn (t) − T (t)]x = 0 ∀x ∈ D(A), (9.10.46)


n→∞

uniformly for t in every compact subinterval of [0, ∞). Now, let us


estimate

[Tn (t) − T (t)]R(λ, A)x ≤ Tn (t)[R(λ, A) − R(λ, An )x]


+ R(λ, An )[Tn (t) − T (t)]x
+ [R(λ, An ) − R(λ, A)]T (t)x.
(9.10.47)

The right-hand side of (9.10.47) has three terms, say Si = Si (t, n, x),
i = 1, 2, 3. Using our assumption (a) and the estimate Tn (t) ≤
M eωt , t ≥ 0, we can see that, for each x ∈ X, S1 (t, n, x) converges to
zero as n → ∞, uniformly for t in every compact subinterval of [0, ∞).
A similar conclusion holds for S2 (t, n, x), x ∈ D(A) (see (9.10.46)).
Taking again assumption (a) into account, it follows that S3 (t, n, x),
x ∈ X, also converges to zero as n → ∞, uniformly for t in every
compact subinterval of [0, ∞) (here we use the fact that {T (t)x; 0 ≤
t ≤ t1 } is a compact set for each t1 > 0). Summarizing, we derive from
(9.10.47) that

lim [Tn (t) − T (t)]R(λ, A)x = 0, x ∈ D(A).


n→∞
9.10 Approximation of Semigroups 275

Hence,
lim [Tn (t) − T (t)]z = 0, ∀z ∈ D(A2 ),
n→∞

uniformly on every compact subinterval of [0, ∞). Since D(A2 ) is dense


in X (see Remark 9.11), this conclusion extends to all x ∈ X, so (b)
holds.
Conversely, assuming now that (b) is satisfied, we have for any λ > ω
and x ∈ X

R(λ, An )x − R(λ, A)x =  e−λt [Tn (t)x − T (t)x] dt
∞0

≤ e−λt Tn (t)x − T (t)x dt.


0
(9.10.48)
Using again Lebesgue’s Dominated Convergence Theorem for the right-
hand side of the above inequality, we conclude that indeed (b) implies
(a).

Remark 9.37. It is obvious from the proof above that condition (a)
is equivalent to (a) : for all x ∈ X and all λ > ω, R(λ, An )x →
R(λ, A)x as n → ∞. If one assumes that, for some λ > ω,
R(λ, An )x converges as n → ∞ to some Rλ x for all x ∈ X, and if in
addition the range of Rλ is assumed to be dense in X, then Rλ is the
resolvent R(λ, A) of an operator A ∈ G(M, ω). For the proof of this
implication, see [24] and [39, p. 86]. This implication can be used to
replace Theorem 9.36 by an improved version, in which the existence
of A ∈ G(M, ω) is no longer assumed. The reformulation of the Trotter
Theorem in view of the above information is left to the reader.
Remark 9.38. It is worth pointing out that the Trotter Theorem or
suitable versions of it can be used successfully in the numerical analysis
of various initial-boundary value problems.
We continue this section with a result known as the Chernoff product
formula.5
Theorem 9.39. Let A ∈ G(M, ω) for some M ≥ 1 and ω ∈ R and let
F : [0, ∞) → L(X) be a function satisfying F (0) = I and F (t)k  ≤
M ekωt for all t ≥ 0, k ∈ N. Assume that

Ax = lim s−1 [F (s)x − x], ∀x ∈ D(A). (9.10.49)


s→0+
5
Paul R. Chernoff, American mathematician, born 1942.
276 9 Semigroups of Linear Operators

Then,
T (t)x = lim F (t/n)n x, (9.10.50)
n→∞

for all x ∈ X, uniformly for t in compact subintervals of [0, ∞), where


{T (t); t ≥ 0} is the C0 -semigroup generated by A.

In order to prove this theorem we need the following lemma.

Lemma 9.40. Let Q ∈ L(X) such that Qj  ≤ M for some M ≥ 1


and all j ∈ N. Then we have

en(Q−I) x − Qn x ≤ M n Qx − x, ∀n ∈ N, x ∈ X.

Proof. Let n ∈ N be arbitrary but fixed. We have



en(Q−I) − Qn = e−n enQ − en Qn

 nk  k 
= e−n Q − Qn . (9.10.51)
k!
k=0

Note that for k > n we have



k−1 
Q −Q =
k n
Qj (Q − I),
j=n

and a similar identity holds for k < n. So we obtain by using Qj  ≤


M
Qk x − Qn x ≤ M |n − k| · Qx − x. (9.10.52)
Now, using (9.10.51), (9.10.52), and the Bunyakovsky–Cauchy–Schwarz
inequality, we derive


−n nk
e n(Q−I)
x − Q x ≤ e
n
M |n − k| · Qx − x
k!
k=0


nk 1/2
−n
≤ Me Qx − x
k!
k=0


n 1/2
k
× (n − k)2
k!
k=0
 1/2  n 1/2
= M e−n Qx − x en ne

= M n Qx − x.
9.10 Approximation of Semigroups 277

Proof of Theorem 9.39. We consider first the case ω = 0. Define for


s>0
As x = s−1 [F (s) − I]x, x ∈ X.
Obviously, As ∈ L(X) for all s > 0 and (cf. (9.10.49))

lim As x = Ax, ∀x ∈ D(A). (9.10.53)


s→0+

Note also that for each s > 0,




−t/s tk
e tAs
≤e F (s)k  ≤ M, ∀t ≥ 0, (9.10.54)
sk k!
k=0

i.e., As ∈ G(M, 0). For λ > ω = 0 and y = (λI − A)x, x ∈ D(A), we


have
 
R(λ, As )y = R(λ, As ) (λI − As )x − (λI − As )x + (λI − A)x
 
= x + R(λ, As ) As x − Ax .

Therefore, according to (9.10.53), we have

R(λ, As )y → R(λ, A)y, as s → 0+ , ∀y ∈ X, (9.10.55)

since R(λ, As ) ≤ M/λ. Now, using (9.10.53), (9.10.54), and (9.10.55),


it follows by Theorem 9.36 (which also works with s instead of n),

T (t)x − etAs x → 0, as s → 0+ , ∀x ∈ X,

uniformly for t in compact subintervals of [0, ∞), and hence

T (t)x − etAt/n x → 0, as n → ∞, ∀x ∈ X, (9.10.56)

uniformly for t in compact subintervals of [0, ∞). On the other hand,


by Lemma 9.40, we have

etAt/n x − F (t/n)n x = en[F (t/n)−I] x − F (t/n)n x



≤ M nF (t/n)x − x
Mt
= √ At/n x → 0, as n → ∞, (9.10.57)
n

for all x ∈ D(A), uniformly for t in compact subintervals of [0, ∞).


Combining (9.10.56) and (9.10.57), we derive (9.10.50) for all x ∈
D(A). Since D(A) is dense in X, (9.10.50) extends to the whole of X.
278 9 Semigroups of Linear Operators

The case ω = 0 can be reduced to the previous one. Indeed, the func-
tion F̃ , defined by F̃ (t) = e−ωt F (t), satisfies F̃ (0) = I, F̃ (t)k  ≤ M
for all t ≥ 0 and k ∈ N. Moreover, (9.10.49) is satisfied with F̃ instead
of F , and A − ωI instead of A. So the conclusion of Theorem 9.39
follows easily. 

Corollary 9.41. For every A ∈ G(M, ω), M ≥ 1, ω ∈ R, we have


t −n
T (t)x = lim I − A x, ∀x ∈ X, (9.10.58)
n→∞ n
uniformly for t in compact subintervals of [0, ∞), where {T (t); t ≥
0} ⊂ L(X) is the C0 -semigroup generated by A.
Proof. We can assume that ω > 0. Define F : [0, ∞) → L(X) by

⎨ I,   t = 0,
F (t) = (1/t)R 1/t, A , t ∈ (0, δ),

0, t ≥ δ,

for some δ ∈ (0, 1/ω). We choose δ > 0 small enough so that

F (t)k  ≤ (1/tk )R(1/t, A)k


≤ M/ tk (t−1 − ω)k
= M/(1 − ωt)k
≤ M ek(ω+1)t , ∀t ∈ (0, δ), k ∈ N.

We also have

lim t−1 [F (t)x − x] = lim (1/t)R(1/t, A)Ax = Ax, ∀x ∈ D(A).


t→0+ t→0+

Thus, all the assumptions of Theorem 9.39 are fulfilled and so (9.10.58)
holds.

Another consequence of the Chernoff product formula is the so-called


Trotter product formula corresponding to perturbed semigroups:
Corollary 9.42. Let A ∈ G(M, ω), M ≥ 1, ω ∈ R, and B ∈ L(X). If
{T (t); t ≥ 0} is the C0 -semigroup generated by A, S(t) = etB , t ≥ 0,
and {U (t); t ≥ 0} is the C0 -semigroup generated by A + B, then
 n
U (t)x = lim T (t/n)S(t/n) x, (9.10.59)
n→∞

for all x ∈ X, uniformly for t in compact subintervals of [0, ∞).


9.11 The Inhomogeneous Cauchy Problem 279

Proof. By Theorem 9.35, A + B ∈ G(M, ω + M B). Making use of


the previous renorming procedure (see the proof of Theorem 9.30), we
can assume M = 1. So, defining F (t) = T (t)S(t), t ≥ 0, we have

F (t)k  ≤ T (t)k S(t)k


≤ ekωt ek B t
= ek(ω+ B )t , ∀t ≥ 0,
and for all x ∈ D(A + B) = D(A)
S(t)x − x T (t)x − x
lim t−1 [F (t)x − x] = lim T (t) + lim .
t→0+ t→0+ t t→0 + t
= Bx + Ax.
Therefore, Theorem 9.39 is again applicable and (9.10.59) follows.

Remark 9.43. The Trotter product formula is valid, under appropriate


conditions, for two general C0 -semigroups (see, e.g., [13, p. 154]).

9.11 The Inhomogeneous Cauchy


Problem
Consider the Cauchy (initial value) problem
u (t) = Au(t) + f (t), t ∈ [0, r]; u(0) = x, (CP )
where A is the generator of a C0 -semigroup {T (t); t ≥ 0} ⊂ L(X), f
is a given function from [0, r] to X, r ∈ (0, ∞). The case f ≡ 0 was
discussed before.
Definition 9.44. A function u : [0, r] → X is a classical solution of
problem (CP ) if u is continuous on [0, r] and continuously differen-
tiable on (0, r], u(t) ∈ D(A) for all t ∈ (0, r], u(0) = x, and u satisfies
equation (CP )1 for all t ∈ (0, r].
Remark 9.45. If f ∈ C([0, r]; X) and u is a classical solution of problem
(CP ), then for 0 < s < t ≤ r we have
d
[T (t − s)u(s)] = −T (t − s)Au(s) + T (t − s)u (s)
ds
= −T (t − s)Au(s) + T (t − s)Au(s)
+ T (t − s)f (s)
= T (t − s)f (s).
280 9 Semigroups of Linear Operators

Therefore, by integration over [0, t] one obtains


t
u(t) = T (t)x + T (t − s)f (s) ds, t ∈ [0, r], (9.11.60)
0

showing that u is unique (since A generates a unique C0 -semigroup;


see Theorem 9.12).
Note also that the integral term in the right-hand side of
Eq. (9.11.60) makes sense for f ∈ L1 (0, r; X), since (see Theorem 9.7)

T (t − s)f (s) ≤ M eω(t−s) f (s), 0 ≤ s ≤ t ≤ r.

This observation leads to the introduction of a new concept of solution


for the Cauchy problem (CP ).
Definition 9.46. Let x ∈ X, f ∈ L1 (0, r; X), and let A be the
generator of a C0 -semigroup {T (t); t ≥ 0} ⊂ L(X). The function
u ∈ C([0, r]; X) given by
t
u(t) = T (t)x + T (t − s)f (s) ds ∀t ∈ [0, r] (9.11.61)
0

is called a mild solution of problem (CP ).


Obviously, if A is the generator of a C0 -semigroup {T (t); t ≥ 0}, then
for each (x, f ) ∈ X × L1 (0, r; X) problem (CP ) has a unique mild
solution (since the C0 -semigroup generated by A is unique). Formula
(9.11.61) above is often called the variation of constants formula. Un-
der certain conditions on x and f it gives a classical solution of problem
(CP ). The following theorem is one such example.
Theorem 9.47. Let A : D(A) ⊂ X → X be the infinitesimal gen-
erator of a C0 -semigroup, say {T (t); t ≥ 0}, and let x ∈ D(A) and
f ∈ C 1 ([0, r]; X). Then problem (CP ) has a unique classical solution
(given by (9.11.61)).
Proof. Uniqueness is already known (see the remark above). To prove
existence it suffices to show that
t
v(t) = T (t − s)f (s) ds
0

satisfies equation (CP )1 on (0, r] (see also Theorem 9.10). Indeed,


t
v(t) = T (s)f (t − s) ds
0
9.11 The Inhomogeneous Cauchy Problem 281

and so there exists


t

v (t) = T (t)f (0) + T (s)f  (t − s) ds
0
t
= T (t)f (0) + T (t − s)f  (s) ds ∀t ∈ (0, r].
0

On the other hand, for each t ∈ (0, r) and h > 0 small enough, we
have
t
−1 −1
h [T (h) − I]v(t) = h T (t + h − s)f (s) ds − h−1 v(t)
0
= h−1 [v(t + h) − v(t)] − h−1
t+h
× T (t + h − s)f (s) ds
t

which converges to v  (t) − f (t) as h → 0+ . Therefore, v(t) ∈ D(A)


and
Av(t) = v  (t) − f (t), ∀t ∈ (0, r).
In fact, f can be extended to the right of t = r as a continuously
differentiable function, so v(r) ∈ D(A) and there exists v  (r) = Av(r)+
f (r). Even more, there exists v  (0) = f (0) so the function u(t) =
T (t)x+v(t) is continuously differentiable on [0, r] and satisfies equation
(CP )1 for all t ∈ [0, r].

Remark 9.48. From the proof above we see that (under the conditions
of Theorem 9.47)
t
u (t) = T (t)x + T (t)f (0) + T (t − s)f  (s) ds ∀t ∈ [0, r]. (9.11.62)
0

Remark 9.49. Let A be the infinitesimal generator of a


C0 -semigroup {T (t); t ≥ 0} and let (x, f ) ∈ X × L1 (0, r; X). If u
is the corresponding mild solution of problem (CP ), then it is the uni-
form limit of a sequence of C 1
  -solutions (hence classical solutions) of
(CP ). Indeed, let (xn , fn ) be a sequence in D(A) × C 1 ([0, r]; X)
which approximates (x, f ) in X × L1 (0, r; X). For each (xn , fn ) there
exists a unique C 1 -solution un of problem (CP ) with x := xn and
f := fn given by the variation of constants formula:
t
un (t) = T (t)xn + T (t − s)fn (s) ds.
0
282 9 Semigroups of Linear Operators

By standard arguments one gets for all t ∈ [0, r]


un (t) − u(t) ≤ T (t)(xn − x)
t
+ T (t − s) · fn (s) − f (s) ds
0
≤ M eωt xn − x
t
+ M eω(t−s) fn (s) − f (s) ds
0
r 
≤ Me ωr
xn − x + fn (s) − f (s) ds .
0
Therefore, un → u in C([0, r]; X).
Remark 9.50. The semigroup approach can be used to solve Cauchy
problems for semilinear evolution equations. Specifically, let us con-
sider the following problem,
u (t) = Au(t) + f (t, u(t)), t ∈ [0, r]; u(0) = x ∈ X, (N CP )
where A is the infinitesimal generator of a C0 -semigroup {T (t); t ≥
0} ⊂ L(X) and f : [0, r] × X → X is continuous and satisfies the
Lipschitz condition
f (t, x1 ) − f (t, x2 ) ≤ Lx1 − x2 , (t, x1 ), (t, x2 ) ∈ [0, r] × X.
Here L is a positive constant. One can consider the following “mild”
form for (N CP )
t
u(t) = T (t)x + T (t − s)f (s, u(s)) ds, t ∈ [0, r]. (9.11.63)
0
If u is a classical solution of problem (N CP ), then it satisfies (9.11.63).
One can prove the existence of a solution u ∈ Y := C([0, r]; X) of
(9.11.63) by using the Banach Contraction Principle. For this purpose,
let us consider the Bielecki norm6 on Y :
gB = sup e−βt g, g ∈ Y,
0≤t≤r

where β is a large positive constant. This Bielecki norm is equivalent


to the usual sup-norm of Y , so Y is a Banach space with respect to
 · B . Define on Y an operator Q by
t
(Qu)(t) = T (t)x + T (t − s)f (s, u(s)) ds, t ∈ [0, r], u ∈ Y.
0
6
Adam Bielecki, Polish mathematician, 1910–2003.
9.12 Applications 283

Obviously, Q maps Y into itself, and for all t ∈ [0, r] and u1 , u2 ∈ Y


we have
t
(Qu1 )(t) − (Qu2 )(t) ≤ LM eω(t−s) u1 (s) − u2 (s) ds
0
t
= LM eωt e(β−ω)s e−βs u1 (s) − u2 (s) ds
0
t
≤ LM e u1 − u2 B
ωt
e(β−ω)s ds
0
LM  
= u1 − u2 B eβt − eωt
β−ω
LM
≤ u1 − u2 B eβt .
β−ω

Thus,

LM
e−βt (Qu1 )(t) − (Qu2 )(t) ≤ u1 − u2 B ,
β−ω
t ∈ [0, r], u1 , u2 ∈ Y,

which implies

LM
Qu1 − Qu2 B ≤ u1 − u2 B , u1 , u2 ∈ Y.
β−ω

So, for β > LM + ω, Q is a contraction and the Banach Contraction


Principle ensures the existence of a unique fixed point u of Q. This u
is the unique solution in Y of Eq. (9.11.63), which can be called a mild
solution of the given semilinear Cauchy problem. In general, a mild
solution is not a classical one. However, under appropriate conditions
on x and f it is.

9.12 Applications
In this section we illustrate the above theory with some applications.

9.12.1 The Heat Equation


Consider the heat (diffusion) equation

ut = uxx + f (t, x), t ∈ [0, r], x ∈ (0, 1), (9.12.64)


284 9 Semigroups of Linear Operators

with Dirichlet boundary conditions

u(t, 0) = 0 = u(t, 1), t ∈ [0, r], (9.12.65)

and initial condition

u(0, x) = u0 (x), x ∈ (0, 1), (9.12.66)

where u0 ∈ L2 (0, 1), f ∈ L1 (0, r; L2 (0, 1)), and u = u(t, x) is the


unknown function representing the temperature (or density in the case
of a general diffusion process). We have denoted ut := ∂u ∂t and uxx :=
∂2u
∂x2
. In order to solve problem (9.12.64)–(9.12.66), we choose X =
L2 (0, 1) as the basic space equipped with the usual scalar product
1
p, q = p(x)q(x) dx,
0

and the corresponding (Hilbertian) norm. Define A : D(A) ⊂ X → X


by
d2 v
D(A) = H 2 (0, 1) ∩ H01 (0, 1), Av = v  = 2 .
dx
So, regarding u = u(t, x) as an X-valued function of t ∈ [0, r], problem
(9.12.64)–(9.12.66) can be expressed as the Cauchy problem in X
d
u(t, ·) = Au(t, ·) + f (t, ·), t ∈ [0, r]; u(0, ·) = u0 . (9.12.67)
dt
Note that the boundary conditions (9.12.65) are incorporated into the
definition of D(A). It turns out that A is the generator of a C0 -
semigroup of contractions, say {T (t) : X → X; t ≥ 0}, so there is
a unique mild solution u of problem (9.12.64)–(9.12.66) given by the
variation of constants formula (see (9.11.61))
t
u(t, ·) = T (t)u0 (·) + T (t − s)f (s, ·) ds, t ∈ [0, r]. (9.12.68)
0

In order to show that A is a generator of a C0 -semigroup of contrac-


tions one could use the Hille–Yosida Theorem. A better option is to
use the Lumer–Phillips Theorem. In fact, as X is a Hilbert (hence
reflexive) space, it suffices to prove that A is an m-dissipative oper-
ator (cf. Theorem 9.27). This means that we do not need to check
the density condition on D(A) (that actually follows by the density of
C0∞ (0, 1) in X and the obvious inclusion relation C0∞ (0, 1) ⊂ D(A)).
9.12 Applications 285

As the dissipativeness of A follows trivially, let us prove that for any


λ > 0 we have R(λI − A) = X. In other words, for any λ > 0, g ∈ X,
there exists a solution v ∈ H 2 (0, 1) of the following boundary value
problem
λv − v  = g, v(0) = 0 = v(1).
But this follows easily by imposing the boundary conditions to the
general solution of the above differential equation.
One could also use Theorem 9.29 and the fact that A is a self-adjoint
operator.
According to Theorem 9.47 (see also its proof), if u0 ∈ D(A) =
H01 (0, 1)∩H 2 (0, 1) and f ∈ C 1 ([0, r]; X) then u ∈ C 1 ([0, r]; X). More-
over, since u satisfies the heat equation it follows that u ∈ C([0, r];
H 2 (0, 1)). Note that the condition u0 ∈ D(A) incorporates the com-
patibility of u0 with the boundary conditions: u0 (0) = u0 (1) = 0. It
is also worth pointing out that higher regularity of u can be obtained
under additional conditions on u0 and f .

The above discussion can be extended to more dimensions. Specifi-


cally, let Ω ⊂ Rn , n ≥ 2, be a bounded domain with a sufficiently
smooth boundary ∂Ω. Consider the n-dimensional heat equation

ut = Δu + f (t, x), t ∈ [0, r], x ∈ Ω,

and associate with it the homogeneous Dirichlet boundary condition

u = 0 on ∂Ω,

and the initial condition

u(0, x) = u0 (x), x ∈ Ω.

We have denoted by Δ the classical Laplacian with respect to x. Let


X = L2 (Ω) and let A = Δ with D(A) = H01 (Ω) ∩ H 2 (Ω). So the above
initial-boundary value problem can be viewed as a Cauchy problem in
X. The fact that A is a dissipative operator follows from Green’s for-
mula, and its m-dissipativity can be derived by using the Lax–Milgram
Theorem. The reader is encouraged to continue the discussion and de-
rive existence, uniqueness, and regularity of the solution to the above
problem. The reader could also consider the case of the homogeneous
Neumann or Robin boundary condition and investigate it along the
same lines.
286 9 Semigroups of Linear Operators

9.12.2 The Wave Equation


Consider in a first stage the one-dimensional wave equation

utt − uxx = f (t, x), t ≥ 0, x ∈ (0, 1), (9.12.69)

with the homogeneous Dirichlet boundary conditions,

u(t, 0) = 0 = u(t, 1), t ≥ 0, (9.12.70)

and initial conditions

u(0, x) = u0 (x), ut (0, x) = v0 (x), x ∈ (0, 1). (9.12.71)

Recall that this problem describes the evolution of the displacement


u(t, x) of an elastic string fixed at both its ends (x = 0 and x = 1),
where f (t, x) represents an external force.
Denoting v = ut , problem (9.12.69)–(9.12.71) can be equivalently writ-
ten as ⎧
⎪ ∂
⎨ ∂t [u, v] = [v, uxx + f ], t ≥ 0, x ∈ (0, 1),
u(t, 0) = u(t, 1) = 0, t ≥ 0,


[u, v](0, x) = [u0 (x), v0 (x)], x ∈ (0, 1).
Let X = H01 (0, 1) × L2 (0, 1) (the so-called phase space) which is a real
Hilbert space with the scalar product
1 1
 
[p1 , q1 ], [p2 , q2 ] = p1 p2 dx + q1 q2 dx,
0 0

and the induced norm. Define A : D(A) ⊂ X → X by

D(A) = [H01 (0, 1) ∩ H 2 (0, 1)] × H01 (0, 1), A[p, q] = [q, p ].

Thus the above problem can be expressed as the following Cauchy


problem in X

dt [u(t, ·), v(t, ·)] = A[u(t, ·), v(t, ·)] + [0, f (t, ·)], t ≥ 0,
d

[u(0, ·), v(0, ·)] = [u0 , v0 ].

Denote this Cauchy problem by (CP). In order to derive existence


results for (CP), we are going to show in what follows that A is the
generator of a C0 -group of isometries. For this purpose, we can use
Corollary 9.34.
9.12 Applications 287

First, noting that C0∞ (0, 1) is dense in H01 (0, 1) as well as in L2 (0, 1),
and C0∞ (0, 1) × C0∞ (0, 1) ⊂ D(A), we infer that the closure of D(A) in
X equals X. It is also easily seen that A is a closed operator. So we
need only to show that condition (kk)∗ of Corollary 9.34 is fulfilled.
Let λ ∈ (−∞, 0) ∪ (0, ∞) and let [g, h] be an arbitrary pair in X. We
claim that there exists a unique [p, q] ∈ D(A) such that

λ[p, q] − A[p, q] = [g, h], (9.12.72)

or, equivalently, there exists a unique p ∈ H01 (0, 1)∩H 2 (0, 1) satisfying
the equation
λ2 p = p + h + λg.
We know from the preceding discussion on the heat equation that the
last assertion is true. We also have q = λp − g ∈ H01 (0, 1) which
concludes the proof of our claim. Hence λI − A is invertible.
Now, multiplying Eq. (9.12.72) by [p, q] and taking into account the
definition of A we get
1 1 
λ[p, q]2X − p q  dx + p q dx = [g, h], [p, q],
 0  0 
=0

which implies

|λ| · [p, q]2X = |[g, h], [p, q]|


≤ [g, h]X · [p, q]X .

Therefore,

|λ| · (λI − A)−1 [g, h]X ≤ [g, h]X ∀[g, h] ∈ X,


λ ∈ (−∞, 0) ∪ (0, ∞),

and so
1
(λI − A)−1  ≤ ∀λ ∈ (−∞, 0) ∪ (0, ∞).
|λ|
Thus, according to Corollary 9.34, A generates a group of isometries,
say {G(t); t ∈ R} ⊂ L(X). Therefore, for all [u0 , v0 ] ∈ X and f ∈
L1loc ([0, ∞); X) there exists a unique mild solution [u, v] of (CP) given
by the variation of constants formula
t
[u(t, ·), v(t, ·)] = G(t)[u0 , v0 ] + G(t − s)[0, f (s, ·)] ds, t ≥ 0,
0
288 9 Semigroups of Linear Operators

hence u ∈ C([0, ∞); H01 (0, 1)). This u can be viewed as a general-
ized solution of problem (9.12.69)–(9.12.71). If [u0 , v0 ] ∈ D(A) =
[H01 (0, 1) ∩ H 2 (0, 1)] × H01 (0, 1) and f ∈ C 1 ([0, ∞); L2 (0, 1)), then
[u, v] ∈ C 1 ([0, ∞); X) (cf. Theorem 9.47). It follows that u ∈ C 2 ([0, ∞);
L2 (0, 1))∩C 1 ([0, ∞); H01 (0, 1))∩C([0, ∞); H 2 (0, 1)) and u is a classical
solution of problem (9.12.69)–(9.12.71).

The above discussion can be extended to the n-dimensional case




⎨utt − Δu = f (t, x), t ≥ 0, x ∈ Ω,
u(t, x) = 0, t ≥ 0, x ∈ ∂Ω,


u(0, x) = u0 (x), x ∈ Ω,

where Ω ⊂ Rn , n ≥ 2, is a bounded domain with sufficiently smooth


boundary ∂Ω, and Δ is the Laplacian with respect to x. In this case,
using the substitution v = ut again, the above initial-boundary value
problem can similarly be expressed as a Cauchy problem in the phase
space X = H01 (Ω) × L2 (Ω), associated with the operator A : D(A) ⊂
X → X defined by
 
D(A) = H01 (Ω) ∩ H 2 (Ω) × H01 (Ω), A[p, q] = [q, Δp].

One can again use Corollary 9.34 to prove that A generates a C0 -


group of isometries on X. In particular, to show that Eq. (9.12.72)
has a solution in D(A) we need to use Green’s formula (instead of
integration by parts) and Lax–Milgram. The rest follows similarly.
The case of the homogeneous Neumann or Robin boundary condition
can be addressed in a similar manner.

9.12.3 The Transport Equation


Let a be a given vector in Rn , n ≥ 1. Consider the equation

ut + a · ∇u = f (t, x), t ≥ 0, x ∈ Rn , (9.12.73)

with the initial condition

u(0, x) = u0 (x), x ∈ Rn , (9.12.74)

where ∇u means the gradient of u with respect to x, and a · b denotes


the usual scalar product of a, b ∈ Rn . Equation (9.12.73) is known as
the transport equation. The case a = 0 is trivial, so in what follows
we assume a = 0 (i.e., a = (a1 , . . . , an ) contains nonzero components).
9.12 Applications 289

Let us choose X = Lp (Rn ), p ∈ (1, ∞), equipped with the usual norm.
If f ≡ 0 and u0 is a smooth function, then the solution of problem
(9.12.73) and (9.12.74) is given by

u(t, x) = u0 (x − ta), t ≥ 0, x ∈ Rn .

This formula leads us to the definition of the semigroup {T (t) : X →


X; t ≥ 0},

(T (t)v)(x) = v(x − ta), v ∈ X, x ∈ Rn , t ≥ 0.

It is easily seen that this is a semigroup of isometries, of class C0 :



p
lim T (t)v − vX = lim |v(x − ta) − v(x)|p dx
t→0+ t→0+ Rn
= 0, ∀v ∈ X,

by virtue of the Lebesgue Dominated Convergence Theorem.


In order to determine its infinitesimal generator A : D(A) ⊂ X → X,
consider Eq. (9.12.73) with f ≡ 0 and deduce Av = −a · ∇v for all
v ∈ D(A). This follows from the fact that the right derivative of
t → T (t)v at t = 0 is equal to Av. Indeed, if v ∈ C0∞ (Rn ) (which is
dense in X), then v ∈ D(A) and

lim h−1 [T (h)v − v] + a · ∇vpX


h→0+

= lim |h−1 [v(x − ha) − v(x)] + a · ∇v(x)|p dx
h→0+ Rn
= 0,

by virtue of the Mean Value Theorem (which insures uniform conver-


gence as h → 0+ under the above integral). Since the range of A must
be a subset of X, the maximal domain of A is

D(A)
∂v
= {v ∈ X; ∈ X for all i ∈ {1, . . . , n} for which ai = 0},
∂xi
∂v
where ∂x i
denotes the partial derivative of v with respect to xi in the
sense of distributions. Since C0∞ (Rn ) is dense in X and C0∞ (Rn ) ⊂
D(A) it follows that D(A) is dense in X. Obviously, A is a closed
operator. We can use Theorem 9.29 to prove that A is a generator (the
generator of {T (t) : X → X; t ≥ 0}). Indeed, for all u ∈ D(A) \ {0}
290 9 Semigroups of Linear Operators

and u∗ = J(u) = u2−p


X |u|
p−2 u (here J denotes the duality mapping

of X), we have

∗ 2−p
u (Au) = uX Au · |u|p−2 u dx
Rn
n
2−p ∂u
= −uX ai |u|p−2 u dx
Rn ∂xi
i=1
n
1 2−p ∂
= − uX ai |u|p dx
p R n ∂x i
i=1
= 0,

so A is dissipative. To derive
 the last equality, we have used the fact
that the function g(xi ) = Rn−1 |u|p dx1 . . . dxi−1 dxi+1 . . . dxn belongs
to W 1,1 (R) so g(xi ) −→ 0 as |xi | −→ ∞ (prove it, or see [6, Corollary
8.9, p. 214]). Let X ∗ = Lq (R) be the dual of X (i.e., p1 + 1q = 1). The
adjoint A∗ of A is defined by

∂w
D(A∗ ) = {w ∈ X ∗ ; ∈ X ∗ ∀i ∈ {1, . . . , n} for which ai = 0},
∂xi
A∗ w = a · ∇w.

By a computation similar to that performed above for operator A, we


infer that operator A∗ is also dissipative. Thus, according to The-
orem 9.29, A is m-dissipative, hence it is indeed the generator of
{T (t) : X → X; t ≥ 0}. In fact, this semigroup extends to a C0 -
group of isometries,
 
T (t)v (x) = v(x − ta) x ∈ Rn , t ∈ R,

with generator A (see Sect. 9.4).


Therefore, for all u0 ∈ X = Lp (Rn ) and f ∈ L1 (0, ∞; X), problem
(9.12.73) and (9.12.74) has a unique mild solution u,
t
   
u(t, x) = T (t)u0 (x) + T (t − s)f (s, ·) (x) ds,
0
t
= u0 (x − ta) + f (s, x − (t − s)a) ds, ∀t ≥ 0.
0

If u0 ∈ D(A) and f ∈ C 1 ([0, ∞); X), then u ∈ C 1 ([0, ∞); X) and u is


a classical solution, with the additional property a·∇u ∈ C([0, ∞); X).
9.12 Applications 291

Remark 9.51. If n = 1 and a = −1, then the above group {T (t); t ∈ R}


is a group of translations defined on X = Lp (R).
Remark 9.52. Since the above operator A generates a C0 -group of
isometries, it follows by Corollary 9.34 that R \ {0} ⊂ ρ(A). Therefore
for all λ ∈ R \ {0} and g ∈ X = Lp (Rn ) there exists a unique solution
u ∈ D(A) satisfying the equation

λu(x) + a · ∇u(x) = g(x), x ∈ Rn .

9.12.4 The Telegraph System


For an electrical long line we have the following PDE system, called
the telegraph system (see, e.g., [36, p. 320])

Lit + vx + Ri = e(t, x),
Cvt + ix + Gv = 0, t ≥ 0, x ∈ (0, 1),

with the boundary conditions (Ohm’s law at both ends of the line)

v(t, 0) + R0 i(t, 0) = 0, R1 i(t, 1) = v(t, 1), t ≥ 0,

and initial conditions

i(0, x) = i0 (x), v(0, x) = v0 (x), x ∈ (0, 1),

where i = i(t, x) is the current flowing in the line and v = v(t, x)


represents the voltage across the line; R ≥ 0, R0 > 0, R1 > 0, L >
0, C > 0, G ≥ 0 are constants representing resistances, inductance,
capacitance, and conductance, respectively; e = e(t, x) is the voltage
per unit length impressed along the line in series with it.
We regard the unknown pair [i, v] as a function of t ≥ 0 with values
in X = L2 (0, 1) × L2 (0, 1). Consider in X the scalar product
1 1
[f1 , g1 ], [f2 , g2 ] = L f1 f2 dx + C g1 g2 dx,
0 0

and the norm induced by this scalar product, so X is a real Hilbert


space. Define A : D(A) ⊂ X → X by

D(A) = {[f, g] ∈ X; f  , g  ∈ L2 (0, 1),


g(0) + R0 f (0) = 0, R1 f (1) = g(1)},
 1 1
A[f, g] = − (g  + Rf ), − (f  + Gg) .
L C
292 9 Semigroups of Linear Operators

Operator A is densely defined, since C0∞ (0, 1) × C0∞ (0, 1) ⊂ D(A) and
is dense in X. It is also easily seen that A is a closed operator: it suffices
to note that the derivative is a closed operator in L2 (0, 1) and that con-
vergence in H 1 (0, 1) implies convergence in C[0, 1] (cf. Arzelà–Ascoli).
It turns out that A is an m-dissipative operator (thus confirming the
fact that A is densely defined and closed, cf. Theorems 9.10 and 9.27).
Indeed, for all [f, g] ∈ D(A) we have

 1 1 
A[f, g], [f, g] =  − (g  + Rf ), − (f  + Gg) , [f, g]
L C
1 1
= − f (g  + Rf ) dx − g(f  + Gg) dx
0 0
1 1 1

= − (f g) dx − R f dx − G
2
g 2 dx
0 0 0
1
≤ − (f g) dx
0
= f (0)g(0) − f (1)g(1)
= −R0 f (0)2 − R1 f (1)2
≤ 0,

that is to say, A is dissipative (with respect to the scalar product ·, ·).
Let us now prove that R(λI − A) = X for all λ > 0, i.e., for all λ > 0
and [h, k] ∈ X there exists a solution [f, g] ∈ D(A) of the equation

λ[f, g] − A[f, g] = [h, k]. (9.12.75)

Equation (9.12.75) can be written as the following boundary value


problem

⎪ 
⎨f + (Cλ + G)g = Ck,
g  + (Lλ + R)f = Lh,


g(0) + R0 f (0) = 0, R1 f (1) = g(1).

We compute the general solution [f, g] of the above differential system


(see the solution of Exercise 9.13) and then impose upon it the above
boundary conditions to deduce that there exists a unique [f, g] ∈ D(A)
satisfying the problem. The details are left to the reader. Thus, A
is m-dissipative, so it generates a C0 -contraction semigroup on X,
say {T (t) : X → X; t ≥ 0} (cf. Theorem 9.27). Therefore, for all
[i0 , v0 ] ∈ X and e ∈ L1loc ([0, ∞); L2 (0, 1)) there exists a unique mild
9.13 Exercises 293

solution [i, v] ∈ C([0, ∞); X) of the Cauchy problem



dt [i(t, ·), v(t, ·)] = A[i(t, ·), v(t, ·)] + [e(t, ·), 0], t ≥ 0,
d

[i(0, ·), v(0, ·)] = [i0 , v0 ],

which is the representation in X of our initial-boundary value problem


formulated above. This mild solution can be written explicitly in terms
of T (t), i0 , v0 , and e, by using the usual variation of constants formula.
If [i0 , v0 ] ∈ D(A) and e ∈ C 1 ([0, ∞); L2 (0, 1)), then [i, v] is a classi-
cal solution, [i, v] ∈ C 1 ([0, ∞); X) ∩ C([0, ∞); H 1 (0, 1)2 ). It is worth
pointing out that the condition [i0 , v0 ] ∈ D(A) implies compatibility
of the initial data with the boundary conditions and, as a by-product
of this compatibility plus smoothness of function e, we obtain a clas-
sical solution [i, v] with the above properties. In particular i, v are
continuous on [0, ∞) × [0, 1] and satisfy the boundary conditions for
all t ≥ 0.
Remark 9.53. All the above applications can be extended to the semi-
linear case, as pointed out in Remark 9.50.

Comment. This chapter represents a short introduction into the


theory of semigroups of linear operators, including its implications to
linear evolution equations and some applications. Some subjects in
the field have not been addressed, e.g., semigroups of compact opera-
tors, differentiable semigroups, analytic semigroups, dual semigroups,
etc. For more information about linear operator semigroups and their
applications, the reader is referred to [7], [12], [19], [21], [39], [49], [51].
For more details on the regularity of solutions to linear evolution equa-
tions, including significant examples from the theory of linear partial
differential equations, see [6], [19], [39], [49].

9.13 Exercises
1. Compute T (t) = etA , t ∈ R, where
- . - . - .
1 1 0 1 −1 −1
(i) A = ; (ii) A = ; (iii) A = .
−1 −1 −1 0 2 −4

2. If A is an n × n complex matrix, then the following equivalences


hold true:
294 9 Semigroups of Linear Operators

(a) supt≥0 etA  < ∞ ⇐⇒ all eigenvalues λ of A satisfy Re λ ≤


0 and whenever Re λ = 0, then λ is a simple eigenvalue;
(b) limt→∞ etA  = 0 ⇐⇒ all eigenvalues λ of A satisfy Re λ <
0.

3. Let (X,  · ) be a Banach space and let A ∈ L(X). Consider in


X the Cauchy problem

u (t) = Au(t), t ∈ R,
u(0) = u0 .

Show that if u0 = 0 then u(t) = 0 for all t ∈ R.

4. Let (X,  · ) be a Banach space. Show that for every C0 -


semigroup {T (t) : X → X; t ≥ 0} the X-valued function (t, x) →
T (t)x is continuous on [0, ∞) × X.

5. Let X denote the space of all functions f : R → R which are


continuous and bounded, equipped with the sup-norm. For some
λ > 0 and δ > 0 define G(t) : X → X by

 (λt)k
(G(t)f )(x) = e−λt f (x − kδ), t ≥ 0, f ∈ X, x ∈ R.
k!
k=0

(a) Prove that {G(t) : X → X; t ≥ 0} is a uniformly continuous


group and determine its infinitesimal generator;
(b) Show that 
1 if t ≥ 0,
G(t) =
e−2λt if t < 0.

6. Let X be the real Banach space of all functions f : R → R that


are continuous on R and p-periodic with some period p > 0,
equipped with the sup-norm

f  = sup |f (s)| ∀f ∈ X.
0≤s≤p

Define
(T (t)f )(s) = f (t + s), t, s ∈ R, f ∈ X.
Show that {T (t) : X → X; t ∈ R} is a C0 -group of isometries,
i.e., T (t) = 1, t ∈ R. Find its infinitesimal generator.
9.13 Exercises 295

7. Let M = (mij ) be a k × k matrix with real entries. Denote


X = Lp (Rk ), where p ∈ [1, ∞). For t ∈ R define G(t) : X → X
by
(G(t)f )(x) = f (e−tM x), f ∈ X, a.a. x ∈ Rk .

(a) Show that {G(t) : X → X; t ≥ 0} is a C0 -group and deter-


mine its infinitesimal generator;

(b) If ki=1 mii = 0, then G(t) = 1 for all t ∈ R.

8. Let X be the real Banach space of all functions f : [0, ∞) → R


that are bounded and uniformly continuous on [0, ∞), equipped
with the usual sup-norm. Define

f (s − t) for s − t ≥ 0,
(T (t)f )(s) =
f (0) for s − t < 0.

Show that {T (t) : X → X; t ≥ 0} is a C0 -semigroup and deter-


mine its infinitesimal generator.

9. For a given 1 ≤ p < ∞, consider the real p


∞Banachp space X = l of
all sequences (xn )n∈N in R satisfying n=1 |xn | < ∞, equipped
with the usual norm

∞ 1/p
(xn )p = |xn |p ∀(xn )n∈N ∈ X.
n=1

Let (cn )n∈N be a sequence of positive numbers. Define T (t) :


X → X by

T (t)(xn )n∈N = (e−cn t xn )n∈N ∀(xn )n∈N ∈ X, t ≥ 0.

(a) Show that {T (t) : X → X; t ≥ 0} is a C0 -semigroup of


contractions;
(b) Determine its infinitesimal generator;
(c) Prove that {T (t) : X → X; t ≥ 0} is uniformly continuous
if and only if (cn ) is bounded.

10. Let H = L2 (0, 1) be equipped with the usual scalar product and
the corresponding induced norm. Define A : D(A) ⊂ H → H by

D(A) = {v ∈ H 1 (0, 1); v(0) = 0}, Av = −v  ∀v ∈ D(A).


296 9 Semigroups of Linear Operators

Show that A generates a C0 -semigroup of contractions {T (t) :


H → H; t ≥ 0}. Find the explicit form of this semigroup and
show that, for u0 ∈ H, u(t, x) = (T (t)u0 )(x) satisfies the trans-
port equation ut + ux = 0 in Ω = (0, ∞) × (0, 1) in the sense of
distributions.
11. Consider the initial-boundary value problem


⎨ut − uxx + au = f (t, x), t > 0, x ∈ (0, 1),
u(t, 0) = 0, ux (t, 1) + αu(t, 1) = 0, t > 0,


u(0, x) = u0 (x), x ∈ (0, 1),
where a ∈ R, α > 0, u0 ∈ L2 (0, 1), f ∈ L1loc [0, ∞). Solve this
problem using the semigroup approach. Solve the more general
problem obtained by replacing the term au in the above equation
by h(u), where h : R → R is a Lipschitz function.
12. Consider the initial-boundary value problem


⎨utt − uxx = f (t, x), t > 0, x ∈ (0, 1),
u(t, 0) = 0, ux (t, 1) = 0, t > 0,


u(0, x) = u0 (x),
where u0 ∈ H 1 (0, 1), u0 (0) = 0, f ∈ L1loc [0, ∞). Solve this
problem using the semigroup approach.
13. Consider the telegraph differential system

Lit + vx + Ri = e(t, x),
Cvt + ix + Gv = 0, t ≥ 0, x ∈ (0, 1),
with the following boundary conditions
v(t.0) + R0 i(t, 0) = 0, −i(t, 1) + C1 vt (t, 1) + D1 v(t, 1)
= e1 (t), t > 0,
and initial conditions
i(0, x) = i0 (x), v(0, x) = v0 (x), x ∈ (0, 1),
where C > 0, C1 > 0, L > 0, D1 ≥ 0, G ≥ 0, R ≥ 0, R0 ≥ 0,
and e, e1 are given functions.
(a) Solve the above problem by using the semigroup approach;
(b) What can you say about existence in the case when D1 , G, R
are Lipschitz functions from R into itself?
Chapter 10

Solving Linear Evolution


Equations by the Fourier
Method

In Chap. 9 we used the linear semigroup approach to solve inhomoge-


neous linear evolution equations. For the same purpose, we use here
the Fourier method. More precisely, under appropriate conditions on
the linear operators governing such equations, we find the solutions in
the form of Fourier series expansions. This approach is based in an
essential way on the results discussed in Chap. 8.

10.1 First Order Linear Evolution


Equations
Consider the Cauchy problem

u (t) + Qu(t) = f (t), 0 < t < T, (E)

u(0) = u0 , (IC)
where Q satisfies the set of conditions (a) originally presented in
Chap. 8:

© Springer Nature Switzerland AG 2019 297


G. Moroşanu, Functional Analysis for the Applied Sciences,
Universitext, https://doi.org/10.1007/978-3-030-27153-4 10
298 10 Solving Linear Evolution Equations by the Fourier Method

(a) Q : D(Q) ⊂ H → H is a linear, densely defined, self-adjoint,


strongly positive operator, where (H, (·, ·),  · ) is a real, infinite
dimensional, separable Hilbert space.

We also assume that the energetic space HE defined in Chap. 8


satisfies

(b) HE is compactly embedded into H,

so that Theorem 8.16 holds true. The notation in the statement


of that theorem will be also used in what follows.

If Q satisfies (a) then −Q generates a C0 -semigroup of contractions


(see Theorem 9.29) and so for any u0 ∈ H and f ∈ L1 (0, T ; H)
there exists a unique mild solution u = u(t) of problem (E), (IC)
given by the variation of constants formula. If u0 ∈ D(Q) and f ∈
C 1 ([0, T ]; H), then u is a classical solution (cf. Theorem 9.47). The
Fourier method we are going to discuss next offers more possibilities
to investigate the regularity of solutions and provides good approxi-
mations of solutions in terms of eigenfunctions of the operator Q.
Let us start with a specific result.

Theorem 10.1. Assume that (a) and (b) above are fulfilled. Then
for all u0 ∈ H and f ∈ L2 (0, T ; H) there exists a unique
√ function u ∈
C([0, T ]; H) ∩ C((0, T ]; HE ) ∩ L2 (0, T ; HE ) with tu ∈ L2 (0, T ; H)
which satisfies (IC) and Eq. (E) for a.a. t ∈ (0, T ). This function
u (called a strong solution of problem (E), (IC)) is expressed as the
Fourier series expansion


u(t) = un (t)en , (10.1.1)
n=1

where {en }∞
n=1 is the orthonormal basis in H provided by Theorem 8.16,
and un (t) = (u(t), en ), n = 1, 2, . . . If u0 ∈ HE and f ∈ L2 (0, T ; H),
then u ∈ H 1 (0, T ; H) ∩ C([0, T ]; HE ), u(t) ∈ D(Q) for a.a. t ∈ (0, T ),
and Qu ∈ L2 (0, T ; H).

Proof. Assume first that u0 ∈ H and f ∈ L2 (0, T ; H). As mentioned


before, we already know that problem (E), (IC) has a unique mild so-
lution u given by the variation of constants formula. A strong solution
is clearly a mild one so the uniqueness part of the theorem is obvious.
10.1 First Order Linear Evolution Equations 299

In fact, uniqueness also follows by a simple direct proof. If y = y(t)


denotes the difference of two strong solutions of problem (E), (IC),
then y(0) = 0 and

y  (t) + Qy(t) = 0 for a.a. t ∈ (0, T ) .

Multiplying this equation by y(t) and taking into account the positivity
of Q we obtain
1d
y(t)2 = (y  (t), y(t)) ≤ 0 for a.a. t ∈ (0, T ) ,
2 dt
which shows that the function t → y(t) is nonincreasing on [0, T ].
Since y(0) = 0 it follows that y is the null function, i.e., the two strong
solutions coincide.
We could show that, under our assumptions, the mild solution u is in
fact a strong solution by a limiting procedure applied to a sequence
of strong solutions un ∈ C 1 ([0, T ]; H) (given by Theorem 9.47) cor-
responding to sequences u0n ∈ D(Q) and fn ∈ C 1 ([0, T ]; H) which
satisfy u0n − u0  → 0, fn − f L2 (0,T ; H) → 0. However, we shall pro-
vide here the existence proof using the Fourier method. Specifically,
we seek a solution in the form (10.1.1) where the un ’s are unknown
real valued functions. For u0 we have the Fourier expansion

 ∞

u0 = u0n en with u0n = (u0 , en ), u0  = 2
u20n .
n=1 n=1

Similarly, for a.a. t ∈ (0, T ),



 ∞

f (t) = fn (t)en with fn (t) = (f (t), en ), f (t) = 2
fn (t)2 .
n=1 n=1
k
Denoting sk (t) = n=1 fn (t)en , we can see that


k
sk (t)2 = fn (t)2 ≤ f (t)2 ∀k ∈ N, a.a. t ∈ (0, T ) ,
n=1

so by the Lebesgue Dominated Convergence Theorem sk → f strongly


in L2 (0, T ; H). Now we impose conditions on u (given by (10.1.1)) to
formally satisfy Eq. (E),

 ∞
 ∞

un (t)en + un (t)λn en = fn (t)en ,
n=1 n=1 n=1
300 10 Solving Linear Evolution Equations by the Fourier Method

and (IC),

 ∞

un (0)en = u0n en ,
n=1 n=1

so identifying the coefficients of the en ’s yields


un (t) + λn un (t) = fn (t) for all n ∈ N and a.a. t ∈ (0, T ) , (10.1.2)
un (0) = u0n , n ∈ N , (10.1.3)
hence
t
−λn t
un (t) = e u0n + e−λn (t−s) fn (s) ds ∀t ∈ [0, T ], n ∈ N .
0

Therefore, un ∈ H 1 (0, T ) and, since λn ≥ λ1 > 0, we easily obtain by


Hölder’s inequality

 T 
un (t) ≤ 2
2
u20n +T fn (s)2 ds ∀t ∈ [0, T ], n ∈ N . (10.1.4)
0
T
Since u20n and 0 fn (s)2 ds are terms of convergent series, ∞ it follows
from (10.1.4), by the Weierstrass M-test, that the series n=1 un (t)2 is
uniformly convergent in [0, T ] and consequently so is the series (10.1.1)
and its sum u is in C([0, T ]; H).
Next, we multiply Eq. (10.1.2) by tun (t) and then integrate the result-
ing equation over [0, T ] to obtain ∀n ∈ N
T
λn
tun (t)2 dt + T un (T )2
0 2
T
λn T
= 2
un (t) dt + tfn (t)un (t) dt
2 0 0
T
λn 1 T  2 1 T
≤ un (t)2 dt + tun (t) dt + tfn (t)2 dt . (10.1.5)
2 0 2 0 2 0
On the other hand, multiplying (10.1.2) by un (t) and then integrating
over [0, T ] we obtain
T T
1 2 
u (T ) − u0n + λn
2 2
un (t) dt = fn (t)un (t) dt
2 n 0 0

1 T
≤ fn (t)2 dt
2 0

1 T
+ un (t)2 dt ,
2 0
10.1 First Order Linear Evolution Equations 301

for all n ∈ N, so

 T
λn un (t)2 dt < ∞ , (10.1.6)
n=1 0

hence

 ∞
 ∞

 2  2
u(t), λn−1/2 en E
= u(t), λn−1/2 Qen = λn un (t)2
n=1 n=1 n=1
∞
is convergent for a.a. t ∈ (0, T ), and t → u(t)2E = n=1 λn un (t)
2 is
summable on (0, T ), i.e., u ∈ L2 (0, T ; HE ).
From (10.1.5) and (10.1.6) we infer that

 T
tun (t)2 dt < ∞ ,
n=1 0

so tu ∈ L2 (0, T ; H). We also have the inequality (similar to (10.1.5))
t T T
λn 1 λn 1
tun (t)2 ≤ − 2
sun (s) ds + 2
un (s) ds + sfn (s)2 ds ,
2 2 0 2 0 2 0

for all t ∈ [0, T ], n ∈ N, which


∞ combined 2with (10.1.6) implies (by the
Weierstrass
√ M-test) that n=1 λn tun (t) is uniformly convergent in
[0, T ] so tu ∈ C([0, T ]; HE ). This shows that u ∈ C((0, T ]; HE ).
Now, passing to the limit in L2 (0, T ; H) as k → ∞ in the equation


k 
k 
k
fn (t)en = un (t)en + un (t)λn en ,
n=1 n=1 n=1
k 
k 
= un (t)en +Q un (t)en
n=1 n=1

we conclude that u satisfies Eq. (E) for a.a. t ∈ (0, T ). This uses the
fact that Q is a closed operator. It is also obvious that u(0) = u0 .
Now, let us assume that u0 ∈ HE and f ∈ L2 (0, T ; H). Multiplying
Eq. (10.1.2) by un (t) we obtain

λn d  
un (t)2 + un (t)2
2 dt
= fn (t) · un (t) for a.a. t ∈ (0, T ), ∀n ∈ N . (10.1.7)
302 10 Solving Linear Evolution Equations by the Fourier Method

It follows, by integration over [0, T ], that


T
T
λn  
un (t)2 dt
+ un (T ) − u0n =
2 2
fn (t) · un (t) dt
0 2 0

1 T 1 T  2
≤ 2
fn (t) dt + u (t) dt , (10.1.8)
2 0 2 0 n
∞
for all n ∈ N. Since u0 ∈ HE (i.e., n=1 λn u0n < ∞), the last
2

inequality implies
∞ T
un (t)2 dt < ∞ ,
n=1 0
∞
hence n=1 un (t)en is convergent in L2 (0, T ; H) and, obviously, its
sum is u ∈ L2 (0, T ; H).
Integration over [0, t] of (10.1.7)
∞ leads to an inequality similar to
2
(10.1.8) which implies that n=1 λn un (t) is uniformly convergent in
[0, T ] and so u ∈ C([0, T ]; HE ). As u , f ∈ L2 (0, T ; H) we derive from
Eq. (E) that Qu ∈ L2 (0, T ; H).

Remark 10.2. For further regularity results see, e.g., [22, Chapter 7].
We continue with a result on the existence of a periodic solution of
Eq. (E).

Theorem 10.3. Assume that (a) and (b) are fulfilled and f ∈ L2
(0, T ; H). Then, there exists a unique function u ∈ H 1 (0, T ; H) ∩
C([0, T ]; HE ) satisfying Eq. (E) for a.a. t ∈ (0, T ) and u(0) = u(T ),
and u is given by Eq. (10.1.1), where
t
un (t) = dn e−λn t + e−λn (t−s) fn (s) ds ,
0

with

 
−λn T −1
T
dn = 1 − e e−λn (T −s) fn (s) ds , n = 1, 2, . . .
0

Proof. By Theorem 10.1, for all u0 ∈ H there is a unique strong solu-


tion u = u(t, u0 ) of problem (E), (IC)
√ which belongs to C([0, T ]; H) ∩

C((0, T ]; HE ) ∩ L (0, T ; HE ) with tu (t, u0 ) ∈ L2 (0, T ; H). For two
2

vectors u0 , v0 ∈ H we have
d
[u(t, u0 ) − u(t, v0 )] + Q[u(t, u0 ) − u(t, v0 )] = 0 for a.a. t ∈ (0, T ) .
dt
10.1 First Order Linear Evolution Equations 303

If we multiply this equation by u(t, u0 ) − u(t, v0 ) and use the strong


positivity of Q (with some constant c > 0), we get
1 d
u(t, u0 ) − u(t, v0 )2
2 dt
+ cu(t, u0 ) − u(t, v0 )2 ≤ 0 for a.a. t ∈ (0, T ) ,
or, equivalently,
d  2ct 
e u(t, u0 ) − u(t, v0 )2 ≤ 0 for a.a. t ∈ (0, T )
dt
which shows that the function t → ect u(t, u0 ) − u(t, v0 ) is nonin-
creasing and hence
u(t, u0 ) − u(t, v0 ) ≤ e−ct u0 − v0  ∀t ∈ [0, T ] . (10.1.9)
Now let us consider the so-called Poincaré operator P : H → H defined
by
P u0 = u(T ; u0 ) ∀u0 ∈ H .
From (10.1.9) we see that P is a contraction:
P u0 − P v0  ≤ e−cT u0 − v0  ∀u0 , v0 ∈ H .
By the Banach Contraction Principle (see Chap. 2) it follows that P
has a unique fixed point u∗0 ∈ H, i.e., P u∗0 = u∗0 . In other words,
u(T, u∗0 ) = u∗0 , which is to say, u(t, u∗0 ) is the unique periodic solu-
tion of Eq. (E). Since u∗0 = u(T, u∗0 ) we deduce from the first part
of Theorem 10.1 that u∗0 ∈ HE . Therefore, by the second part of
Theorem 10.1, it follows that u(t, u∗0 ) ∈ H 1 (0, T ; H) ∩ C([0, T ]; HE ).
Clearly u(t, u∗0 ) is the sum of a Fourier series of the form (10.1.1) which
is convergent in C([0, T ]; HE ) since u∗0 ∈ HE . From the periodicity
condition u∗0 = u(T, u∗0 ) we infer
un (0) = un (T ) ∀n ∈ N , (10.1.10)
where the un ’s are solutions of (10.1.2), i.e.,
t
un (t) = dn e−λn t + e−λn (t−s) fn (s) ds ∀t ∈ [0, T ], n ∈ N,
0

Taking into account (10.1.10) we can easily find



 −1 T −λn (T −s)
dn = 1 − e−λn T e fn (s) ds, n ∈ N .
0
304 10 Solving Linear Evolution Equations by the Fourier Method

10.2 Second Order Linear Evolution


Equations
In this section we keep the notation and assumptions used in the pre-
vious section. Consider the Cauchy problem

u (t) + Qu(t) = f (t), 0 < t < T, (e)

u(0) = u0 , u (0) = u1 . (ic)


Theorem 10.4. Assume that conditions (a) and (b) are fulfilled.
Then for all u0 ∈ D(Q) (i.e., Qu0 ∈ H), u1 ∈ HE and f ∈ L2 (0, T ; HE )
there exists a unique function u ∈ C 1 ([0, T ]; HE ) ∩ H 2 (0, T ; H) which
satisfies (ic) and (e) for a.a. t ∈ (0, T ), and Qu ∈ C([0, T ]; H).
If, in addition, f ∈ C([0, T ]; H) then u ∈ C([0, T ]; H). Alter-
natively, if u0 ∈ D(Q), u1 ∈ HE and f ∈ H 1 (0, T ; H) then u ∈
C 1 ([0, T ]; HE ) ∩ C 2 ([0, T ]; H) (hence Qu ∈ C([0, T ]; H)). In both
cases the solution u is given by a Fourier series expansion of the form
(10.1.1).
Proof. Let us first prove uniqueness. Let y ∈ H 1 (0, T ; H) be the
difference of two solutions of problem (e), (ic). Then, y(0) = 0, y  (0) =
0, and
y  (t) + Qy(t) = 0 for a.a t ∈ (0, T ) .
We multiply this equation by y  (t) to obtain
    
y (t), y  (t) + Qy(t), y  (t) = 0 ,

so, as Q is self-adjoint, we can write


d   
y (t)2 + Qy(t), y(t) = 0 ,
dt
for a.a. t ∈ (0, T ). This shows that y is the null function (since y(0) =
0, y  (0) = 0 and Q is strongly positive), so the solution is indeed unique
(if it exists).
In order to prove existence, we seek a solution u to problem (e), (ic)
in the form (10.1.1). Requiring this series to formally satisfy (e) and
(ic) we find

un (t) + λn un (t) = fn (t) ∀n ∈ N and a.a. t ∈ (0, T ) , (10.2.11)

un (0) = u0n , un (0) = u1n ∀n ∈ N , (10.2.12)


10.2 Second Order Linear Evolution Equations 305

where fn (t), u0n and u1n are the Fourier coefficients of f (t), u0 and
u1 , respectively. For each n ∈ N problem (10.2.11) and (10.2.12) has
the solution
 u1n 
un (t) = u0n cos( λn t) + √ sin( λn t)
λn
t 
1  
+√ sin λn (t − s) fn (s) ds , (10.2.13)
λn 0

for all t ∈ [0, T ]. Therefore,


  
un (t) = − λn u0n sin( λn t) + u1n cos( λn t)
t
 
+ cos λn (t − s) fn (s) ds , (10.2.14)
 0
   
t √
= 0 cos λn s fn (t−s) ds

and
  
un (t) = −λn u0n cos( λn t) − λn u1n sin( λn t)
 t  
+ fn (t) − λn sin λn (t − s) fn (s) ds , (10.2.15)
0

or, equivalently,
  
un (t) = −λn u0n cos( λn t) − λn u1n sin( λn t)
t
 
+ cos λn (s) fn (t − s) ds . (10.2.16)
 0
   
t √
= 0 cos λn (t−s) fn (s) ds

From (10.2.13)–(10.2.16) we deduce (where C1 , C2 , C3 , C4 are con-


stants)
T
1 2 1
un (t)2 ≤ C1 (u20n + u1n + fn (s)2 ds) , (10.2.17)
λn λn 0

T
un (t)2 ≤ C2 (λn u20n + u21n + fn (s)2 ds) , (10.2.18)
0
T
un (t)2 ≤ C3 (λ2n u20n + λn u21n + fn (t) + λn2
fn (s)2 ds) , (10.2.19)
0
306 10 Solving Linear Evolution Equations by the Fourier Method

and
T
un (t)2 ≤ C4 (λ2n u20n + λn u21n + fn (s)2 ds) . (10.2.20)
0

Assume u0 ∈ D(Q), u1 ∈ HE and f ∈ L2 (0, T ; HE ). Then



 ∞

λ2n u0n 2 < ∞, λn u1n 2 < ∞ ,
n=1 n=1
∞ T
λn f (t)2 dt < ∞ . (10.2.21)
n=1 0

It follows from (10.2.17)–(10.2.19) and (10.2.21) that the series (10.1.1)


is convergent in different spaces and its sum u satisfies

u ∈ C 1 ([0, T ]; HE ) ∩ H 2 (0, T ; H), Qu ∈ C([0, T ]; H) .

If, in addition, f ∈ C([0, T ]; H) then, according to (10.2.20), u ∈


C([0, T ]; H).
If u0 ∈ D(Q), u1 ∈ HE and f ∈ H 1 (0, T ; H) then, according to
(10.2.17), (10.2.18), (10.2.20), and (10.2.21), u ∈ C 1 ([0, T ]; HE ) ∩
C 2 ([0, T ]; H) (hence Qu ∈ C([0, T ]; H)).
Finally, it is easily seen (as in the proof of Theorem 10.1) that in
both cases u, expressed as the sum of the series (10.1.1), satisfies (e),
(ic).

Remark 10.5. Obviously, further regularity results can be stated under


different conditions on u0 , u1 and f .
On the other hand, using the semigroup approach, one can derive
the existence of a solution to problem (e), (ic) which comes from the
mild solution for the Cauchy problem associated with a first order
differential equation in the product space X = V × H equipped with
the scalar product
 
[v1 , h1 ], [v2 , h2 ] X
= (v1 , v2 )E + (h1 , h2 ) ∀[v1 , h1 ], [v2 , h2 ] ∈ X .

Obviously, X is a real Hilbert space. Define A : D(A) ⊂ X → X by

D(A) = D(Q) × HE , A[v, h] = [h, −Qv] ∀[v, h] ∈ D(A) .


10.2 Second Order Linear Evolution Equations 307

It is easily seen that A is linear, densely defined, closed, and dissipative.


In fact, for all [v, h] ∈ D(A), we have
   
A[v, h], [v, h] X = [h, −Qv], [v, h] X
= (h, Qv) − (Qv, h)
= 0.
Thus, according to Remark 9.26, A is a dissipative operator. We also
have A∗ = −A, so A∗ is also dissipative. By Theorem 9.29 it follows
that A is m-dissipative, so (according to the Lumer–Phillips Theorem)
it generates a C0 -semigroup of contractions, say {S(t) : X → X; t ≥
0}.
Problem (e), (ic) can be expressed as the following Cauchy problem
in X
d
[u(t), w(t)]
dt
= A[u(t), w(t)] + [0, f (t)], 0 < t < T ; [u, w](0) = [u0 , u1 ] .
(10.2.22)
According to Sect. 9.11, for [u0 , u1 ] ∈ X and f ∈ L1 (0, T ; H) this
problem has a unique mild solution [u, w] ∈ C([0, T ]; X),
t
[u(t), w(t)] = S(t)[u0 , u1 ] + S(t − s)[0, f (s)] ds, t ∈ [0, T ] .
0
(10.2.23)
The first component u = u(t) can be called a mild solution of problem
(e), (ic). In fact, w(t) = u (t). In order to show this, we approx-
imate [u0 , u1 ] ∈ X by [uk0 , uk1 ] ∈ D(Q) × HE , and f ∈ L1 (0, T ; H)
by f k ∈ H 1 (0, T ; H). Denote by [uk , wk ] = [uk , (uk ) ] the solution
of problem (10.2.22) with [u0 , u1 ] := [uk0 , uk1 ] and f := f k which is a
strong solution belonging to C 1 ([0, T ]; HE ) ∩ C 2 ([0, T ]; H) (cf. Theo-
rem 10.4). Obviously,
[uk (t), (uk ) (t)]
t
= S(t)[uk0 , uk1 ] + S(t − s)[0, f k (s)] ds, t ∈ [0, T ] . (10.2.24)
0
As {S(t) : X → X; t ≥ 0} is a semigroup of contractions, we have for
all t ∈ [0, T ]
[uk (t) − um (t), (uk ) (t) − (um ) (t)]X ≤ [uk0 − uk0 , uk1 − um
1 ]X
T
+ f k (s) − f m (s) ds,
0
308 10 Solving Linear Evolution Equations by the Fourier Method

hence uk converges in C([0, T ]; HE ) to some u ∈ C([0, T ] HE ), and


(uk ) converges in C([0, T ]; H) to w = u ∈ C([0, T ]; H). Passing to
the limit in (10.2.24) we reobtain (10.2.23) with w = u . So the mild
solution u belongs to C([0, T ]; HE ) ∩ C 1 ([0, T ]; H). Since u is a limit
of strong solutions uk that admit Fourier series expansions (as stated
in Theorem 10.4), we can easily show that u is the sum of the Fourier
series (10.1.1), where un (t) = (u(t), en ) for n = 1, 2, . . .

10.3 Examples
Let ∅ = Ω ⊂ RN , N ≥ 2, be a bounded domain with smooth bound-
ary ∂Ω. Consider the following problem (associated with the heat
equation)


⎨ut − Δu = f (t, x), t ≥ 0, x ∈ Ω ,
u(t, x) = 0, t ≥ 0, x ∈ ∂Ω , (10.3.25)


u(0, x) = u0 (x), x ∈ Ω.

This problem can be solved by the Fourier method using the re-
sults presented in Chap. 8 and in Sect. 10.1 above. Thus, the Fourier
method provides an approach for solving the above initial-boundary
value problem which is complementary to the semigroup approach.
Specifically, consider H = L2 (Ω) equipped with the usual scalar prod-
uct and Hilbertian norm, Q = −Δ with D(Q) = H01 (Ω) ∩ H 2 (Ω), and
HE = H01 (Ω) (the corresponding energetic space) with

(p, q)E = ∇p · ∇q dx, p2E = (p, p)E .
Ω

By Theorem 8.16 there exist an increasing sequence (λn )n≥1 in (0, ∞)


converging to ∞ and an orthonormal basis {en }∞ 2
n=1 in H = L (Ω) such
that
−Δen = λen in Ω, ∀n ≥ 1 .
Thus, Theorem 10.1 is applicable to problem (10.3.25) which is of the
form (E), (IC) with the above choices. In particular, under suitable
conditions, the solution of (10.3.25) is given by


u(t, x) = un (t)en (x) , (10.3.26)
n=1
10.4 Exercises 309

where the un ’s are solutions of



un (t) + λn un (t) = fn (t), t ≥ 0,
un (0) = u0n , n = 1, 2, . . .
with

fn (t) = f (t, ξ)en (ξ) dξ, u0n = u0 (ξ)en (ξ) dξ, n = 1, 2 . . .
Ω Ω

Theorem 10.3 is also applicable to problem (10.3.25).

Theorem 10.4 can be illustrated with the following problem (associated


with the wave equation):


⎨utt − Δu = f (t, x), t ≥ 0, x ∈ Ω ,
u(t, x) = 0, t ≥ 0, x ∈ ∂Ω , (10.3.27)


u(0, x) = u0 (x), x ∈ Ω.
The cases of the boundary conditions of Neumann or Robin type can
also be analyzed along the same lines.

10.4 Exercises
1. Consider the following initial-boundary value problem:


⎨ut − uxx = f (t, x), t ∈ (0, T ), x ∈ (0, 1),
u(t, 0) = 0, ux (t, 1) = 0, t ∈ [0, T ],


u0 (x) = u0 (x), x ∈ (0, 1).

Denote H = L2 (0, 1). Assume that H is equipped with the


usual scalar product (·, ·) and the induced norm  ·  (hence H is
a real Hilbert space which is infinite dimensional and separable).
Define Q : D(Q) ⊂ H → H by
D(Q) = {v ∈ H 2 (0, 1); v(0) = 0, v  (1) = 0},
Qv = −v  ∀v ∈ D(Q).
The above problem can be expressed as a Cauchy problem in H:

u (t) + Qu(t) = f (t), 0 < t < T,
(CP )
u(0) = u0 ,
where u(t) := u(t, ·) ∈ H.
310 10 Solving Linear Evolution Equations by the Fourier Method

(i) Show that Q satisfies condition (a) of Theorem 10.1 (i.e.,


Q is densely defined, self-adjoint, and strongly positive);
(ii) Find all the eigenpairs of Q and construct a corresponding
orthonormal basis {en }∞n=1 of H;

(iii) Determine the energetic space HE , show that HE is com-


pactly embedded in H, and determine an orthonormal basis
of HE ;
(iv)  Find the explicit Fourier series solution u(t, x) =

n=1 un (t)en (x) for

u0 (x) = x(1 − x), f (t, x) = (t + 1)x.

2. Consider a homogeneous thin metal rod occupying an interval


[0, l], l > 0. The temperature at time t = 0 of the rod is constant:
u = u0 for x ∈ [0, l]. The temperatures at the ends of the rod
are kept constant in time: u(t, 0) = u1 , u(t, l) = u2 , t ∈ [0, T ],
where T > 0 is a given time instant.
Find the temperature distribution u = u(t, x) on the rod, if there
is no external heat source distributed along the rod.

3. Consider the following initial-boundary value problem:




⎨ut − uxx = f (t, x), t ∈ (0, T ), x ∈ (0, 1),
−ux (t, 0)+αu(t, 0)=0, ux (t, 1)=0, t∈[0, T ],


u0 (x) = u0 (x), x ∈ (0, 1),

where α is a given positive number. Denote as before H =


L2 (0, 1) and define Q : D(Q) ⊂ H → H by

D(Q) = {v ∈ H 2 (0, 1); −v  (0) + αv(0) = 0, v  (1) = 0},


Qv = −v  ∀v ∈ D(Q).

Thus, the above problem can be expressed as a Cauchy problem


in H: 
u (t) + Qu(t) = f (t), 0 < t < T,
(CP )
u(0) = u0 ,

where u(t) := u(t, ·) ∈ H.


10.4 Exercises 311

Show that Q satisfies the conditions (a) and (b) of Theorem 10.1
(thus ensuring existence, uniqueness, and regularity of solutions
to the given problem).

4. Repeat Exercise 10.1 above, replacing the boundary conditions


by the following (Neumann) boundary conditions

ux (t, 0) = 0, ux (t, 1) = 0, t ∈ [0, T ].

5. Let (H, (·, ·), ·) be a real Hilbert space and let A : D(A) ⊂ H →
H be a linear and positive operator, i.e., (Ap, p) ≥ 0 ∀p ∈ D(A),
where I is the identity operator on H. Assume that Q = A + αI
satisfies both conditions (a) and (b) of Theorem 10.1, where
α is a positive constant.

(a) Solve the following Cauchy problem:



u (t) + Au(t) = f (t), 0 < t < T,
(CP )
u(0) = u0 ,

for some given u0 ∈ H and f ∈ L2 (0, T ; H).


(b) Show that, given T and f , if α is small enough, then there
exists u0 ∈ H such that u(T ) is close to u0 , i.e., u(T ) − u0 
is small, where u is the solution of (CP ) corresponding to
u0 and f .

6. Let Ω = (0, a) × (0, b) ⊂ R2 , a, b ∈ (0, ∞). Consider the initial-


boundary value problem


⎨ut − Δu = f (t, x), (t, x) ∈ (0, T ) × Ω,
u(t, x) = 0, (t, x) ∈ [0, T ] × ∂Ω,


u(0, x) = u0 (x), x ∈ Ω.

Find the general Fourier series expansion of the solution u =


u(t, x) of the above problem for u0 ∈ H = L2 (Ω) and f ∈
L2 ((0, T )×Ω), and determine an explicit expansion for u0 (x) = c
and f (t, x) = tx1 x2 , where c is a real constant.

7. Repeat the previous exercise with Neumann conditions on ∂Ω


(instead of the preceding Dirichlet boundary conditions). Con-
sider also combinations of Dirichlet and Neumann conditions on
different sides of the rectangle Ω.
312 10 Solving Linear Evolution Equations by the Fourier Method

8. Solve the following initial-boundary value problem:




⎨ut − uxx = αδ(x − 1) + βδ(x − 2), (t, x) ∈ (0, ∞) × (0, 3),
u(t, 0) = 0, u(t, 3) = 0, t ≥ 0,


u(0, x) = 0, x ∈ [0, 3],

where α, β are real constants, and δ(x−1), δ(x−2) are the usual
Dirac distributions in D (0, 3), also denoted δ1 , δ2 .

9. Consider an elastic string of length l > 0, held fixed at both


ends x = 0 and x = l. Find the displacement u = u(t, x) in
the string, which is set in motion from its straight equilibrium
position, with the initial velocity v0 defined by

Ax, 0 ≤ x ≤ l/2,
v0 (x) =
A(l − x), l/2 ≤ x ≤ l,

where A is a positive constant.

10. Consider an elastic string of length l > 0, held fixed at the end
x = 0, while the end x = l is free. Find the displacement
u = u(t, x) in the string, if it is s