Beruflich Dokumente
Kultur Dokumente
Principles of Analysis
Measure, Integration, Functional Analysis,
and Applications
Hugo D. Junghenn
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity
of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of
users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
AND TO MY WIFE
Mary
AS ALWAYS
Contents
Preface xix
0 Preliminaries 1
0.1 Sets . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1
Set Operations . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1
Number Systems . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 2
Relations . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 3
Functions . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 4
Cardinality . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.2 Algebraic Structures . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
Semigroups and Groups .
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
Linear Spaces . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 8
Linear Transformations .
. . . . . . . . . . . . . . . . . . . . . . . . . . 9
Quotient Linear Spaces . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 10
Algebras . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.3 Metric Spaces . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 10
Open and Closed Sets . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 11
Interior, Closure, and Boundary . . . . . . . . . . . . . . . . . . . . . . . 12
Sequential Convergence. Completeness . . . . . . . . . . . . . . . . . . . 12
Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.4 Normed Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Norms and Seminorms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Completion of a Normed Space . . . . . . . . . . . . . . . . . . . . . . . 16
Infinite Series in Normed Spaces . . . . . . . . . . . . . . . . . . . . . . 16
Unordered Sums in Normed Spaces . . . . . . . . . . . . . . . . . . . . . 17
Bounded Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 18
Banach Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
0.5 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Neighborhood Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Neighborhood Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Relative Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
0.6 Continuity in Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . 23
Definition and General Properties . . . . . . . . . . . . . . . . . . . . . . 23
Initial Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Product Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Final Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Quotient Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
viii Contents
2 Measurable Functions 75
2.1 Measurable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 75
General Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2 Measurable Numerical Functions . . . . . . . . . . . . . . . . . . . . . . . 78
Criteria for Measurability . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Almost Everywhere Properties . . . . . . . . . . . . . . . . . . . . . . . . 79
Combinatorial and Limit Properties of Measurable Functions . . . . . . . 79
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.3 Simple Functions . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 82
A Fundamental Convergence Theorem . .. . . . . . . . . . . . . . . . . . 82
Applications . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 83
Exercises . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 84
2.4 Convergence of Measurable Functions . .. . . . . . . . . . . . . . . . . . 85
Modes of Convergence . . . . . . . . . .. . . . . . . . . . . . . . . . . . 85
Relationships Among the Modes of Convergence . . . . . . . . . . . . . . 86
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3 Integration 89
3.1 Construction of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Integral of a Nonnegative Simple Function . . . . . . . . . . . . . . . . . 89
Integral of a Real-Valued Function . . . . . . . . . . . . . . . . . . . . . . 90
Integral of a Complex-Valued Function . . . . . . . . . . . . . . . . . . . 91
Integral over a Measurable Set . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Basic Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . 92
Almost Everywhere Properties . . . . . . . . . . . . . . . . . . . . . . . . 92
Monotone Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . 93
Linearity of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Integration Against an Image Measure . . . . . . . . . . . . . . . . . . . 96
Integration Against a Measure with Density . . . . . . . . . . . . . . . . . 96
Change of Variables Theorem . . . . . . . . . . . . . . . . . . . . . . . . 97
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.3 Connections with the Riemann Integral on Rd . . . . . . . . . . . . . . . 100
The Darboux Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Measure Zero Criterion for Riemann Integrability . . . . . . . . . . . . . 104
x Contents
4 Lp Spaces 123
4.1 Definition and General Properties . . . . . . . . . . . . . . . . . . . . . . 123
The Case 1 ≤ p < ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
The Case p = ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
The Case 0 < p < 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
`p -Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2 Lp Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Approximation by Simple Functions . . . . . . . . . . . . . . . . . . . . . 129
Approximation by Continuous Functions . . . . . . . . . . . . . . . . . . 130
Approximation by Step Functions . . . . . . . . . . . . . . . . . . . . . . 131
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.3 Lp Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
*4.4 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
*4.5 Convex Functions and Jensen’s Inequality . . . . . . . . . . . . . . . . . . 136
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5 Differentiation 139
5.1 Signed Measures . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 139
Definition and a Fundamental Example . . .. . . . . . . . . . . . . . . . 139
The Hahn-Jordan Decomposition . . . . . .. . . . . . . . . . . . . . . . 140
Exercises . . . . . . . . . . . . . . . . .
. .. . . . . . . . . . . . . . . . 142
5.2 Complex Measures . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 143
The Total Variation Measure . . . . . . . .. . . . . . . . . . . . . . . . 144
The Vitali-Hahn-Saks Theorem . . . . . . .. . . . . . . . . . . . . . . . 145
The Banach Space of Complex Measures . .. . . . . . . . . . . . . . . . 146
Integration against a Signed or Complex Measure . . . . . . . . . . . . . . 147
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Contents xi
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
*12.4 Hilbert-Schmidt Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 303
The Hilbert-Schmidt Norm . . . . . . . . . . . . . . . . . . . . . . . . . . 303
The Hilbert-Schmidt Inner Product . . . . . . . . . . . . . . . . . . . . . 304
The Hilbert-Schmidt Operator A ⊗ B . . . . . . . . . . . . . . . . . . . . 306
Hilbert-Schmidt Integral Operators . . . . . . . . . . . . . . . . . . . . . . 307
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
*12.5 Trace Class Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
The Trace Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
The Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
The Dual Spaces B0 (H)0 and B1 (H)0 . . . . . . . . . . . . . . . . . . . 313
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
IV Appendices 493
A Change of Variables Theorem 495
References 505
Index 511
Preface
The purpose of this book is to provide a rigorous and detailed treatment of the essentials
of measure, integration, and functional analysis at the graduate level. It is assumed that the
reader has an undergraduate background in what is now traditionally called real analysis,
including elementary set theory and a rigorous treatment of limits, continuity, differentiation,
Riemann integration, and uniform convergence. An acquaintance with complex function
theory, in particular the complex exponential function ez and Cauchy’s integral equation,
is needed for a few applications. A knowledge of basic linear algebra, at least the notions
of subspace, basis, and linear transformation, is also assumed. Metric spaces and general
topology are developed in detail in Chapter 0. The former topic will be needed for the
treatment of Lp spaces and the latter for the chapters on Radon measures and weak
topologies.
The book has four parts. Part I consists of Chapters 1–7 and develops the general theory of
Lebesgue integration. A course in the subject could consist of Chapters 1–5 with Chapters 6
or 7 optional.
Part II is organized as a course in functional analysis. Chapters 8–12 could form the core
of such a course, with Chapter 13 optional. Some of the applications and examples in Part II
rely on the measure and integration developed in Part I. The reader with a background in
this subject could safely omit Part I. Chapter 14 consists of deeper theorems in functional
analysis as well as applications. Some of the applications in the remainder of the book rely
on results of this chapter.
Part III consists of a variety of topics and applications that depend on, and indeed are
meant to illustrate the power of, topics developed in the first two parts. The chapters here
are largely independent, with the exception of Chapter 17, which depends on some results
in Chapter 16. The goal of these chapters is to provide a relatively quick overview of the
essentials of the subjects treated therein. The approach to these is sufficiently detailed so
that the reader can follow the development with relative ease. It is hoped that the treatment
here will inspire the reader to consult some of the many fine texts that specialize in these
subjects, some of which are listed in the bibliography.
Part IV consists of two appendices with proofs of the change of variables theorem and a
theorem on separate and joint continuity. The reader may safely omit the proofs without
disturbing the flow of the text.
The book contains nearly 700 exercises. Hints and/or a framework of intermediate steps
are given for the more difficult exercises. Many of these are extensions of material in the
text or are of special independent interest. Exercises related in a critical way to material
elsewhere in the text are marked with either an upward arrow, referring to earlier results,
or a downward arrow, referring to later material. Instructors with suitable bona fides may
obtain complete solutions to the exercises from the publisher.
A word about numbering: Proclamations (theorems, lemmas, examples, etc.) are numbered
consecutively in each section. Thus 1.2.3 refers to the third proclamation in Section 2 of
xix
xx Preface
Chapter 1. Important equations are numbered consecutively in each chapter. Thus (4.5)
refers to the fifth such equation of Chapter 4. Equations within a proof that are only locally
relevant are referenced by symbols such as (†), (α), etc. Exercises are numbered consecutively
within each chapter. Thus Ex. 6.7 refers to the seventh exercise of Chapter 6.
The book is an outgrowth of courses in analysis taught at The George Washington
University. Specific notes for the book have been tested in classes over the last three years
and have benefitted greatly from comments, questions, and corrections from students; for
these I am grateful. It goes without saying that the book has also benefitted from several
excellent texts in analysis that have served as valuable resources—several of these are listed
in the bibliography. Finally, I wish to express my gratitude to my teacher C.T. Taam who
first exposed me to much of the mathematics that appears in this book.
Hugo D. Junghenn
Washington, D.C.
Chapter 0
Preliminaries
In this chapter we assemble the basic material needed for the topics treated in the book.
The reader may wish to simply skim the chapter at first, returning to specific topics as the
need arises.
0.1 Sets
The terms set, collection, and family are synonymous, although in some contexts one
term may be preferred over another, as in a collection of sets or a family of functions. Sets
are usually denoted by capital letters in various styles, and members of sets by small letters.
As usual, the notation x ∈ A denotes membership of x in A.
A concrete set may be described either by (perhaps only partially) listing its members or
by set-builder notation. The latter is of the form {x : P (x)}, which is read “the set of all x
such that P (x),” where P (x) is a well-defined property that x must possess to belong to the
set. For example, the set of all odd integers may be described as
Set Operations
If A is a subset of B, we write A ⊆ B. If all sets in a particular discussion are subsets
of a set X, then X is called a universal set. The power set of a set X is the collection
P(X) of all subsets of X. If A, B ⊆ X, then A ∪ B, A ∩ B, and A \ B denote the union,
intersection and relative difference of A and B, respectively, and Ac denotes the complement
of A in X. If A ⊆ P(X) and B ∈ P(X), we define the trace of A on B by
A ∩ B = {A ∩ B : A ∈ A}.
The union and intersection of an indexed family A = {Ai : i ∈ I} of sets are denoted,
respectively, by [ [ \ \
A= Ai and A= Ai .
i∈I i∈I
If the index set in these operations is {1, 2 . . . , n} or {1, 2 . . .}, we write instead
n
[ n
\ ∞
[ ∞
\
Aj = A1 ∪· · ·∪An , Aj = A1 ∩· · ·∩An , Aj = A1 ∪A2 ∪. . . , Aj = A1 ∩A2 ∩. . .
j=1 j=1 j=1 j=1
1
2 Principles of Analysis
Parts (a) and (b) of the proposition are known as DeMorgan’s laws, and parts (c) and (d)
are called distributive laws.
A family
S {Ai : i ∈ I} of sets is disjoint if Ai ∩ Aj = ∅ whenever i 6= j. In this case, the
union i∈I Ai is said to be disjoint. A partition of a set X is a collection of nonempty,
disjoint sets whose union is X.
A sequence of sets An is said to be increasing if A1 ⊆ A2 ⊆ · · · , in which case we write
An ↑. Similarly, the sequence is decreasing if A1 ⊇ A2 ⊇ · · · , written An ↓. In the first
case we also write An ↑ A, where A = A1 ∪ A2 ∪ · · · , and in the second An ↓ A, where
A = A1 ∩ A2 ∩ · · · .
Cartesian products of finite or infinite sequences of sets A1 , A2 , . . . are denoted, respectively,
by
Yd Y∞
An = A1 × · · · × Ad and An = A1 × A2 × . . . .
n=1 n=1
Number Systems
The following notation is used for the standard number systems:
N := the set of positive integers.
Z := the set of integers.
Q := the set of rational numbers.
R := the set of real numbers.
C := the set of complex numbers.
Two subsets of C are of particular importance:
D := {z ∈ C : |z| < 1} the open unit disk and T := {z ∈ C : |z| = 1} the circle group.
Preliminaries 3
Rd := R × · · · × R and Cd := C × · · · × C (d factors).
Relations
A relation on a nonempty set X is a nonempty set ∼ of ordered pairs from X. It is
customary to write x ∼ y rather than the prolix (x, y) ∈ ∼. A relation is said to be
(a) reflexive if x ∼ x for every x ∈ X;
(b) symmetric if x ∼ y ⇒ y ∼ x;
(c) transitive if x ∼ y and y ∼ z ⇒ x ∼ z;
(d) antisymmetric if x ∼ y and y ∼ x ⇒ x = y.
4 Principles of Analysis
[x] := {y ∈ X : x ∼ y}.
Functions
The terms mapping, transformation, and function are synonymous. A function f
with domain dom f = X and range ran√ f ⊆ Y is symbolized by f : X → Y . We also
occasionally write x 7→ f (x), as in x 7→ x, to describe a function. The collection of all
functions from X to Y is denoted by Y X .
The image of A ⊆ X and the preimage of B ⊆ Y under a function f : X → Y are
defined, respectively, by
Equality holds in (d) and (h) if f is injective. Equality holds in (f ) and (g) if f is surjective.
For f : X → Y , A ⊆ P(X) and B ⊆ P(Y ), we define the collections
f (A) = f (A) : A ∈ A ⊆ P(Y ) and f −1 (B) = f −1 (B) : B ∈ B ⊆ P(X).
The identity function idX on a set X is defined by idX (x) = x for all x ∈ X. If A ⊆ X,
then the restriction of idX to A is called the inclusion map and is frequently denoted by
ιA : A ,→ X.
If f : X → Y is bijective, then the inverse f −1 : Y → X of f is defined by the rule
x = f −1 (y) iff y = f (x). One then has
x = x+ − x− and |x| = x+ + x− .
6 Principles of Analysis
The real and imaginary parts of a complex number z are denoted, respectively, by Re z
and Im z, the conjugate by z, and the modulus by |z|. Thus
p
z = Re z + i Im z, z = Re z − i Im z, and |z| = (Re z)2 + (Im z)2 .
φ∗ (f ) = f ◦ φ, f ∈ F.
(Re f )(x) = Re f (x), (Im f )(x) = Im f (x), f (x) = f (x), and |f |(x) = |f (x)|.
Cardinality
Two sets A and B are said to have the same cardinality if there exists a bijection
from A to B. A set A is finite if either A is the empty set or A has the same cardinality
as {1, 2, . . . , n} for some positive integer n. In the latter case, the members of A may be
labeled with the numbers 1, 2, . . . , n so that A may be written {a1 , a2 , . . . , an }. A set A is
countably infinite if it has the same cardinality as the set N of positive integers, in which
case we may write A = {a1 , a2 , . . .}. A set is countable if it is either finite or countably
infinite; otherwise, it is said to be uncountable. The set of all integers is countably infinite,
as is the set of rational numbers. The set of all real numbers is uncountable, as is any
(nondegenerate) interval of real numbers. The cardinality of R is denoted by c and that of
N by ℵ0 . For a detailed discussion of cardinality, the reader is referred to [23].
Preliminaries 7
the last notation assuming that G is a group. The notation is modified in the obvious way if
multiplication is written additively.
The sets R and C are groups under addition and are semigroups under multiplication.
Removing zero in each case yields a group under multiplication. The interval [1, ∞) is a semi-
group under both addition and multiplication. The disk D and its closure are subsemigroups
of C under multiplication, and T is a group. These examples are obviously all commutative.
The collection of nonsingular n × n matrices over K (n ≥ 2) is a noncommutative group
under matrix multiplication. The subset of matrices with determinant 1 is a subgroup.
It G and G0 are semigroups, then a function ϕ : G → G0 satisfying
ϕ(st) = ϕ(s)ϕ(t), s, t ∈ G,
ker ϕ := {x ∈ G : ϕ(x) = e0 },
Linear Spaces
A linear space (or vector space) over K is an additively written abelian group V with
identity 0 and an operation scalar multiplication K × V → V, (s, v) → sv, satisfying
for all s, t ∈ K and v, w ∈ V. It follows that 0 v = 0 for all v ∈ V. Linear spaces are
always taken over K, whether or not explicitly mentioned. Euclidean space is a familiar
example of a linear space. Numerous additional examples appear throughout the text. It is
assumed that the reader has some familiarity with the basic theory of finite dimensional
vector spaces.
A subspace of a linear space V is a nonempty subset W that is closed under the
operations of addition and scalar multiplication. If A ⊆ V, then the span of A is the
subspace of V consisting of all linear combinations of members of A:
X
m
span A := cj aj : aj ∈ A, cj ∈ K, m ∈ N .
j=1
and balanced if
cC ⊆ C, for all c ∈ K with |c| ≤ 1.
A subspace of a linear space is obviously convex and balanced. The line segment from a
to b, defined by
[a : b] = {(1 − t)a + tb : 0 ≤ t ≤ 1},
is convex but generally not balanced. The disk D is both convex and balanced in the real
linear space R2 , while T is neither balanced nor convex.
The convex hull co A of a subset A of a linear space V is the intersection of all convex
subsets of V containing A. It is the smallest convex set (in the sense of containment)
containing A. Similarly, the convex balanced hull cobal A of A is the intersection of all
convex balanced subsets of V containing A. Here are important alternate descriptions of
these sets.
0.2.1 Proposition. Let A be a subset of a linear space V. Then
Pn
(a) co A consists of all sums of the form
Pn j=1 tj xj , where n ∈ N, xj ∈ A, tj ≥ 0, and
j=1 tj = 1.
Pn
(b) cobal A consists of all sums of the form
Pn j=1 cj xj , where n ∈ N, xj ∈ A, cj ∈ K,
and j=1 |cj | ≤ 1.
Proof. Let C denote the collection of all sums in (a). One easily checks that C is convex.
Since C ⊇ A, we have C ⊇ co A. For the reverse inclusion, let D be any convex set containing
A. By induction, D ⊇ C. Since co A is the intersection of all such sets D, co A ⊇ C. This
proves (a). The proof of (b) is similar.
Preliminaries 9
The sum in part (a) of the proposition is called a convex combination and the sum in
(b) an absolutely convex combination.
0.2.2 Theorem. Every linearly independent set A in a vector space may be extended to a
basis. Thus every (nontrivial) vector space has a basis.
Proof. Partially order the collection of linearly independent sets containing A by inclusion
and note that the union of a chain of such sets is linearly independent. By Zorn’s lemma,
there exists a maximal linearly independent set, which is necessarily a basis.
A basis for V is also called a Hamel basis to distinguish it from other types of bases,
for example Schauder bases.
Linear Transformations
Let V and W be linear spaces over K. A linear transformation from V into W is a
function T : V → W such that
The collection of all linear transformations from V to W is a linear space under pointwise
addition and scalar multiplication
ker T = {x ∈ V : T x = 0}.
For example, the restriction of a linear transformation to a convex set is affine. The function
x 7→ a · x + b on a convex subset of Rd is affine.
1 The notation T x for T (x) is standard for linear transformations.
10 Principles of Analysis
(x + U) + (y + U) = (x + y) + U and c(x + U) = cx + U.
Algebras
An algebra (over K) is a linear space A with an associative multiplication (x, y) → xy
that satisfies
The ordered pair (X, d), as well as the set X, is called a metric space. A nonempty subset
Y of X with the metric dY ×Y is called a subspace of X. A metric has the property
as may be seen from the triangle inequality d(x, y) ≤ d(x, u) + d(u, v) + d(v, y) and its
counterpart.
The real number system R is a metric space under the usual metric d(x, y) = |x − y|.
More generally, the set Rd is a metric space under the Euclidean metric
X
d 1/2
2
d(x, y) = |x − y| = (xj − yj ) .
j=1
For another example, let X be a nonempty set and define d(x, y) = 1 if x 6= y and d(x, x) = 0.
Then d is a metric, called the discrete metric on X.
are called, respectively, the open and closed balls with center x and radius r. The set
is called the sphere with center x and radius r. For example, the open (closed) balls in
R with the usual metric are the bounded open (closed) intervals. The open (closed) balls in
Euclidean space R2 are open (closed) disks and the spheres are circles. The open and closed
balls in a discrete metric space X are the sets X and {x}; the spheres are X \ {x} and the
empty set.
A subset U of X is said to be open if either U = ∅ or else for each x ∈ U there exists
an r > 0 such that Br (x) ⊆ U . A subset of X is closed if its complement is open. An
application of the triangle inequality shows that an open ball is open. Indeed, if y ∈ Bε (x),
then Bδ (y) ⊆ Bε (x) for δ = ε − d(x, y), which shows that Bε (x) is a union of open balls
Bδ (y). A similar argument shows that a closed ball is closed.
A neighborhood of a point a in X is any set containing an open set containing a. As
we shall see, certain concepts such as continuity and convergence are conveniently phrased
in terms of neighborhoods.
0.3.1 Proposition. Open and closed sets have the following properties:
S
(a) If U is a collection of open sets, then U is open.
(b) If V1 , . . . , Vn are open, then V := V1 ∩ · · · ∩ Vn is open.
T
(c) If C is a family of closed sets, then C is closed.
Proof. (a) Let (yn ) be a Cauchy sequence in Y . Since X is complete, there exists x ∈ X
such that yn → x. Since Y is closed, x ∈ Y . Therefore, Y is complete.
(b) Let (yn ) be a sequence in Y such that yn → x ∈ X. Then (yn ) is Cauchy and hence
converges to some y ∈ Y . Since limits are unique, x = y ∈ Y . Therefore, Y is closed.
Preliminaries 13
0.3.5 Proposition. Let A ⊆ X. Then x ∈ cl(A) iff there exists a sequence (an ) in A such
that an → x.
Proof. Let C be the set of all limits of convergent sequences in A including constant sequences,
so A ⊆ C ⊆ cl(A), the second inclusion by 0.3.3. We show that C is closed, proving the
assertion.
Suppose C is not closed. Then C c is not open, hence there exists y ∈ C c and for each n a
point yn ∈ B1/n (y) ∩ C. Since each yn is the limit of a sequence in A, there exists an ∈ A
such that d(yn , an ) < 1/n. By the triangle inequality, d(an , y) < 2/n, hence an → y. But
then y ∈ C, a contradiction.
Continuity
Let (X, d) and (Y, ρ) be metric spaces. A function f : X → Y is said to be continuous
at a ∈ X if for each ε > 0 there exists a δ > 0 such that d(x, a) < δ ⇒ ρ f (x), f (a) < ε.
In terms of open balls,
f Bδ (a) ⊆ Bε f (a) . (0.2)
If E ⊆ X and f is continuous at each point of E, then f is said to be continuous on E. If f
is continuous at each member of X, then f is said to be continuous. A homeomorphism
from X to Y is a bijection f : X → Y such that both f and f −1 are continuous.
The following proposition describes a useful characterization of continuity in terms of
neighborhoods. It will have implications later in the formulation of the definition of continuity
in the more general setting of topological spaces.
0.3.6 Proposition. A function f : X → Y is continuous at a iff for each neighborhood M
of f (a) there exists a neighborhood N of a such that f (N ) ⊆ M .
Proof. Let f be continuous at a and let M be a neighborhood of f (a). Choose ε > 0 such
that Bε (f (a)) ⊆ M and choose δ > 0 as in 0.2. Then Bδ (a) is the required neighborhood N .
Conversely, assume the neighborhood property holds and let ε > 0. Choose a neighborhood
N of a such that f (N ) ⊆ Bε (f (a)) and choose δ so that Bδ (a) ⊆ N . Then (0.2) holds.
It is clear from the proof that the neighborhoods M and N in 0.3.6 may be taken to be
open.
0.3.7 Proposition. Let f : (X, d) → (Y ρ) and a ∈ X. Then f is continuous at a iff
f (an ) → f (a) for any sequence (an ) in X with an → a.
Proof. If f is continuous at a, then for any neighborhood M of f (a) there exists a neighbor-
hood N of a such that f (N ) ⊆ M . If an → a, then an ∈ N for all sufficiently large n, and
for such n, f (an ) ∈ M . Therefore, f (an ) → f (a).
Conversely, if f is not continuous ata, then for some ε > 0 and each n ∈ N there exists
an an ∈ B1/n (a) with f (an ) 6∈ Bε f (a) . Thus the sequential property fails.
0.3.8 Theorem. Let f : (X, d) → (Y ρ). The following statements are equivalent:
(a) f is continuous.
Category
The diameter of a nonempty subset E of a metric space (X, d) is defined by
Note that the continuity of the metric implies that d(E) = d(cl(E)).
Here is an important characterization of completeness of a metric space in terms of
diameters.
0.3.11 Cantor Intersection Theorem. A metric space X is complete iff the intersection
of any decreasing sequence of nonempty closed sets Cn in X with d(Cn ) → 0 consists of a
single point.
Proof. Assume X is complete. For each n choose xn ∈ Cn . Since Cn ↓ and d(Cn ) → 0, (xn )
is Cauchy. Let xn → x. Since xm ∈ Cn for T all m ≥ n and Cn is closed, letting m → ∞ we
see that x ∈ Cn for all n, that is, x ∈ C := n Cn . Since d(C) ≤ d(Cn ) → 0, C = {x}.
Conversely, let X have the stated intersection property and let (xn ) be a Cauchy sequence
in X. Set CT n := cl{xk : k ≥ n}. By the Cauchy property, d(Cn ) → 0. Since Cn ↓, by our
hypothesis n Cn contains a point x. It follows easily that xnk → x for some subsequence
(xnk ). By 0.3.2, xn → x.
Preliminaries 15
The following consequence of Cantor’s theorem is a key step in the proofs of several
important results in analysis. In §0.12 we give a version of the theorem for locally compact
spaces.
0.3.12 Baire Category Theorem. Let X be a complete metric space. If (Xn ) a sequence
of closed sets with union X, then int Xn 6= ∅ for some n.
Proof. Suppose for a contradiction that int Xn = ∅ for all n. Choose an open ball B(x0 , r0 )
with r0 = 1. Since int X1 = ∅, there exists x1 ∈ B(x0 , r0 ) \ X1 , and since B(x0 , r0 ) \ X1
is open, there exists r1 ∈ (0, 1/2) such that C(x1 , r1 ) ⊆ B(x0 , r0 ) \ X1 . Since int X2 = ∅
and B(x1 , r1 ) \ X2 is open there exists x2 ∈ B(x1 , r1 ) \ X2 and r2 ∈ (0, 1/3) such that
C(x2 , r2 ) ⊆ B(x1 , r1 ) \ X2 . In this way we construct sequences (xn ) in X and (rn ) in R such
that
C(xn , rn ) ⊆ B(xn−1 , rn−1 ) \ Xn , 0 < rn−1 ≤ 1/n, n ≥ 1.
Since the closed balls are decreasing and the diameters are tending to zero, their intersection
C is nonempty (0.3.11). But this is impossible because C ∩ Xn = ∅ for all n.
Indeed, the first inequality may be established by a simple induction argument, and the
second by applying the triangle inequality to kxk = kx − y + yk and kyk = ky − x + xk.
If k·k is a norm on X, then the pair (X, k · k) is called a normed space. It is easy to
check that the mapping (x, y) 7→ kx − yk is a metric on X, making the entire machinery
of metric spaces available. Unless stated otherwise, convergence and continuity in a normed
space are taken relative to this metric.
Banach Spaces
A normed space (X, k·k) that is complete in the metric (x, y) → kx − yk is called a
Banach space. A familiar example is Euclidean space Kd . Many other examples appear
throughout the text. For now we content ourselves with the following.
16 Principles of Analysis
0.4.1 Example. (The space of bounded functions). Let X be a nonempty set and let B(X)
denote the vector space (under pointwise addition and scalar multiplication) of all bounded
functions f : X → K. The supremum norm or uniform norm on B(X) is defined by
kf k∞ = sup |f (x)| : x ∈ X .
That k·k∞ is a norm is easily established using familiar properties of absolute value. For
example, the triangle inequality follows by taking the supremum over X in
To verify completeness, let (fn ) be a Cauchy sequence in B(X) and ε > 0. Choose N such
that kfn − fm k < ε for all m, n ≥ N . For such indices and each x ∈ X we then have
which shows that (fn (x)) is a Cauchy sequence in K. Since K is complete, fn (x) → f (x) ∈ K.
Fixing n ≥ N in (†) and letting m → ∞ yields |fn (x) − f (x)| ≤ ε for all x ∈ X and n ≥ N .
Therefore, f = f − fn + fn ∈ B(X) and kfn − f k∞ ≤ ε. ♦
0.4.4 Proposition. If the unordered sums on the right in the following equality exist, then
the unordered sum on the left exists and
X X X
(axi + byi ) = a xi + b yi .
i∈I i∈I i∈I
P P
Proof. Let x = i∈I xi and y = i∈I yi . Given ε > 0, choose finite Fε , Gε ⊆ I such that
X
X
xi − x
< ε/2 and
xi − x
< ε/2
i∈F i∈G
for all finite F ⊇ Fε and G ⊇ Gε . Then for finite F ⊇ Fε ∪ Gε , by the extended triangle
inequality (0.3) we have
X
X X
(xi + yi ) − (x + y)
≤ kxi − xk + kyi − yk < ε.
i∈F i∈F i∈F
P P P
This
P shows that P i∈I (xi + yi ) = i∈I xi + i∈I yi . A even simpler argument shows that
i∈I axi = a i∈I xi .
0.4.7P Theorem. Let X be a Banach space and let {xi : i ∈ I} ⊆ X such that s :=
supF i∈F kxi k < ∞, where the supremum is taken over all finite F ⊆ I. Then the family
{kxi k : i ∈ I} converges unconditionally to s, {xi : i ∈ I} converges unconditionally to
some x ∈ X, and kxk ≤ s, that is,
X
X
xi
≤ kxi k .
i∈I i∈I
P P∞ P∞
Proof. By 0.4.5 and 0.4.6, s = i∈I kxi k = k=1 kxik k . Thus the series k=1 xik is
absolutely convergent, so converges to some x ∈ X (0.4.3). Given ε > 0, choose m such that
X
n
xik − x
< ε for all n ≥ m.
k=1
Banach Algebras
A normed algebra is an algebra A over C with a norm that satisfies
kxyk ≤ kxk kyk , x, y ∈ A.
A complete normed algebra is called a Banach algebra. These structures occur in many
important settings, particularly in the theory of operators on Hilbert spaces. The Banach
space B(X) of all bounded functions under pointwise multiplication is a simple example of
a commutative unital Banach algebra. Other examples appear throughout the text. General
commutative Banach algebras are discussed in detail in Chapter 13.
Preliminaries 19
(a) X, ∅ ∈ T,
[
(b) U ⊆ T ⇒ U ∈ T, (0.5)
(c) U, V ∈ T ⇒ U ∩ V ∈ T.
topological analog.
20 Principles of Analysis
Proof. Let (Un ) be a countable base and xn ∈ Un . For any open neighborhood U of x, there
exists n such that Un ⊆ U , hence xn ∈ U . Therefore, (xn ) is dense in X.
Now let X be a metric space with countable dense set {x1 , x2 , . . .}. The collection
B := {B1/m (xn ) : m, n ∈ N, } is then countable. We show that every nonempty open set U
is a union of members of B. Let x ∈ U and choose m such that B2/m (x) ⊆ U . Next, choose
xn ∈ B1/m (x). Then x ∈ B1/m (xn ) ⊆ B2/m (x) ⊆ U . Therefore, U is a union of the balls
B1/m (xn ).
Neighborhood Systems
The notion of neighborhood of a point x in a topological space X is defined as in the
case of a metric space, namely as a superset of a open set containing x. The collection of
all neighborhoods of x is called the neighborhood system at x and is denoted by N(x).
Neighborhood systems clearly have the following properties:
(a) X ∈ N(x) ∀ x ∈ X.
(b) N ∈ N(x) ⇒ x ∈ N.
(c) N ∈ N(x) and M ⊇ N ⇒ M ∈ N(x). (0.6)
(d) N1 , N2 ∈ N(x) ⇒ N1 ∩ N2 ∈ N(x).
(e) N ∈ N(x) ⇒ there exists U ∈ N(x) with U ⊆ N such that U ∈ N(y) ∀ y ∈ U.
Proof. Let T be the collection of all sets U such that either U = ∅ or U ∈ N(x) for each
x ∈ U . By (a), X ∈ T, and, by (c) and (d), T is closed under arbitrary unions and finite
intersections. Therefore, T is a topology for X satisfying (ii).
Now let {NT (x) : x ∈ X} be the T-neighborhood system. If M ∈ NT (x) and U is open
with x ∈ U ⊆ M then, by definition of T, U ∈ N(x), hence, by (c), M ∈ N(x). Conversely,
if N ∈ N(x), then the set U in (e) is in T, hence N ∈ NT (x). Therefore, N(x) = NT (x).
To prove uniqueness, let T 0 be a topology satisfying (i) and (ii). If x ∈ V ∈ T 0 , then
V ∈ N(x) by (ii), hence there exists U ∈ T such that x ∈ U ⊆ V . Therefore, V is a union of
T-open sets and so is T-open. This shows that T 0 ⊆ T. Similarly, T ⊆ T 0 .
Neighborhood Bases
Let X be a topological space. A neighborhood base at x ∈ X is a subset B(x) of N(x)
such that every member of N(x) contains a member of B(x). For example, the collection of
open neighborhoods of x is clearly a neighborhood base at x.
If each x ∈ X has a neighborhood base B(x), then the resulting system {B(x) : x ∈ X}
has the following properties, derived from those of N(x):
(a) B ∈ B(x) ⇒ x ∈ B.
(b) B1 , B2 ∈ B(x) ⇒ there exists B3 ∈ B(x) with B3 ⊆ B1 ∩ B2 .
(0.7)
(c) B ∈ B(x) ⇒ there exists U ∈ B(x) with U ⊆ B such that U contains a
member of B(y) for each y ∈ U .
0.5.3 Proposition. Let X be a nonempty set and for each x ∈ X let B(x) be a collection
of subsets of X with properties (a) – (c) of (0.7). Then there exists a unique topology T on
X such that (i) B(x) is a neighborhood base at x and (ii) every open set is a neighborhood
of each of its points.
Proof. Let N(x) be the collection of all supersets of members of B(x). Then N(x) satisfies
the conditions (a) – (e) of (0.6), and the assertions follow from 0.5.2.
Relative Topology
If Y is a subset of a topological space XT , then the trace T ∩ Y is a topology called the
relative topology of Y . The collection of closed sets in Y is easily seen to be the trace
of the collection of closed sets in X. Open (closed) sets of Y are frequently referred to as
relatively open (closed). The neighborhood system of y ∈ Y is the trace on Y of the
T-neighborhood system of y. If B(y) is a T-neighborhood base at y ∈ Y , then B(y) ∩ Y
is a neighborhood base at y. For example, the collection of intervals [0, 1/n) (n ∈ N) is a
neighborhood base at 0 in the relative topology of [0, 1].
Nets
A directed set is a nonempty set A together with a relation that is reflexive, transitive,
and has the property that every pair of elements has an upper bound. For example, the
neighborhood system of a point x in a topological space is directed by reverse inclusion,
that is,
Nx Mx iff Mx ⊆ Nx .
The collection of all partitions of an interval [a, b] is directed by inclusion:
P Q iff Q is a refinement of P.
The Cartesian product A × B of directed sets A and B is directed by the product ordering
If f : [a, b] → R is Riemann integrable, then the Riemann sums S(f, P, ξ) form a net such
Rb
that lim(P,ξ) S(f, P, ξ) = a f (x) dx. (See §3.3.)
Many properties of sequential convergence in a metric space carry over to nets in general
topological spaces. In fact, the notion of net was introduced to describe convergence in
topological spaces that are not first countable. Here is the net analog of 0.3.5.
0.5.4 Proposition. Let X be a topological space and E ⊆ X. Then x ∈ cl(E) iff there
exists a net (xα ) in E converging to x.
Proof. If x ∈ cl(E), then N ∩ E 6= ∅ for each neighborhood N of x. Choosing xN ∈ N ∩ E
and directing the neighborhood system at x by reverse inclusion, we obtain a net in E
converging to x. Conversely, if x 6∈ cl(E) then every net converging to x is eventually in the
open set cl(E)c ⊆ E c , hence no net in E can converge to x.
The notion of subsequence has the following net counterpart: A net (yβ )B is a subnet of
a net (xα )A if there exists a function β 7→ αβ : B → A such that
(i) yβ = xαβ .
While this generalizes the notion of subsequence, it should be noted that a subnet of a
sequence need not be a subsequence. (Consider a subnet (xnβ ) of a sequence (xn ) with
β(1) = β(2) = 1, β(3) = β(4) = 2, etc.)
A point x in a topological space X is a said to be cluster point of a net (xα ) if (xα ) is
frequently in every neighborhood of x. The connection between cluster points and subnets is
analogous to the connection between cluster points and subsequences in a metric space:
0.5.5 Proposition. A net (xα )A in a topological space X has a cluster point x iff (xα ) has
a subnet converging to x
Proof. Let (xαβ ) be a subnet of (xα ) converging to x. If N is a neighborhood of x, then there
exists β1 such that xαβ ∈ N for all β β1 . This implies that xα is frequently in N . Indeed,
by definition of subnet, for each α0 ∈ A there exists β0 ∈ B such that β β0 ⇒ αβ α0 .
Thus if β β0 and β β1 , then αβ α0 and xαβ ∈ N .
Conversely, let x be a cluster point of (xα ) and let N(x) be the neighborhood system at
x directed by reverse inclusion. Direct the pairs (α, N ) ∈ A × N(x) by the product ordering.
For each (γ, N ), the net (xα ) is frequently in N , hence there exists α(γ, N ) γ such that
xα(γ,N ) ∈ N . Then (xα(γ,N ) )A×N(x) is the required subnet. Indeed, for any α0 ∈ A and
N0 ∈ N(x), (γ, N ) (α0 , N0 ) ⇒ α(γ, N ) γ α0 ⇒ xα(γ,N ) ∈ N ⊆ N0 .
0.5.6 Corollary. Let X have topologies T and T 0 . Then T 0 ≤ T iff every net (xα ) that
T-converges to some member x of X also T 0 -converges to x.
Proof. The necessity is clear, since every T 0 -neighborhood of x is a T-neighborhood of x.
For the sufficiency, suppose for a contradiction that C is T 0 -closed but not T-closed. Take
any x ∈ clT (C) \ C and for each T-neighborhood N of x choose xN ∈ C ∩ N . Then xN → x
in XT , hence also in XT 0 . But then x ∈ C.
Preliminaries 23
Proof. Suppose that f is continuous at x and let xα → x. Given Nf (x) choose Nx such that
f (Nx ) ⊆ Nf (x) . Next, choose α0 so that xα ∈ Nx for all α α0 . For such α, f (xα ) ∈ Nf (x) .
Therefore, f (xα ) → f (x).
Conversely, if f is not continuous at x, then there exists a neighborhood Nf (x) such that
f (N ) 6⊆ Nf (x) for all N ∈ N(x). For each N choose xN ∈ N so that f (xN ) 6∈ Nf (x) . Then
the net (xN ) converges to x, but f (xN ) is never in Nf (x) .
The next result is proved exactly as in the metric case (0.3.8), except that in the proof of
(d) ⇒ (a) one must use nets instead of sequences. We leave the details to the reader.
0.6.2 Theorem. The following statements are equivalent:
(a) f is continuous.
0.6.3 Corollary. Let X and Y be topological spaces and let the topology of Y be generated
by a collection S of subsets of Y . Then a function f : X → Y is continuous iff f −1 (U ) is
open in X for each U ∈ S.
Proof. The necessity is clear. For the sufficiency, assume f −1 (U ) is open in X for each U ∈ S.
Then the collection of all sets V ⊆ Y for which f −1 (V ) is open in X is a topology containing
S and so contains T(S).
For example, a real-valued function f on a topological space is continuous iff f −1 (−∞, a)
and f −1 (a, ∞) are open for all rational numbers a.
24 Principles of Analysis
Initial Topologies
Let X be a set, Y a topological space, and F a family of maps f : X → Y . The topology
T on X generated by the sets f −1 (U ), where f ∈ F and U is open in Y , is called the initial
topology on X with respect to F.
0.6.4 Proposition. The initial topology T has the following properties:
(a) T is the weakest topology on X relative to which every member of F is continuous.
(b) For each y ∈ Y , let By be an open neighborhood base at y. For x ∈ X, let Bx denote
the collection of all sets of the form f1−1 (U1 ) ∩ · · · ∩ fn−1 (Un ) containing x, where
fj ∈ F and Uj ∈ Bfj (x) . Then Bx is a neighborhood base at x for the initial topology.
(c) A net (xα ) in X T-converges to x iff f (xα ) → f (x) for every f ∈ F.
Product Topology
Q
Let {Xi : i ∈ I} be a family of topological spaces and set X := i∈I Xi . The product
topology on X is the initial topology with respect to the family of projection mappings
πi : X → Xi . By 0.6.4(c), a net (fα ) in X converges to f in this topology iff fα (i) =
πi (fα ) → πi (f ) = f (i) for each i ∈ I. For this reason the product topology is also called
the topology of pointwise convergence on I. Note that the product topology on Rd is
simply the topology defined by the Euclidean metric.
Final Topologies
Let X be a topological space, Y a nonempty set and F a family of maps f : X → Y .
The collection T of all subsets V of Y such that f −1 (V ) is open in X for each f ∈ F is a
topology called the final topology on Y with respect to F.
0.6.5 Proposition. The final topology T has the following properties:
(a) T is the strongest topology on Y relative to which every member of F is continuous.
Quotient Topology
Let X be a topological space and ∼ an equivalence relation on X. The final topology on
X/ ∼ with respect to the quotient map Q : X → X/ ∼ is called the quotient topology.
S of X/ ∼ are precisely those collections V of equivalence classes [x] such that
The open sets
Q−1 (V ) = [x]∈V [x] is open in X. Quotient topologies play an important role in the theory
of normed linear spaces (see §8.4).
Clearly, C(X) is closed under addition and multiplication and so is an algebra. We define
the related space of all bounded, continuous functions f : X → C by
where the infimum is taken over all open neighborhoods of x. Then for each r > 0 the set
Wr := {x ∈ X : F (x) < r} is open. Thus W0 := {x ∈ X : F (x) = 0} is a Gδ set.
Proof. Let x0 ∈ Wr and choose an open neighborhood U of x0 such that
Proof. We claim that W0 is the set of continuity points of f . Indeed, f is continuous at x iff
for each ε > 0 there exists a neighborhood U of x such that d(x0 , x) < ε for all x0 ∈ U iff for
each ε > 0 there exists a neighborhood U of x such that d(x0 , x00 ) < ε for all x0 , x00 ∈ U iff
F (x) < ε for all ε iff F (x) = 0.
From the proposition we see that no function f : R → R can be continuous precisely at
the rationals. The reader may easily find examples of functions that are continuous precisely
at the irrationals. (See Ex. 2.20.)
Urysohn’s Lemma
0.7.2 Theorem (Urysohn). If X is a normal topological space and A and B are disjoint
closed subsets, then there exists a continuous function f : X → [a, b] such that f = a on A
and f = b on B.
Proof. We may assume a = 0 and b = 1 (otherwise, replace f by (f − a)/(b − a)). Let
D := {r = k2−n : n ∈ N, 0 < k < 2n }, the set of dyadic rational numbers in (0, 1). We show
by induction on n that there exists a family of open sets Ur indexed by members r of D
such that
A ⊆ Ur ⊆ cl Ur ⊆ Us ⊆ B c for all r, s ∈ D with r < s. (†)
By 0.7.1, there exists an open set U1/2 such that A ⊆ U1/2 ⊆ cl U1/2 ⊆ B c . This defines
Ur for the case k = n = 1. Now assume that sets Ur have been constructed for r = k/2n
(0 < k < 2n ). Since k/2n = 2k/2n+1 , it remains to construct Ur for r = (2k + 1)/2n+1 . But
since cl Uk2−n ⊆ U(k+1)2−n , there exists by 0.7.1 an open set Ur such that
A ⊆ cl Uk2−n ⊆ Ur ⊆ cl Ur ⊆ U(k+1)2−n ⊆ B c ,
establishing (†).
Now set U1 = X and define f on X by f (x) = inf{r ∈ D : x ∈ Ur }. Obviously, 0 ≤ f ≤ 1.
Also, since no member of B is in Ur for r < 1, f (B) = 1. Moreover, since A ⊆ Ur for all r,
f (A) = 0. To see that f is continuous, let 0 < t < 1 and note that f (x) < t iff x ∈ Ur for
some r < t, and f (x) > t iff x 6∈ cl(Ur ) for some r > t. Thus we have open sets
[ [
{f < t} = Ur and {f > t} = (cl Ur )c .
r<t r>t
Since the intervals (−∞, t), (t, ∞) generate the topology of R, f is continuous by 0.6.3.
Preliminaries 27
Conversely, if X is not compact, then there exists an open cover U with no finite subcover.
28 Principles of Analysis
Taking C to be the collection of complements of members of U, we see that C has the finite
intersection property but has empty intersection.
0.8.2 Proposition. A compact subset of a Hausdorff space X is closed.
Proof. Let A ⊆ X be compact. We show that Ac is open. Let b ∈ Ac . For each x ∈ A, let Mx
and Nx be disjoint open neighborhoods of x and b, respectively. Then {MxS: x ∈ A} is an
open coverTof A, hence there exists a finite subset A0 of A such that Ub := x∈A0 Mx ⊇ A.
Set Vb := x∈A0 Nx . Then Vb is a neighborhood of b, and since Vb ∩ Mx = ∅ for every x ∈ A0 ,
Vb ⊆ Ac . Therefore Ac is open.
0.8.3 Proposition. A compact subset Y of a metric space (X, d) is bounded.
Proof. Fix y ∈ Y . The collection of open balls Bn (y) with center y ∈ Y and radius n ∈ N is
an open cover of Y and so has a finite subcover. Therefore, Y ⊆ Bn (y) for some n.
0.8.4 Proposition. A closed subset of a compact space X is compact.
Proof. Let Y ⊆ X be closed. If U is a cover of Y by open sets of X, then enlarging U by
including the open set X \ Y results in anSopen cover of X. S
Since X is compact, there exist
U1 , . . . , Un ∈ U such that X = (X \ Y ) ∪ j Uj . Then Y ⊆ j Uj .
0.8.5 Corollary. Let X have topologies T 1 ≤ T 2 such that (X, T 2 ) is compact and (X, T 1 )
is Hausdorff. Then T 1 = T 2 .
Proof. Let C be T 2 -closed. By 0.8.4, C is T 2 -compact hence T 1 -compact. By 0.8.2, C is
T 1 -closed. Therefore, T 1 and T 2 have the same closed sets and so are equal.
The following proposition asserts that disjoint compact sets in a Hausdorff space may be
separated by open sets.
0.8.6 Proposition. Let A and B be disjoint compact subsets of a Hausdorff space X. Then
there exist disjoint open sets U and V with A ⊆ U and B ⊆ V .
Proof. By the proof of 0.8.2, for each b ∈ B there exist disjoint open sets Ub ⊇ A and Vb 3 b.
Then {Vb : b ∈ B} is S
an open cover of B, so byTcompactness there exists a finite set B0 ⊆ B
such that B ⊆ V := b∈B0 Vb . Now set U := b∈B0 Ub .
From 0.8.4 and 0.8.6 we have the following:
0.8.7 Corollary. A compact Hausdorff space is normal.
We shall see in the next section that in a metric space the nets in the last theorem may
be replaced by sequences.
Proof. By 0.8.10(a), f (X) is compact, hence closed and bounded in R. Thus f (X) must
contain its supremum and infimum.
Proof. (a) ⇒ (b): Let (an ) be a sequence in X with no cluster point. Then for each x ∈ X
there must exist an open ball B(x) with center x that contains only finitely many terms
of (an ). This implies that every finite subcover of the open cover {B(x) : x ∈ X} of X
contains only finitely many terms of the sequence and so cannot cover X. Therefore, X is
not compact.
(b) ⇒ (c): Let X be sequentially compact. That X is complete follows from 0.3.2.
Suppose X is not totally bounded. Then there exists ε > 0 such that no finite collection
of open balls of radius ε covers X. Choose any a1 ∈ X. Since Bε (a1 ) does not cover
X, there exists a2 ∈ X \Bε (a1 ). Since Bε (a1 ) ∪ Bε (a2 ) does not cover X, there exists
a3 ∈ X \ Bε (a1 ) ∪ Bε (a2 ) . Continuing in this manner we obtain a sequence (an ) in X with
an ∈ X \ Bε (a1 ) ∪ Bε (a2 ) ∪ · · · ∪ Bε (an−1 ) .
It follows that d(an , am ) ≥ ε for all m 6= n. But then no subsequence of {an } can converge.
Therefore, X must be totally bounded.
(c) ⇒ (a): Assume that X is complete and totally bounded but not compact. Then X has
an open cover U = {Ui : i ∈ I} with no finite subcover. For each k let Fk be a finite set of
points in X such that {B1/k (x) : x ∈ Fk } is a cover of X. Consider the case k = 1. If for
each x ∈ F1 the ball B1 (x) could be covered by finitely many members of U, then X itself
would have a finite cover, contradicting our assumption. Thus there exists x1 ∈ F1 such that
E1 := B1 (x1 ) cannot be covered by finitely many members of U. Since {B1/2 (x) : x ∈ F2 }
covers X, {E1 ∩ B1/2 (x) : x ∈ F2 } covers E1 , so by similar reasoning applied to E1 there
Preliminaries 31
exists x2 ∈ F2 such that E2 := E1 ∩ B1/2 (x2 ) cannot be covered by finitely many members
of U. In this way we obtain a sequence (xn ) in X and decreasing sets
from which it follows that (xn ) is a Cauchy sequence. Since X is complete, xn → x for
some x ∈ X. Choose i ∈ I such that x ∈ Ui . Since Ui is open, there exists r > 0 such that
Br (x) ⊆ Ui . Taking n > 2/r so that d(xn , x) < r/2 we then have En ⊆ B1/n (xn ) ⊆ Br (x) ⊆
Ui , contradicting the non-covering property of En . Therefore, X must be compact.
0.10 Equicontinuity
We have seen that every closed ball in Rd is compact. By contrast, closed balls in the
space C[0, 1] with the supremum norm are not compact, as may be inferred from the fact
that the sequence of functions fn (x) = xn has no convergent subsequence in C[0, 1]. The
additional property of equicontinuity is needed to characterize compact subsets of such
spaces.
Let X be a topological space. A family F of functions in C(X) is said to be equicontin-
uous at a point a ∈ X if, for each ε > 0, there exists a neighborhood N of a such that
|f (x) − f (a)| < ε for all x ∈ N and all f ∈ F. If F is equicontinuous at each point of X,
then F is said to be equicontinuous. The distinguishing feature of equicontinuity is that,
while the neighborhood N may vary with the point a, the same N works for all f ∈ F.
Here is the main result regarding equicontinuity.
0.10.1 Theorem (Arzelá–Ascoli). Let X be a compact Hausdorff space. A subset F of
C(X) is relatively compact in the uniform norm topology iff it is equicontinuous and pointwise
bounded, that is,
sup |f (x)| : f ∈ F < ∞ for all x ∈ X.
The next proposition gives a key property of locally compact spaces that underlies the
utility and importance of these spaces.
0.12.2 Proposition. If X is a locally compact Hausdorff space, then for each x ∈ X the
collection of compact neighborhoods of x is a neighborhood base.
Proof. Let N be an open neighborhood of x. We may assume that cl(N ) is compact, otherwise
replace N by the smaller open neighborhood int(M )∩N , where M is a compact neighborhood
of x. By 0.8.6, there exist disjoint open sets U and V with x ∈ U and cl(N ) \ N ⊆ V . If
y ∈ cl(U ∩ N ) \ N , then y ∈ V , hence V ∩ (U ∩ N ) 6= ∅, which is impossible. Therefore,
x ∈ U ∩ N ⊆ cl(U ∩ N ) ⊆ N , so cl(U ∩ N ) is the desired compact neighborhood contained
in N .
The following version of 0.7.1 will be needed below.
0.12.3 Proposition. Let X be locally compact and Hausdorff. If K ⊆ U ⊆ X with U
open and K compact, then there exists an open set V with compact closure such that
K ⊆ V ⊆ cl(V ) ⊆ U .
Proof. By 0.12.2, for each x ∈ K there exists an open neighborhood Vx of x with compact
closure
S contained in U . By compactness
S of K, there exists a finite set F ⊆ K such that
V := x∈F Vx ⊇ K. Then cl(V ) ⊆ x∈F cl(Vx ), hence cl(V ) is compact and ⊆ U .
Baire Spaces
0.12.4 Proposition. Let X be a topological space. The following statements are equivalent:
T∞
(a) If Un is open and dense in X for each n, then n=1 Un is dense in X.
S∞
(b) If Cn is closed and n=1 Cn has an interior point, then some Cn has an interior point.
Proof. The equivalence follows from De Morgan’s laws and the fact that an open set is dense
in X iff its complement has empty interior.
A Baire space is a topological space X with the equivalent properties in the proposition.
For example, a complete metric space is a Baire space (0.3.12). Here is another important
example.
0.12.5 Theorem. A locally compact Hausdorff space X is a Baire space.
34 Principles of Analysis
T∞
Proof. We show that (a) of 0.12.4 holds. Set D := n=1 Un and let U be any nonempty
open set in X. We show that D ∩ U 6= ∅. Since U ∩ U1 is open and nonempty, there exists a
nonempty open set V1 such that cl V1 is compact and contained in U ∩ U1 (0.12.3). Since
V1 ∩ U2 is open and nonempty, there exists a nonempty open set V2 such that cl V2 is compact
and contained in V1 ∩ U2 and hence is contained in U ∩ U1 ∩ U2 ∩ V1 . Proceeding in this
manner we construct a sequence of nonempty open sets Vn with compact closure contained
in U ∩ U1 ∩ · · · ∩ Un ∩ Vn−1 . Since the compact sets cl Vn are decreasing, their intersection is
nonempty. Any point in this intersection is a member of D ∩ U
Thus supp(f ) is the smallest closed set on whose complement f = 0. The collection of all
functions f ∈ C(X) with compact support is denoted by Cc (X):
K(f + g, ε) ⊆ K(f, ε) ∪ K(g, ε), K(cf, ε) = K(f, ε/|c|) and K(f g, ε) ⊆ K(f, ε/ kgk∞ )
|fn (xα )| = |f (xα ) + fn (xα ) − f (xα )| ≥ |f (xα )| − |fn (xα ) − f (xα )| > ε/2,
hence, xα ∈ K(fn , ε/2). Since K(fn , ε/2) is compact, there exists a subnet (xβ ) that
converges to some x ∈ K(fn , ε/2). Since |f (xα )| ≥ ε for all α and f is continuous, |f (x)| ≥ ε.
Therefore, x ∈ K(f, ε). By 0.8.8, K(f, ε) is compact.
0.12.10 Proposition. If X is locally compact and Hausdorff, then Cc (X) is dense in
C0 (X).
Proof. Let f ∈ C0 (X) and ε > 0. Since K(f, ε) ⊆ U := {x ∈ X : |f (x)| > ε/2}, there exists
a function g : X → [0, 1] in Cc (X) such that g = 1 on K(f, ε) and g = 0 on U c (0.12.6).
Then f g ∈ Cc (X) and kf g − f k∞ ≤ ε.
K ⊆ X and each α there exists an αK α with xαK 6∈ K. Direct the collection of compact
subsets upward by inclusion. Then (xαK ) is a subnet of (xα ), and for any compact K0 ,
xαK 6∈ K0 for all K ⊇ K0 . Therefore, xαK → ∞ and so by hypothesis f (xαK ) → 0. But this
is impossible, since |f (xα | ≥ ε for all α. This shows that (xα ) must have a cluster point in
K(f, ε) and so K(f, ε) is compact.
We may now prove the following extension property of one-point compactifications:
0.12.12 Proposition. Let X and Y be noncompact, locally compact Hausdorff spaces and
let ϕ : X → Y be continuous such that g ◦ ϕ ∈ C0 (X) for all g ∈ C0 (Y ). Then the extension
ϕ∞ : X∞ → Y∞ of ϕ defined by ϕ∞ (∞) = ∞ is continuous.
Proof. Let xα ∈ X and xα → ∞. Let K ⊆ Y be compact and choose g ∈ Cc (Y ) such that
g = 1 on K. By hypothesis, f := g ◦ ϕ ∈ C0 (X), hence f (xα ) → 0. Thus the net (ϕ(xα )) is
eventually in K c , hence ϕ(xα ) → ∞. This establishes continuity of ϕ at ∞.
We conclude this subsection with a locally compact version of the Stone-Weierstrass
theorem. It is derived from the compact version via the one-point compactification.
0.12.13 Stone-Weierstrass Theorem. Let X be a locally compact noncompact Hausdorff
conjugate closed subalgebra of C0 (X) that separates points
topological space and let A be a T
of X and with the property that f ∈A {x ∈ X : f (x) = 0} = ∅. Then A is dense in C(X) in
the uniform norm.
Proof. Identify C0 (X) with the closed subspace of C(X∞ ) consisting of all f with f (∞) = 0.
Let A1 denote the subalgebra of C∞ (X) generated by A and the constant function 1. Then
A1 trivially separates points of X∞ , hence is dense in C(X∞ ). Moreover, every member g of
A1 may be written uniquely as g = g0 + g(∞), where g0 ∈ C0 (X). Now let f ∈ C0 (X) and
ε > 0, and choose g ∈ A1 such that kg − f k∞ < ε. In particular, |g(∞)| < ε, hence setting
g0 = g − g(∞) we have g0 ∈ A and for all x ∈ X
|f (x) − g0 (x)| = |f (x) − g(x) + g(∞)| ≤ |g(x) − f (x)| + |g(∞)| < 2ε.
|α| = α1 + · · · + αd .
While this conflicts with the notation for the Euclidean norm on Rd , context will make
clear which notion is being referenced. The partial differential operator of order |α| is
defined by α1 αd
α α ∂ ∂
∂ = ∂x = ··· .
∂x1 ∂xd
If α = (0, . . . , 0), then ∂ α is the identity operator. The following spaces of differentiable
functions figure prominently in the study of Fourier analysis and distributions on Rd . (See
Chapters 6 and 15.)
Preliminaries 37
T∞
C k (U ) = {f : ∂ α f ∈ C(U ) for all |α| ≤ k}, C ∞ (U ) := k=1 C k (U ).
T∞
Cck (U ) = {f : ∂ α f ∈ Cc (U ) for all |α| ≤ k}, Cc∞ (U ) := k=1 Cck (U ).
By the standard rules of differentiation, these spaces are closed under addition, multiplication,
and scalar multiplication and so are algebras. Moreover, the C ∞ spaces satisfy ∂ α C ∞ ⊆ C ∞
for all α.
ηj := (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψj ).
For j > 1, ηj−1 − ηj = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψj−1 ) 1 − (1 − ψj ) = φj , hence
p
X p
X
φj = φ 1 + (ηj−1 − ηj ) = φ1 + η1 − ηp = 1 − ηp .
j=1 j=2
S S Pp
Since K ⊆ j Vj ⊆ j Kj and φj = 1 on Kj , ηp = 0 on K, hence j=1 φj = 1 on K,
completing the proof.
0.14.2 Lemma. Let a < b. Then there exists a C ∞ function h : R → [0, +∞) such that
h > 0 on (a, b), and h = 0 on (a, b)c .
Proof. Define h by
(
exp (x − a)−1 (x − b)−1 if a < x < b,
h(x) =
0 otherwise.
38 Principles of Analysis
Clearly, h(m) = 0 on [a, b]c for all m ≥ 0. Moreover, if x ∈ (a, b), then h(m) (x) is a sum of
terms of the form
±h(x)
, p, q ∈ Z+ .
(x − a)p (x − b)q
Since the exponent (x − a)−1 (x − b)−1 in h(x) is negative on (a, b), l’Hospital’s rule is
applicable and yields
h(x)
lim = 0, a < x < b.
x→a+ (x − a)p (x − b)q
Therefore, limx→a h(m) (x) = 0. An induction argument then shows that h(m) (a) = 0 for all
m. A similar argument holds for b. Thus h is C ∞ on R.
0.14.3 Lemma. Let a < b. Then there exists a C ∞ function g : R → R such that 0 ≤ g ≤ 1,
g = 0 on (−∞, a], and g = 1 on [b, +∞).
R b −1 R x
Proof. Take g(x) := a h a
h, where h is the function in 0.14.2.
Proof. For each j, let hj : R → [0, +∞) be a C ∞ function such that hj > 0 on (aj , bj ) and
hj = 0 on (aj , bj )c . Now set f (x1 , . . . , xn ) := h1 (x1 ) · · · hn (xn ).
We may now prove the following C ∞ version of Urysohn’s lemma:
0.14.5 Theorem. Let K ⊆ U ⊆ Rd , where K is compact and U is open. Then there exists
a C ∞ function ψ : Rd → [0, 1] such that supp(ψ) ⊆ U and ψ = 1 on K.
Proof. For each x ∈ K, let Vx be an open cube with center x and edge 2r:
0.15 Connectedness
A pair of open sets U , V in a topological space X is said to separate X if
X = U ∪ V, U 6= ∅, V 6= ∅, and U ∩ V = ∅.
The pair (U, V ) is then called a separation of X. The space X is said to be disconnected if
it has a separation, and connected if no separation exists. A subset E of X is disconnected
(connected) if it is disconnected (connected) as a subspace of X. Thus if E is disconnected,
then there exist sets U , V open in X such that (E ∩ U, E ∩ V ) is a separation of E.
In any topological space, the singletons {x} are trivially connected. In a discrete space
the only connected
√ √ the singletons. The set Q is not connected in R, since the open
sets are
sets (−∞, 2) ∩ Q and ( 2, +∞) ∩ Q separate Q.
0.15.1 Theorem. A topological space X is disconnected iff there exists a continuous function
from X onto {0, 1}. Equivalently, X is connected iff every continuous function from X into
{0, 1} is constant.
Proof. Assume that X is disconnected and let (U, V ) separate X. The function
(
0 if x ∈ U ,
g(x) =
1 if x ∈ V .
(c) Cx is closed in X.
Proof. Part (a) follows directly from the definition of component. Part (b) follows from (a)
and 0.15.5. Part (c) follows from 0.15.4 and (a) by considering the closure of Cx . For (d) let
U ⊆ X be open and C a component of U . If x ∈ C and r is chosen so that Br (x) ⊆ U , then,
since Br (x) is connected, C ∪ Br (x) is connected (0.15.5), hence Br (x) ⊆ C by (a).
Part I
1.1 Introduction
This chapter begins the development of Lebesgue integration, which constitutes Part
I of the text. The theory may be seen as arising from the need to overcome some of the
shortcomings of the Riemann integral, which is restrictive in both the kind of function that
may be integrated and the space over which the integration takes place. These shortcomings
make the Riemann integral unsuitable for certain applications, for example those involving
random parameters. A further complication with the Riemann theory concerns the integration
of a pointwise limit of a sequence of Riemann integrable functions, such limits sometimes
failing to be Riemann integrable. The removal of these limitations may be seen as a reason
for the wide applicability of the Lebesgue theory.
Nevertheless, the Riemann integral still occupies an important position in analysis. Indeed,
as we shall see, the set of Lebesgue integrable functions on [a, b] is the completion in a precise
sense of the set of Riemann integrable functions, much as the real number system is the
completion of the rational number system.
It is illuminating to compare the construction of the two integrals in terms of how the
domain [a, b] of an integrand f is partitioned. In the case of the Riemann integral, [a, b] is
partitioned into subintervals [xi−1 ,Pxi ] and a point x∗i is chosen in each. A suitable limit of
the corresponding Riemann sums i f (x∗i )∆xi then produces the Riemann integral of f .
By contrast, in the Lebesgue theory it is the range of the function that is partitioned into
subintervals, these inducing, via preimages under f , a partition of [a, b]. This partition will in
general not consist of intervals. However, the Lebesgue theory provides a way of “measuring”
the members of the partition. The Lebesgue integral is then constructed by multiplying
these measured values by (approximate) function values, summing, and taking limits.
The preceding discussion suggests (correctly) that a fundamental feature of the Lebesgue
theory is the notion of “measure” of a set. Such measures are constructed by starting with
a collection A of elementary sets, such as intervals in R or rectangles in R2 , and a set
function that assigns a natural “size” to each member of A, for example length in the case
of intervals and area in the case of rectangles. The collection A is then enlarged to a richer
class of sets that can still be “measured,” the so-called σ-field of measurable sets. Unlike A,
this collection is closed under standard set-theoretic operations, including countable unions
and intersections, a feature eventually resulting in limit theorems of a sort unavailable in
Riemann integration, these theorems underlying much of modern analysis. The first step
then in the construction of the Lebesgue integral is to develop the notion of measurable set
and measure, which is the goal of this chapter.
43
44 Principles of Analysis
Note that (a) and (b) imply that ∅ ∈ F. An induction argument using (c) shows that a field
F is closed under finite unions, that is,
A1 , . . . , An ∈ F ⇒ A1 ∪ · · · ∪ An ∈ F.
Of course, every field with only finitely many members is a σ-field, since in this case countable
unions reduce to finite unions. De Morgan’s law
c
A1 ∩ A2 ∩ · · · ∩ An = Ac1 ∪ Ac2 ∪ · · · ∪ Acn
together with (b) shows that a field is closed under finite intersections and thus, for example,
under the operation of symmetric difference defined by
A 4 B := (A ∪ B) \ (A ∩ B) = (A \ B) ∪ (B \ A).
Furthermore, every finite union of members of a field may be expressed as a disjoint union
of members of the field via the construction
n
[
Ak = A1 ∪ (A2 ∩ Ac1 ) ∪ · · · ∪ (An ∩ Ac1 ∩ · · · ∩ Acn−1 ). (1.1)
k=1
Similar remarks apply to σ-fields: Part (d) of the above definition asserts that a σ-field is
closed under countable unions, and an application of De Morgan’s law shows that a σ-field
is closed under countable intersections as well. As a consequence, a σ-field F is closed under
the operations of limit infimum and limit supremum defined, respectively, by
∞ \
[ ∞ ∞ [
\ ∞
lim An := Ak and lim An := Ak .
n n
n=1 k=n n=1 k=n
1.2.1 Examples.
(a) The power set P(X) is obviously a σ-field, as is the collection {∅, X}. A field clearly
cannot have exactly three members. All fields with exactly four members are of the form
{∅, X, A, Ac }.
(b) A subset A of X is said to be cofinite if Ac is finite. The collection F of all sets that
are either finite or cofinite is a field. If X is infinite, then F is not a σ-field (Ex. 1.9).
(c) A subset A of X is said to be cocountable if Ac is countable. The collection F of all
sets that are either countableSor cocountable is a σ-field. For example, to see that F is closed
∞
under countable unions A = n=1 An , note that if each An is countable, then A is countable
and if some An is cocountable then A is cocountable. In either case, A ∈ F.
(d) If F is a field (σ-field) on X, then the trace
F ∩ E = {A ∩ E : A ∈ F}
is a field (σ-field) on E. For example, if A, B ∈ F, then the relations
(A ∪ B) ∩ E = (A ∩ E) ∪ (B ∩ E) and (A \ B) ∩ E = (A ∩ E) \ (B ∩ E)
show that A ∪ B, A \ B ∈ F. Note that F ∩ E ⊆ F iff E ∈ F, in which case F ∩ E is simply
the collection of all sets A ∈ F with A ⊆ E. ♦
hence F is closed under complements as well. Since A ⊆ F ⊆ σ(A), the minimality property
implies that σ(A) = F. The analogous assertions hold for finite partitions of X. ♦
Borel Sets
Let X be a topological space. The σ-field generated by the collection of all open subsets
of X is called the Borel σ-field on X and is denoted by B(X). A member of B(X) is
called a Borel set. The minimality property of B(X) takes the following form:
If a σ-field F contains all open sets, then it contains all Borel sets.
Borel σ-fields provide a bridge between topology and measure theory, allowing, for example,
the entry of continuous functions into integration theory.
Since closed sets are complements of open sets, B(X) is also generated by the collection
of closed sets. For Euclidean space Rd , more can be said:
46 Principles of Analysis
Proof. For ease of notation we prove the proposition for d = 1; the proof for the general
case is entirely similar.
(a) Let O denote the collection of all open sets in R. Since OI ⊆ O, by minimality we have
σ(OI ) ⊆ σ(O) = B(R). On the other hand, every member of O is a countable union of sets
in OI , hence O ⊆ σ(OI ) and so B(R) ⊆ σ(OI ).
(b) Let C denote the collection of all closed sets in R. As in part (a), σ(C
S I ) ⊆ σ(C) = B(R).
Moreover, every bounded open interval (a, b) may be expressed as n [a + 1/n, b − 1/n],
hence OI ⊆ σ(CI ). By part (a) and minimality, B(R) = σ(OI ) ⊆ σ(CI ).
S T
(c) From the representations (a, b) = n (a, b − 1/n] and (c, d] = n (c, d + 1/n), we see
that OI ⊆ σ(HI ) and HI ⊆ σ(OI ). By minimality, σ(OI ) ⊆ σ(HI ) and σ(HI ) ⊆ σ(OI ).
An application of (a) completes the argument.
The collection HI will figure prominently in the development of the Lebesgue integral on
Euclidean space Rd .
The collection of all such sets, together with the Borel subsets of Rd , is called the extended
Borel σ-field and is denoted by B(R). One easily checks that B(R) is indeed a σ-field with
trace B(R) on R. It may be shown that R has a natural topology whose open sets generate
B(R) (Exercise 2.30).
A1 × · · · × Ad = {A1 × · · · × Ad : Aj ∈ Aj , j = 1, . . . , d}.
F1 ⊗ · · · ⊗ Fd := σ(F1 × · · · × Fd ).
In particular,
B(Rd ) = B(R) ⊗ · · · ⊗ B(R) (d factors).
Proof. By definition, B(Rdj ) = σ(Oj ) and B(Rd ) = σ(O), where Oj is the collection of all
open subsets of Rdj and O is the collection of all open subsets of Rd . By the theorem,
the desired equality (1.4) will then follow by minimality. The first inclusion in (†) follows
from the definition of the product topology of Rd1 × · · · × Rdk (the latter identified with
Rd ). For the second inclusion, recall that each U ∈ O is a countable union of open intervals
I = (a1 , b1 ) × · · · × (ad , bd ). Since each such interval may be written as Id1 × · · · × Idk ,
where Idj is a dj -dimensional open interval, U ∈ B(Rd1 ) ⊗ · · · ⊗ B(Rdk ). Therefore, (†) holds,
completing the proof.
Note that (a) and (b) imply that a λ-system is closed under complements and contains
the empty set. The importance of λ-systems is that they provide an indirect method for
establishing various properties of certain collections of sets. (See, for example, 1.6.8.) The
method is based on Dynkin’s π-λ theorem, which makes a connection between π-systems,
λ-systems, and σ-fields.
48 Principles of Analysis
Proof. Let `(P) denote the intersection of all λ-systems containing P. Then `(P) is a λ-
system, as is easily verified, and `(P) ⊆ σ(P). If we show that `(P) is a σ-field, it will then
follow by minimality that σ(P) = `(P) ⊆ L, establishing the theorem.
To show that `(P) is closed under finite intersections, let A ∈ `(P) and define
LA := {B ∈ `(P) : A ∩ B ∈ `(P)}.
Exercises
1.1 Let A, B, C, An , Bn ⊆ X. Verify the following:
Show that the inclusions in (c), (f), and (i) may be strict.
1.5 Let {an } be a sequence in R and set An = (−∞, an ) and Bn = (an , ∞). Prove:
(a) x ∈ limn An ⇒ x ≤ limn an . (b) x < limn an ⇒ x ∈ limn An .
(c) x ∈ limn An ⇒ x ≤ limn an . (d) x < limn an ⇒ x ∈ limn An .
(e) x ∈ limn Bn ⇒ limn an ≤ x.
1.6 Determine all sets in the field on X = {1, 2, 3, 4, 5, 6} generated by the sets
(a) {1, 2}, {2, 3}, {3, 4}, {4, 5}. (b) {1, 2, 3}, {2, 3, 4}, {3, 4, 5}.
(c) {1, 2, 3, 4}, {2, 3, 4, 5}, {3, 4, 5, 6}.
1.7 Let F be a σ-field on X and E ⊆ X. Show that σ F ∪ {E} consists of all sets of the form
(A ∩ E) ∪ (B ∩ E ), A, B ∈ F.
c
1.8 Let F ⊆ P(X) such that X ∈ F and A \ B ∈ F whenever A, B ∈ F. Show that F is a field.
1.9 Show that if X is infinite, then the field consisting of all finite or cofinite sets is not a σ-field.
1.11 Find examples of fields F and G on X = {1, 2, 3} such that F ∪ G is not a field.
1.12 Describe the σ-field F on (0, 1) generated by all singletons {x}, x ∈ (0, 1). Show that F is
contained in B(0, 1) and contains no proper open subinterval of (0, 1).
1.13 Let F be the collection of all finite disjoint unions of intervals [a, b) ⊆ [0, 1). Show that F is a
field on [0, 1) but not a σ-field.
1.15 Let Ff denote the field consisting of the subsets of X that are either finite or cofinite. Show
that σ(Ff ) is the σ-field Fc consisting of the countable or cocountable subsets of X.
1.19 Let X be a topological space and let E ⊆ X have the relative topology. Prove that B(X) ∩ E =
B(E).
1.20 [↓ 2.30] Let a, b ∈ R and let [a, b] and (a, b) have the relative topology from R. Show that
B([a, b]) consists of the sets B, B ∪ {a}, B ∪ {b}, and B ∪ {a, b} where B ∈ B (a, b) .
1.23 Let A ⊆ P(X) and let F be the union of all σ-fields σ(C), where C is a countable subfamily of
A. Prove that F = σ(A).
50 Principles of Analysis
1.24 Let F = {B1 , . . . , Bm } be a finite field on X. Show that there exists a finite partition A of X
by sets in F such that every member of F is a union of members of A. JConsider C1 ∩ · · · ∩ Cm ,
where Cj = Bj or Bjc . K
1.25 Show that every infinite σ-field F has an infinite sequence of disjoint nonempty sets. Conclude
that F has cardinality at least that of the continuum. Conclude that no σ-field can have
cardinality ℵ0 . Find a field that has cardinality ℵ0 .
1.26 A nonempty collection M of subsets of X is a monotone class if for any sequence {An } in
M, An ↑ A or An ↓ A ⇒ A ∈ M. Carry out steps (a)–(f) below to prove the monotone class
theorem, due to Halmos: If F is a field, M is a monotone class, and F ⊆ M, then σ(F) ⊆ M.
(a) Show that a monotone class that is closed under finite unions (intersections) is closed under
countable unions (intersections).
(b) Let m(F) denote the intersection of all monotone classes containing F. Show that m(F) is a
monotone class.
(c) Show that A := {A ∈ m(F) : Ac ∈ m(F)} is monotone and m(F) = A. Conclude that m(F)
is closed under complements.
(d) Let B = {B ∈ m(F) : A ∪ B ∈ m(F) for all A ∈ F}. Show that B is a monotone class and
B = m(F). Conclude that A ∪ B ∈ m(F) for all B ∈ m(F) and all A ∈ F.
(e) Let C = {C ∈ m(F) : C ∪ B ∈ m(F) for all B ∈ m(F)}. Show that C is monotone and
C = m(F). Conclude that m(F) is closed under finite unions.
(f) Show that m(F) is closed under countable unions. Conclude that σ(F) ⊆ m(F) ⊆ M.
1.3 Measures
Set Functions
Let X be a nonempty set. A collection of subsets of X containing the empty set is
called a paving of X. A function µ on a paving A of X that takes values in R is called a
set function on A. Until Chapter 5, we consider only nonnegative set functions, that
is, those taking values in [0, ∞]. An important example is the function that assigns the
length b − a to intervals [a, b]. This set function and its d-dimensional generalization will be
examined in detail in §1.7.
Let µ be a nonnegative set function on a paving A and let A1 , A2 , . . . ∈ A. Then µ is
said to be
(b) (Continuity at A from above). An ↓ A and µ(A1 ) < ∞ implies µ(An ) ↓ µ(A).
For (a), let {An } be a sequence of disjoint sets in F with union A ∈ F and set
Proof. S
n
Bn := k=1 Ak . Then Bn ∈ F and Bn ↑ A. By finite additivity and continuity from below,
∞
X n
X
µ(Ak ) = lim µ(Ak ) = lim µ(Bn ) = µ(A).
n n
k=1 k=1
(c) Let X be uncountable and F the σ-field of countable or cocountable subsets of X (see
1.2.1(c)). Define µ(A) = 0 if A is countable and µ(A) = 1 if A is cocountable. Then µ is a
probability measure on F.
(d) Dirac measure. Let (X, F) be a measurable space. For x ∈ X and A ∈ F define
δx (A) = 1A (x). Then δx is a probability measure on F.
Pn
(e) If µj are measures on a σ-field F and aj ≥ 0, then j=1 aj µj is a measure on F. In
particular, a nonnegative linear combination of Dirac measures is a measure.
(f) If (X, F, µ) is a measure space and E ∈ F, then µE (A) := µ(A ∩ E) defines a measure
on F. Note that µE agrees with µ on the trace F ∩ E.
(g) Counting measure. Let X be a nonempty set. For A ⊆ X let µ(A) be the number of
elements in A if A is finite and µ(A) = ∞ otherwise. Then µ is clearly finitely additive on
P(X). To show that µ is a measure, let An ↑ A. If there exists an m such that Am = A,
then An = A for all n ≥ m and so, trivially, µ(An ) ↑ µ(A). On the other hand, if no such
m exists, then A must be infinite and Ank−1 $ Ank for some sequence of indices. Since
µ(Ank ) ≥ µ(Ank−1 ) + 1,
where the sum may be infinite. (By convention, the sum over the empty set is zero.) The
rearrangement theorem for nonnegative series implies that µ is well-defined and finitely
additive. Let An ↑ A. If A is finite, then eventually An = A, so obviously
P∞ µ(An ) ↑ µ(A). If A
is infinite, then µ(A) may be written as an infinite series µ(A) = k=1 pnk . Let r < µ(A),
Pk
choose k such that i=1 pni > r, and choose m so that Am contains the indices n1 , . . . , nk .
Then µ(An ) ≥ µ(Am ) > r for all n ≥ m. Since r was arbitrary, µ(An ) → µ(A). By 1.3.2, µ
is a measure on P(N). Note that if pk ≡ 1, then µ is simply counting measure on N. ♦
Exercises
1.27 Let A ⊆ P(X) and ∅ ∈ A. Show that if µ is a countably additive, finite set function on A, then
µ(∅) = 0.
1.28 Verify that the set functions defined in 1.3.3 (c) and (d) are measures.
1.30 [↑ 1.2.1] Let F be the field of finite or cofinite subsets of X and define µ(A) = 0 if A is finite and
µ(A) = 1 if A is cofinite. (a) Show that µ is finitely additive but in general is not countably
additive. (b) Show that µ is countably additive if X is uncountable.
1.31 Let µ be a finitely additive, nonnegative set function on a field F. Prove that if µ(A) and µ(B)
are finite, then |µ(A) − µ(B)| ≤ µ(A 4 B).
1.32 (Inclusion-exclusion I). Let µ be a finitely additive nonnegative set function on a field F. Prove
that µ(A) + µ(B) = µ(A ∪ B) + µ(A ∩ B).
Measurable Sets 53
1.33 Let µ be a finitely additive, nonnegative set function on a field F and let A, B ∈ F with
µ(B) = 0. Show that µ(A ∪ B) = µ(A \ B) = µ(A).
1.34 (Inclusion-exclusion II). Let µ be a finitely additive, nonnegative set function on a field F and
let A1 , . . . , An ∈ F with union A such that µ(A) < ∞. Prove that for n ≥ 2
n
X n
X n
X
µ(A) = µ(Ai ) − µ(Ai ∩ Aj ) + µ(Ai ∩ Aj ∩ Ak ) − · · · + (−1)n−1 µ(A1 ∩ · · · ∩ An ).
i=1 1≤i<j≤n 1≤i<j<k≤n
1.35 (Inclusion-exclusion III). Let µ be a finitely additive, nonnegative set function on a field F with
µ(X) < ∞ and let B1 , . . . , Bn ∈ F with intersection B. Prove that for n ≥ 2,
n
X n
X n
X
µ(B) = µ(Bi ) − µ(Bi ∪ Bj ) + µ(Bi ∪ Bj ∪ Bk ) − · · · + (−1)n−1 µ(B1 ∪ · · · ∪ Bn ).
i=1 1≤i<j≤n 1≤i<j<k≤n
F, µ) be a measure
1.36 Let (X, S P∞ space and let An ∈ F such that µ(Am ∩ An ) = 0 for m 6= n. Prove
that µ ∞ n=1 An = n=1 µ(An ).
1.38 Let (X, F) be a measurable space and let x1 , x2 ∈ X. For A ∈ P(X), define µ(A) = 1 if
{x1 , x2 } ⊆ A and µ(A) = 0 otherwise. Prove that µ is continuous from below. Is µ a measure?
1.40 [↓ Ex. 3.3] Let µn be a sequence of measures on a σ-field F on X such that µn (A) ≤ µn+1 (A)
for all A ∈ F. Define the set function µ on F by µ(A) = limn µn (A). Prove that µ is a measure.
1.42 Let (X, F, µ) be a finite measure space. Show that there can be at most countably many pairwise
disjoint sets of positive measure.
1.43 Let (X, F, µ) be a σ-finite measure space and E a collection of pairwise disjoint members of F.
Show that for any A ∈ F, µ(A ∩ E) > 0 for at most countably many members of E.
Show that µ0 is a measure on F. Show also that µ0 = µ iff the following condition holds:
For each A ∈ F with µ(A) = ∞ there exists B ∈ F such that B ⊆ A and 0 < µ(B) < ∞.
1.45 Let (X, F, µ) be a measure space and {Ek } be a sequence in F. For fixed m ∈ N, let A denote
the set of all x such that x ∈ Ek for exactly m values of k; B the set of all x such that x ∈ Ek
P set of all x such that x ∈ Ek for at most
for finitely many and at least m values of k; and C the
m values of k. Prove that A, B, C ∈ F. If s(D) := ∞ k=1 µ(D ∩ Ek ), prove that
Completion Theorem
Here is the general technique for completing a measure space. Part (a) of the theorem
gives the construction and part (b) describes a minimality property of a completion.
1.4.2 Theorem. Let (X, F, µ) be a measure space. Define
Fµ := A ∪ N : A ∈ F, N ⊆ M ∈ F, µ(M ) = 0 and µ(A ∪ N ) := µ(A). (1.6)
(a) Fµ is a σ-field containing F and µ is a measure on Fµ that extends µ such that
(X, Fµ , µ) is complete.
(b) If (X, G, ν) is a complete measure space such that F ⊆ G and ν is an extension of µ,
then Fµ ⊆ G and the restriction of ν to Fµ is µ.
Proof. (a) To see that µ is well-defined, let A1 ∪ N1 = A2 ∪ N2 , where Nj ⊆ Mj , Aj , Mj ∈ F
and µ(Mj ) = 0. Then A1 ⊆ A2 ∪ M2 and A2 ⊆ A1 ∪ M1 , hence µ(A2 ) = µ(A1 ).
Clearly, F ⊆ Fµ . To see that Fµ is closed under complements note that in the notation
of (1.6)
(A ∪ N )c = (Ac ∩ M c ) ∪ (Ac ∩ N c ∩ M ), Ac ∩ M c ∈ F and Ac ∩ N c ∩ M ⊆ M.
S
For closure under countable unions, let Bn := An ∪ Nn ∈ Fµ and B := n Bn , where
Nn ⊆ Mn , An , Mn ∈ F, and µ(Mn ) = 0. Then
∞
[ ∞
[ ∞
[
B = A ∪ N, where A := An and N := Nn ⊆ M := Mn .
n=1 n=1 n=1
Null Sets
The sets N in the completion theorem, namely the subsets of F-measurable sets M
with measure zero, are called µ-null sets. Such sets appear throughout measure theory,
frequently in the following context:
A property P (x) of points x ∈ X is said to hold µ-almost everywhere, abbreviated
µ-a.e., if the set of all x for which P (x) is false is a µ-null set, that is,
µ {x ∈ X : P (x) is false} = 0.
In this case we also say that the property P (x) holds for µ-almost all x, abbreviated µ-a.a.
x. If the measure is clear from context we drop the qualifier µ and simply write a.e. or a.a.
For example, if a function f in 1.4.1 is defined by f (j) = j, then f = 1 a.e. For an example
with far reaching implications, consider functions fn , f : X → C. The notation fn → f a.e.
then means that
µ{x ∈ X : lim fn (x) 6= f (x)} = 0.
n
This type of convergence will be examined in Chapter 2.
Exercises
1.46 [↑ 1.3.3(d).] Let (X, F) be a measurable space, E a finite subset of X, and µ :=
P
x∈E δx .
Describe the completion of (X, F, µ).
1.47 Show that if G ⊆ F are sigma fields, µ is a measure on F, and ν = µG , then Gν ⊆ Fµ and
ν = µG .
ν
1.50 Let ν and η be measures on a σ-field F and set µ := ν + η. Show that Fµ ⊆ Fν ∩ Fη and
µ := ν + η on Fµ .
1.51 [↑ 1.3.3(f)] Let E ∈ F. Prove that FµE ∩ E = Fµ ∩ E and µE = µE on FµE .
1.52 Let (X, F, µ) be a finite measure space. For E ⊆ X define
µ∗ (E) = sup{µ(A) : A ∈ F, A ⊆ E} and µ∗ (E) = inf{µ(B) : B ∈ F, B ⊇ E}.
Show that Fµ = {E ⊆ X : µ∗ (E) = µ∗ (E)}.
Carathéodory’s Theorem
Let µ∗ be any outer measure on X. A subset E of X is said to be µ∗ -measurable if
The definition asserts that E “splits” the outer measure of each subset C of X, a property
that may be seen as a precursor to finite additivity. Note that by subadditivity the inequality
≤ in (1.8) always holds. Thus the measurability criterion singles out precisely those sets E
for which the inequality ≥ in (1.8) is satisfied. The collection of all µ∗ -measurable subsets of
X is denoted by M(µ∗ ). Here is the main result regarding outer measure.
µ∗ (C ∩ E) + µ∗ (C ∩ E c ) ≤ µ∗ (E) + µ∗ (C ∩ E c ) = µ∗ (C ∩ E c ) ≤ µ∗ (C),
The verifications of (a) and (b) are carried out in the following steps. For convenience, call a
set C for which the equality in (1.8) holds a test set for E.
µ∗ (C) = µ∗ (C ∩ E) + µ∗ (C ∩ E c ) and
µ∗ (C ∩ E c ) = µ∗ (C ∩ E c ∩ F ) + µ∗ (C ∩ E c ∩ F c ).
µ∗ (C) = µ∗ (C ∩ E) + µ∗ (C ∩ E c ∩ F ) + µ∗ (C ∩ E c ∩ F c )
≥ µ∗ (C ∩ E) ∪ (C ∩ E c ∩ F ) + µ∗ (C ∩ E c ∩ F c ) (by subadditivity)
= µ∗ C ∩ (E ∪ F ) + µ∗ C ∩ (E ∪ F )c .
Therefore, E ∪ F ∈ M.K
(2) C ⊆ X, E, F ∈ M and E ∩ F = ∅ ⇒ µ∗ C ∩ (E ∪ F ) = µ∗ (C ∩ E) + µ∗ (C ∩ F ).
JUsing C ∩ (E ∪ F ) as a test set for E we have
µ∗ C ∩ (E ∪ F ) = µ∗ C ∩ (E ∪ F ) ∩ E + µ∗ C ∩ (E ∪ F ) ∩ E c
= µ∗ (C ∩ E) + µ∗ (C ∩ F ).K
S∞ P∞
(3) If the sets En are disjoint, then F := n=1 En ∈ M and µ(F ) =
n=1 µ(En ).
Sn
JLet Fn := k=1PnEk and C ⊆ X. By steps (1) and (2) and induction, Fn ∈ M and
µ∗ (C ∩ Fn ) = k=1 µ∗ (C ∩ Ek ). Therefore, by monotonicity,
n
X
µ∗ (C) = µ∗ (C ∩ Fn ) + µ∗ (C ∩ Fnc ) ≥ µ∗ (C ∩ Ek ) + µ∗ (C ∩ F c )
k=1
Exercises
1.53 Define an outer measure µ∗ on P(X) by µ∗ (∅) = 0 and µ∗ (E) = 1 if E 6= ∅. Find M(µ∗ ).
1.54 Let OI denote the collection of all bounded open subintervals of R and let µ := δ0 be the Dirac
measure at 0 on OI . Show that the outer measure µ∗ generated by (OI , µ) is the Dirac measure
at 0 on P(R). Find M(µ∗ ).
1.55 Let X be an uncountable set and define µ∗ (E) = 0 if E = ∅ and µ∗ (E) = 1 otherwise. Show
that µ∗ (E) = 0 or 1 according as E is countable or uncountable. Show also that M(µ∗ ) is the
σ-field of sets that are countable or cocountable.
1.56 [↑ 1.3.3(f)] Let µ be a monotone set function on a field F. For E ∈ F, let µE denote the set
function on F defined by µE (A) = µ(E ∩ A) and let (µE )∗ be the outer measure generated by
(F, µE ). Prove that (µ∗ )E = (µE )∗ .
1.57 [↓ 1.8.1.] Let A and B be pavings of X such that each contains sequence with union X. Let
µ be a measure on A ∪ B and let µ∗a and µ∗b be the outer measures generated by (A, µ) and
(B, µ), respectively. Suppose that
µ∗a (E) = µ∗b (E) = µ(E) ∀ E ∈ A ∪ B. (†)
Prove that µ∗a = µ∗b . Show that assertion fails if the condition in (†) is not assumed.
1.58 Let µ be an outer measure on X, E ⊆ X, and A ∈ M(µ∗ ) with E ∩ A = ∅. Show that
∗
µ∗ (E ∪ A) = µ∗ (E) + µ(A).
1.59 Let µ∗ be an outer measure on X, E ⊆ X, and A, B ∈ M(µ∗ ) with A ∩ B = ∅. Show that
µ∗ E ∩ (A ∪ B) = µ∗ (E ∩ A) + µ(E ∩ B). Show that the conclusion holds for countable disjoint
unions as well.
1.60 Let µ a nonnegative set function on a paving A of X with µ(∅) = 0, and let µ∗ be the outer
measure generated by (A, µ). Prove that E ∈ M(µ∗ ) for any E ⊆ X satisfying
µ∗ (A) = µ∗ (A ∩ E) + µ∗ (A ∩ E c ) for all A ∈ A.
1.6.1 Lemma. The set Au of all finite disjoint unions of members of A is a ring.
Proof. Let A, B ∈ Au , say
m
[ n
[
A= Aj , Aj ∈ A, and B = Bk , Bk ∈ A (disjoint unions).
j=1 k=1
B ∈ Au , for each j and k choose finitely many disjoint sets Cijk ∈ A such
To see that A \ S
that Aj \ Bk = i Cijk . Then Aj \ Bk ∈ Au and
m
[ m \
[ n m \
[ n [
A\B = Aj ∩ B c = Aj \ B k = Cijk .
j=1 j=1 k=1 j=1 k=1 i
Summing, we obtain
m
X m X
X n n
X
µ(Aj ) = µ(Aj ∩ Bk ) = µ(Bk ).
j=1 j=1 k=1 k=1
By definition of µu ,
m
X mk
m X
X
µu (E) = µ(Ai ) and µu (Ek ) = µ(Ai ∩ Bk,j ). (α)
i=1 i=1 j=1
60 Principles of Analysis
1.6.3 Lemma. The outer measures generated by (A, µ) and (Au , µu ) are the same.
Proof. Let E ⊆ X. Typical sums in the definitions of µ(E) and µu (E) are, respectively,
∞
X ∞
[ ∞
X ∞
[
s= µ(An ), An ∈ A, E ⊆ An , and t = µu (Bn ), Bn ∈ Au , E ⊆ Bn .
n=1 n=1 n=1 n=1
Since A ⊆ Au , every sum s is also a sum t. On the other hand, since each Bn is a finite
disjoint union of members of A and µu is additive, every t may be decomposed and written
as an s. The infima over these sums are therefore the same.
We may now prove
1.6.4 Theorem. Let A be a semiring on a set X, µ a measure on A, µ∗ the outer measure
generated by (A, µ), and M = M(µ∗ ) the σ-field of µ∗ -measurable sets. Then σ(A) ⊆ M
and the measure µ∗ M is an extension of µ.1
Proof. By the last lemma, we may assume that A is a ring. To show that A ⊆ M(µ∗ ), let
A ∈ A and C ⊆ X. We show that
µ∗ (C ∩ A) + µ∗ (C ∩ Ac ) ≤ µ∗ (C). (†)
S∞
Let Cn ∈SA such that C ⊆ n=1 Cn . Since
S∞ A is a ring, Cn ∩ A, Cn ∩ A ∈ A. Moreover,
c
∞
C ∩ A ⊆ n=1 (Cn ∩ A) and C ∩ Ac ⊆ n=1 (Cn ∩ Ac ), so
∞
X ∞
X
µ∗ (C ∩ A) ≤ µ(Cn ∩ A) and µ∗ (C ∩ Ac ) ≤ µ(Cn ∩ Ac ).
n=1 n=1
Adding we have
∞
X ∞
X ∞
X
µ∗ (C ∩ A) + µ∗ (C ∩ Ac ) ≤ µ(Cn ∩ A) + µ(Cn ∩ Ac ) = µ(Cn ).
n=1 n=1 n=1
Taking infima over all such sequences {An } yields µ(A) ≤ µ∗ (A). On the other hand, the
sequence
A, ∅, ∅, . . . is a cover of A by members of A, hence µ∗ (A) ≤ µ(A). Therefore,
∗
µ A = µ, completing the proof of the theorem.
1 We frequently denote this extension also by µ, depending on context.
Measurable Sets 61
we have
n
[ ∞
[ [
∞ X∞ ∞
X
c
µ E4 Ej ≤ µ E ∩ Ej + µ Ej ≤ µ(Bj ) − µ(E) + µ(Ej )
j=1 j=1 j=n+1 j=1 j=n+1
∞
X ∞
[
µ∗ (E c ) ≤ µ(An ) ≤ µ(An,j ) ≤ µ∗ (E c ) + 1/n, where An := An,j ⊇ E c , and
j=1 j=1
∞
X ∞
[
µ∗ (E) ≤ µ(Bn ) ≤ µ(Bn,j ) ≤ µ∗ (E) + 1/n, where Bn := Bn,j ⊇ E.
j=1 j=1
Next, let
∞
[ ∞
\
A= Acn and B = Bn .
n=1 n=1
completion M(µ∗n ). By Ex. 1.62 again, M(µ∗n ) = M(µ∗ ) ∩ Xn and µ∗n is the restriction
of µ∗ to M(µ∗n ). Now let E ∈ M(µ∗ ). By the preceding paragraph, for each n there exist
Mn , An ∈ Fn with µn (Mn ) = 0 and Nn ⊆ Mn such that E ∩ Xn = An ∪ Nn . Setting
∞
[ ∞
[ ∞
[
A= An , M = Mn , and N = Nn ,
n=1 n=1 n=1
Proof. Let ν denote the outer measure generated by (ν|A , A). Then the measures ν ∗ |σ(A)
∗
Exercises
1.61 Let Ai be a semiring on Xi , i = 1, 2. Show that A1 × A2 is a semiring.
1.62 Let µ be a measure on a semiring A ⊆ P(X) and E ∈ A
(a) Prove that A ∩ E is a semiring consisting of the members of A that are subsets of E.
(b) Let ν be the restriction of µ to A ∩ E and let µ∗ and ν ∗ be the outer measures generated
by (X, A, µ) and (E, A ∩ E, ν). Show that ν ∗ is the restriction of µ∗ to P(E).
(c) Prove that M(ν ∗ ) = M(µ∗ ) ∩ E.
1.63 Let µ be as in 1.6.4 and let ν be a measure on σ(A) that equals µ on A.
(a) Show that ν(E) ≤ µ(E) for all E ∈ σ(A). (1.6.10 shows equality may not hold.)
(b) Show that ν(E) = µ(E) for all E ∈ σ(A) with µ(E) < ∞. J Assume that A is a ring (how?).
Choose A ∈ A such that E ⊆ A and µ(A) < µ(E) + ε. Then ν(E) + ν(A \ E) < ν(E) + ε. K
1.64 Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated by
(A, µ). Prove that for any E ⊆ X there exists A ∈ σ(A) such that E ⊆ A and µ∗ (E) = µ(A).
1.65 [↑ 1.64] Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated
by (A, µ). Prove the weak inclusion-exclusion principle
µ∗ (E ∪ F ) + µ∗ (E ∩ F ) ≤ µ∗ (E) + µ∗ (F ), E, F ⊆ X.
1.66 [↑ 1.64] Let µ and ν be measures on a semiring A ⊆ P(X) and let µ∗ and ν ∗ be the outer
measures generated by (A, µ) and (A, ν), respectively. Prove that (µ + ν)∗ = µ∗ + ν ∗ and
M(µ∗ ) ∩ M(ν ∗ ) ⊆ M(µ∗ + ν ∗ ). Show that the inclusion may be strict.
1.67 [↑ 1.64, 1.40] Let µ and µn be σ-finite measures on a semiring A ⊆ P(X) with µn ↑ µ on σ(A).
Let µ∗ , µ∗n be the outer measures generated by (A, µ) and (A, µn ) Prove that µ∗n ↑ µ∗ on P(X).
1.68 [↑ 1.66, 1.67] Let µn be measures on a semiring A on X and define µ(A) = ∞ n=1 µn (A) (A ∈ A).
P
Let µ∗Pand µ∗n be the outer measures generated by (A, µ) and (A, µn ), respectively. Prove that
µ∗ = ∞ ∗
n=1 µn .
1.69 [↑ 1.64] Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated
by (A, µ). Prove that µ∗ is continuous from below. Why doesn’t this imply that µ∗ is a measure
on P(X)?
1.70 [↑ 1.64] Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated
by (A, µ). Suppose that µ∗ (X) < ∞. Show that E ∈ M(µ∗ ) iff µ(X) = µ∗ (E) + µ∗ (E c ).
In this section we apply the results of §1.6 to the pair (HI , λ) to construct d-dimensional
Lebesgue measure. The following lemma is key to the construction.
64 Principles of Analysis
d
H7 H6
y3 H1
y2 H5
Ri,j
y1 H2
H3 H4
c
a x1 x2 x3
b
FIGURE 1.1: Pairwise disjoint interval grid of H.
of [a, b] and [c, d], respectively. These partitions generate a grid of disjoint subrectangles
Ri,j = (xi , xi+1 ] × (yj , yj+1 ] with union H such that each Hk is a union of such subrectangles.
The procedure for case (a) is illustrated in Figure 1.1. Since
p−1
X q−1
X
b−a= (xi+1 − xi ) and d − c = (yj+1 − yj ),
i=0 j=1
Similarly, X
λ(Hk ) = λ(Ri,j )
(i,j):Ri,j ⊆Hk
so that X X X
λ(Hk ) = λ(Ri,j ). (1.10)
k k (i,j):Ri,j ⊆Hk
to obtain
1.7.3 Theorem. The volume set function λ on HI has a unique extension to B(Rd ).
Moreover, M(λ∗ ) is the completion of B(Rd ).
The members of M(λ∗ ) are called Lebesgue measurable sets and λ := λ∗ M(λ∗ ) is
called Lebesgue measure on Rd .
Exercises
1.71 Let I ∈ HI . Show that λ(I) = λ(int I) = λ(cl I). Also, in the definition
X∞ ∞
[
∗
λ (E) := inf λ(An ) : An ∈ A and E ⊆ An , E ⊆ Rd ,
n=1 n=1
Regularity
The following theorem complements the approximation property 1.6.5.
1.8.1 Theorem. Let µ be a Lebesgue-Stieltjes measure on Rd and let E ∈ B(Rd ). Then
(a) µ(E) = inf{µ(U ) : U open and U ⊇ E}.
V (gray)
K
E
J
where the sum is taken over all indices n for which cn ≤ x. (If there are no such indices,
the sum is defined to be 0.) Note that because the order of summation is irrelevant, F is
well-defined. The Lebesgue-Stieltjes measure corresponding to F is given by
X
µ(B) = pn for all Borel sets B.
n:cn ∈B
Proof of Theorem 1.8.2. For the first part of the theorem, define F : R → R as follows:
Let F (0) be arbitrary and set
(
F (0) + µ(0, x] if x > 0,
F (x) :=
F (0) − µ(x, 0] if x < 0.
By considering cases, we see that for a < b, F (b) − F (a) = µ(a, b]. Therefore, F is
nondecreasing and right continuous. If also G(b) − G(a) = µ(a, b] for all a < b, then
F (x) − F (0) = G(x) − G(0) for all x, hence F = G + F (0) − G(0).
For the converse, let F : R → R be a distribution function. To construct the Lebesgue-
Stieltjes measure defined by F , we apply the results of §1.6 to (HI , µ), where µ is the
set function on HI given by (1.11). Thus the proof of the theorem will be complete if we
show that µ is countably additive on HI . The following lemmas, analogous to those of §1.7,
establish this.
68 Principles of Analysis
a1 = a a2 b1 a3 b2 a4 b3 b4 = b
I1 I2 I3 I4 I5 I6 I7
X X
µ(H) = µ(Ii ) and µ(Hj ) = µ(Ii ).
i i:Ii ⊆Hj
= (b1 − a1 ) · · · (bd − ad ).
b1 bd
Thus 4a1 · · · 4ad F (x1 , x2 , . . . , xd ) is the Lebesgue measure of the d-dimensional interval
(a1 , b1 ] × · · · × (ad , bd ]. This sort of connection holds more generally and is described in the
theorem below. For the statement of the theorem we need the following definitions:
A function F : Rd → R is a distribution function if it is nondecreasing in the sense
that
b b
4a11 · · · 4add F (x1 , . . . , xd ) ≥ 0, ai < bi , i = 1, . . . , d,
and right continuous in the sense that
xi,n ↓n xi , i = 1, . . . , d ⇒ F xn,1 , . . . , xn,d → F (x1 , . . . , xd ).
Exercises
1.79 Describe the Lebesgue-Stieltjes measure for each of the following distribution functions.
(a) F (x) = bxc, the greatest integer function.
(b) F (x) = x1[0,1) + 1[1,∞] .
1.80 Show that the sum of finitely many distribution functions and the product of finitely many
nonnegative distribution functions are distribution functions.
1.81 Verify that the function in 1.8.3(b) is a distribution function. Prove also that F is left continuous
at a iff a 6= cn for every n.
and set
F (−∞) := F ((−∞)+) and F (∞) := F (∞−).
Let F be a distribution function and µ the associated Lebesgue-Stieltjes measure. Prove the
following, when defined:
(a) µ(a, b) = F (b−) − F (a).
(b) µ[a, b) = F (b−) − F (a−).
(c) µ[a, b] = F (b) − F (a−).
Prove also that µ{x} = 0 iff F is continuous at x.
1.83 Let µ be a finite Lebesgue-Stieltjes measure on B(R) such that µ {x} = 0 for all x. Show that
1.84 Show that a monotone function f : R → R has countably many discontinuities. Conclude that
if µ is a Lebesgue-Stieltjes measure, then there exist at most countably many x ∈ R such that
µ({x}) > 0. JFor each t ∈ R, define at = limx→t− f (x) and bt = limx→t+ f (x). Then at < bt iff
f is discontinuous at t.K
1.85 Let µ be a Lebesgue-Stieltjes measure on R with a continuous distribution function and let
A ∈ B(R) with µ(A) > 0. Prove that for each b ∈ (0, µ(A)) there exists a Borel set B ⊆ A such
that µ(B) = b. JUse the intermediate value theorem on G(x) = µ A ∩ [−n, x] for suitable nK.
Measurable Sets 71
I0,1
0 1
I1,1 I1,2
.0... .2...
I2,1 I2,2 I2,3 I2,4
.00... .02... .20... .22...
I3,1 I3,2 I3,3 I3,4 I3,5 I3,6 I3,7 I3,8
.000... .002... .020... .022... .200... .202... .220... .222...
To show that C is uncountable, consider the ternary representation of a number x ∈ [0, 1]:
∞
X
x = .d1 d2 . . . = dk 3−k , where dk ∈ {0, 1, 2}. (1.13)
k=1
By induction, using the fact that x ∈ Ik−1,j ⇒ Ik,2j−1+dk /2 , one shows that x ∈ C iff x has
an expansion with even digits (see Figure 1.4). Define ϕ : C → [0, 1] by
ϕ .d1 d2 . . . (ternary) = .e1 e2 . . . (binary), where dk ∈ {0, 2} and ek = dk /2.
The function ϕ is not one-to-one, but by removing from C the countable set of all numbers
with ternary representations ending in a sequence of zeros we obtain a set D on which ϕ is
one-to-one. Since ϕ(D) = (0, 1), C is uncountable.
Non-Lebesgue-Measurable Sets
We show the following:
Every Lebesgue measurable set A with λ(A) > 0
contains a set that is not Lebesgue measurable.
S
Since A = n∈Z A ∩ [n, n + 1], we may suppose that A is bounded. Define an equivalence
relation on A by x ∼ y iff x − y ∈ Q. Let B be the subset of A obtained by choosing exactly
one point from each distinct equivalence class. (The existence of B requires the axiom of
choice.) Now observe that the sets r + B, r ∈ Q, are disjoint. Indeed, if (r + B) ∩ (s + B) 6= ∅,
then r + x = s + y for some x, y ∈ B, so x = y and r = s. Moreover, since A is bounded
72 Principles of Analysis
so is B + [0, 1]. Let (rn ) be an enumeration of the rationals in [0, 1] and assume that B is
measurable. Then
[ X X
∞>λ (B + rn ) = λ(B + rn ) = λ(B),
n n n
and fn is linear on the complementary intervals In,j . Since |fn (x) − fn+1 (x)| ≤ 1/2n+1 , the
3
4
1
2
1
4
sequence {fn } is uniformly Cauchy and so converges to a continuous function f , the Cantor
function.
To construct the desired non-Borel set, note first that since fn (0) = 0, fn (1) = 1, and fn is
nondecreasing on [0, 1], f also has these properties. Thus, by the intermediate value theorem,
f (I) = I. Since the values of f on the intervals Jn,k are already assumed at the endpoints
and since these endpoints lie in C, f (Jn,k ) contributes nothing additional to the range of f ,
Measurable Sets 73
hence f (C) = I. Now set g(x) = (f (x)+x)/2, x ∈ I. Then g is continuous, strictly increasing,
g(0) = 0, and g(1) = 1, hence g(I) = I. It follows that g : I → I is a homeomorphism, hence
g(C) is closed. Thus g(I \ C) is a proper nonempty open subset of I and so has positive
Lebesgue measure. Moreover, g takes the interval Jn,k , on which f is constant, to an open
interval half its length, so by countable additivity λ g(I \ C) = λ(I \ C)/2 = 1/2 and
therefore λ(g(C)) = 1/2. Now let E be a subset of g(C) that is not Lebesgue measurable and
let A := g −1 (E). Then A ⊆ C and so is Lebesgue measurable with λ(A) = 0. However, A
cannot be a Borel set since g maps Borel sets onto Borel sets. (This is proved in Chapter 2.)
1.9.1 Remark. While the intricate nature of the construction of A might lead one to
believe that such sets are rare, there are in fact many more Lebesgue measurable sets than
Borel sets. Indeed, since the Cantor set C is uncountable and every subset of C is Lebesgue
measurable, the collection of Lebesgue measurable sets has cardinality 2c , where c is the
cardinality of the continuum. On the other hand, it may be shown that B(R) has only
cardinality c. (See, for example, [38].) ♦
Exercises
1.86 Show that (R, B(R), λ) is not complete.
1.87 Carry out the steps below to prove following assertion: If A ⊆ R has positive Lebesgue measure
then the set A − A := {x − y : x, y ∈ A} contains an interval (−r, r) for some r > 0.
(a) Show that it suffices to consider the case A compact.
(b) Choose an open set U ⊇ A such that λ(U ) < 2λ(A) (how?). Define a distance function
d : U → R by d(x) = inf{|x − y| : y ∈ U c }. Show that d is continuous and positive. Conclude
that d has a minimum r > 0 on A.
(c) Show that |x| < r ⇒ x + A ⊆ U ⇒ (x + A) ∩ A 6= ∅. Conclude that (−r, r) ⊆ A − A.
1.88 [↑ 1.87] Show that the only subgroup of (R, +) that has positive Lebesgue measure is R.
1.89 Let (an ) be a sequence in (0, 1) and set bn := 1 − an . Mimic the construction of the Cantor
ternary set by removing the middle part of [0, 1] of length a1 , leaving two intervals with union
E1 , each of length b1 /2, then removing the middle part of length a2 b1 /2 from these leaving
T four
intervals with union E2 , each of length b1 b2 /4, and so forth. The intersection E := n En is
0 b1 b1 1
2 2
a1
b1 b2 b1 b2 b1 b2 b1 b2
4 b1 a2 4 4 b1 a2 4
2 2
1.90 Let A be the set of all x ∈ [0, 1] having a decimal expansion .d1 d2 . . . with no digit equal to 3.
Show that A is uncountable, A ∈ B(R), and λ(A) = 0.
Chapter 2
Measurable Functions
In this chapter we consider functions that are measurable with respect to a given σ-field
F, that is, functions f for which (in the real-valued case) the sets {x ∈ X : f (x) ∈ (a, b)}
are F-measurable. As we shall see, such functions are natural candidates for integration
with respect to Lebesgue measure. We begin with the more general notion of measurable
transformation.
General Properties
2.1.1 Proposition. If T : (X, F) → (Y, G) and S : (Y, G) → (Z, H) are measurable, then
S ◦ T : (X, F) → (Z, H) is measurable.
Proof. This follows from (S ◦ T )−1 (A) = T −1 S −1 (A) , A ∈ H.
The following result characterizes measurability in terms of the generators of a σ-field. It
will play an important role in what follows.
2.1.2 Theorem. Let A ⊆ P(Y ) and T : X → Y . Then σ T −1 (A) = T −1 σ(A) . In
particular, T : (X, F) → (Y, σ(A)) is measurable iff T −1 (A) ∈ F for all A ∈ A.
Proof. Since T −1 σ(A) is a σ-field
and T −1 (A) ⊆ T −1 σ(A) , it follows by minimality
that σ T −1 (A) ⊆ T −1 σ(A) . For the reverse inclusion, observe that the set
A ∈ σ(A) : T −1 (A) ∈ σ T −1 (A)
75
76 Principles of Analysis
T1 (X1 , F1 )
T T2
(X0 , F0 ) (X, F) (X2 , F2 )
T3 (X3 , F3 )
Proof. Proposition 2.1.1 gives the necessity. For the sufficiency, if Ti ◦ T is F0 /Fi -measurable
for every i ∈ I, then
[ [
T −1 (E) = T −1 Ti−1 Fi = (Ti ◦ T )−1 (Fi ) ⊆ F0 ,
i∈I i∈I
hence σ T −1 (E) ⊆ F0 . But by the theorem, σ T −1 (E) = T −1 F).
πi : X → Xi , πi (x1 , . . . , xd ) = xi ,
hence πi is F/Fi -measurable. The set E in 2.1.4 corresponding to the maps πi is the
collection of all such sets, and taking intersections produces F1 × · · · × Fd . Therefore,
σ(E) = F1 ⊗ · · · ⊗ Fd , and the conclusion of the theorem follows from 2.1.4.
2.1.6 Corollary. Let (Xi , Fi ) be measurable spaces (i = 0, 1, . . . , d) and Ti : X0 → Xi
arbitrary mappings (i = 1, . . . , d). Define
T = (T1 , . . . , Td ) : X0 → X1 × · · · × Xd , T (x) = T1 (x), . . . , Td (x) .
(X1 , F1 ) T1
(X2 , F2 ) T2 (X, F) T (X0 , F0 )
(X2 , F2 ) T3
Exercises
2.1 Show that for a measurable transformation T : (X, F) → (Y, G) it is not necessarily the case
that T (F) ⊆ G.
2.4 Let (X, F), (Y, G), and (Z, H) be measurable spaces and let T : X → Y have countable range.
Assume that G contains the singletons (e.g., a Borel σ-field). Show that
(a) T is F/G-measurable iff T −1 ({y}) ∈ F for every y ∈ Y .
(b) If T is F/G-measurable, then S ◦ T is F/H-measurable for any mapping S : Y → Z.
2.5 Let (X, F) and (Y, G) be measurable spaces. Show that if A ⊆ X and B ⊆ Y are nonempty
and A × B ∈ F ⊗ G, then A ∈ F and B ∈ G.
2.7 Let {(Xi , Fi ) : i ∈ I} be a family of measurable spaces with union X and let F be the σ-field
of all sets E ⊆ X such that E ∩ Xi ∈ Fi for all i ∈ I. Let (X0 , F0 ) be a measurable space and
T : X → X0 . Show that T is F/F0 -measurable iff T X is Fi /F0 -measurable for every i ∈ I.
i
2.10 Let (X, F), (Z, H) be measurable spaces, T : X → Y surjective, and G = {A ⊆ Y : T −1 (A) ∈ F}.
Let R : (X, F) → (Z, H) be measurable such that T (x) = T (x0 ) ⇒ R(x) = R(x0 ). Show that
there exists a measurable transformation S : (Y, G) → (Z, H) such that R = S ◦ T .
78 Principles of Analysis
2.11 Let (Y, F), (Z, H) be measurable spaces, T : X → Y , and set F := T −1 (G), so that the map
T : (X, F) → (Y, G) is measurable. Let R : (X, F) → (Z, H) be measurable with countable range.
Show that if H contains the singletons, then there exists a measurable S : (Y, G) → (Z, H) such
that R = S ◦ T .
2.12 Prove that if S, T : Rp → Rq are continuous and S = T λ-a.e., then S = T . What if only one of
the mappings is continuous?
2.13 [↓ 3.5.2] Let (X, F), (Y, G), and (Z, H) be measurable spaces and T : X × Y → Z an arbitrary
mapping. We say that T is separately measurable if Ty := T (·, y) is F/H-measurable for
each y ∈ Y and Tx := T (x, ·) is G/H-measurable for each x ∈ X. To distinguish from separate
measurability, we sometimes refer to F ⊗ G/H-measurability of T as joint measurability.
Show that if T is jointly measurable, then it is separately measurable.
It follows easily that if the range of f is countable, say ran f = (an ), then f is measurable
iff {f = an } ∈ F for all n.
Measurable Functions 79
2.2.1 Example. Let dn (x) denote the nth digit of the decimal expansion of x ∈ [0, 1),
where for definiteness we exclude expansions that end in a sequence of 9’s, choosing for
example .500 · · · over .499 · · · . Let en ∈ {0, 1, . . . , 9}. Then
The following proposition shows that measurable R-valued functions may be combined in
standard ways to produce new measurable functions.
2.2.5 Proposition. If f, g : X → K are measurable and c ∈ C, then f + g, f g, cf , f , and
|f | are measurable. Moreover, if K = R, then f ∨ g and f ∧ g are measurable.
80 Principles of Analysis
and
lim fn = sup inf fk , lim fn = − lim(−fn ).
n n k≥n n n
Proof. By considering real and imaginary parts, we may assume that fn and f are R-valued.
Part (a) follows from the fact that f = limn fn . For (b), let N = {x : limn fn (x) 6= f (x)}
and set gn = fn 1N c and g = f 1N c . Then gn is Fµ -measurable and gn → g, hence g is
Fµ -measurable by part (a). Since g = f a.e., f is Fµ -measurable.
2.2.8 Example. Let f : X × R → C have the property that f (x, t) is left continuous in t
for each x and F-measurable in x for each t. We show that f is F ⊗ B(R)-measurable. For
this, it suffices to take f real-valued.
For each n, the collection of intervals of the form Ik,n := k/n, (k +1)/n , k ∈ Z, partitions
R. Define
fn (x, t) = f x, k/n , t ∈ In,k , k ∈ Z, x ∈ X.
Then fn is F ⊗ B(R)-measurable, as may be seen by writing
X
fn (x, t) = f x, k/n 1In,k (t)
n∈Z
Exercises
2.14 Give an example of a nowhere continuous function equal a.e. to a continuous function.
2.15 Show that if F 6= P(X), then there exists a nonmeasurable function f such that |f | is measurable.
2.16 Let fn : X → R be F-measurable for every n. Prove that the following sets are F-measurable:
(a) {x : limn fn (x) exists in R}. (b) {x : limn fn (x) exists in R}.
2.19 Prove that f = 1[0,1] is not equal a.e. to a continuous function on R. Show, however, that f is a
pointwise limit of continuous functions fn such that for each ε > 0, λ{|fn − f | ≥ ε} → 0.
2.22 Let f : X × [a, b] → R such that f (x, t) is F-measurable in x for each x and continuous in t for
Rb
each t. Show that the Riemann integral a f (x, t) dt is F-measurable in x.
2.23 [↑ 2.2.1] For x ∈ (0, 1) define f (x) to be first digit in the decimal expansion of x that is greater
than 5 and f (x) = 0 if there is no such digit. (For definiteness, use decimal expansions that do
not end in a sequence of 9’s.) Also, define g(x) to be the first time a digit is greater than 5, and
g(x) = ∞ if there is no such digit. Prove that f and g are Borel measurable.
2.24 Show that the supremum of an uncountable family of Borel functions on R need not be Lebesgue
measurable.
2.25 [↑ 2.9] Let f : Rd → R have the property that for each i, f (x1 , . . . , xi , . . . xd ) is either left
continuous or right continuous in xi when the other variables are fixed. Show that f is Borel
measurable.
2.26 Let F be a σ-field on Rd such that every continuous function f : Rd → R that vanishes outside
a bounded interval is F-measurable. Prove that B(Rd ) ⊆ F.
2.27 Let µ be a finite measure on B(Rd ) and A ∈ B(Rd ). Define f (x) = µ(A + x), x ∈ Rd . Show
that f is Borel measurable. JAssume first that A is closed and show that At := {f ≥ t} is
closed.K
(d) For arbitrary f , the functions g(x) := limt→x f (t) and h(x) := limt→x f (t) are, respectively,
upper and lower semicontinuous on R.
(e) The set {x : limt→x f (t) exists in R} is Borel measurable.
(f) The set {x : limt→x f (t) exists in R} is Borel measurable.
2.30 [↑ 1.20] Define a topology on [−∞, ∞] with open sets O such that B([−∞, ∞]) = σ(O).
n+1
j2−n
(2j − 1)2−(n+1)
(j − 1)2−n
An
An,j
FIGURE 2.3: Components of fn and fn+1 .
(b) If f is R-valued, apply (a) to f + and f − . Suppose f is C-valued. Let gn and hn be real-
valued simple functions such that gn → Re f , hn → Im f , |gn | ≤ |Re f |, and |hn | ≤ |Im f |.
Then gn + ihn is a simple function, gn + ihn → f , and
|gn + ihn |2 = gn2 + h2n ≤ (Re f )2 + (Im f )2 = |f |2 .
The proof of part (c) is left to the reader as an exercise (2.34).
T
X Y
g
f
Proof. Since 1T −1 (A) = 1A ◦ T (A ∈ G), the assertion holds for F-simple functions. If f ≥ 0,
let (fn ) be a sequence of nonnegative F-simple functions such that fn ↑ f . For each n
there exists a nonnegative G-measurable function hn : Y → R such that fn = hn ◦ T . Set
gn = h1 ∨ · · · ∨ hn . Then gn is G-measurable and gn ↑ on Y , hence g := limn gn exists and is
G-measurable. Moreover, hn ↑ on T (X), hence fn = gn ◦ T . Taking limits, we have f = g ◦ T .
If f is real-valued, choose G-measurable functions g1 , g2 : Y → R such that f + = g1 ◦ T
and f − = g2 ◦ T . Then f = (g1 − g2 ) ◦ T .
Finally, if f is complex-valued, choose G-measurable functions h1 , h2 : Y → R such that
Re f = h1 ◦ T and Im f = h2 ◦ T . Then f = (h1 + ih2 ) ◦ T .
Hereafter, in arguments such as those in the preceding theorem we shall frequently omit
the part of the proof that constitutes the transition from the nonnegative case to the complex
case, this argument usually being straightforward.
2.3.3 Theorem. Let (X, F) be a measurable space and let f : X → K be Fµ -measurable.
Then there exists an F-measurable function g such that f = g a.e.
Proof. Let f = 1E (E ∈ Fµ ). Then E = A ∪ N , where A ∈ F, N ⊆ M , and µ(M ) = 0. Set
g = 1A . Then g is F-measurable, and {f 6= g} ⊆ N , so f = g a.e. Therefore, the assertion
holds for indicator functions.
If f is a Fµ -simple function in standard form then, by the first paragraph, each of its
terms is equal a.e. to an F-measurable function. By considering a finite union of sets of
measure zero we see that f has this property.
If f is nonnegative, there exists a sequence of nonnegative Fµ -simple functions fn such
that fn → f on X. By the previous paragraph, for each n there exists anSF-measurable
∞
function gn such that fn = gn a.e. Let Nn := {x : fn 6= gn } and N := n=1 Nn . Then
N ∈ Fµ , µ(N ) = 0 and fn = gn on N for all n. Let M denote the set of all x such that
c
the sequence (gn (x)) does not converge in R. Then M ⊆ N , and by Ex. 2.16, M ∈ F. Let
g = limn gn 1M c . Then g is measurable and {g 6= f } ⊆ N , so g = f a.e. The general case
f : X → K follows by a standard argument.
Exercises
2.31 Prove that 1E is measurable iff E ∈ F.
2.33 [↓ 3.5] Let X be uncountable and let F the σ-field of countable or cocountable subsets of X.
Show that a function f : X → C is measurable iff f is constant on some cocountable set. JUse
2.3.1.K
a.e. λ a.u.
1[0,1/n] → 0, 1[0,1/n] → 0, 1[0,1/n] → 0,
λ a.u.
a.e.
1[n,n+1] → 0, 1[n,n+1] 6→ 0, 1[n,n+1] 6→ 0, (2.2)
a.e. λ a.u.
1[n,n+1/n] → 0, 1[n,n+1/n] → 0, 1[n,n+1/n] 6→ 0 .
Proof. We prove the proposition for convergence in measure. Part (a) follows from
|(af + bg) − (afn + bgn )| ≥ ε ⊆ |f − fn | ≥ ε/2(|a| + 1) ∪ |g − gn | ≥ ε/2(|b| + 1) ,
For the first part of (b), use the inequality µ{| |fn | − |f | | ≥ ε|} ≤ µ{|fn − f | ≥ ε|}. Part (c)
follows from
λ
hence fn → 0 on [0, 1). On the other hand, for any x ∈ [0, 1), fn (x) = 1 for infinitely many
f4 f5 f6 f7
1
Exercises
2.35 Discuss the convergence behavior of fn (x) = xn 1[0,1] on (R, B(R), λ).
µ
2.36 Let En ∈ F and let f be measurable. Suppose that 1En → f . Show that f = 1E a.e. for some
E ∈ F.
a.e.
2.37 Let En ∈ F, A := limn En , and B := limn En . Show that 1En → f for some f iff µ(A \ B) = 0.
µ µ
2.38 Let f, fn , g : X → C be measurable, fn → f and fn → g. Show that f = g a.e.
µ a.e.
2.39 Let f, fn : X → R be measurable and fn → f . Show that if fn ↑ then fn → f .
µ
2.40 Let f, fn : X → C be measurable. Show that fn → f iff for each ε > 0 there exists m such that
µ{|f − fn | ≥ ε} < ε for all n ≥ m.
a.e.
2.41 Let fn : X → C be measurable. Show that fn → f for some Fµ -measurable f iff gm,n :=
fm − fn → 0 a.e. as m, n → ∞.
2.42 [↑ 2.41] Let µ(X) < ∞, an > 0, and n an < ∞. Let fn : X → C be measurable and set
P
P a.e.
An := {|fn − fn+1 | ≥ an }. Show that if n µ(A n ) < ∞, then fn → f for some function
Pm−1
f : X → C. JBy 1.37, µ(limn An ) = 0. Consider k=n [fk (x) − fk+1 (x)]. K
µ
2.43 [↑ 2.42] Let µ(X) < ∞ and fn : X → C measurable. Show that fn → f for some f : X → C
µ
iff fm − fn → 0 as m, n → ∞. JFor the sufficiency, modify the proof of 2.4.4 to obtain a
strictly increasing sequence of positive integers nk such that µ{|fn − fm | ≥ 1/2k } < 1/2k for
all m, n ≥ nk . K
2.44 (Frechét). Let µ(X) < ∞. Define
ρ(f, g) = inf r + µ{x : |f (x) − g(x)| ≥ r}
r>0
Show that if functions that are equal µ-a.e are identified, then ρ becomes a metric on the space
µ
L0 = L0 (X, F, µ) of all measurable functions on X. Show also that ρ(f, fn ) → 0 iff fn → f .
Thus, by 2.43, the metric space is complete.
88 Principles of Analysis
2.48 Let µ and ν be finite measures on (X, F) with the same sets of measure zero and let f, fn : X → C
µ ν
be measurable. Show that fn → f iff fn → f .
Chapter 3
Integration
In this chapter we construct the general Lebesgue integral. The construction proceeds
in stages. The integral is first defined on the class of nonnegative simple functions and
then extended to nonnegative measurable functions and ultimately to complex measurable
functions. The basic properties of the integral are also developed in this chapter. Additional
properties are discussed in subsequent chapters.
Note that the above sum may contain terms of the form a · ∞, where a ∈ [0, ∞). Such terms
have value either ∞ or 0, depending on whether a > 0 or a = 0 (see §0.1). In particular, the
integral of the identically zero function is 0 · µ(X) = 0, whether or not µ(X) is finite.
The following lemma summarizes the elementary properties of the integral of nonnegative
simple functions. These will be used later to obtain analogous properties of the general
integral.
89
90 Principles of Analysis
Proof. Part (a) is immediate from the definition of the integral, and (d) follows from (c). To
prove (b), let f and g have standard representations
m
X n
X
f= ai 1Ai and g = bj 1 B j .
i=1 j=1
Sm Sn Pn
Since X = i=1 Ai = j=1 Bj (disjoint), we have µ(Ai ) = j=1 µ(Ai ∩ Bj ) and µ(Bj ) =
Pm
i=1 µ(Ai ∩ Bj ), hence
Z Xm X Z n
X X
f= ai µ(Ai ) = ai µ(Ai ∩ Bj ) and g= bj µ(Bj ) = bj µ(Ai ∩ Bj ). (†)
i=1 i,j j=1 i,j
and so
Z p
X p
X X X Z Z
(f + g) = ck µ(Ck ) = ck µ(Ai ∩ Bj ) = (ai + bj )µ(Ai ∩ Bj ) = f + g,
k=1 k=1 ai +bj =ck i,j
Note that the integral is nonnegative and could be infinite. (For an extremeR example, consider
the measure µ on P(X) that assigns ∞ to every nonempty set. Then f dµ = ∞ for all
nonnegative functions except the identically zero function.)
The integral of a measurable function f : X → R is defined as
Z Z Z
f dµ := f + dµ − f − dµ,
R R
provided at least one of the integrals on the right is finite. If both f + dµ and f − dµ are
finite, then f is said to be integrable.
Integration 91
that is, Z Z Z Z
Re f dµ = Re f dµ and Im f dµ = Im f dµ.
It follows that Z Z
f dµ = f dµ.
We have now constructed the integral with respect to µ on the class of all (suitably
restricted) measurable functions f : X → K. The special cases of the integral with respect
to Lebesgue measure on Rd and Lebesgue-Stieltjes measures on Rd are important examples.
Here is another example:
3.1.2 Example. Let x ∈ X and let δx be the Dirac measure defined in 1.3.3(d). Then
Z
f dδx = f (x) (3.2)
for every F-measurable function f : X → K. Indeed, this clearly holds for indicator functions
f , and, by 3.1.1(a) and (b), it holds for nonnegative F-simple functions. If f ≥ 0, then
Z n o
f dδx = sup fs (x) : 0 ≤ fs ≤ f, fs simple = f (x),
the last equality by 2.3.1. For the general real-valued case, use the positive and negative
parts of f . For the complex case, consider the real and imaginary parts of f . ♦
To see this, first take f = 1A , A ∈ F. Then the left side of (3.3) is simply µ(A ∩ E), and since
1A E is the indicator function of E ∩A on the domain E, the right side is ν(A∩E) = µ(A∩E).
Thus (3.3) holds for indicator functions, hence for nonnegative F-simple functions. Taking
suprema over integrals of simple functions shows that the equation holds for nonnegative
measurable functions, hence for arbitrary measurable R-valued functions via f = f + − f − ,
and finally for measurable C-valued functions using f = Re f + i Im f . ♦
R
The preceding
R remark implies that general properties of integrals f dµ are immediately
valid for E f dµ—no special argument is necessary.
92 Principles of Analysis
Applying the proposition to the real and imaginary parts of f and g, we have
and so Z m
X
fn dµ ≥ r aj µ(En ∩ Aj ).
j=1
Then ν is a measure on F.
Proof. For countable additivity, apply 3.2.9 to gn = 1En · h.
3.2.11 Corollary. Let f, g : X → K be measurable.
Proof. (a) Suppose first that f is R-valued. If f is integrable, then, by definition, f + and
f − are integrable, hence + −
R ± by theR theorem |f | = f+ + f −is integrable. Conversely, if |f | is
integrable, then 0 ≤ f dµ ≤ |f | dµ, hence f and f are integrable.
Now let f be C-valued. If f is integrable then by definition Re f and Im f are integrable.
By the first paragraph, |Re f | and |Im f | are integrable, hence, by the theorem |Re f | + |Im f |
is integrable. Since |f | ≤ |Re f | + |Im f |, |f | is integrable. This proves the necessity of (a). A
similar argument shows that if |f | is integrable, then Re f and Im f are integrable, verifying
the sufficiency.
(b) By part (a), |f | is integrable. The inequality |g| ≤ |f | then implies that |g| is integrable.
By (a) again, g is integrable.
(c) This follows from (b), since |f 1E | ≤ |f |.
We may now prove linearity for the real-valued case:
R
3.2.12 Theorem.
R Let f, g : X → R be measurable, g integrable, and a, b ∈ R. If f du
exists, then (af + bg) dµ exists and
Z Z Z
(af + bg) dµ = a f dµ + b g dµ.
(f + g)+ + f − + g − = (f + g)− + f + + g +
proving additivity.
If a ≥ 0, then (af )+ = af + and (af )− = af − , hence, by 3.2.8,
Z Z Z Z Z Z
+ − + −
af = (af ) − (af ) = a f − a f = a f.
Integration 95
Therefore, if a < 0, Z Z Z Z
af = (−a)(−f ) = −a (−f ) = a f.
Proof. By 3.2.8, |α| |f | + |β| |g| is integrable. Since |αf + βg| ≤ |α| |f | + |β| |g|, by 3.2.11
αf + βg is integrable. Now let α = a + i b and set fr = Re f and fi = Im f . Then
αf = a fr − b fi + i [b fr + a fi ],
in the sense that if one side is defined, then so is the other and equality holds.
Proof. Since 1T −1 (A) = 1A ◦ T , (3.4) holds for indicator functions g, hence by linearity
for simple functions. Taking a sequence of nonnegative simple functions increasing to g
and applying the monotone convergence theorem yields (3.4) for nonnegative measurable
functions g. The general case follows by standard arguments.
Applying the theorem to the transformations x → x + z and x → rx on Rd , we have
3.2.16 Corollary. The following are valid in the sense that if one side of an equation is
defined, then so is the other and equality holds.
Z Z Z Z
f (x + y) dλd (x) = f (x) dλd (x) and f (rx) dλd (x) = |r|−d f (x) dλd (x). (3.5)
Properties (3.5) express, respectively, the translation invariance and dilation proper-
ties of the Lebesgue integral. The special case r = −1 gives the reflection invariance of
the integral.
(See 3.2.10.) We also express this by writing d(hµ) = h dµ. Densities arise as Radon-Nikodym
derivatives and in particular as conditional expectations in probability theory. The proof of
the following theorem is similar to that of 3.2.15. The details are left to the reader (Ex. 3.15).
3.2.17 Theorem. Let f be F-measurable. Then
Z Z
f d (hµ) = (f · h) dµ
in the sense that if one side is defined, then so is the other and equality holds.
Note that the Dirac measure δx on R (1.3.3(d)) has no density with respect to λ. Never-
theless, it is customary in physics and elsewhere to write
Z ∞ Z ∞
f (x) = f (y)δ(y − x) dy = f (x + y)δ(y) dy
−∞ −∞
for a symbolic density function δ(·), the so-called Dirac delta function. This interpretation
can be made rigorous using distribution theory. (See §15.1.)
Integration 97
in the sense that if one side is defined, then so is the other and then equality holds.
A proof of the theorem is given in Appendix A. Note that for all Lebesgue measurable
functions f ≥ 0 on V ,
Z Z Z
d
f dλ = 0 d
(f ◦ ϕ) · | det ϕ | dλ = (f ◦ ϕ) · det ϕ0 ◦ ϕ−1 ◦ ϕ dλd
V
ZU U
0 −1
= f · | det ϕ | ◦ ϕ dϕ(λd ).
V
−1
Replacing f by the f · | det ϕ0 | ◦ ϕ−1 we have
Z Z
0
−1 −1 d
f · | det ϕ | ◦ ϕ dλ = f dϕ(λd )
V V
hence
dλd
dϕ(λd ) = , (3.7)
| det ϕ0 | ◦ ϕ−1
which combines the notions of density and image measure.
Exercises
3.1 Let f ∈ L1 (R) be positive and a 6= 0 ∈ R. Prove that
Z Z
eiat f (t) dt < f (t) dt.
3.2 Let µ be the infinite series measure of 1.3.3(h). Prove that the equation
Z ∞
X
f dµ = f (k)pk
k=1
holds for any function f : N → C in the sense that if one side is defined, then so is the other
and equality holds. What is the significance of the case pk ≡ 1?
3.3 [↑ 1.40] Let (X, F) be a measurable space and {µn } a sequence of measures such that µn (A)
is a nondecreasing sequence for each A ∈ F. Then µ(A) = limn µn (A) defines a measure on
F. Prove that f is a nonnegative
R µ-integrable
R Borel measurable function on X then f is µn
integrable for every n and f dµ = limn f dµn .
3.5 Let X be uncountable and let F be the σ-field consisting of the countable and cocountable
subsets of X. Let µ be the probability measure on F assigns 0 to countable sets and 1 to
cocountable sets. By 2.33, an F-measurable Rfunction f is constant on some cocountable set.
Show that the constant is unique and equals f dµ.
3.6 Let f : X → C be measurable with P countable range {a1 , a2 , . . .}. Set An := {f = an }. Prove
that f is integrable Riff the series ∞
n=1 an µ(An ) converges absolutely, in which case the value
of the series equals f dµ.
R 2.2.1] Let dn (x) denote the nth digit in the decimal expansion of a number x ∈ [0, 1). Find
3.8 [↑
d (x) dλ(x).
[0,1] n
integral is 9/4.
P∞ −p
3.11 Let f be Lebesgue integrable on Rd and |f | > 0. Prove that the series
R
n=1 n f (nx)
converges absolutely a.e. on R iff p > 1 − d.
d
3.12 Let µ be a Lebesgue-Stieltjes measure on R such that for all integrable functions f ,
Z Z
f (x + y) dµ(x) = 2y f (x) dµ(x) y ∈ R.
3.13 [↓ 4.3.4] Let f ≥ 0 be µ-integrable. Prove that for each ε > 0 there exists a δ > 0 such that
Z
E ∈ F and µ(E) < δ ⇒ f dµ < ε.
E
3.16 Let dµ = h dν, where h is positive, finite and measurable. Show that dν = h−1 dµ
3.19 [↑ 3.13] Let fn , f : X → C be integrable and En , E ∈ F such that limn |fn − f | dµ = 0 and
R
3.22 Let (X, F, µ) be a finite measure space and let L0 = L0 (X, F, µ) denote the linear space of all
measurable functions f : X → K. Show that
Z
|f − g|
d(f, g) = dµ, f, g ∈ L0 ,
1 + |f − g|
defines a metric on L0 , where we identify functions equal a.e. Show also that convergence in
this metric is convergence in measure.
Z
1
3.23 Let f : X → R be µ-integrable. Prove that a ≤ f ≤ b a.e iff f dµ ∈ [a, b] for all A ∈ F
µ(A) A
with 0 < µ(A) < ∞.
3.25 Show that (X, F, µ) is σ-finite iff there exists a positive integrable function f on X.
3.26 Let I be an arbitrary index set. For i ∈ I and ai ∈ [0, ∞], define the extended real number
( )
X X
ai := sup aj : F ⊆ I, F finite .
i∈I j∈F
X ∞
X
(a) Show that there exists a sequence {in } in I such that ai = ain .
i∈I n=1
(b) Let (Xi , Fi , µi ), i ∈ I, be a family of measure spaces, where the sets Xi are disjoint. The
direct sum of these measure spaces is the triple (X, F, µ), where
[ X
X := Xi , F := {E ⊆ X : E ∩ Xi ∈ Fi ∀ i ∈ I}, µ(E) := µi (E ∩ Xi ).
i∈I i∈I
3.28 (Weighted mean value theorem for integrals). Let µ be a Lebesgue-Stieltjes measure on B(Rd )
and E ⊆ Rd compact and connected. Let f, g : E → R with g µ-integrable and f continuous. If
g does not change sign on E, show that for some c ∈ E.
Z Z
f g dµ = f (c) g dµ.
E E
P∞ f be measurable and µ(X) < ∞. Set An = {|f | ≥ n}. Prove that f is integrable iff
3.29 Let
n=1 µ(An ) converges, in which case limn nµ(An ) = 0. JConsider Bn := {n ≤ |f | < n + 1}.K
3.32 Let (Xi , Fi ), i = 1, 2, be measurable spaces and T : (X1 , F1 ) → (X2 , F2 ) measurable with
measurable inverse. Let µ be a measure on (X1 , F1 ) and let h ≥ 0 be F1 -measurable. Show that
T (hµ) = (h ◦ T −1 )T (µ).
3.33 Let V is a linear subspace of Rd of dimension m < d. Use the change of variables theorem to
show that λd (V) = 0. JConstruct a suitable linear transformation.K
R X be a metric space and µ, µ1 , µ2 , . . . finite measures on B(X) such that limn f dµn =
R
3.34 Let
f dµ for all bounded continuous f : X → R. Carry out the following steps to show that
limn µn (E) = µ(E) for all E ∈ B(X) with µ bd(E) = 0.
(a) Show that for each open U ⊆ X, there exists a sequence of closed sets Cn ↑ U .
(b) Referring to (a), show that there exist bounded continuous functions fk ↑ 1U .
R
(c) Show that fk du ≤ limn µn (U ) and hence µ(U ) ≤ limn µn (U ).
(d) Apply (c) to U = int(E) and U = X \ cl(E) to obtain the desired conclusion.
where the supremum and infimum are taken over all partitions P of [a, b]. If the upper and
lower integrals are equal, then f is said to be Darboux-integrable on [a, b], the common
Rb
value of these integrals then being denoted by a f .
Rb
For a limit description of a f , we need the following notions: A refinement of P =
P1 × · · · × Pd is a partition Q = Q1 × · · · × Qd of [a, b] such that, as a sets of points, Qj ⊇ Pj
for each j. Every member I of P is then a union of members J of Q, and because boundaries
of intervals have Lebesgue measure zero,
X
|I| = |J|.
J∈Q, J⊆I
The common refinement of partitions P and Q is the partition of [a, b] whose jth
coordinate partition consists of the points in Pj ∪ Qj . The following lemma shows that
taking refinements decreases the difference of upper and lower sums.
3.3.1 Lemma. If Q is a refinement of P, then
Proof. The second inequality is clear, and the first inequality follows from the third by
considering −f . For the third inequality, we have
X X X X X
S(f, P) = MI |I| = MI |J| ≥ MJ |J| = S(f, Q).
I∈P I∈P J∈Q,J⊆I I∈P J∈Q,J⊆I
Rb R b
3.3.2 Lemma. For any partition P of [a, b], S(f, P) ≤ a
f≤ a
f ≤ S(f, P).
Proof. The first and last inequalities are immediate from the definition of lower and upper
integrals. For the middle inequality, let P and Q be partitions of [a, b], and let R be a
refinement of both P and Q. By 3.3.1,
Taking the supremum over P and the infimum over Q yields the desired inequality.
3.3.3 Corollary. A bounded function f : [a, b] → Rd is Darboux integrable iff for each
ε > 0 there exists a partition P of [a, b] such that S(f, P) − S(f, P) < ε.
102 Principles of Analysis
We may now describe the integral as a limit of Darboux sums. Given L ∈ R and a
real-valued function F (P) of partitions P of [a, b], we write
L = lim F (P)
P
if, given ε > 0, there exists a partition Pε such that |F (P) − L| < ε for all partitions P that
refine Pε . By applying standard techniques, one easily shows that such limits are unique
and have the usual combinatorial properties.1 Using this notion we can give the following
characterization of the Darboux integral. The proof is left as an exercise.
3.3.4 Theorem. A bounded function f : [a, b] → Rd is Darboux integrable iff the limits
Rb
limP S(f, P) and limP S(f, P) exist and are equal. In this case, their common value is a f .
A more useful limit characterization of the Darboux integral may be given in terms of
the following. The mesh of a partition P is the value
The Darboux integral may be expressed as a limit of Darboux sums as kPk → 0. For this
we need the following technical lemma:
3.3.5 Lemma. Let P0 = P01 × · · · × P0d be a partition of [a, b]. Then there exist a positive
constant C such that for all partitions P with kPk sufficiently small,
Proof. Let P0 = P01 × · · · × P0d and let P = P1 × · · · × Pd with kPk sufficiently small so that
each interval I = I1 × · · · × Id of P has the property that either some Ij contains exactly
one interior point of P0j , or no Ij contains such a point. Let Jα denote the d-dimensional
intervals of P of the former type and Jβ the intervals of the latter type. The construction
is illustrated in the figure, where [x, y] is a coordinate interval of several Jα ’s and z is an
interior point of P20 . Let N be the number of intervals of type Jα and note that N depends
only on P0 . Let P00 denote the common refinement of P and P0 . An interval in P00 is either a
Jα ’s and Jγ ’s
z }| {
b2
Jβ Jβ
y Jα ’s
z
x
} and
Jγ ’s
Jβ Jβ
a2
a1 b1
FIGURE 3.1: The intervals of P (solid), P (dotted), and P00 .
0
Jβ or was formed from a Jα . Denote intervals of the latter type by Jγ . Since the introduction
of a point into a jth coordinate interval of a Jα results in two jth coordinate intervals, each
1 If the set of partitions of [a, b] is partially ordered by refinement, then the described convergence is
Jα can produce at most 2d Jγ ’s. Thus the number of Jγ ’s is at most 2d N . Since the terms
of S(f, P) and S(f, P00 ) corresponding to intervals Jβ are identical, we have
X X
S(f, P) − S(f, P00 ) = S(f, P) + S(−f, P00 ) = MJα (f ) |Jα | + mJγ (−f ) |Jγ |. (†)
α γ
Since the number of terms in the first sum in (†) is no more than N , we see that this sum
d
is majorized by M N kPk ≤ M N kPk, where
M =
kf k∞ and kPk is taken < 1. Similarly,
the second sum in (†) is majorized by M N 2d
P00
≤ M N 2d kPk, the inequality following
from the fact that P00 is a refinement of P. Thus there exists a constant C depending
only on P0 and f such that S(f, P) − S(f, P00 ) ≤ C kPk. Since P00 is a refinement of P0 ,
S(f, P) − S(f, P0 ) ≤ CkPk.
We may now prove the following complement to 3.3.4.
3.3.6 Theorem. A bounded function f : [a, b] → R is Darboux integrable iff the limits
limkPk→0 S(f, P) and limkPk→0 S(f, P) exist and are equal. In this case, their common value
Rb
is a f .
Proof. It suffices to prove that
Z b Z b
f = lim S(f, P) and f = lim S(f, P).
a kPk→0 a kPk→0
Rb
Given ε > 0, choose a partition P0 such that S(f, P0 ) < a f + ε. In the notation of 3.3.5,
for all partitions P with sufficiently small mesh,
Z b Z b
0
f ≤ S(f, P) ≤ S(f, P ) + C kPk < f + ε + C kPk.
a a
R b
Therefore S(f, P) − a f < 2ε for all P with sufficiently small mesh. This establishes the
first limit. The second follows from the first by considering −f .
exists in the sense that, given ε > 0 there exists a δ > 0 such that
|S(f, P, ξ) − R(f )| < ε for all partitions P with kP| < ε and all choices of ξ.
The connection between the Darboux and Riemann integrals is given in the following result.
3.3.7 Theorem. A bounded function f : [a, b] → R is Darboux integrable iff it is Riemann
Rb
integrable. In this case, R(f ) = a f .
104 Principles of Analysis
Proof. Since S(f, P) ≤ S(f, P, ξ) ≤ S(f, P) for all ξ, the necessity follows from 3.3.6. For
the sufficiency, given ε > 0 choose a partition Pε such that
Since ξ is arbitrary, the approximation properties of suprema and infima imply that
JBy the approximation property of infima and suprema, for each n there exist partitions
P0n and P00n of [a, b] such that
Z b Z b Z b Z b
1 1
f− < S(f, P0n ) ≤ f≤ f ≤ S(f, P00n ) < f+ .
a n a a a n
Since refinements decrease upper sums and increase lower sums, the inequalities still
hold if P0n and P00n are replaced by refinements. Now let P1 be a refinement of P01 and P001
with kP1 k < 1, then let P2 be a refinement of P1 , P01 , and P001 with kP2 k < 1/2, etc.K
(3) f is Riemann integrable on [a, b] iff g = h a.e. In this case, f is Lebesgue measurable
Rb R
and a
f= [a,b]
f.
R
JFrom (†), f is Riemann integrable iff [a,b] (g − h) = 0, which is equivalent to g = h
a.e. If the latter holds, then {f 6= h} and {f 6= g} are null sets, hence f is Lebesgue
measurable and the integrals are equal.K
(4) If f is continuous at x ∈ [a, b], then h(x) = g(x).
JGiven ε > 0, choose δ > 0 such that d(x, y) < δ implies |f (x) − f (y)| < ε, where d is
the metric on Rd defined by d(x, y) = maxj |xj − yj |. Choose m so that kPn k < δ for
all n ≥ m. For such n and for x ∈ I ∈ Pn ,
Letting n → ∞ yields
(5) Let x ∈ [a, b] such that x is not on the boundary of any subinterval of any Pn . If
h(x) = g(x), then f is continuous at x.
JGiven ε > 0, choose n such that
|gn (x) − g(x)| < ε/2 and |hn (x) − h(x)| < ε/2.
h(x) − ε/2 < hn (x) ≤ f (y) ≤ gn (x) < g(x) + ε/2 = h(x) + ε/2.
To complete the proof of the theorem, observe that, by step (3), f is Riemann integrable
iff λd (A) = 0. By step (6), this occurs iff λd (D) = 0.
(b) If g is Lebesgue integrable on [a, b), then g is improperly integrable on [a, b) and (3.8)
holds.
Proof. That g is Lebesgue measurable on [a, b) follows from 3.3.8. To prove (a), let bn ↑ b
and let D denote the set of discontinuities of g on [a, b). Since g is Riemann integrable on
[a, bn ], λ [a, bn ] ∩ D = 0. Then 1[a,bn ] g is Lebesgue measurable for every n and
Z bn Z
g(x) dx = 1[a,bn ] g dλ.
a
Taking limits, using 3.2.7, we see that g is Lebesgue measurable on [a, b) and (3.8) holds.
For (b) note that
R n by 3.3.9 theR functions g ± are locally Riemann integrable on [a, b).
Therefore, by (a), a g (x) dx = [a,n] g ± dλ for all n, and an application of the monotone
±
Exercises
3.35 Let µ be a Lebesgue-Stieltjes measure whose distribution function F has a positive continuous
derivative on R. Show that dµ = F 0 dλ.
Z b Z b
Show that i(f ) = f dλ and s(f ) = f dλ.
a a
3.37 Show that a bounded function f on [a, b] is Riemann integrable iff there exists a real number L
such that S(f, Pn , ξ n ) → L for each sequence of tagged partitions (Pn , ξ n ) with kPn k → 0.
Z ∞
3.38 The gamma function is defined by Γ(x) = tx−1 e−t dt, x > 0.
0
(a) Show that the integral converges.
(b) Integrate by parts to show that Γ(x + 1) = xΓ(x) for all x > 0.
(c) Show that Γ(n + 1) = n! for all n ∈ N.
√
R ∞ −t2 1 3 5
(d) Given that 0 e dt = π/2, evaluate Γ ,Γ , and Γ .
2 2 2
(e) The formula Γ(x) = x−1 Γ(x + 1) may
beused to extend thegamma
function
√ to noninteger
1 √ 3 4 π
values x < 0. Use this to show that Γ − = −2 π and Γ − = .
2 2 3
3.39 Show that for n ≥ 2,
(n − 1)(n − 3) · · · 4 · 2
π/2 π/2 1 , n odd,
xn
Z Z Z
n(n − 2) · · · 5 · 3
sinn x dx = cosn x dx = √ dx =
0 0 0 1 − x2 π (n − 1)(n − 3) · · · 5 · 3 ,
n even.
2 n(n − 2) · · · 4 · 2
3.41 Show that if fn is Riemann integrable on [a, b] and fn → f uniformly on [a, b], then f is
Rb Rb
Riemann integrable and a fn → a f . Show also that the assertion is false if the convergence
is merely pointwise.
is improperly Riemann integrable on [1, ∞) for any p > 0, but is Lebesgue integrable iff p > 1.
108 Principles of Analysis
3.43 Show that (x−1 sin x)2 extended continuously to [0, ∞) is Lebesgue integrable and improperly
Riemann integrable on [0, ∞) and
Z ∞ 2 Z ∞
sin x sin x
dx = dx.
0 x 0 x
3.44 Let p > 0. Show that
Z 1 ∞
dx X 1 1
= − .
0 1 + xp n=0
2np + 1 (2n + 1)p + 1
Show that for suitable p the formula yields
1 1 1 π 1 1 1
ln 2 = 1 − + − + · · · and = 1 − + − + ···
2 3 4 4 3 5 7
JUse the identity (1 + y)−1 = ∞
2n
− y (2n+1) , 0 ≤ y < 1.K
P
n=0 y
2
(b) Show that x2p /(ex − 1), extended continuously to [0, ∞), is Lebesgue integrable and
Z ∞ √ ∞
x2p π(2p − 1)! X 1
2 dx = p (p − 1)! p+1/2
.
0 e −1
x 4 n=1
n
JUse (z − 1)−1 = ∞ −n
P
n=1 z , z > 1.K
Fatou’s Lemma
The following result is useful in cases where limn fn does not exist.
R
3.4.2 Theorem. If fn and g are measurable, fn ≥ g a.e. for all n, and g − du < ∞, then
Z Z
lim fn dµ ≤ lim fn dµ. (3.10)
n n
The inequality in (3.10) may be strict. For example, if µ = λ and fn = n2 1[0,1/n] , then
the left side of (3.10) is zero while the right side is ∞.
and Z Z Z Z
(g − f ) dµ ≤ lim (g − fn ) dµ = g dµ − lim fn dµ.
n n
R
Subtracting g dµ in each inequality yields
Z Z Z Z
f dµ ≤ lim fn dµ ≤ lim fn dµ ≤ f dµ.
n n
We note that the hypothesis that the functions |fn | be dominated R by an integrable
function cannot be omitted. For example, 1[n,2n] → 0 on R, but 1[n,2n] dλ → ∞.
We conclude this section with two applications of the dominated convergence theorem.
The first, whose proof is an exercise for the reader, describes a continuity property of integrals.
The second gives sufficient conditions for differentiating “under the integral sign.”
3.4.4 Corollary. Let I be an open d-dimensional interval and let f be B(I) ⊗ F-measurable
on I × X such that f (t, x) is continuous in t for each x ∈ X. RIf there exists an integrable
function g on X such that |f (t, x)| ≤ g(x) for all t and x, then f (t, x) dµ(x) is continuous
in t.
110 Principles of Analysis
3.4.5 Corollary. Let I be an open d-dimensional interval and let f be B(I) ⊗ F-measurable
on I × X such that for each t in I the function f (t, ·) is µ-integrable. Let α be a fixed
multi-index and assume that for all multi-indices β with |β| ≤ |α| the derivative ∂tβ f (t, x)
exists for each t and x and is measurable in x for each fixed t. If there exists an integrable
function g on X such that |∂tβ f (t, x)| ≤ g(x) for all such β, t and x, then
Z Z
∂tα f (t, x) dµ(x) = ∂tα f (t, x) dµ(x).
Proof. We prove the right-hand derivative version for the case d = 1. The general formula
follows by induction. Fix t ∈ I and let tn ↓ t. Set
Z
f (tn , x) − f (t, x)
H(t) = f (t, x) dµ(x) and hn (x) = .
tn − t
By the mean value theorem, hn (x) = ft (s, x) for some s = s(n, x) ∈ (t, tn ). Then |hn | ≤ g
and hn (x) → ft (t, x), hence
Z Z
H(tn ) − H(t)
= hn (x) dµ(x) → ft (t, x) dµ(x).
tn − t
R
This shows that the right-hand derivative of H exists at x and equals ft (t, x) dµ(x).
Exercises
3.47 Find all p > 0 for which there is an λ integrable function g on R+ such that n−p I[0,n] ≤ g for
all n.
Rb
3.48 Let µ be a Lebesgue-Stieltjes measure on B(R) and f µ-integrable. Show that limn ann f dµ = 0
for any pair of sequences (an ) and (bn ) with an < bn and an → ∞. Show that this may not
hold if f ≥ 0 is not integrable.
3.49 Let µ be a Lebesgue-Stieltjes measure on B(R) and f integrable. Let g be measurable and
bounded on R such that r := limt→∞ g(t) exists and is finite. Show that
Z Z
lim g(x + t)f (t) dµ(t) = r f dµ.
x→∞
3.50 Let µ be a Lebesgue-Stieltjes measure on B(Rd ) and f > 0 µ-integrable. Prove that
Z n Z Z
(a) n ln(1 + n−1 f ) dµ → f dµ. (b) n ln(1 + n−2 f ) dµ → 0.
1/n
Z Z Z
np sinp n−1 f 1/p dµ → f 1/n dµ → µ(E).
(c) f dµ. (d)
E
3.53 [↑ 3.38] Prove that the kth derivative of the gamma function is
Z ∞
Γ(k) (x) = tx−1 e−t lnk t dt, x > 0.
0
∞
X n
X
3.55 Let f : R → K be λ-integrable. Show that the series f (k + x) := lim f (k + x)
n
k=−∞ k=−n
converges absolutely a.e. on R.
3.56 Let f : R → R be Lebesgue integrable on every interval and satisfy f (x + y) = f (x) + f (y) for
all x, y. Show that f (x) = f (1)x for all x. JShow first that f is continuous.K
X
3.57 Let fn : X → [0, ∞) be integrable and fn+1 ≤ fn a.e. for all n. Show that (−1)n+1 fn is
Z X Z n
X
integrable and that fn dµ = fn dµ.
n n
3.58 Let g be integrable on X and let (fn ) be a sequence of real-valued measurable functions on X
such that |fn | ≤ g. Prove that
Z Z Z Z
lim fn dµ ≤ lim fn dµ ≤ lim fn dµ ≤ lim fn dµ.
n n n n
3.59 Let f,
R g, fn , gnR be real valued and R such that fn → f , gn → g a.e., |fn | ≤ gn a.e.,
R integrable
and gn dµ → g dµ. Prove that fn dµ → f dµ.
a.e. a.e.
3.60 [↑ 3.59] Let fn , gn , hn , Rf , g, h be Rintegrable, fRn ≤ gn ≤ hRn a.e. for allRn, fn → fR, gn → g, and
a.e.
hn → h. Show that if fn dµ → f dµ and hn dµ → h dµ, then gn dµ → g dµ.
a.e.
3.61 Show that the dominated convergence theorem holds if the hypothesis fn → f is replaced by
µ
fn → f .
Moreover, if the measure spaces (X, F, µ) and (Y, G, ν) are σ-finite, then the measure µ ⊗ ν
is unique with respect to property (3.11).
Proof. Define µ ⊗ ν on the semiring R by Equation (3.11). We claim that µ ⊗ ν is a
measure on R. Clearly, (µ ⊗ ν)(∅) S = 0. For countable additivity, let (An × Bn ) be a
disjoint sequence
P in R such that n An × Bn = A × B ∈ R. Then for (x, y) ∈ X × Y ,
1A (x)1B (y) = n 1 An
(x)1 B n
(y). P fixed x we can integrate with respect to y and
For
use 3.2.9 to obtain
P 1A (x)ν(B) = n 1An (x)ν(Bn ). Integrating with respect to x yields
(µ ⊗ ν)(A × B) = n (µ ⊗ ν)(An × Bn ), verifying the claim. By 1.6.4, µ ⊗ ν may be extended
to a measure on F ⊗ G. If (X, F, µ) and (Y, G, ν) are σ-finite, then (R, µ ⊗ ν) is σ-finite,
hence uniqueness follows from 1.6.9.
The measure space (X × Y, F ⊗ G, µ ⊗ ν) is called the product of the measure spaces
(X, F, µ) and (Y, G, ν).
Fubini’s Theorem
3.5.2 Theorem (Fubini-Tonelli). Let (X, F, µ) and (Y, G, ν) be σ-finite measure spaces and
let f : X × Y → K be F ⊗ G-measurable.
R R
(a) If f ≥ 0, then the functions X f (x, y) dµ(x) and Y f (x, y) dν(y) are measurable in y
and x, respectively, and
Z ZZ ZZ
f (x, y) d(µ ⊗ ν)(x, y) = f (x, y) dµ(x) dν(y) = f (x, y) dν(y) dµ(x). (3.12)
X×Y Y X X Y
Proof. Recall that a measurable function f (x, y) is separably measurable, that is, measurable
in x for each fixed y and measurable in y for each fixed x (Ex. 2.13). Thus the inner integrals
in (3.12) are legitimate.
We now make the following reductions. First, part (b) of the theorem is a consequence
of part (a). Indeed, if one of the inequalities in (b) holds, then f is integrable by part (a)
applied to |f |. By considering real, imaginary, positive, and negative parts, we see that (3.12)
holds. Second, to prove (a) we may assume by the usual arguments that f is an indicator
function. Thus to prove the theorem it suffices to show that for any C ∈ F ⊗ G,
Z Z Z Z
η(C) = 1C (x, y) dµ(x) dν(y) = 1C (x, y) dν(y) dµ(x), where η := µ ⊗ ν. (†)
Y X X Y
For this we may assume that the measure spaces (X, F, µ) and (Y, G, ν) are finite. Indeed,
if (†) holds in the finite case and if Xn ↑ X, where µ(Xn ) < ∞ and ν(Y ) < ∞, then by
Integration 113
Therefore, H is closed under increasing unions, completing the proof that H is a λ-system
and establishing the theorem.
P P
Note that a special case of part (a) is the
P interchange of summation in n m amn ,
where amn ≥ 0, even when the double sum m,n amn is infinite.
3.5.3 Remarks. (a) The σ-finiteness hypothesis in Fubini’s theorem is essential: Consider
Lebesgue measure λ and counting measure ν on ([0, 1], B[0, 1]). The diagonal E = {(t, t) :
t ∈ [0, 1]} is closed and so is a member of B[0, 1] ⊗ B[0, 1]. But for all x and y
Z Z
1E (t, y) dλ(t) = λ{y} = 0 and 1E (x, t) dν(t) = ν {x} = 1,
(b) Part (b) of the theorem fails if the absolute values on the integrands are removed.
Indeed for Lebesgue measure on [0, 1] we have
Z 1Z 1 Z 1Z 1
x2 − y 2 x2 − y 2 π
2 2 2
dy dx = − 2 + y 2 )2
dx dy = .
0 0 (x + y ) 0 0 (x 4
Thus (x2 − y 2 )(x2 + y 2 )−2 is not integrable on [0, 1] × [0, 1]. (See Ex. 3.71.) ♦
Moreover, if the measure spaces (Xi , Fi , µi ) are σ-finite, then µ is unique with respect to
property (3.13).
3.5.5 Example. Consider the measure spaces (Rpi , B(Rpi ), λpi ) (i = 1, . . . , d) and
(Rp , B(Rp ), λp ), where p = p1 + · · · + pd . Since λp1 ⊗ · · · ⊗ λpd = λp on the semiring of
half-open intervals, the measures must be equal on B(Rp1 ) ⊗ · · · ⊗ B(Rpd ) = B(Rp ). ♦
3.5.6 Theorem. Let the measure spaces (Xi , Fi , µi ) be σ-finite and let f : X → R be
F-measurable.
(a) If f ≥ 0, then
Z Z Z
f dµ = ··· f (x1 , . . . , xd ) dµ1 (x1 ) . . . dµd (xd ), (3.14)
R
where f (x1 , . . . , xi , . . . , xd ) dµi (xi ) is measurable in (x1 , . . . , xi−1 , xi+1 , . . . , xd ), and
the iterated integration may be carried out in any of the d ! orders.
(b) If for some permutation (i1 , i2 , . . . , id ) of the indices 1, 2, . . . , d
Z Z
··· |f (xi1 , . . . , xid )| dµi1 (xi1 ) . . . dµid (xid ) < ∞, (3.15)
R R
then f is µ-integrable and (3.14) holds, where · · · f (x1 , . . . , xd ) dµ1 (x1 ) . . . dµi (xi ),
is defined and finite for a.a values of xi+1 , . . . , xd and is integrable in these variables.
3.5.7 Example. In elementary calculus, integration is sometimes carried out on regions
in R3 bounded by surfaces. This idea generalizes to higher dimensions as follows: Given
continuous functions u2 (x1 ) ≤ v2 (x1 ) on E1 := [a, b], and in general continuous functions
uk+1 (x1 , . . . , xk ) ≤ vk+1 (x1 , . . . , xk ) defined on the set
Ek := (x1 , . . . , xk ) : a ≤ x1 ≤ b, u2 (x1 ) ≤ x2 ≤ v2 (x1 ), . . . ,
uk (x1 , . . . , xk−1 ) ≤ xk ≤ vk (x1 , . . . , xk−1 ) ,
Exercises
3.62 Show that the product of complete measure spaces need not be complete.
3.63 Let µ be a probability measure on B(Rd ). Find µ(Ix ) dx, where
R
3.64 Let a, Rb > 0. Use Fubini’s theorem, the dominated convergence theorem, and the identity
∞
1/x = 0 e−xt dt, x > 0, to prove that
Z ∞ Z ∞ −ax
sin x π e − e−bx
(a) dx = . (b) dx = ln(b) − ln(a).
0 x 2 0 x
3.65 Let µ be a Lebesgue-Stieltjes measure on R. Show that if 0 < µ(E) < ∞ and a > 0, then
Z ∞
1
µ (x, x + a] ∩ E dx = a.
µ(E) −∞
3.66 Let (Xi , Fi , µi ) (i = 1, 2) be σ-finite measure spaces and let fi ≥ 0 be Fi -measurable. Find a
density function for the product measure (f1 µ1 ) ⊗ (f2 µ2 ) .
3.67 Let (X, F, µ) be σ-finite and f : X → [0, ∞) measurable. Prove that the integral of f is the
“area under the graph,” that is,
Z
f dµ = (µ ⊗ λ){(x, t) : 0 < t < f (x)} = (µ ⊗ λ){(x, t) : 0 < t ≤ f (x)}.
Conclude that if f is integrable, then the graph {(x, t) : t = f (x)} has measure zero.
3.68 (Cavalieri’s principle). For E ∈ B(Rd ) and t ∈ R, define
Et := {x = (x1 , . . . , xd−1 ) ∈ Rd−1 : (x, t) ∈ E}.
Show that Et ∈ B(Rd ) for all t ∈ [a, b] and prove that
h Z b
d d−1 i d−1
λ E∩ R × [a, b] = λ (Et ) dt.
a
Thus the “volume” of the portion of E between the hyperplanes xd = a and xd = b is the
integral from a to b of the “cross-sectional areas” λd−1 (Et ).
3.69 Let (X, F, µ) be σ-finite and f : X → [0, ∞) measurable. Suppose that ϕ : [0, ∞) → [0, ∞) has
a positive continuous derivative and ϕ(0) = 0. Prove that
Z Z ∞
ϕ0 (x)µ f ≥ x dx.
ϕ ◦ f dµ =
X 0
R∞
f p dµ = pxp−1 µ f ≥ x dx, (p ≥ 1).
R
Deduce, in particular, that X 0
an
Use Fubini’s theorem and induction to show that λn S(a, n) =
.
n!
3.71 Verify the assertions in 3.5.3. Also, show directly that [0,1]2 |x2 − y 2 |(x2 + y 2 )−2 dλ2 (x, y) = ∞.
R
3.72 Let µ be a translation invariant Lebesgue-Stieltjes measure on B(Rd ) and set E = [0, 1]d . Use
Fubini’s theorem to show that for all B ∈ B(Rd ),
Z Z
1E (x)1B (y) dλd (x) dµ(y) = 1E (y)1B (x) dλd (x) dµ(y),
hence µ(B) = µ(E)λ(B). Conclude that Lebesgue measureRλd is the only σ-finite translation
invariant measure µ on B(Rd ) with µ[0, 1]d = 1. JConsider 1E (x + y)1B (y) dλd (x) dµ(y).K
116 Principles of Analysis
The integrand here is the density of a normal random variable with mean m and standard
deviation σ.
Integration by Parts
Let F and G be distribution functions on R with limx→−∞ F (x) = limx→−∞ G(x) = 0,
and let µ and ν be the corresponding Lebesgue-Stieltjes measures:
R2
R1
(a, a)
By Fubini’s theorem,
ZZ Z
(µ ⊗ ν)(R1 ) = 1(a,b] (x)1(a,x] (y) dν(y) dµ(x) = [G(x) − G(a)] dµ(x)
(a,b]
Z
= G(x) dµ(x) − G(a) F (b) − F (a) , and
(a,b]
ZZ Z
(µ ⊗ ν)(R2 ) = 1(a,b] (y)1(a,y) (x) dµ(x) dν(y) = [F (y−) − F (a)] dν(y)
(a,b]
Z
= F (y−) dν(y) − F (a) G(b) − G(a) .
(a,b]
Adding these equations and using (†), we find after cancellations that
Z Z
F (b)G(b) − F (a)G(a) = G(x) dµ(x) + F (y−) dν(y).
(a,b] (a,b]
Spherical Coordinates
yj = xj /(r sin θ1 ), 2 ≤ j ≤ d,
has a unique solution (θ2 , . . . , θd−1 ). Then (3.18) has the unique solution (r, θ1 , . . . , θd−1 ).
By standard properties of determinants and a reduction argument,
Crd(x) denote the closed ball in R with center x and radius r. We show
d d
For d ≥ 1, let
d d
that λ Cr (x) = r αd , where
(2π)d/2
if d is even,
d(d − 2) · · · 4 · 2
αd = (d−1)/2 = volume of C1d (0) in Rd . (3.20)
2(2π)
if d is odd
d(d − 2) · · · 3 · 1
By translation invariance and the dilation property of Lebesgue measure, λd Crd (x) =
rd λd C1d (0) , hence it suffices to establish the formula for the case r = 1 and x = 0, which
is the version expressed in (3.20).
To simplify notation, for 1 ≤ k ≤ d let C k (r) := Crk (0) and let 1k (r; x1 , . . . , xk ) denote
the indicator function of C k (r). Formula (3.20) is easily verified for d = 1 and 2, so we
assume that d > 2. Since
Let S d−1 := {x ∈ Rd : |x| = 1}, where |x| is the Euclidean norm of x. The theorem in
this subsection asserts that the Lebesgue integral of a function on Rd may be calculated
by a two-stage process, integrating first over S d−1 with respect to a surface measure µ and
then radially outward. The surface measure is constructed as follows: Set Rd∗ := Rd \ {0}
and define a mapping
We then have
3.6.1 Theorem. If f : Rd∗ → K is Borel measurable, then
Z Z ∞Z
f (x) dλd (x) = rd−1 f (rx) dµ(x) dr
Rd
∗ 0 S d−1
in the sense that if one side of the equation is defined, then so is the other and equality holds.
Proof. Define a measure ρ on B(0, ∞) by dρ := rd−1 dλd . By Fubini’s theorem, the desired
equation may be written
Z Z
f (x) dλd (x) = (f ◦ T )(r, x) d(ρ × µ)(r, x).
Rd
∗ (0,∞)×S d−1
120 Principles of Analysis
Since the collection of intervals is a π-system, by the measure uniqueness theorem it suffices
to take A = (a, b]. The above equation then reduces to
λd T ((a, b] × B)) = d−1 (bd − ad ) µ(B).
But this follows from the dilation property of λd , using the relations
T (a, b] × B = T (0, b] × B − T (0, a] × B , T (0, c] × B = cT (0, 1] × B .
Theorem 3.6.1 is useful for calculating integrals of radial functions, that is, functions f
on Rd of the form f (x) = g(|x|).
3.6.2 Corollary. Let g be a Borel function on (0, ∞). Then
Z Z ∞
g(|x|) dλ(x) = dαd rd−1 g(r) dr
Rd
∗ 0
2
Taking g(r) = e−r we have
Z Z ∞
−|x|2 2 π d/2
e dx = µ(S d−1
) rd−1 e−r dr = µ(S d−1 ) ,
Rd
∗ 0 dαd
where the last equality is from Ex 3.74. By Fubini’s theorem and (3.16), the integral on the
left is π d/2 . Therefore, µ(S d−1 ) = dαd , completing the proof.
Corollary 3.6.2 may be used to establish the integrability of certain functions on Rd :
3.6.3 Example. Let f (x) = (1 + c|x|s )−t , where c, s, t > 0. Then
Z ∞ Z ∞ Z 1 Z ∞
rd−1 1
(dαd )−1 f dλ = dr ≤ rd−1 dr + c−t dr.
−∞ 0 (1 + crs )t 0 1 rst−d+1
Exercises
3.73 Let F be a continuous distribution function on R with finite limits
3.77 Let 1 ≤ d < m, U ⊆ Rd open and f : U → Rm such that |f (x) − f (y)| ≤ C|x − y|p , where
C > 0 and p > d/m. Verify the following to prove that λm f (U ) = 0. What if p ≤ d/m?
(a) It suffices to prove that λm f (I) = 0 for a d-dimensional interval I = [a, b] ⊆ U .
(b) For fixed n ∈ N and each k form the partition Pk,n = {ak + j(bk − ak )/n : j = 0, . . . , n} of
the kth coordinate interval [ak , bk ] of I, k = 1, . . . , d. Let Jk ∈ Pk,n , J := J1 × · · · × Jd and y
the midpoint of J. Then for all x ∈ J
p/2
|f (x) − f (y)| ≤ Cn−p (a1 − b1 )2 + · · · + (ad − bd )2 := M n−p .
3.78 Show that the measure µ in 3.6.1 satisfies L(µ) = | det(L−1 )| µ for any 1-1 linear transformation
L on Rd for which |L(x)| = |x|. In particular, µ is invariant under rotations.
3.79 Let M , a, and ε be positive constants. Suppose f is Borel measurable on Rd and satisfies
(
M |x|ε−d if |x| ≤ a,
|f (x)| ≤
M |x|−ε−d if |x| > a.
3.80 Let 0 ≤ a < b ≤ ∞ and set A(a, b) = {x ∈ Rd : a < |x| < b}. Prove that
Z Z b Z
f (x) dλd (x) = rd−1 f (rx) dµ(x) dr,
A(a,b) a S d−1
In this chapter we examine the properties of spaces of measurable functions f for which |f |p
(p > 0) is integrable, the so-called Lp spaces. These are among the most important examples
of Banach spaces. In particular, the case p = 2 is of critical importance in Fourier analysis.
If there is no ambiguity, we write Lp (X), Lp (µ), or Lp instead of Lp (X, F, µ). Note that
L1 (µ) is just the space of µ-integrable functions.
The quantity kf kp is called the Lp norm of f . This terminology is a slight abuse of
language, since the property of positivity of a norm does not always hold. Indeed, kf kp = 0
implies only that f = 0 a.e. We resolve this discrepancy informally by identifying functions
that are equal a.e. This will cause no problems as long as the reader keeps in mind that the
symbol f has the dual interpretation of a function as well as the equivalence class of all
measurable functions equal a.e. to f . A precise resolution may be given in terms of quotient
spaces. (See Ex. 8.56.)
The following inequality will be needed to establish that k · kp is indeed a norm (subject
to the aforementioned convention of identifying functions that are equal a.e.).
4.1.1 Lemma. Let a, b > 0 and 0 < t < 1. Then at b1−t ≤ ta + (1 − t)b, equality holding iff
a = b.
Proof. Equality clearly holds if a = b. Assume a < b and set x = ta + (1 − t)b. To prove
that at b1−t < x we use the strict concavity of ln x established as follows: By the mean value
theorem there exist y ∈ (a, x) and z ∈ (x, b) such that
ln b − ln x 1 1 ln x − ln a
= < = .
b−x z y x−a
123
124 Principles of Analysis
hence, by the second part of 4.1.2, there exist nonnegative constants a1 , b1 not both zero
and nonnegative constants a2 , b2 not both zero such that
hence
|f + g| = |f | + |g| = (1 + b)|g| a.e. on the set E := {f + g 6= 0}.
Therefore,
1 + f = 1 + b = 1 + f a.e. on E.
g g
4.1.5 Minkowski’s Integral Inequality. Let (X, F, µ) and (Y, G, ν) be σ-finite measure
spaces, f a nonnegative F ⊗ G-measurable function, and 1 ≤ p < ∞. Then
Z Z p 1/p Z Z 1/p
f (x, y) dν(y) dµ(x) ≤ f (x, y)p dµ(x) dν(y), (4.1)
Then
Z
q−qp −p
g(x)q = khkp h(x)qp−q = khkp h(x)p , kgkq = 1, and hg dµ = khkp . (†)
The Case p = ∞
The space of L∞ functions on X is defined by
show that L∞ is a linear space and that kf k∞ satisfies the triangle inequality. Moreover,
from 4.1.6, kf k∞ ≥ 0, equality holding iff f = 0 a.e.
P∞ To see that L∞ is complete, we use 0.4.3 again. Let (fn ) be a sequence in L∞ such that
n=1 kfn k∞ < ∞. By 4.1.6, the sets Nk := {|fk | > kfk k∞ } have measure zero, hence so
S P∞
does N := k Nk . Moreover, the series n=1 |fn | converges on N c , hence the function
P∞ P∞
f := 1N c n=1 fn is finite a.e., measurable, and is a version of n=1 fn . Since
n
X X X
f −
fk = fk ≤ kfk k∞ a.e.,
k=1 k>n k>n
by 4.1.6(b) we have
n
X
X
f − f
≤ kfk k∞ .
k
k=1 ∞ k>n
P∞
This shows that f ∈ L∞ and that n=1 fn converges to f in the L∞ form.
1
Hölder’s inequality may now be extended to the case 1 ≤ p ≤ ∞, where ∞ := 0:
kf gk1 ≤ kf k1 kgk∞ , f ∈ L1 , g ∈ L∞ .
(Ex. 4.2), which implies that Lp (µ) is a linear space and d(f, g) = kf − gk is a metric. One
may prove, as in the case p ≥ 1, that Lp (µ) is complete in this metric.
`p -Spaces
An important special case of an Lp space is obtained by taking X = N and µ = counting
measure on N. In this case we write `p (N) instead of Lp (N). Thus for 1 ≤ p < ∞,
n ∞
X o
p p
` (N) := x := (xn ) : xn ∈ K, kxkp = |xn |p < ∞ ,
n=1
and for p = ∞
n o
`∞ (N) := x := (xn ) : xn ∈ K, kxk∞ = sup |xn | < ∞ .
n
Note that K may be identified with a linear subspace of ` (N) and, as such, inherits the `p
d p
Exercises
4.1 Let a, b > 0 and p ≥ 1. Prove that (a + b)p ≤ 2p−1 (ap + bp ). JConsider ϕ(x) = xp .K
4.2 Let a, b > 0 and 0 < p < 1. Prove that (a + b)p ≤ ap + bp . JConsider the function ϕ(x) =
ap + xp − (a + x)p .K
4.3 Show that the mapping (f, g) → f g : L2 (µ) × L2 (µ) → C is continuous in the L2 norm.
R
4.7 Let f be continuous and bounded on Rd . Show that kf k∞ = sup{|f (x)| : x ∈ Rd } (relative to
Lebesgue measure).
4.8 Let f : X → C be measurable. The essential range of f is defined as
rane (f ) = {z ∈ C : µ{|f − z| < ε} > 0 for all ε > 0} .
Prove:
(a) rane (f ) is closed and contained in cl f (X).
(b) f = g a.e. ⇒ rane (f ) = rane (g).
T
(c) rane (f ) = f =g a.e. cl g(X) .
(d) If f ∈ L∞ , then rane (f ) is compact and kf k∞ = sup{x : x ∈ rane (|f |)}.
4.9 Let 1 < p < ∞, 0 < r < 1, and f ∈ Lp (0, ∞), λ . Define g(x, y) := f (x)x−1 sin(xy).
(b) limp→∞ kf kp ≤ kf k∞ .
(c) Assume kf k∞ > 0. Let 0 < r < kf k∞ and r < t ≤ kf k∞ such that µ(Et ) > 0, where
Et = {|f | > t} > 0. Then limp→∞ kf kp ≥ r.
(d) Conclude that limp→∞ kf kp ≥ kf k∞ .
Lp Spaces 129
4.13 Let 1 ≤ p, q, r < ∞, r−1 = p−1 + q −1 . Prove that if f ∈ Lp and g ∈ Lq , then f g ∈ Lr and
kf gkr ≤ kf kp kgkq .
4.14 Let f and g be nonnegative and measurable and 0 < p < q < r < ∞. Prove:
Z r−p Z r−q Z q−p
(a) f g q dµ ≤ f g p dµ f g r dµ .
Z r Z r−1 Z
(b) f g dµ ≤ f dµ f g r dµ for r > 1.
Js + t = 1. If q = ∞, then rs/p = 1; if q < ∞, then p/sr and q/tr are conjugate exponents.K
(c) kf kr ≤ max{kf kp , kf kq }.
(d) If f ∈ Lp ∩ L∞ , then limr→∞ kf kr = kf k∞ . JUse (b) for one inequality. For the reverse
inequality, note that kf krr ≥ M r µ{|f | ≥ M }.K
4.16 Let T : L1 (µ) → L1 (µ) be a continuous linear transformation, and let g(t, x) be continuous in
t ∈ [a, b] for each x ∈ X and measurable in x ∈ X for each t. and set gt = g(t, ·). Suppose that
Rb
there exists an integrable function h ≥ 0 such that |g(t, x)| ≤ h(x) for all t and x. Let a gt dt
Rb
denote the function x 7→ a gt (x) dt. Assume that [T gt ](x) is continuous in t for each x ∈ X.
Rb
Carry out the following to show that a gt dt is in L1 and
Z b Z b
T gt dt = T gt dt. (†)
a a
(a) Let (Pn , tn ) be any sequence of tagged partitions of [a, b] with kPn k → 0 and let S(g, Pn , tn )
denote the function X
x 7→ S(g(·, x), Pn , tn ) = g(tj,n , x) |I|
I∈Pn
4.2 Lp Approximation
In this section we prove three approximation theorems that are useful in establishing
certain properties of Lp functions, as illustrated by Corollary 4.2.3 below.
Proof. Let {fn } be a sequence of simple functions such that fn → f and |fn | ≤ |f |
(2.3.1). The case p = ∞ follows from part (c) of that theorem. Assume that p < ∞. Then
|fn − f |p ≤ 2p+1 |f |p , hence kfn − f kp → 0 by the dominated convergence theorem. The first
assertion of the theorem
Pm follows by taking fs = fn for sufficiently large n. For the second
assertion, let fs = k=1 ak 1Ak , where ak 6= 0 and the sets Ak are disjoint. Then
Z m
X
|fs |p dµ = |ak |p µ(Ak ).
k=1
Sm
Since the integral is finite and ak 6= 0, µ(Ak ) < ∞. Therefore, fs = 0 outside k=1 Ak , a set
of finite measure.
4.2.2 Theorem. Let 1 ≤ p < ∞, f ∈ Lp (λd ), and ε > 0. Then there exists a continuous
function g vanishing outside a bounded interval such that kf − gkp < ε.
Pm
Proof. By 4.2.1, we may assume that f is simple with standard representation k=1 ak 1Ak ,
where ak 6= 0 and λd (Ak ) < ∞. We may further assume that Ak is bounded, otherwise
d d
replace Ak by Ak ∩ I, where I is a bounded
Pminterval with λ (Ak ) − λ (Ak ∩ I) sufficiently
small so that f may be approximated by k=1 ak 1Ak ∩I .
Now let α > 0. By 1.8.1 we may choose for each k a compact set Ck and a bounded open
set Uk such that Ck ⊆ Ak ⊆ Uk and λd (Uk \ Ck ) < α. By 0.3.10, there exists a continuous
function gk : Rd → [0, 1] such that gk = 1 on Ck and gk = 0 on Ukc . Since gk = 1Ak on
Ukc ∪ Ck = (Uk \ Ck )c ,
Z
p p
kak 1Ak − ak gk kp = |ak | |1Ak − gk |p dλd ≤ 2p |ak |p λ(Uk \ Ck ) < (2M )p α,
Uk \Ck
Pm
where M := supk |ak |. The function g := k=1 ak gk is continuous, and by the triangle
inequality kf − gkp < S2mM α1/p . We then have kf − gkp < ε for sufficiently small α.
Furthermore, g = 0 on k Uk , which is contained in a bounded interval.
Here is an important application of 4.2.2.
4.2.3 Corollary. Let 1 ≤ p < ∞, and for y ∈ Rd let Ty be the translation operator
Ty f (x) = f (x + y). Then for each f ∈ Lp (Rd , λ), limy→y0 kTy f − Ty0 f kp = 0.
Proof. By translation invariance of the integral, we may take y 0 = 0. By the theorem, given
ε > 0 there exists continuous function g such that kf − gkp < ε and g = 0 on the complement
of some interval [a, b]. By translation invariance, kTy f − Ty gkp = kf − gkp , hence
It now suffices to prove that limy→0 kTy g − gkp = 0. Let c = (1, . . . , 1) and let y n → 0 such
that |yn,j | < 1 (1 ≤ j ≤ d). For x ∈ [a − c, b + c]c , x + y n ∈ [a, b]c , hence g(x + y n ) = 0.
Thus if M is a bound for |g|, then
By continuity of g, the
R left side of the inequality tends to zero so, by the dominated
convergence theorem, |g(x + y n ) − g(x)|p dλd → 0.
Lp Spaces 131
Exercises
4.17 [↑ 3.2.16] Let Dr be the dilation operator Dr f (x) = f (rx) on Lp (Rd ), 1 ≤ p < ∞. Show that
limr→s kDr f − Ds f kp = 0, r, s > 0.
4.18 Let f ∈ L1 (R) and let g be bounded with bounded continuous derivative. Prove that
Z
lim f (x)g 0 (nx) dλ(x) = 0.
n
4.19 Show that the last assertion of 4.2.1 fails for the case p = ∞. Show also that 4.2.2 does not
hold for p = ∞.
4.3 Lp Convergence
Let fn , f ∈ Lp (X, F, µ) (p ≥ 1). Convergence of fn to f in the Lp norm is called Lp
Lp
convergence and is written fn → f . For example, the approximation theorems in the
preceding section may be phrased in terms of Lp convergence. The results in the present
section relate Lp convergence to various modes of convergence considered in §2.4. The case
p = ∞ is easy to treat:
L∞
4.3.1 Theorem. Let fn , f ∈ L∞ . Then fn → f iff there exists a set A of measure zero
a.u.
such that fn → f uniformly on Ac . In particular, fn → f .
L∞
Proof. Let fn → f andSlet An be a set of measure zero such that |fn − f | ≤ kfn − f k∞
on Acn (4.1.6). Set A = n An . Then on Ac , |fn − f | ≤ kfn − f k∞ for all n, hence fn → f
uniformly on Ac .
Conversely, let µ(A) = 0 and fn → f uniformly on Ac . Given ε > 0, choose N so that
|fn − f | ≤ ε on Ac for all n ≥ N . By 4.1.6(b), for such n we have kfn − f k∞ ≤ ε. Therefore,
L∞
fn → f .
The case 1 ≤ p < ∞ is more delicate. We shall need the following lemma.
132 Principles of Analysis
a.e. Lp
4.3.2 Lemma. Let 1 ≤ p < ∞ and f, fn ∈ Lp . If kfn kp → kf kp and fn → f then fn → f .
Proof. From the inequality |fn −f |p ≤ 2p (|fn |p +|f |p ) we have 2p (|fn |p +|f |p )−|fn −f |p ≥ 0.
Moreover,
lim 2p (|fn |p + |f |p ) − |fn − f |p = 2p+1 |f |p a.e.
n
R Lp
Therefore, limn |fn − f |p dµ = 0, hence fn → f .
The following result characterizes Lp convergence in terms of convergence in measure.
Lp µ
4.3.3 Theorem. Let 1 ≤ p < ∞ and f, fn ∈ Lp . Then fn → f iff both fn → f and
a.e.
kfn kp → kf kp . In this case, there exists a subsequence fnk → f .
Proof. The necessity follows from the inequalities kf kp − kfn kp ≤ kf − fn kp and
Z Z Z
µ |fn − f | ≥ ε = 1{|fn −f |≥ε} dµ = 1{|fn −f |p ≥εp } dµ ≤ ε−p |fn − f |p dµ.
For the sufficiency, suppose for a contradiction that kfn − f kp 6→ 0. Then there exists
µ
an ε > 0 and an infinite subset S of N such that kfn − f kp ≥ ε for all n ∈ S. Since fn → f
holds for subsequences and since convergence in measure implies a.e. convergence for some
subsequence (2.4.4), we may choose a subsequence (fnk ) of (fn ) with indices in S such that
a.e Lp
fnk → f . But then by 4.3.2, fnk → f , which is impossible by definition of S.
A deeper result is the following, whose proof brings together some earlier results on
convergence.
a.e.
4.3.4 Vitali Convergence Theorem I. Let 1 ≤ p < ∞ and fn , f ∈ Lp such that fn → f .
Lp
Then fn → f iff for each ε > 0 the following conditions hold:
(a) There exists A ∈ F with finite measure such that supn kfn 1Ac kp ≤ ε.
(b) There exists δ > 0 such that E ∈ F and µ(E) < δ ⇒ limn kfn 1E kp ≤ ε.
Lp
Proof. Suppose fn → f . To establish (a), choose m so that kfn − f kp < ε/2 for all n > m.
For such n and any E ∈ F,
Set A = E ∪ E1 ∪ · · · ∪ Em . Then, by (α) and (β), kfn 1Ac kp ≤ ε for all n, verifying (a).
To establish (b), choose δ so that k1E f kp < ε for all E with µ(E) < δ (Ex. 3.13). For
such E and all n,
We show that the right side of (γ) may be made arbitrarily small. Enlarging A if necessary,
we may assume by Ex. 3.17 that kf 1Ac kp < ε. By (a) we then have
Exercises
4.20 Let µ be a probability measure and fn ∈ Lp (µ). Prove:
(a) If fn → f uniformly on X, then f ∈ Lp (µ) and fn → f in Lp .
a.u.
(b) If fn → f and the functions fn and f are uniformly bounded, then fn → f in Lp .
a.u.
4.21 Let kfn k∞ ≤ C < ∞ for all n and fn → f . Show that f ∈ L∞ .
For a finite measure, additional convergence results may be obtained via the notion of
uniform integrability. The following proposition motivates the definition.
134 Principles of Analysis
Proof. Suppose that f is integrable. Then A := {|f | = ∞} has measure zero. Set R An :=
{|f | > n}. Then |f | ≥ 1An |f | ↓ 1A |f |, so by the dominated convergence theorem, An |f | →
R
1A |f | dµ = 0, which implies (4.2). R
Conversely, suppose that (4.2) holds. Choose t so that {|f |≥t} |f | dµ < 1. Then
Z Z Z
|f | dµ = |f | dµ + |f | dµ ≤ 1 + t · µ(X) < ∞,
{|f |≥t} {|f |<t}
hence f is integrable.
Note that on an infinite measure space a nonzero constant function trivially satisfies (4.2)
yet is not integrable. Thus the sufficiency of the proposition fails on infinite measure spaces.
With the preceding proposition in mind, we say that a family F of measurable functions
f : X → K is uniformly integrable (u.i.), if
Z
lim sup |f | dµ : f ∈ F = 0. (4.3)
t→∞ {|f |≥t}
By 4.4.1, each member of such a family is integrable. Conversely, by the same proposition,
any finite family of integrable functions is u.i. Moreover, it is trivially the case that any
uniformly bounded family of measurable functions is uniformly integrable.
The following result is sometimes useful for establishing uniform integrability of a family
of functions. The reader should compare the conditions in the theorem with the Vitali
convergence conditions (4.3.4).
4.4.2 Theorem. A family F of measurable functions is u.i. iff the following conditions
hold:
(a) sup{kf k1 : f ∈ F} < ∞.
(b) For each ε > 0 there exists δ > 0 such that sup {kf 1E k1 : f ∈ F} < ε for all E ∈ F
with µ(E) < δ.
Proof. Suppose that F is u.i. Given ε > 0, choose t so that
Z
ε
|f | dµ < for all f ∈ F.
{|f |≥t} 2
Then for E ∈ F and all f ∈ F,
Z Z Z
ε
|f | dµ = |f | dµ + |f | dµ ≤ + t · µ(E).
E E∩{|f |≥t} E∩{|f |<t} 2
Exercises
4.24 Consider Lebesgue measure on B(0, 1]. Show that the sequence of functions fn = n1(0,1/n] is
not u.i. even though the sequence (kfn k1 ) converges.
4.25 Show that {fn } is u.i. iff {fn+ } and {fn− } are u.i.
4.26 Let (fn ) be a sequence of F-measurable functions such that supn |fn |r dµ < ∞ for some r > 1.
R
Strict convexity is defined by replacing weak inequality by strict inequality. Thus a function
a u v b
FIGURE 4.1: A strictly convex function.
is convex iff the line segment connecting any two points on its graph lies above the part of
the graph between the two points. A function f is (strictly) concave if −f is (strictly)
convex.
A function ϕ with an increasing derivative (in particular, a function with a nonnegative
x = (1 − t)u +
second derivative) is convex. Indeed, if tv (0 < t < 1) then, by the mean value
theorem, there exist points y ∈ u, x and z ∈ x, v such that
Luv
Lxv
Luy
u x y v
It remains to verify the inequalities (1)–(3) above. For a < c < d < b, let Lcd denote the
function whose graph is the line segment from (c, ϕ(c)) to (d, ϕ(d)). Since u < x < y < v,
convexity implies that ϕ(x) ≤ Luy (x) and ϕ(y) ≤ Luv (y), hence
ϕ(x) − ϕ(u) Luy (x) − ϕ(u) ϕ(y) − ϕ(u)
≤ = slope of Luy =
x−u x−u y−u
ϕ(y) − ϕ(u) Luv (y) − ϕ(u) Luv (v) − Luv (y)
≤ = slope of Luv = , and
y−u y−u v−y
Luv (v) − Luv (y) Luv (v) − ϕ(y) ϕ(v) − ϕ(y)
≤ = ,
v−y v−y v−y
verifying (1) and (2). A similar argument establishes (3).
4.5.2 Corollary. A convex function is continuous.
4.5.3 Corollary. If a convex function ϕ is differentiable at x ∈ (u, v), then
ϕ0 (x)(t − x) + ϕ(x) ≤ ϕ(t) for all t ∈ (u, v).
That is, the tangent line at (x, ϕ(x)) lies below the graph of ϕ on (u, v).
138 Principles of Analysis
4.5.4 Jensen’s Inequality. Let (X, F, µ) be a probability space and let ϕ : (a, b) → R be
convex. If f : X → (a, b) and f, ϕ ◦ f ∈ L1 , then
Z Z
ϕ f dµ ≤ ϕ ◦ f dµ.
Proof. By 4.5.1(c), for fixed z ∈ (a, b) there exists a constant c such that
R
Taking z = f dµ produces the desired inequality.
Note that the inequality in 4.5.4 reverses for concave functions, as may be seen by
considering −ϕ.
Exercises
Pn
4.28 Prove that for xj , tj > 0 and j=1 tj = 1,
n n
t
Y X
xjj ≤ tj xj .
j=1 j=1
4.29 Use Jensen’s inequality to verify the following for a probability measure µ:
(a) kf kp is increasing on (0, ∞].
(b) kf k1 k1/f kp ≥ 1 (p > 0).
(c) kln f k1 ≤ ln kf k1 , (f > 0).
(d) kf k1 ln kf k1 ≤ ln kf ln f k1 , (f > 0).
4.30 Let µ be a probability measure, 0 < q < ∞, and f ∈ Lq with kf kq > 0. Verify (a)–(e) and
conclude that Z
lim ln kf kp = ln |f | dµ.
p→0
In this chapter we consider countably additive set functions that take values in K, so-called
signed and complex measures. These set functions play an important role in the description
of linear functionals on spaces of continuous functions a topic considered in Chapter 7, as
well as in harmonic analysis, developed in Chapter 16. The main result of the chapter
is the Radon-Nikodym theorem, which establishes in terms of integrals the existence of a
derivative of one measure with respect to another. This notion of measure differentiation is
made concrete for Lebesgue-Stieltjes measures on Rd .
A signed measure on (X, F) is a R-valued set function µ with the following properties:
(a) µ(∅) = 0.
Property (b) is needed to avoid expressions such as ∞ − ∞. Property (c) asserts that µ
is countably additive. This property, together with (a), implies that µ is also finitely
additive, which may be verified by considering sequences with a “tail end” of empty sets.
Note that because the left side of (c) is invariant under permutations of the sequence (An ),
the right side must also have this property. We shall therefore make that assumption.
To emphasize the distinction between signed measures and the set functions considered
in Chapters 1–4, we sometimes refer to the latter as nonnegative measures. Nevertheless,
the unadorned term measure will continue to refer to the nonnegative set functions studied
in previous chapters.
The sum µ1 + µ2 of signed measures µ1 and µ2 is defined by
139
140 Principles of Analysis
For this to be well-defined, the right side must not be of the form ∞ − ∞ or −∞ + ∞. When
dealing with such sums we shall therefore tacitly assume that this restriction holds.
A signed measure µ is said to be finite if µ(X) is finite. Note that in this case µ(A) is
finite for all A ∈ F (use additivity on the sequence {A, Ac }).
5.1.1 Example. Let ν and η be measures on F at least one which, say η, is finite. We
show that µ := ν − η is a signed measure. Properties (a) and (b) are clear. PFor (c) we
consider two
P cases: If µ(A) = ∞, then ν(A) P= ∞ and η(A) < ∞, hence n ν(An ) =
∞ and n η(A n ) < ∞, which implies that
P n µ(A n ) =P∞. If µ(A) is finite, then both
ν(A) and η(A) P are finite and are equal to n ν(A n ) and n η(A n ), respectively. In each
case, µ(A) = n µ(An ). ♦
(d) If also µ+ (E) = µ(E ∩ P1 ) and µ− (E) = −µ(E ∩ P1c ) for some P1 ∈ F, then
µ+ (P 4 P1 ) = µ− (P 4 P1 ) = 0.
We give the proof below. The measures µ+ and µ− in the statement of the theorem are
called the positive and negative variations of µ, and the measure
|µ| := µ+ + µ−
is called the total variation measure of µ. The quantity |µ|(X), which may be infinite,
is called the total variation of µ. The equation µ = µ+ − µ− in (b) is called the Jordan
decomposition of µ. The decomposition of X into a disjoint union of measurable sets P
and P c such that µ ≥ 0 on F ∩ P and µ ≤ 0 on F ∩ P c is called a Hahn decomposition
for µ. Thus part (a) guarantees the existence of a Hahn decomposition (P, P c ) and part (d)
asserts that the decomposition is unique up to a set of total variation measure zero.
R
5.1.3 Example. R Let ν be a measure on F and let f be measurable such that f dν is
defined (hence E f dν is defined for all E ∈ F). Set
Z Z Z
µ(E) = f dν, µ1 (E) = f + dν, and µ2 (E) = f − dν.
E E E
−
Then µ1 and µ2 are measures and µ = µ1 − µ2 . Moreover, f = 0 on P := {f ≥ 0} and
f + = 0 on P c = {f < 0}, hence µ1 (P c ) = µ2 (P ) = 0. Therefore, µ1 ⊥ µ2 . By uniqueness,
µ1 = µ+ and µ2 = µ− , hence the total variation measure of µ is |µ| = |f |ν. ♦
5.1.4 Corollary. If µ is a signed measure, then for all E ∈ F,
X n
|µ|(E) = sup |µ(Ej )| : E1 , . . . , En is a measurable partition of E .
j=1
Differentiation 141
Proof. Let ν(E) denote expression on the right. We show that ν(E) = µ+ (E) + µ− (E). This
is clear if |µ(E)| = ∞, since then ν(E) = ∞ and either µ+ (E) = ∞ or µ− (E) = ∞. Now let
|µ(E)| < ∞, so µ+ (E) < ∞ and µ− (E) < ∞. Let A and B be measurable subsets of E and
set C := A ∩ B. Then
the last inequality because A \ C and B \ C are disjoint subsets of E and so are members of
a measurable partition of E. Therefore, by (b) of the theorem,
For the reverse inequality, let E1 , . . . , En be a measurable partition of E and let A be the
union of those Ej for which µ(Ej ) ≥ 0 and B the union of the remaining Ej . Then
n
X X X
|µ(Ej )| = µ(Ej ) − µ(Ej ) = µ(A) − µ(B) ≤ µ+ (E) + µ− (E).
j=1 j:µ(Ej )≥0 j:µ(Ej )<0
A+ := {A ∈ F : µ ≥ 0 on F ∩ A} and A− := {A ∈ F : µ ≤ 0 on F ∩ A}.
+
Note that the sets contain ∅ and hence are nonempty.
+ S Set a := sup{µ(A) : A ∈ +A } and
choose An ∈ A such that µ(An ) → a. Define P := n An . We claim that P ∈ A , that is,
µ(E) ≥ 0 for all E ∈ F ∩ P . To see this, set B1 := A1 ∩ E, Bn := An ∩ Acn−1 ∩ · · · ∩ Ac1 ∩ E,
P
n ≥ 2. The sets Bn are disjoint members of A+ with union E, hence µ(E) = n µ(Bn ) ≥ 0.
We show next that P c ∈ A− , which will complete the proof of (a). Since P ∈ A+ ,
a ≥ µ(P ) = µ(An ) + µ(P ∩ Acn ) ≥ µ(An ) for all n. Taking limits we see that µ(P ) = a.
Since a ≥ 0 and µ never takes on the value ∞, a is finite. These facts imply that P c ∈ A− .
Indeed, let E ⊆ P c be measurable and suppose for a contradiction that µ(E) > 0. We
claim that there exists and F ∈ A+ such that F ⊆ E and µ(F ) > 0. Assuming this for the
moment, we then have µ(P ∪ F ) = µ(P ) + µ(F ) > µ(P ) = a. On the other hand, since P
and F are disjoint, P ∪ F ∈ A+ and so µ(P ∪ F ) ≤ a. With this contradiction we see that
µ(E) ≤ 0, hence P c ∈ A− .
It remains to verify the claim, namely:
If E ∈ F with µ(E) > 0, then there exists a set F ∈ A+ such that F ⊆ E and µ(F ) > 0.
If E\E1 ∈ A+ , take F = E\E1 . Otherwise, apply the same argument to E\E1 , obtaining a set
142 Principles of Analysis
E2 ∈ F∩(E \E1 ) and n2 ≥ n1 such that µ(E2 ) < −1/n2 . Continue inductively. If the process
stops at some point, we are done. Otherwise, we generate a sequence 1 ≤ n1 ≤ n2 ≤ . . . in
N and disjoint E1 , E2 , . . . ∈ F such that
k−1
[
Ek ⊆ E \ Ej , µ(Ek ) < −1/nk , and
j=1
k−1
[
µ(A) ≥ −1/(nk − 1) for all k with nk > 1 and all A ⊆ E \ Ej . (†)
j=1
S∞ S∞ P∞
Set F := E \ k=1 Ek . Then E \ F = k=1 Ek , hence µ(E \ F ) = k=1 µ(Ek ) < 0.
Because µ(E) is finite, so is µ(E \ F ), hence the series converges and so µ(Ek ) → 0. Since
−µ(Ek ) > 1/nk , nk → ∞. Also µ(F ) > µ(F ) + µ(E \ F ) = µ(E) > 0. Finally, if A ∈ F ∩ F ,
letting k → ∞ in (†) yields µ(A) ≥ 0. Therefore, F ∈ A+ .
Proof of (b): Let s denote the supremum in (b). Since µ+ (E) = µ(E ∩ P ), µ+ (E) ≤ s. For
the reverse inequality, let A ∈ F with A ⊆ E. By definition of µ± ,
Taking the sup over all such A yields s ≤ µ+ (E). This proves the first part of (b). The proof
of the second part is similar.
Proof of (c): The set functions µ± are clearly mutually singular measures and µ = µ+ − µ− .
Suppose also that µ = µ1 − µ2 , where µ1 and µ2 are nonnegative singular measures. Let
E ∈ F. For any measurable A ⊆ E, µ1 (E) ≥ µ1 (A) ≥ µ(A), hence, taking the sup over all
such A, we have µ1 (E) ≥ µ+ (E) by (b). For the reverse inequality, use the mutual singularity
to obtain B ∈ F such that µ1 (B c ) = µ2 (B) = 0. Then
Exercises
5.1 Show that a signed measure µ is finite iff |µ| is finite.
5.2 Let µ1 and µ2 be signed measures such that µ1 + µ2 is defined. Prove that
(µ1 + µ2 )+ + µ− − − + +
1 + µ2 = (µ1 + µ2 ) + µ1 + µ2 .
5.3 Let Q ∈ F have the property that µ(E ∩ Q) ≥ 0 and µ(E ∩ Qc ) ≤ 0 for all E ∈ F. Show that
µ+ (E) = µ(E ∩ Q) and µ− (E) = µ(E ∩ Qc ).
5.4 Let µ be a finite measure and x ∈ X. Find the Hahn decomposition of µ − aδx , where a = µ(X).
5.5 Let µ be a signed measure with Hahn decomposition (P, P c ). Show that (−µ)+ = µ− and
(−µ)− = µ+ and that (P c , P ) is a Hahn decomposition of −µ.
This definition is compatible with the corresponding notion for signed measures (5.1.4). In
the latter case, however, the total variation was immediately seen to be a measure. In the
complex case, some work is required to verify this.
5.2.1 Theorem. If µ is a complex measure, then |µ| is a finite measure and |µ(E)| ≤ |µ|(E)
for all E ∈ F. Moreover, if ν is a complex measure, then |µ + ν| ≤ |µ| + |ν|.
hence
n
X
− + −
|µ(Ej )| ≤ µ+
r (E) + µr (E) + µi (E) + µi (E) < ∞.
j=1
To show countable additivity, let (An ) be a sequence of disjoint measurable sets with
union A and let {E1 , . . . , En } be a measurable partition of A. Then
n
X n
X n X
X ∞ X ∞ X
n
|µ(Ej )| = |µ(A ∩ Ej )| = µ(Ak ∩ Ej ) ≤ |µ(Ak ∩ Ej )|
j=1 j=1 j=1 k=1 k=1 j=1
∞
X
≤ |µ|(Ak ),
k=1
X nk
m X m X
X
|µ|(A) ≥ |µ(Ek,j )| = |µ(E)|.
k=1 j=1 k=1 E∈Pk
Pm
Taking the suprema over Peach of the partitions Pk yields |µ|(A) ≥ k=1 |µ|(Ak ). Since m
∞
was arbitrary, |µ|(A) ≥ k=1 |µ|(Ak ). This establishes countable additivity of |µ|.
The inequality |µ(E)| ≤ |µ|(E) follows directly from the definition of |µ|(E). The proof
of the triangle inequality is an exercise (5.15).
5.2.2 Example. Let ν be a measure R on F and let f : X → C be ν-integrable. Define the
complex measure µ by µ(E) = E f dν, that is, dµ = f dν. We show that d|µ| = |f | dν.
Let {E1 , . . . , En } be an arbitrary measurable partition of E. Then
n
X n Z
X X n Z Z
|µ(Ej )| =
f dν ≤ |f | dν = |f | dν,
j=1 j=1 Ej j=1 Ej E
Differentiation 145
R
hence
R |µ|(E) ≤ E |f | dν. In particular, |µ|{f = 0} = 0, so for the reverse inequality
E
|f | dν ≤ |µ|(E) we may assume that f is never zero, otherwise remove the part of E on
which f = 0. Consider the polar form of z 6= 0, written as |z| = zeiθ(z) , where −π ≤ θ(z) < π.
For each n define
n
X
gn (z) = eiθk 1[θk ,θk+1 ) θ(z) , θk = −π + 2πk/n, k = 0, 1, . . . , n − 1.
k=1
Then zgn (z) → |z| and |gn (z)| = 1. Therefore, fn := gn ◦ f is an F-simple function satisfying
|fn | = 1 and f · fn → |f | on X. Let E ∈ F and let fn have standard form
mn
X
fn = cj 1Aj , |cj | = 1.
j=1
hence Z X
mn
f · fn dν ≤ |cj | |µ(E ∩ Aj )| ≤ |µ|(E).
E j=1
R
Letting n → ∞ and applying the dominated convergence theorem yields E
|f | dν ≤ |µ|(E),
as required. ♦
and similarly d(A0 , B 0 ) ≤ d(A, B). That d is a metric follows easily from the properties of
the L1 norm. To show completeness, let (An ) be a Cauchy sequence in F. Then (1An ) is
a Cauchy sequence in L1 , hence there exists f ∈ L1 such that kf − 1An k1 → 0. Choose a
subsequence (1Ank ) that converges a.e. to f . Then f takes on the values 0 and 1 a.e., hence
f = 1A a.e., where A = {f = 1}. Therefore, d(An , A) = k1An − 1A k1 → 0.
5.2.4 Vitali-Hahn-Saks Theorem. Let (X, F) be a measurable space and (µn ) a sequence
of complex measures on F such that the limit
exists for every A ∈ F. Then µ is countably additive and hence is a complex measure.
146 Principles of Analysis
Proof. The set function µ is clearly finitely additive, and µ(∅) = 0. It remains to show that
µ is continuous from below. To this end, apply the lemma to the finite measure
X∞
1 |µn |(A)
η(A) := n 1 + |µ |(X)
, A ∈ F.
n=1
2 n
For each n, the function A 7→ µn (A) on (F, d) (viewed as a collection of equivalence classes)
is well-defined, since d(A, A0 ) = 0 ⇒ |µn |(A 4 A0 ) = 0 ⇒ µn (A) = µn (A0 ). Moreover, from
|µn (A) − µn (B)| ≤ |µn |(A 4 B) ≤ 2n 1 + |µn |(X) η(A 4 B) = 2n 1 + |µn |(X) d(A, B)
|µk (A) − µm+k (A)| ≤ ε for all k ≥ 1 and all A with d(A, A0 ) < δ. (†)
A = B \ C, C ⊆ B, B 4 A0 ⊆ A, and C 4 A0 ⊆ A,
hence d(B, A0 ), d(C, A0 ) < δ and µn (A) = µn (B) − µn (C). Therefore, for all n ≥ m,
Since ε was arbitrary, limn µ(E \ En ) = 0, which shows that µ is continuous from below.
so (νn (E)) is a Cauchy sequence in C. Let νn (E) → ν(E). By the Vitali-Hahn-Saks theorem,
ν is a complex measure. For any measurable partition E1 , . . . , Ep of X,
p
X m
X p
X X
|νm (Ej ) − νn (Ej )| ≤ |µk (Ej )| ≤ kµk k ,
j=1 k=n+1 j=1 k>n
Pp P
and letting m → ∞ weP have j=1 |ν(Ej ) − νn (Ej )| ≤ k>n kµk k . Since the partition was
arbitrary, kν − νn k ≤ k>n kµk k. Therefore, kνn − νk → 0, proving that M is complete.
We summarize this discussion in
5.2.5 Proposition. The linear space M (X, F) of complex measures on a measurable space
(X, F) is a Banach space under the total variation norm.
It is straightforward to check that in each case the integrals are well-defined, linear, and
satisfy Z Z Z Z
f dµ ≤ |f | d|µ| and f dµ = f dµ (5.4)
(Ex. 5.16). Moreover, the dominated convergence theorem holds for a signed or complex
measure µ, as may be seen by decomposing µ into a linear combination of the measures µ±
r,i .
Exercises
5.14 Show that if µ is a complex measure and kµk = µ(X) < ∞, then µ is a nonnegative measure.
5.15 Verify the inequality |µ + ν| ≤ |µ| + |ν| for complex measures.
5.16 Let µ be a signed or complex measure. Verify that the integral with respect to µ is well-defined
and linear. Also, verify the assertions in (5.4).
5.17 Show that in the definition of |µ|, the finite measurable partition E1 , . . . , En may be replaced a
countable measurable partition.
5.18 Let µ be a complex measure and E ∈ F. Prove that
Z
|µ|(E) = sup f dµ : f is measurable and |f | ≤ 1 .
E
5.19 Let µ and ν be complex measures on measurable spaces (X, F) and (Y, G), respectively.
(a) Show that there exists a unique complex measure µ × ν on F ⊗ G such that (µ ⊗ ν)(A × B) =
µ(A)ν(B) for all A ∈ F and B ∈ G.
5.20 Let (X, F) and (Y, G) be measurable spaces and T : (X, F) → (Y, G) measurable. If µ is a signed
or complex measure on (X, F), then the image of µ under T is the signed of complex measure
T (µ) on (Y, G) defined as before by T (µ)(E) = µ T −1 (E) , E ∈ G. Show that in the signed case
|T (µ)| ≤ |T |(µ), (T µ)+ ≤ T (µ+ ), and (T µ)− ≤ T µ− , and in the complex case |T (µ)| ≤ |T |(µ).
Proof. (a) Let ν µ, µ(E) = 0, and A ⊆ E measurable. Then, µ(A) = 0, hence ν(A) = 0.
Therefore, by 5.1.2, ν + (E) = ν − (E) = 0, hence also |ν|(E) = 0. The converses are clear.
(b) This follows from |νr,i | ≤ |ν| ≤ |νr | + |νi | and (a).
(c) For the signed case, let |η|(E c ) = µ(E) = 0. Since ν µ, ν(A) = 0 for all measurable
A ⊆ E, hence |ν|(E) = 0. Therefore, |ν| ⊥ |η|. The complex case is obtained by using νr,i .
(d) By (c), |ν| ⊥ |ν|, that is, |ν|(E) = |ν|(E c ) = 0 for some E. Therefore, |ν| = 0.
5.3.2 Proposition. Let µ be a measure and ν a complex measure. Then ν µ iff
limµ(E)→0 ν(E) = 0 for all E ∈ F.
Proof. The limit assertion means that for every ε > 0 there exists δ > 0 such that |ν(E)| < ε
for all E ∈ F with µ(E) < δ. Suppose this holds. If µ(E) = 0, then the δ-inequality holds
trivially, hence |ν(E)| < ε for all ε and so ν(E) = 0. Therefore, ν µ.
Conversely, suppose ν µ. Then by 5.3.1(b), |ν| µ. If we show that limµ(E)→0 |ν|(E) =
0, then the inequality, |ν(E)| ≤ |ν|(E) will imply that limµ(E)→0 ν(E) = 0. Thus we may
assume without loss of generality that ν is nonnegative. Suppose that the ε-δ condition
does not hold. Then there exists ε > 0 and for each n ∈ N a measurable set En such that
Differentiation 149
P
µ(En ) < 1/2n and ν(En ) ≥ ε. Let E = limn En . Since n µ(En ) < +∞, µ(E) = 0 (Ex. 1.37).
But by continuity from above (since ν is finite), ν(E) ≥ limn ν(En ) ≥ ε, contradicting the
assumption that ν µ.
5.3.3 Remark. The necessity of 5.3.2 does not necessarily P hold if ν is not finite. For
example, let ν be counting measure on N and let µ(E) := n∈E 1/2n . Clearly ν µ. On
the other hand, if An := {n, n + 1, . . .} then µ(An ) → 0 but ν(An ) = ∞ for all n. ♦
Therefore,
dν
d|ν| = 1, |ν| a.e.
150 Principles of Analysis
Then µf µ on G and
Z Z
dµf
dµ = f dµ for all E ∈ G. (5.6)
E dµ E
The salient point here is that dµf / dµ has the same integral property as f but is G-measurable,
while, of course, f need not be. If µ is a probability measure, then dµf / dµ is called the
conditional expectation of f given G, studied in detail in Chapter 18. For a concrete
example, let (X, F, µ) be the product of the probability spaces (X1 , F1 , µ1 ) and (X2 , F2 , µ2 )
and take G = F1 × X2 so that
Z Z Z
f dµ = f (x1 , x2 ) dµ2 (x1 ) dµ1 (x1 ), E1 ∈ F1 .
E1 ×X2 E1 X2
where we have omitted the redundant argument x2 in dµf / dµ. Viewing X1 × X2 as the
set of outcomes of a two-stage experiment and taking x1 as the outcome of stage one, we
see that dµf / dµ(x1 ) is the average of f over the possible outcomes of stage two. Thus if
the σ-field F1 is interpreted as “given information,” namely as the information revealed
after the first stage, then dµf / dµ incorporates both the “known” and an average over the
“unknown.” Therefore, dµf / dµ may be interpreted as the best information regarding f that
is available after stage one but before stage two. ♦
R
To see this, let s denote the supremum on the right and let fn ∈ F such that fn dµ → s.
Replacing fn by f1 ∨ · · · ∨ fn if necessary, we may assume that fn ↑ h for some measurable
h ≥ 0. By the monotone convergence theorem,
Z Z
h dµ = lim fn dµ ≤ ν(E), E ∈ F.
E n E
R
Therefore, h ∈ F and s = E
h dµ, verifying the claim.
Now define Z
η(E) := ν(E) − h dµ, E ∈ F.
E
Since h ∈ F, η(E) ≥ 0 for every E ∈ F. Therefore, η is a finite measure. The proof of the
theorem for Case I will be complete once we show that η(X) = 0. Let r > 0 and let (P, P c )
be a Hahn decomposition for the signed measure η − rµ. Since (η − rµ)(E ∩ P ) ≥ 0,
Z Z Z
ν(E) = h dµ + η(E) ≥ h dµ + rµ(E ∩ P ) = (h + r1P ) dµ, E ∈ F,
E E E
But if µ(B c ∩ E) = 0, then by absolute continuity ν(B c ∩ E) = 0, hence both sides of (‡)
are zero. On the other hand, if µ(B c ∩ E) > 0, then the right side is ∞. In this case the
left
side must be ∞, since
also otherwise B ∪ (B c ∩ E) would be in A, impossible because
c c
µ B ∪ (B ∩ E) = s + µ(B ∩ E) > s.
Case IV. µ is σ-finite and ν is an arbitrary measure.
The proof is similar to that of Case II. The details are left to the reader.
Case V. µ is σ-finite and ν is an arbitrary signed measure.
Apply Case IV to ν + and ν − to obtain nonnegative measurable functions h1 and h2 such
that Z Z
ν + (E) = h1 dµ and ν − (E) = h2 dµ, E ∈ F.
E E
+ −
Since ν (X) and ν (X) are not both infinite, one of the hj is µ-integrable. Taking h := h1 −h2
produces the desired result.
Case VI. µ is σ-finite and ν is an arbitrary complex measure.
Apply Case V to νr and νi . The details are left to the reader.
Lebesgue-Decomposition of a Measure
The following result, a consequence of the Radon-Nikodym theorem, asserts that for
a suitable pair of measures µ and ρ, the former may be decomposed into parts that are,
respectively, absolutely continuous and singular with respect to the latter. This decomposition
will lead to an important result in the next section regarding the derivative of a Lebesgue-
Stieltjes measure on Rd .
5.3.7 Lebesgue Decomposition Theorem. Let ρ be a σ-finite measure and µ a signed
(resp., complex) measure on (X, F) such that |µ| is σ-finite. Then there exist unique signed
(resp., complex) measures µa and µs such that µ = µa + µs , µa ρ, and |µs | ⊥ ρ.
Furthermore, if µ is a measure, then so are µa and µs .
Proof. Suppose first that µ is a measure. Consider the σ-finite measure m = ρ+µ. By 5.3.6(a)
there exists a measurable function h (0 ≤ h ≤ 1) such that for all E ∈ F
Z Z
µ(E) = h dm and ρ(E) = (1 − h) dm.
E E
Define
µa (E) = µ E ∩ {h < 1} and µs (E) = µ E ∩ {h = 1} .
Clearly, µa + µs = µ. If ρ(E) = 0, then h = 1 m-a.e. and hence also µ-a.e. on E and so
µa (E) = 0. Therefore, µa ρ. Since µs (h < 1) = 0 = ρ(h = 1), µs ⊥ ρ. This proves the
theorem for the case µ a measure.
If µ is a signed measure, then, by the previous paragraph, there exist measures µa1 , µa2
and µs1 , µs2 such that
µ+ = µa1 + µs1 , µa1 ρ, |µs1 | ⊥ ρ, and µ− = µa2 + µs2 , µa2 ρ, |µs2 | ⊥ ρ.
Set µa = µa1 − µa2 and µs = µs1 − µs2 . Clearly, µa ρ. Also, if |µsj |(Ej ) = ρ(Ejc ) = 0,
then |µs |(E1 ∩ E2 ) ≤ |µs1 |(E1 ∩ E2 ) + |µs2 |(E1 ∩ E2 ) = 0 and ρ((E1 ∩ E2 )c ) = 0, so |µs | ⊥ ρ.
A similar argument proves the complex case.
For uniqueness, assume that
µ = µ0a + µ0s , where µ0a ρ and |µ0s | ⊥ ρ.
Then, µa − µ0a = µs − µ0s , hence the common value is both absolutely continuous and singular
with respect to ρ and so must be zero (5.3.1(d)).
Differentiation 153
5.3.8 Remark. The conclusion of the theorem is false if |µ| is not σ-finite. For example,
take ρ = λ and let µ = counting measure on B[0, 1]. Suppose that µ = µa + µs , where
µa λ and µs ⊥ λ. Then µs (Ac ) = λ(A) = 0 for some A ∈ B[0, 1] and µa {x} = 0 for all x.
Since µs {x} = µs {x} + µa {x} = µ{x} = 1, Ac = ∅. But then A = [0, 1], impossible. ♦
Exercises
5.21 Let µ and ν be finite measures with ν µ and let a > 0. Find a Hahn decomposition of ν − aµ
in terms of h = dν/dµ.
5.22 Let p > 0 and define ν(E) = E xp dλ(x), E ∈ B[1, ∞). Show that ν λ, but the limit
R
5.23 Let f be the (increasing, continuous) Cantor function and let ν be the probability measure on
B[0, 1] with distribution function f . Show that ν ⊥ λ.
5.24 Let ν1 and ν2 be complex measures, µ a σ-finite measure, and c1 , c2 ∈ C. Show that if ν1 µ
and ν2 µ, then c1 ν1 + c2 ν2 µ and
d(c1 µ1 + c2 µ2 ) dµ1 dµ2
= c1 + c2 .
dµ dµ dµ
5.25 Let µ1 and µ2 be signed measures. Find a Hahn decomposition for µ1 + µ2 . JConsider Radon-
Nikodym derivatives.K
5.26 Let µ be a σ-finite measure and ν a signed or complex measure with ν µ. Show that
d|ν| dν
= .
dµ dµ
5.27 Let µj be σ-finite measures with µ1 µ2 and µ2 µ3 . Prove:
dµ1 dµ1 dµ2
= , µ3 -a.e.
dµ3 dµ2 dµ3
−1
dν dν dν
5.28 Let σ-finite measures with ν µ. Show that = 1+ .
d(µ + ν) dµ dµ
5.29 [↑ 5.13] Let µ be a measure and let µ1 and µ2 be a finite signed measure with µj µ. Show
that (µ1 ∨ µ2 ) µ and
d(µ1 ∨ µ2 ) dµ1 dµ2
= ∨ .
dµ dµ dµ
Show conversely that if (µ1 ∨ µ2 ) µ, then µ1 µ and µ2 µ. Formulate and prove the
analogous assertions for µ1 ∧ µ2 .
5.30 Let µ1 and µ2 be finite measures. Show that µ1 ⊥ µ2 iff µ1 ∧ µ2 = 0 iff µ1 ∨ µ2 = µ1 + µ2 . JIn
one direction use Ex. 5.8.K
5.31 Two σ-finite measures µ and ν are said to be equivalent if µ ν and ν µ, that is, µ and ν
have the same sets of measure zero.
(a) Show that µ and ν are equivalent iff there exists a finite, positive, measurable function h
such that ν = hµ.
(b) Show that every σ-finite measure µ is equivalent to some probability measure ν. JConsider
an infinite series of measures.K
5.32 For j = 1, 2, let µj and νj be nontrivial σ-finite measures on (Xj , Fj ). Show that ν1 ⊗ν2 µ1 ⊗µ2
iff ν1 µ1 and ν2 µ2 , in which case
d(ν1 ⊗ ν2 ) dν1 dν2
(x1 , x2 ) = (x1 ) (x2 ).
d(µ1 ⊗ µ2 ) dµ1 dµ2
154 Principles of Analysis
5.33 Let T : (X, F) → (Y, G) be measurable, µ a σ-finite measure and ν a complex or signed measure
on X such that ν µ. (a) Show that T (ν) T (µ). (b) Suppose also that T −1 : (Y, G) → (X, F)
exists and is measurable. Prove that
d T (ν) dν
= ◦ T −1 .
d T (µ) dµ
∞
dµ X dηn
= ν-a.e.
dν n=1
dν
5.36 Let µ, ν, νn be measures with µ σ-finite and νn (E) ↑ ν(E) for every E ∈ F. Show that ν µ
iff νn µ for all n, in which case
dν dνn
= lim µ a.e.
dµ n dµ
P
5.37 [↓ 5.5.9] Let µ and µn be finite measures with µ = n µn and let µn = µna +µns and µ = µa +µs
be the Lebesgue decompositions with respect to a σ-finite measure ρ. Show that
X X
µa = µna and µs = µns .
n n
Let µ be a signed measure on B(Rd ) which is finite on bounded sets. We shall call such a
measure a Lebesgue-Stieltjes signed measure. For each x ∈ Rd and r > 0, let B(x, r)
denote the collection of all open balls containing x and with radius less than r. Define
µ(B) µ(B)
D(µ; x, r) := sup : B ∈ B(x, r) and D(µ; x, r) := inf : B ∈ B(x, r) ,
λ(B) λ(B)
Differentiation 155
where for simplicity of notation we set λ = λd . Note that for fixed x, the functions D(µ; x, r)
and D(µ; x, r) decrease and increase, respectively, as r ↓ 0. Moreover, for each c and r the
sets {x : D(µ; x, r) > c} and {x : D(µ; x, r) < c} are open (Ex. 5.38). Thus D(µ; x, r) and
D(µ; x, r) are Borel measurable in x for fixed r.
Now define the upper and lower derivates D µ and D µ of µ by
D µ(x) := lim D(µ; x, r) = inf D(µ; x, r), D µ(x) := lim D(µ; x, r) = sup D(µ; x, r).
r→0+ r>0 r→0+ r>0
Then
D(µ; x, r) ↑ D µ(x) and D(µ; x, r) ↓ D µ(x) as r ↓ 0,
so by the preceding observations the functions D µ and D µ are Borel measurable. If D µ(x)
and D µ(x) are finite and equal, then µ is said to be differentiable at x. In this case, the
common value is denoted by Dµ(x) and is called the derivative of µ at x. Note that the
inequalities
µ B(x, r)
D(µ; x, r) ≤ ≤ D(µ; x, r)
λ B(x, r)
imply that
µ B(x, r) µ B(x, r)
D µ(x) ≤ lim ≤ lim ≤ D µ(x).
r→0 λ B(x, r) r→0 λ B(x, r)
Letting r → 0, we obtain
µ(Bn ) µ(Bn )
Dµ(x) ≤ lim ≤ lim ≤ Dµ(x).
n λ(Bn ) n λ(Bn )
F (x + h) − F (x) F (x + h) − F (x)
D µ(x) = lim inf and D µ(x) = lim sup .
r→0+ 0<|h|≤r h r→0+ 0<|h|≤r h
Proof. Note first that F has at most countably many discontinuities. This follows from
µ = µ+ − µ− , allowing us to write F (b) − F (a) = [F+ (b) − F+ (a)] − [F− (b) − F− (a)], where
F± are distribution functions, hence nondecreasing, and so have at most countably many
discontinuities (Ex. 1.84).
We prove only the equality for D µ(x). (The other equality follows by considering −µ.)
Define
F (x + h) − F (x)
f (r) := inf and f (0+) := lim+ f (r).
0<|h|≤r h r→0
F (x + (1 + tn )hn ) − F (x − tn hn )
< a.
hn
Since F has at most countably many discontinuities, we may take tn so that F is continuous
at x + (1 + tn )hn . Setting Bn = (x − tn hn , x + (1 + tn )hn ) and rn = hn (1 + 2tn )/2 (the
radius of the interval Bn ), we then have
µ(Bn ) F (x + (1 + tn )hn ) − F (x − tn hn ) a
D(µ, x, rn ) ≤ = < .
λ(Bn ) hn (1 + 2tn ) 1 + 2tn
F (x) − F (x − kn )
< a.
kn
By right continuity at x, there exists 0 < tn < 1/n such that
F (x + tn kn ) − F (x − kn )
< a.
kn
Setting Bn = (x − kn , x + tn kn ) and rn = kn (1 + tn )/2 we then have
µ(Bn ) F (x + tn kn ) − F (x − kn ) a
D(µ, x, rn ) ≤ = < .
λ(Bn ) kn (1 + tn ) 1 + tn
Differentiation 157
JChoose the notation so that the radius of Bi decreases as i increases. Let k1 = 1, and
successively choose ki ∈ N such that ki+1 is the smallest index j > ki for which Bj is
disjoint from Bk1 ∪ · · · ∪ Bki . Let km be the index for which the process stops, so that
the collection {Bk1 , . . . , Bkm } is disjoint. By choice of km , if j > km or ki < j < ki+1 ,
then Bj ∩ Bkq 6= ∅ for some q with kq < j. Thus for each j = 1, . . . , n there exists kq ≤ j
such that Bj ∩ Bkq 6= ∅. Now let Akq be the ball with the same center as Bkq and with
triple the radius.
Sn j ≥ kqS, m
Since the radius of Bj is no larger than that of Bkq , hence Bj ⊆ Akq . Therefore
B
j=1 j ⊆ i=1 Aki , and the desired inequality follows from the dilation property of λ.K
158 Principles of Analysis
Akq
Bj
Bkq
(2) Let µ be nonnegative, c > 0, and K ⊆ {Dµ > c} compact. Then cλ(K) ≤ 3d µ(K).
JLet r > 0. For each x ∈ K, choose B ∈ B(x, r) such that µ(B)/λ(B) > c. By
compactness, there exists a finite subcover B1 , . . . , Bn of K of such balls. Choose
{Bk1 , . . . , Bkm } as in step (1). Then
X m m [m
3d X 3d
λ(K) ≤ 3d λ(Bki ) ≤ µ(Bki ) = µ Bki .
i=1
c i=1 c i=1
Since Bki has radius < r and meets K it must be contained in Ur := {x : d(x, K) < 2r}.
Therefore, λ(K) ≤ (3d /c)µ(Ur ). Letting r ↓ 0 yields λ(K) ≤ (3d /c)µ(K).K
(3) If µ is nonnegative and µ(E) = 0, then Dµ(x) = 0 for λ-a.a. x ∈ E.
JLet c > 0 and B := E ∩ {D µ > c}. We show that λ(B) = 0. By regularity, it suffices
to show that λ(K) = 0 for any compact K ⊆ B. But this follows from step (2), since
λ(K) ≤ (3d /c)µ(K) ≤ (3d /c)µ(E) = 0.K
hence
µ(B) ρ(B)
≤ + t.
λ(B) λ(B)
Since B was arbitrary, D µ ≤ D ρ + t. Set E := {h < t}. Then ρ(E) = 0, so by step (3)
applied to ρ, Dρ(x) = 0 for λ-a.a x ∈ E. Therefore, D µ(x) ≤ t for λ-a.a. x ∈ E, as
required.K
The desired equality now follows by applying step (4) to µs and step (5) to µa .
dµa
5.4.6 Corollary. Let µ and F be as in 5.4.4. Then F 0 = λ-a.e.
dλ
Differentiation 159
Exercises
5.38 Let µ be a Lebesgue-Stieltjes signed measure on B(Rd ). Show that the sets {x : D(µ; x, r) > c}
and {x : D(µ; x, r) < c} are open.
5.40 Prove that if µ is a nonnegative measure, then E (Dµ) dλ ≤ µ(E) for all E ∈ B(Rd ).
R
5.42 Let f be locally integrable on Rd . Verify (a)–(c) to prove the Lebesgue differentiation theorem:
For λ-a.e. x, Z
1
lim |f (y) − f (x)| dy = 0
r→0 λ B(x, r) B(x,r)
(c) Set N = a∈Q Na . Let ε > 0, x ∈ N c and choose a ∈ Q such that |f (x) − a| < ε. There
S
exists δ > 0 such that for all r < δ,
Z
1
|f (y) − f (x)| dy < 2ε.
λ B(x, r) B(x,r)
Note that if Q ⊇ P, then, by the triangle inequality, VI,Q (f ) ≥ VI,P (f ). The total variation
of f on I is the extended real number
where the supremum is taken over all ordered subsets P of I. We say that f has bounded
variation on I if VI (f ) < ∞. We shall mainly be concerned with the cases I = R and
I = [a, b]. For the latter, we may assume in (5.9) that P is a partition of [a, b], as the
supremum does not change by adjoining the points a and b. We denote set of all functions
with bounded variation on I by BV (I).
By the mean value theorem, a real-valued function with a bounded derivative has bounded
variation on bounded intervals. In particular, sin x has bounded variation on any bounded
interval (but not on R: consider partition points (2k + 1)π/2).
The following proposition summarizes the elementary properties of BV (I). The proof is
left as an exercise for the reader (5.44).
5.5.1 Proposition. Let I be any interval.
(a) A bounded, monotone function f on R has bounded variation.
By the proposition, the difference of two bounded monotone functions on I has bounded
variation on I. The converse also holds:
5.5.2 Proposition. If f ∈ BV (I) is real-valued, then there exist nondecreasing functions g
and h on I such that f = g − h. In particular, f is a Borel function.
Proof. For definiteness, we take I = R. For x ∈ R, define
so that f = g − h. Clearly, g is increasing. To see that h is increasing, let a < x < y, let Px
be an arbitrary partition of [a, x], and set Py := Px ∪ {y}. Then
Taking the supremum over all partitions Px yields V[a,x] (f ) + f (y) − f (x) ≤ g(y). Since a
was arbitrary, g(x) + f (y) − f (x) ≤ g(y), that is, h(x) ≤ h(y).
Since monotone functions have at most countably many discontinuities (Ex. 1.84), we
have
5.5.3 Corollary. If f ∈ BV (R), then f has at most countably many discontinuities.
Differentiation 161
Tf (x); = V(−∞,x] (f ), x ∈ R.
Clearly, Tf is increasing, hence has bounded variation on any bounded interval. The theorem
below makes a connection between the total variation function and the total variation
measure of a complex measure. For the proof we need the following lemmas.
5.5.4 Lemma. Let f ∈ BV (R). Then for x < y, Tf (y) − Tf (x) = V(x,y] (f ).
Proof. Set T := Tf . Note first that for the sets P = {x0 < x1 < · · · < xn = y} ⊆ (−∞, y]
implicit in the definition of T (y), we may assume that x0 ≤ x, otherwise simply adjoin a
suitable point to P, increasing the P-variation of f but not altering T (y). Choose k so that
xk ≤ x < xk+1 and set
Since |f (x1 ) − f (x0 )| ≤ |f (x1 ) − f (x)| + |f (x0 ) − f (x)| < 2ε, we have
n
X n
X
|f (xj ) − f (xj−1 )| = |f (xj ) − f (xj−1 )| − |f (x1 ) − f (x0 )| ≥ T (y) − T (x) − 3ε. (†)
j=2 j=1
Since the left side is ≤ V(x,y] = T (y) − T (x), we see that T (x1 ) − T (x) ≤ 4ε. Letting x1 ↓ x
yields T (x+) − T (x) ≤ 4ε. Therefore, T (x+) = T (x), as required.
5.5.6 Corollary. Let f ∈ BV (R) be right continuous. Then f = g − h, where g and h are
distribution functions.
162 Principles of Analysis
For the reverse inequality, let E1 , . . . , Ek be a measurable partition of (a, b]. Given ε > 0,
by regularity of |µ| (1.8.1) there exist compact sets Kj ⊆ Ej such that |µ|(Ej \ Kj ) < ε/k,
hence X X
|µ(Ej )| ≤ ε + |µ(Kj )|. (†)
j j
Since the sets Kj are disjoint, there exist disjoint open sets Uj ⊇ Kj . Each Uj is a countable
union of disjoint open intervals (ajn , bjn ), hence
X XX
|µ(Kj )| ≤ |µ (ajn , bjn ] ∩ (a, b] |.
j j n
Define X
ν := lim h(xn )δxn .
m→∞
|xn |<m
By (†), ν is a Lebesgue-Stieltjes measure on R. Moreover, for any a < x < y < b, the quantity
h(x) + h(y) is zero unless x or y is one of the discontinuity points of f , in which case its
value is at most h(xm ) + h(xn ) for some m and n. Therefore, h(x) + h(y) ≤ ν(a, b) for all
a < x < y < b. For r > 0, we then have
h(x + r) − h(x) h(x + r) + h(x) ν(x + 2r, x − 2r) ν B(x, 2r)
≤ ≤ =4 ,
r r r λ B(x, 2r)
hence
h(x + r) − h(x) ν B(x, r)
.
lim
r→0 r ≤ 4 r→0
lim
λ B(x, r)
Since λ{x1 , x2 , . . .} = 0 = ν{x1 , x2 , . . .}c , ν ⊥ λ, hence the right side of the preceding
inequality is zero for λ-a.a x (5.4.5). Therefore, limr→0 r−1 |h(x + r) − h(x)| = 0 for λ-a.a. x,
completing the proof.
5.5.9 Corollary. Let (f P functions on R such
Pn ) be a sequence of nonnegative, nondecreasing
that the series f (x) := n fn (x) converges for all x. Then f 0 (x) = n fn0 (x) λ-a.e.
P
Proof. Let g and gn correspond to f and fn as in the theorem. Then g(x) = n gn (x), as
is easily verified, hence we may assume that f and fn are distribution functions on R. PLet
µn and µ be the corresponding Lebesgue-Stieltjes measures. By the hypothesis,
P µ = µn
on intervals (a, b] hence, by the uniqueness theorem for measures, µ = µn on B(R). Let
µn = µna + µnsPand µ = µa + µs be the Lebesgue decompositions with respect to λ. By
Ex. 5.37, µa = n µna so
Z X Z X
dµa dµna
dλ = µa (E) = µna (E) = dλ for all E ∈ B(R),
E dλ n E n dλ
hence
dµa X dµna
= λ-a.e.
dλ n
dλ
The assertion now follows from 5.4.6.
Exercises
5.44 Prove 5.5.1.
5.45 Let f, g ∈ BV (I) be real-valued. Prove that f ∨g, f ∧g ∈ BV (I). Show also that if |f (x)| ≥ c > 0
for all x, then 1/f ∈ BV (I).
5.46 Show that if E ⊆ R and E c are dense in R, then 1E 6∈ BV (I) for all intervals I.
5.48 Let F ∈ BV (R) be right continuous and suppose that F (−∞) exists and is finite. Show that
F 0 ∈ L1 (λ). JMay assume F is a distribution function. Define a finite measure µ on B(R) so
that µ(−∞, x] = F (x) − F (−∞). K
164 Principles of Analysis
a x1 x2 x3 x4 x5 x6 ··· b
xk 0 xk 1 xk 2
δ
δ
FIGURE 5.2: Construction of the sequence xki .
choose an index k > ki for which xk − xki−1 < δ. Thus if ` denotes the length of I, then
m−1
X
` ≥ b − a ≥ xkm−1 − a = (xki − xki−1 ) ≥ (m − 1)δ/2. (†)
i=1
Pki −1
Since j=ki−1 (xj+1 − xj ) = xki − xki−1 < δ, we have by absolute continuity
i −1
kX
|f (xj+1 ) − f (xj )| < 1.
j=ki−1
Note that the inclusion in the proposition is always strict (see 5.6.7) and is clearly false
for unbounded intervals.
The next result complements 5.5.2.
5.6.3 Proposition. Let I be an arbitrary interval and let f ∈ AC(I) ∩ BV (I) be real-valued.
Then there exist monotone increasing functions g, h ∈ AC(I) ∩ BV (I) such that f = g − h.
Proof. As in the proof of 5.5.2, for definiteness we take I = R. It suffices to show that
the function g(x) := V(−∞,x] (f ) = Tf (x) in that proof is absolutely continuous, since
then h = g − f will also be absolutely continuous. Let ε > 0 and let δ correspond to
εPin the definition of absolute continuity of f . Let (a1 , b1 ), . . . , (an , bn ) be disjoint with
n
j=1 (bj − aj ) < δ. For each j, let Pj be a partition of [aj , bj ]. The open intervals formed
by the totality
Pn of these partitions are disjoint and have total length < δ, hence, by absolute
continuity, j=1 VPj (f ) < ε. Taking the supremum over P1 , . . . , Pn and using 5.5.4 yields
n
X n
X
[g(bj ) − g(aj )] = V(aj ,bj ] (f ) ≤ ε.
j=1 j=1
Thus µ(E) ≤ µ(U ) ≤ ε and so µ(E) = 0. The last assertion follows from 5.4.6.
Note that condition (a) in the preceding theorem implies that f is absolutely continuous.
Weakening this condition by requiring f to be merely absolutely continuous yields the
following version of the fundamental theorem of calculus.
5.6.6 Theorem. Let f : [a, b] → C. The following are equivalent:
(a) f is absolutely continuous.
Pε = {ε < ap < bp < ap−1 < · · · < ak < bk < · · · < bq+1 < aq < bq < 1}
of [ε, 1], where p and q are, respectively, the largest and smallest integers satisfying the
inequalities ε < ap < bq < 1, or equivalently
1 2 − πε
<q<p< .
2π 4πε
c
From f (ak ) − f (bk ) = we have
(4k + 1)α
p
X 1
V[0,1] (f ) ≥ V[ε,1] (f ) ≥ VP (f ) ≥ c .
(4k + 1)α
k=q
By choosing
P∞ ε arbitrarily small, the upper limit p of the sum may be made arbitrarily large.
Since k=1 (4k + 1)−α diverges, V[0,1] (f ) = ∞. ♦
Exercises
5.49 Show that f, g ∈ AC[a, b] ⇒ f g ∈ AC[a, b].
5.50 Let p, q > 0. Show that the function f (x) := xp sin(x−q ) (x > 0), f (0) = 0, is absolutely
continuous on [0, 1] iff p > q.
5.51 The Cantor function f is an example of a continuous nondecreasing function on [0, 1] with
f 0 = 0 a.e. Extend the Cantor function to a nondecreasing P function on R by defining f(x) = 0,
x ≤ 0, and f (x) = 1, x ≥ 1. Define g on [0, 1] by g(x) = ∞ n=1 2
−n
f (x − an )/(bn − an ) , where
the [an , bn ] are the closed intervals in [0, 1] with rational endpoints. Show that g is continuous,
strictly increasing, and g 0 = 0 a.e.
5.52 Let f ∈ AC[c, d] and let g : [a, b] → [c, d] be strictly increasing with g([a, b]) = [c, d]. Show that
g ∈ AC[a, b] ⇒ f ◦ g ∈ AC[a, b]. Give an example of a strictly increasing function g for which
f ◦ g ∈ BV [a, b] \ AC[a, b] for nontrivial f ∈ AC[c, d].
The subject of this chapter plays an important role in many areas of science, technology, and
mathematics, including quantum physics, image processing, probability theory, statistics,
and differential equations. We begin with the notion of convolution, which is central to
Fourier analysis.
The basic properties of convolution are summarized in the following proposition. Note that
parts (a)–(e) of the proposition collectively assert that L1 (Rd ) is a commutative Banach
algebra under convolution.
6.1.1 Proposition. Let f, g, h ∈ L1 (Rd ), c ∈ C, φ ∈ Cc∞ (Rd ), and α a multi-index. Then
convolution f ∗ g is well-defined, f ∗ g ∈ L1 (Rd ), and the following hold.
Proof. To see that convolution is well-defined, note first that since the function (x, y) → x−y
is Borel measurable, the integrand is measurable in (x, y). Thus if f, g ≥ 0, the integral
exists for all x. The inequality kf ∗ gk1 ≤ kf k1 kgk1 , proved next, shows that f ∗ g is finite
a.e. and in L1 . Considering real and imaginary parts and then positive and negative parts,
we see that f ∗ g ∈ L1 for every f, g ∈ L1 .
(a) By Fubini’s theorem and translation invariance,
Z ZZ ZZ
|(f ∗ g)(x)| dx ≤ |f (x − y)g(y)| dx dy = |f (x)g(y)| dx dy = kf k1 kgk1 .
169
170 Principles of Analysis
Approximate Identities
The Banach algebra L1 (Rd ) does not possess an identity, that is, there is no function
e such that f ∗ e = e ∗ f for all f ∈ L1 (Ex. 6.2). However, L1 (Rd ) has an approximate
identity, as described in the following.
R
6.1.2 Theorem. Let φ ∈ L1 (Rd ) with φ dλd = 1. For n ∈ N and x ∈ Rd define φn (x) :=
Lp
nd φ(nx). If 1 ≤ p < ∞, then f ∗ φn → f for all f ∈ Lp . The same conclusion holds for
p = ∞ if f is uniformly continuous and bounded.
Proof. Let Tz denote translation by −z, that is, Tz f (x) = f (x −z). By the dilation property
of λd ,
Z Z
f ∗ φn (x) − f (x) = nd f (x − y)φ(ny) dy − f (x)φ(y) dy
Z
= Ty/n f (x) − f (x) φ(y) dy. (†)
Therefore,
Z Z p 1/p
kf ∗ φn − f kp ≤ Ty/n f (x) − f (x)|φ(y)| dy dx
Z Z 1/p
≤ Ty/n f (x) − f (x)p dx |φ(y)| dy
Z
= kTy/n f − f kp |φ(y)| dy,
the second inequality by 4.1.5. Since kTy/n f − f kp → 0 (4.2.3) and kTy/n f − f kp |φ(y)| ≤
2kf kp |φ(y)|, the dominated convergence theorem implies that kf ∗ φn − f kp → 0. This proves
the first part of the theorem. The second part follows from (†), since by uniform continuity
kTy/n f − f k∞ → 0.
6.1.3 Remark. The function φ in the statement of the theorem may be taken to be C ∞
with support contained in a given compact interval. To see this, let h : Rd → [0, +∞) be
a C ∞ function such that
R −1 h > 0 on (−a, a), and h = 0 on (−a, a)c , where Ra = (1, . . . , 1)
(0.14.4). Then φ := ( h) h is C ∞ with support contained in [−a, a] and φ = 1. As a
consequence, given ε = (ε, . . . , ε), the support of φn is contained in [−ε, ε] for all large n.
For future reference we note that because the interval is symmetric, h may be taken to be
even. (Take a = −b in 0.14.2.) ♦
Fourier Analysis on Rd 171
Exercises
6.1 Let f, g ∈ L1 (Rd ). Show that
R R R
(f ∗ g)(x) dx = f (x) dx · g(x) dx.
6.2 Show that there is no function e ∈ L (R ) such that f ∗ e = f for all f ∈ L1 .
1 d
6.3 Let a > 0 and f (x) = 1[−a,a] . Show that f ∗ f (x) = (2a − |x|)1[−2a,2a] .
6.4 Let Ta denote translation by a. Show that Ta (f ∗ g) = (Ta f ) ∗ g = f ∗ (Ta g).
6.5 Let 1 ≤ p < ∞, q conjugate to p, f ∈ Lp and g ∈ Lq . Prove:
(a) kf ∗ gk∞ ≤ kf kp kgkq . (b) f ∗ g is uniformly continuous. (c) lim|x|→∞ f ∗ g(x) = 0 (p > 1).
6.6 Show that if f, g ∈ L1 (Rd ), then supp f ∗ g is contained in the closure K of supp f + supp g. In
particular, the members of Cc (Rd ) ∗ Cc (Rd ) have compact support.
6.7 Let f ∈ L1 (Rd ) and g ∈ Lp (Rd ) (1 ≤ p ≤ ∞). Prove that f ∗ g(x) exists for a.a x and that
kf ∗ gkp ≤ kf k1 kgkp .
−1 −1 −1 p d q d
R p, q, r ∈ [1, ∞] such that p + q = 1 + r , and let f ∈ L (R ), g ∈ L (R ).
6.8 [↑ 6.7, 4.6] Let
Prove that |f (x − y)g(y)| dy < ∞ for a.a. x and that kf ∗ gkr ≤ kf kp kgkq . JEliminate the
special cases (1) p = q/(q − 1), r = ∞, (2) q = 1, r = p, and (3) p = 1, r = q. Then let p, q, r
be finite and write |f (x − y)g(y)| = |f (x − y)|1−p/r |g(y)|1−q/r |f (x − y)|p/r |g(y)|q/r . K
formula.
172 Principles of Analysis
Additional properties of the transform are given in the next proposition. The following
notation will be needed:
αd
xα := xα
1 · · · xd , where x = (x1 , . . . , xd ) and α = (α1 , . . . , αd ) is a multi-index.
1
(a) f[
∗ g = fbgb. (b) ∂d b
α φ(ξ) = (2πi ξ)α φ(ξ).
(c) f[
◦ T = | det T |−1 fb ◦ T ∗−1 . (d) Td
a f (ξ) = e
2πi ξ·a b
f (ξ).
(e) Ta fb = b
h, h(x) := e−2πi a·x f (x). d
(f) Drf = r
−d
D1/r fb.
d Z Z Z
∂φ b
(ξ) = · · · e−2π i ξ·x φx1 (x) dx1 · · · dxd = 2π i ξ1 e−2π i ξ·x φ(x) dx = 2π i ξ1 φ(ξ).
∂x1
(The constant term is absent because φ has compact support.) The analogous result holds
for the remaining variables. The desired formula now follows by induction.
For (c), we apply the change of variable theorem:
Z Z
f[
∗−1
◦ T (ξ) = e−2π ix·ξ f (T x) dx = e−2π i T x·T ξ f (T x) dx = | det T |−1 fb(T ∗−1 ξ).
The next theorem describes one of the most important properties of the Fourier transform,
one that is largely responsible for the utility of the transform. For the proof we need the
following lemma.
Fourier Analysis on Rd 173
Proof. Let F (x) denote the left side of the equation. Consider first the case d = 1. Differen-
tiating and then integrating by parts, we have
Z
F 0 (x) = ib ξ exp (ibξx − aξ 2 ) dξ
Z
ib
2 ∞ b2
= − exp (ibξx − aξ ) −∞ − x exp (ibξx − aξ 2 ) dξ
2a 2a
−b2
= xF (x).
2a
It follows that the derivative of F (x) exp (b2 x2 /4a) is zero and so
Z ∞
r
2 π
F (x) exp (b2 x2 /4a) = F (0) = e−aξ dξ = ,
−∞ a
(2) For the function φ in (1) define φn (x) = nd φ(nx) as in 6.1.2. For n ∈ N and x ∈ Rd ,
define ψn,x (ξ) = exp (2πiξ · x − πn−2 |ξ|2 ). Then ψbn,x (y) = φn (x − y).
JTake a = π/n2 and b = 2π in 6.2.3 to obtain
Z
−2 2 2 2
ψbn,x (y) = e2πi ξ·(x−y)−πn |ξ| dξ = nd e−πn |x−y| = φn (x − y).K
R R
(3) For g, h ∈ L1 (λd ),gbh = h gb.
RR RR
JBy Fubini’s theorem, g(x)h(y)e−2πi x·y dx dy = g(x)h(y)e−2πix·y dy dx.K
174 Principles of Analysis
Z
(4) ψn,x (ξ)fb(ξ) dξ = f ∗ φn (x).
R R
JBy (2) and (3), the left side of (4) is ψbn,x (y)f (y) dy = φn (x − y)f (y) dy, which
is the right side.K
b
To complete the proof of the theorem, let n → ∞ in (4). Since ψn,x (ξ) → exp (2πiξ · x)
and fb ∈ L , the left side tends to fb(x) by the dominated convergence theorem. By 6.1.2 the
1
right side tends in L1 to f , hence a subsequence tends to f (x) a.e. (4.3.3). Thus the two
functions are equal a.e.
Exercises
6.10 The indicator function h = 1(0,∞) is called the Heaviside function. Let f ∈ L1 (R) be differentiable
with f 0 ∈ L1 (R). Find (f 0 h) ∗ h and h ∗ h ∗ · · · ∗ h (n factors).
6.12 Let a > 0 and f (x) = 1[−a,a] . Show that fb(ξ) = (πξ)−1 sin 2πaξ.
6.18 [↑ 6.14] Let g(x) = (1 + x2 )−1 . Use the inversion formula to show that g ∗ g(x) = π/(4 + x2 ).
(b) For each multi-index β, there exists a constant B > 0 and m ∈ N, each depending only
on β, such that
where S denotes the set of d + 1 multi-indices (0, 0 . . . , 0), (0, . . . , 0, 2n, 0, . . . , 0). Multiplying
by |∂ α φ(x)| and taking suprema yields the inequality in (a).
Pd
(b) For the multi-index β, set m := j=1 βj , tj := βj /m, and t = (t1 , . . . , td ). By Ex. 4.28
and the Cauchy-Schwarz inequality,
d
Y Y
d m X
d m X
d m/2 X
d m/2 X
d m/2
|xj |βj = |xj |tj ≤ tj |xj | ≤ t2j |xj |2 ≤ dm/2 |xj |2 .
j=1 j=1 j=1 j=1 j=1 j=1
A function φ that satisfies the equivalent conditions (a) and (b) of the corollary is called
a rapidly decreasing or Schwartz function. For example, xα exp (−|x|2 ) is rapidly
decreasing. The collection of all rapidly decreasing functions is called the Schwartz space
on Rd and is denoted by S = S(Rd ). Clearly, the following inclusions hold:
Moreover, from the sum and product rules for ∂ α it follows that S(Rd ) is an algebra and is
closed under the operations ∂ α and multiplication by xα .
6.3.3 Proposition. Let 1 ≤ p < ∞. Then in the Lp norm, Cc∞ (Rd ) is dense in S(Rd ) and
S(Rd ) is dense in Lp (Rd ).
Proof. Let φ ∈ S(Rd ). Choose n > p/d and C > 0 so that (1 + |x|)n |φ(x)| ≤ C for all x.
Then |φ(x)| ≤ C(1 + |x|)−n , so φ ∈ Lp by 3.6.3. Thus S(Rd ) ⊆ Lp . Since Cc∞ (Rd ) ⊆ S(Rd )
and Cc∞ (Rd ) is dense in Lp (Rd ) (6.1.4), the assertions follow.
The following result will be needed in the proof of the Plancherel theorem below.
6.3.4 Theorem. Sb = S.
176 Principles of Analysis
Proof. Let φ ∈ S. Let α and β be any multi-indices and set c := −2πi. Differentiating under
the integral sign, we have
Z Z
β αb β α c ξ·x
(cξ) ∂ φ(ξ) = (cξ) ∂ξ e φ(x) dx = (cξ)β ec ξ·x (cx)α φ(x) dx
Z
= ∂xβ ec ξ·x (cx)α φ(x) dx.
b
for some function ψ in S. In particular, ξ β ∂ α φ(ξ) is bounded, hence φb ∈ S. Therefore,
Sb ⊆ S.
b
For the reverse inclusion, set φ1 (x) := φ(−x). By the first paragraph, φ1 ∈ S, and by the
inversion theorem
Z Z
c1 (x) = e2π i ξ·x φ(−ξ)
φ b b dξ = φ(x).
dξ = e−2πi ξ·x φ(ξ)
b
Therefore, φ ∈ S.
b 2.
= kφk2
Exercises
6.19 Prove that a C0 -function f on Rd is uniformly continuous. Conclude that a Schwartz function
is uniformly continuous.
6.20 [↑ 6.12, 3.43] Use the Plancherel theorem to show that
Z ∞ Z ∞ 2
sin x sin x π
dx = dx = .
0 x 0 x 2
6.21 Let φ ∈ S and α a multi-index. Show that ∂ α φ b where ψ(x) := (−2πix)α φ(x).
b = ψ,
(e) For sufficiently large n, there exists a constant Dn depending on d such that
|(f ∗ g)(x)| ≤ Dn (1 + |x|2 )−n for all x.
(c) For any ψ ∈ S and multi-indices α, β, there exists a constant C and a finite set F of
multi-indices such that
X
0 0
β αb
β α
ξ ∂ ψ(ξ)
≤ C
x ∂ ψ(x)
.
∞ 1
α0 ,β 0 ∈F
6.25 (Heisenberg uncertainty principle). The principle states that a nonzero function and its Fourier
transform cannot both be sharply localized. The precise analytical statement takes the form
kφk42
Z Z
|x|2 |φ(x)|2 dx · |ξ|2 |φ(ξ)|
b 2
dξ ≥
16π 2
Establish this for φ ∈ S(R) by verifying (a) and (b) and then using φb0 (ξ) = (2πi ξ)φ(ξ).
b
2
R R
(a) |φ(x)| dx = −2Re xφ(x)φ0 (x) dx.
(b) kφk42 ≤ 4 x2 |φ(x)|2 dx |φ0 (x)|2 dx = 4 x2 |φ(x)|2 dx |φb0 (ξ)|2 dξ.
R R R R
178 Principles of Analysis
Convolution of Measures
The convolution of complex measures µ and ν on B(Rd ) is the complex measure
µ ∗ ν defined by
Z Z Z
(µ∗ν)(E) = 1E (x+y) d(µ⊗ν)(x, y) = 1E (x+y) dµ(x) dν(y), E ∈ B(Rd ). (6.5)
Note that if A : Rd ×Rd → Rd is the addition operator A(x, y) := x+y, then µ∗ν = A(µ⊗ν),
the image measure of µ ⊗ ν under A. Thus, by 3.2.15 and Fubini’s theorem, for all suitable h
Z Z ZZ
h(z) d(µ ∗ ν)(z) = h(x + y) d(µ ⊗ ν)(x, y) = h(x + y) dµ(x) dν(y). (6.6)
Proof. Parts (a)–(d) and (g)–(i) are exercises (6.26). For (e) we use (6.6) and 5.18:
Z Z
|µ ∗ ν|(E) = sup f d(µ ∗ ν) ≤ sup 1E (x + y)|f (x + y)| d|µ|(x) d|ν|(y)
|f |≤1 E |f |≤1
Z
≤ 1E (x + y) d|µ|(x) d|ν|(y) = (|µ| ∗ |ν|)(E).
d d
RR
For (f) we have |µ ∗ ν|(R
RR ) ≤ |µ| ∗ |ν|(R ) = RR 1Rd (x + y) d|µ| d|ν| = |µ|(Rd )|ν|(Rd ).
For (j) note first that
R |f (x − y)| d|ν|(y) dx ≤ |f (x)| dx d |ν|(y) = kf k1 |ν|(Rd ) < ∞,
hence the function f (x − y) dν(y) is defined for a.a. x and is integrable. Moreover, this
calculation together with (5.4) shows that kf ∗ νk1 ≤ kf k1 kνk .
From 6.4.1 and 5.2.5 we have
6.4.2 Corollary. The space M (Rd ) of complex measures on B(Rd ) is a commutative Banach
algebra under convolution and the total variation norm.
Fourier Analysis on Rd 179
b is continuous.
(a) µ (b) kb
µk∞ ≤ kµk .
\
(c) aµ b + b νb.
+ bν = a µ [
(d) µ ∗ν =µ
b νb.
[
(e) T b◦T .
(µ) = µ ∗
b(α + ξ) = νb(ξ), where dν(x) := e−2πi(α·x) dµ(x).
(f) µ
Proof. Part (a) follows from the dominated convergence theorem. Parts (b) and (c) are clear.
For (d) we have
Z ZZ
∗ ν(ξ) = e−2πi x·ξ d(µ ∗ ν)(x) =
µ[ e−2πi (x+y)·ξ dµ(x)dν(y) = µ
b(ξ)b
ν (ξ).
For (e),
Z Z Z
∗
Tcµ(ξ) = e−2πi x·ξ dT (µ)(x) = e−2πi T (x)·ξ dµ(x) = e−2πi x·T ξ
b(T ∗ ξ).
dµ(x) = µ
R R
Since the Fourier transform S → S is surjective, φ dµ = φ dν for all φ ∈ S. Let
ε = (ε, . . . , ε) and a, b ∈ Rd with aj < bj for all j. Choose a C ∞R function φε so that
1[a,b] ≤ φε ≤ 1(a−ε,b+ε) (0.14.5). By dominated convergence, limε→0 φε dµ = µ[a, b] and
similarly for ν. Therefore, µ[a, b] = ν[a, b] for all [a, b]. By the uniqueness theorem for
measures (1.6.8), µ = ν.
180 Principles of Analysis
Exercises
6.26 Verify parts (a)–(d) and (g)–(i) of 6.1.1.
6.27 Let ν be a complex measure on Rd , 1 ≤ p ≤ ∞, and f ∈ Lp (Rd ). Show that (f ∗ ν)(x) exists λ
a.e., f ∗ ν ∈ Lp (Rd ) and kf ∗ νkp ≤ kf kp kνk. JLet 1 ≤ p < ∞. Consider first the case f ≥ 0
and ν ≥ 0 and use Minkowski’s inequality for integrals to show that kf ∗ νkp ≤ kf kp kνk. Apply
this to |f | and |ν| in the general case.K
6.34 Let µ, ν ∈ M (Rd ) and µ λ. Show that µ ∗ ν λ and find d(µ ∗ ν)/dλ in terms of dµ/dλ.
Chapter 7
Measures on Locally Compact Spaces
In this chapter we describe a fundamental connection between topology and measure in the
setting of locally compact Hausdorff spaces. Many of the results will be seen as generalizations
of already established links between Borel measures on Rd and the Euclidean topology.
Properties (b) and (c) assert, respectively, that µ is inner regular on open sets and
outer regular on Borel sets. If µ is a Radon measure on X, we shall call the pair (X, µ)
a Radon measure space.
If µ is a finite measure, then conditions (b) and (c) are equivalent to the assertion that
for each Borel set E and each ε > 0 there exist a compact set K and an open set U such
that K ⊆ E ⊆ U and µ(U \ K) < ε. It follows that if η is a measure with η ≤ µ, then η is a
Radon measure.
A Radon measure that satisfies (b) for every Borel set U is said to be regular. For
example, Lebesgue-Stieltjes measures on Rd are regular Radon measures (1.8.1). The following
proposition shows that if µ is σ-finite, then a Radon measure is regular.
7.1.1 Proposition. A Radon measure µ is inner regular on σ-finite sets E, that is,
181
182 Principles of Analysis
Consequences of Regularity
The proof of following result is the same as that for special case X = Rd (4.2.2), since
the proof of the latter uses only the properties (7.1) of λd .
7.1.2 Theorem. Let (X, µ) be a Radon measure space and f ∈ Lp (µ) (1 ≤ p < ∞). Then
for each ε > 0 there exists g ∈ Cc (X) such that kf − gkp < ε.
The following is an important application of the preceding theorem. The proof brings
together several familiar results on convergence of sequences of functions as well as Tietze’s
extension theorem.
7.1.3 Lusin’s Theorem. Let (X, µ) be a Radon measure space and f : X → C Borel
measurable such that µ{f 6= 0} < ∞. Then for each ε > 0 there exists g ∈ Cc (X) such that
g = f except on a set of measure < ε. Moreover, if f is bounded, then g may be chosen so
that kgk∞ ≤ kf k∞ .
Proof. Set E := {f 6= 0}. Suppose first that f is bounded. Then f ∈ L1 (µ), hence by 7.1.2
L1
there exists a sequence of continuous functions fn with compact support such that fn → f .
By 4.3.3, there exists a subsequence (fnk ) that converges to f a.e. By Egoroff’s theorem
(2.4.5), there exists set A ⊆ E with µ(E \ A) < ε/3 such that fnk → f uniformly on A. In
particular, f is continuous on A. By regularity, we may choose a compact set and an open
set U such that K ⊆ A ⊆ E ⊆ U , µ(A \ K) < ε/3, and µ(U \ E) < ε/3, hence µ(U \ K) < ε.
By 0.12.8, there exists a continuous function F on X with compact support contained in U
such that F = f on K. Now define a continuous function φ : C → C by
(
z if |z| ≤ kf k∞ ,
φ(z) =
kf k∞ sgn z if |z| > kf k∞
Taking the supremum over all such partitions yields |µ|(U \K) ≤ 4ε. Therefore, |µ| ∈ Mra (X).
Conversely, let |µ| ∈ Mra (X). The inequality µj (E) ≤ |µ|(E) implies that µj ∈ Mra (X).
By definition, µ ∈ Mra (X).
7.1.6 Theorem. Mra (X) is a Banach space under the total variation norm.
Proof. By 7.1.4, Mra (X) is a linear subspace of M (X), the space of all complex Borel
measures on X. Since the latter is complete (5.2.5), it suffices to show that Mra (X) is closed
in M (X). Let µn ∈ Mra (X) and µ ∈ M (X) such that kµn − µk = |µn − µ|(X) → 0. Let
ε > 0 and choose n so that |µn − µ|(X) < ε. Given E ∈ B(X), choose a compact set K and
an open set U such that K ⊆ E ⊆ U and |µn |(U \ K) < ε (7.1.5). Then
|µ|(U \ K) ≤ |µn − µ|(U \ K) + |µn |(U \ K) < 2ε,
hence µ ∈ Mra (X).
Thus supp(µ) is the smallest closed set on which the measure µ is concentrated. Exercise 7.2
gives various properties of the support.
184 Principles of Analysis
Exercises
7.1 Let µ be a regular Radon measure on a locally compact, Hausdorff space X and let Y ⊆ X be
closed, hence locally compact (0.12.1). Show that the restriction ν of µ to B(Y ) = B(X) ∩ Y is
a Radon measure on Y .
PnLet x ∈ X, c > 0. Show that supp(µ + cδx ) = supp(µ) ∪ {x}. Conclude that the support of
(e)
j=1 cj δxj (cj > 0) is {x1 , . . . , xn }.
7.3 Let µ be a RadonR measure on X and f ∈ L1 (X). Given ε > 0, show that there exists a compact
set K such that X\K |f | dµ < ε.
7.4 The Baire σ-field Ba (X) is the smallest σ-field relative to which each member of Cc (X) is
measurable. Show that Ba (X) is generated by the compact Gδ sets.
7.5 [↑ 1.85] (Intermediate value property of measures). Let µ be a regular Borel measure on X with
the property that µ{x} = 0 for all x ∈ X. Let E be a Borel set and 0 < c < µ(E). Verify the
following assertions to show that there exists a compact subset C of E such that µ(C) = c.
(a) Let A := {C : C is compact C ⊆ E and µ(C) ≥ c}. Then A is nonempty.
(b) Order A by reverse inclusion. If C is a chain in A and B = C, then µ(B) = inf C∈C µ(C).
T
Conclude that A has a minimal element C. JArgue by contradiction, using outer regularity on
B and the finite intersection property for compact sets.K
(c) If µ(C) > c and x ∈ C, then there exists an open set U 3 x such that µ(U ) < µ(C) − c.
Hence there exists a proper closed subset C1 of C such that µ(C1 ) > c.
7.6 Let µ be a σ-finite Radon measure on X and ν a complex measure such that ν µ. Show that
ν is a Radon measure.
7.7 Let X and Y be locally compact Hausdorff spaces and T : X → Y continuous. Let µ be a
regular Borel measure on X. Prove:
(a) T µ is inner regular on Borel subsets of Y . (b) If X and Y are compact, then T µ is regular.
Proof. The basic idea of the proof is to construct an outer measure from I and then use
Carathéodory’s theorem to obtain µ. This accomplished in the following steps, the first of
which establishes uniqueness, for which regularity is crucial. The remaining steps establish
existence.
JThe first two assertions are clear. For the inequality, let f ∈ CU and set K := supp(f ).
Since K ⊆ U , there exists a finite
Psubcover {U1 , . . . , Up } of K and nonnegative
Pn fi ∈ Cc (X)
p
such that supp(fi ) ⊆ Ui and i=1 fi = 1 on K (0.14.1). Then j=1 f · fj = f , and
since f · fj ∈ CUj ,
n
X n
X ∞
X
I(f ) = I(f · fj ) ≤ µ(Uj ) ≤ µ(Uj ).
j=1 j=1 j=1
Taking the supremum over all f ∈ CU and applying (1) yields the desired inequality.K
(3) For an arbitrary E ⊆ X, define µ∗ (E) by
JFor the inclusion, it suffices to show that M(µ∗ ) contains all open sets U , that is,
For this we may assume that µ∗ (E) < ∞. Suppose first that E is open. Then V :=
E ∩ U is open, so given ε > 0 there exists f ∈ CV such that I(f ) > µ(V ) − ε. Also,
W := E \ supp(f ) is open, so there exists g ∈ CW such that I(g) > µ(W ) − ε. Since
f = 0 on V c = E c ∪ U c and g = 0 on W c = E c ∪ supp(f ), f + g ∈ CE . Therefore,
JThe first set of inequalities are an immediate consequence of (7). For the second set,
observe that for any open set U containing Kj−1 , nfj ∈ CU , hence I(nfj ) ≤ µ(U ).
Taking the infimum over U and applying (4) and (5) produces the second inequality.K
R
(9) I(f ) = f dµ for all f ∈ Cc (X).
JLet f ∈ Cc (X). By considering positive and negative parts, we may assume that
f ≥ 0. Furthermore, dividingP by kf k∞ , we may also assume that f ≤ 1. Summing the
inequalities in (8) and using j fj = f , we obtain
n Z n−1 n n−1
1X 1X 1X 1X
µ(Kj ) ≤ f dµ ≤ µ(Kj ) and µ(Kj ) ≤ I(f ) ≤ µ(Kj ),
n j=1 n j=0 n j=1 n j=0
The following result is immediate from step (1) of the preceding proof.
7.2.2 Corollary. Let (X, µ) be a Radon measure space. Then for each open subset U of X,
nZ o
µ(U ) = sup f dµ : 0 ≤ f ≤ 1, supp(f ) ⊆ U .
Exercises
7.8 [↑ 7.1] Let µ be a regular Radon measure on X and Y a closed subset of X. For each f ∈
R Cc (Y ),
define a measurable function fe on X by fe = f on Y and fe = 0 on X \Y . Then I(f ) = X fe dµ
defines a positive linear functional on Cc (Y ). Describe the corresponding Radon measure in
terms of µ and justify your assertion.
7.9 Let I be a positive linear functional on C0 (X) with corresponding Radon measure µ and let
U ⊆ X be open. Then U is a locally compact Hausdorff space. For g ∈ C0 (U ) define ge by
ge = g on U and ge (X \ U ) = 0.
(a) Show that ge ∈ C0 (X).
(b) Show that J(g) = I(ge ) defines a positive linear functional on C0 (U ).
(c) What is the connection between µ and the Radon measure corresponding to J?
7.10 [↑ 7.7] Let X and Y be compact Hausdorff spaces and T : X → Y continuous. Given a positive
linear functional I on Cc (X) define a positive linear functional J on Cc (Y ) by J(f ) = I(f ◦ T ).
Find a connection between the associated Radon measures and justify your assertions.
188 Principles of Analysis
the second equality by 1.2.4. If X and Y are second countable with countable bases (Un )
and (Vn ), respectively, then every open set in X × Y is a countable union of sets of the
form Un × Vm , hence the inclusion in (7.4) is equality and so B(X) ⊗ B(Y ) = B(X × Y ). In
general, however, the inclusion may be strict (see, for example, [20]), in which case µ ⊗ ν is
not a Borel measure on X × Y . In spite of this shortcoming, if µ and ν are Radon measures it
is possible to extend µ ⊗ ν to a Borel measure on X × Y . For this we need a preliminary result
which is of some independent interest. The development is facilitated by the introduction of
some standard notation.
Given functions g on X and h on Y , define the tensor product g ⊗ h of g and h on
X × Y by
(g ⊗ h)(x, y) = g(x)h(y), x ∈ X, y ∈ Y.
If G and H are linear spaces of functions on X and Y , respectively, the tensor product
G ⊗ H of G and H is the linear span of the set of all functions g ⊗ h, g ∈ G and h ∈ H.
7.3.1 Proposition. Cc (X) ⊗ Cc (Y ) is dense in Cc (X × Y ) in the uniform norm.
Proof. Let πX : X × Y → X and πY : X × Y → Y denote the projection mappings. For
f ∈ Cc (X × Y ), the sets KX := πX supp(f ) and KY := πY supp(f ) are compact and
supp(f ) ⊆ KX × KY . Choose open sets UX ⊆ X and UY ⊆ Y with compact closure
such that KX ⊆ UX and KY ⊆ UY and set K := cl UX × cl VY . By the Stone-Weierstrass
theorem, C(cl UX ) ⊗ C(cl UY ) is dense in C(K), hence, given ε > 0, there exists a function
n
X
F := gi ⊗ hi ∈ C(cl UX ) ⊗ C(cl UY )
i=1
7.3.2 Theorem. Let (X, µ) and (Y, ν) be Radon measure spaces. Then Cc (X × Y ) ⊆
L1 (µ ⊗ ν), and for all f ∈ Cc (X × Y )
Z ZZ ZZ
f (x, y) d(µ ⊗ ν)(x, y) = f (x, y) dµ(x) dν(y) = f (x, y) dν(y) dµ(x). (7.5)
Proof. In the notation of the proof of 7.3.1, f = 0 off KX × KY , which has finite measure.
Therefore, the inclusion holds and (7.5) is a consequence of Fubini’s theorem applied to
KX × KY .
Now define a positive linear functional I(f ) on Cc (X × Y ) by the common value in
(7.5). The corresponding measure from the Riesz representation theorem is then defined on
B(X × Y ) and is an extension of µ ⊗ ν. We denote this measure by µ⊗ν. In summary:
7.3.3 Corollary. There exists a unique Radon measure µ⊗ν on B(X × Y ) whose restriction
to B(X) ⊗ B(Y ) is µ ⊗ ν.
s := (i1 , . . . , in ), ij ∈ I, ij 6= ik ,
Moreover, because the µi are probability measures, s ≤ s0 ⇒ Is (g) = Is0 (g). Define a positive
linear functional I on F by
If also g ∈ Cs0 (X), then Is (g) = Is∪s0 (g) = Is0 (g), hence I is well-defined. Since F is dense
in C(X) and |I(g)| ≤ kgk∞ , I has an extension to a positive linear functional on C(X).
Indeed, if (gn ) is a sequence in F and gn → g ∈ C(X), then I(gn ) is a Cauchy sequence
in C hence converges to some I(g) ∈ C, independent of the sequence (gn ), giving the desired
extension. By the Riesz representation
R theorem, there exists a unique Radon probability
measure µ on X such that I(g) = X g dµ, g ∈ C(X), which implies (7.6) for continuous f .
It remains to show that πs (µ) = µi1 ⊗ · · · ⊗ µin . Since these define equal positive linear
functionals on C(Xs ) and since µi1 ⊗ · · · ⊗ µin is a Radon measure on Xs , it suffices by the
uniqueness part of the Riesz representation theorem to show that πs (µ) is a Radon measure
on Xs . But since πs : X → Xs is continuous, this follows directly from Ex. 7.7.
Exercises
7.11 For each n ∈ N, let Xn be a compact Hausdorff space, µn a Radon probability measure on
Xn , πn : X → Xn the projection map, and (X, µ) the product measure space. Show that
the projection mappings πn : X → Xn are independent, that is, if n1 < n2 < · · · < nk and
Bj ∈ B(Xnj ), then
k
Y
µ πnj ∈ Bj , j = 1, . . . , k = µ πnj ∈ Bj .
j=1
v
for all f ∈ C0 (X). For example, if (xn ) is a sequence in Rd and xn → x, then δxn → δx .
Note that since the measures µn and µ may be identified with continuous linear functionals
on C0 (X), vague convergence is simply weak∗ sequential convergence in the dual of C0 (X)
(see §10.2).
Vague convergence does not necessarily imply that (7.7) holds for all f ∈ Cb (X) (Ex. 7.16).
Additional conditions are needed, as described in the next theorem.
7.4.1 Theorem. Let µ, µn ∈ Mra (X) be nonnegative. Then (7.7) holds for all f ∈ Cb (X)
v
iff µn → µ and kµn k → kµk.
Proof. The necessity is obvious. For the sufficiency, we may assume thatR kµk > 0. Choose
0 < ε < kµk. By 7.2.2, there exists a φ ∈ Cc (X) with 0 ≤ φ ≤ 1 such that φ dµ > µ(X) − ε.
Let f ∈ Cb (X). Since µn (X) → µ(X),
Z Z Z
lim f (1 − φ) dµn ≤ kf k∞ lim (1 − φ) dµn = kf k∞ (1 − φ) dµ ≤ ε kf k∞ . (†)
n n
R R
Since f φ dµn → f φ dµ (because f φ ∈ Cc (X)), we see from the expansion
Z Z Z Z Z Z
f dµn − f dµ = f (1 − φ) dµn − f (1 − φ) dµ + f φ dµn − f φ dµ
The following result gives a sufficient condition for vague convergence on B(Rd ) in terms of
Fourier-Stieltjes transforms. It will be needed later in the proof of the central limit theorem.
7.4.2 Theorem. Let µ, µ1 , µ2 , . . . be complex measures on B(Rd ) such that supn kµn k < ∞
v
bn → µ
and µ b pointwise. Then µn → µ.
Proof. We use the Fourier inversion formula: For φ ∈ S(Rd ) and any complex measure ν,
Z ZZ Z
φ(x) dν(x) = b
φ(ξ)e2πξ·x b νb(−ξ) dξ,
dν(x) dξ = φ(ξ)
hence Z Z Z
b
φ(x) dµn (x) − φ(x) dµ(x) ≤ |φ(ξ)| |c
µn (−ξ) − µ
b(−ξ)| dξ.
By hypothesis, the integrand on the right tends pointwise to 0. Since the integrand is
192 Principles of Analysis
R R
b
dominated by the L1 function 2|φ(ξ)|, φ(x) dµn → φ(x) dµ. Now let f ∈ Cc (Rd ) and
choose φ ∈ Cc∞ (Rd ) such that kf − φk∞ < ε. (6.1.4). Then
Z Z Z Z Z Z
f dµn − f dµ ≤ (f − φ) dµn + (f − φ) dµ + φ dµn − φ dµ
Z Z
≤ kf − φk∞ (kµn k + kµk) + φ dµn − φ dµ .
R R R R
Since φ dµn → φ dµ, limn f dµn − f dµ ≤ ε supn kµn k + kµk .
Since an arbitrary f ∈ Cc (R) may be uniformly approximated by functions g ∈ Cc1 (R) (6.1.4)
v
and since supn kµn k < ∞, it follows that µn → µ.
For the converse, let x be a continuity point of F . Fix k ∈ N and δ > 0 and construct
a piecewise linear function f ∈ Cc (R) such that f = 1 on the Rinterval [−k,
R x], and f = 0
on (−∞, −k − δ] ∪ [x + δ, ∞). Given ε > 0, choose N so that f dµn ≤ f dµ + ε for all
n > N . For such n and all k
Z Z
Fn (x) − Fn (−k) = µn (−k, x] ≤ f dµn ≤ ε + f dµ ≤ ε + F (x + δ) − F (−k − δ),
hence
Fn (x) ≤ ε + F (x + δ) − F (−k − δ) + Fn (−k).
Letting k → ∞ we have Fn (x) ≤ ε + F (x + δ) for all n ≥ N and so limn Fn (x) ≤ F (x + δ).
Letting δ → 0 we then have limn Fn (x) ≤ F (x). Similarly, by taking g ∈ Cc (R) such that
g = 1 on [−k + δ, x − δ] and g = 0 on (−∞, k] ∪ [x, ∞) and linear on the remaining intervals,
we see that limn Fn (x) ≥ F (x). Therefore, Fn (x) → F (x).
Exercises
7.15 Let X be locally compact and Hausdorff and µ, µn complex R with supn kµn k < ∞.
R measures
Show that the set V of all functions f ∈ Cb (X) for whichR f dµn →R f dµ is a closed linear
subspace of Cb (X) in the uniform norm. Conclude that if f dµn → f dµ for all f ∈ Cc (X),
v
then µn → µ.
7.16 Show that the condition kµn k → kµk in 7.4.1 cannot be removed.
7.17 Show that the convergence Fn (x) → F (x) in 7.4.3 need not hold at points x where F is
discontinuous.
v
7.18 Consider the space [0, 1] with Lebesgue measure λ. Set fn = n1[0,1/n] . Show that fn · λ → δ0 .
Measures on Locally Compact Spaces 193
v
7.19 Let µn (E) = 2−n ∞
j=−∞ 1E (j/2 ), E ∈ B([a, b]). Show that µn → λ [a,b]
P n
7.20 [↑ 7.7] Let X and Y be compact Hausdorff spaces and T : X → Y continuous. Let µn and µ be
v v
Radon measures on X. Show that if µn → µ, then T (µn ) → T (µ).
7.21 Let (X, F, µ) be a probability space and gn , g real-valued, measurable functions on R such
µ v
that gn → g. Show that gn (µ) → g(µ). JLet f ∈ C0 (R), ε > 0, and set En = {|gn − g)| ≥ δ}
for a suitable δ obtained from the uniform continuity of f .K Show that the converse is false. J
Consider the space [0, 1) with Lebesgue measure λ. Set
An = [0, 1/2n ) ∪ [2/2n , 3/2n ) ∪ [4/2n , 5/2n ) ∪ · · · ∪ [(2n − 2)/2n , (2n − 1)/2n )
v
and rn = 1An . (The functions rn are called Rademacher functions). Show that rn (λ) → r1 (λ)
λ
but rn 6→ f for any f .K
v
7.22 Let µ, µ1 , µ2 , . . . be probability measures on B(R) such that µn → µ. Let F , Fn be as in 7.4.3.
Carry out the following steps to show that if F is continuous, then Fn (x) → F (x) uniformly on
R. Give an example to show that the continuity of F is needed here, that is, in general Fn need
not converge uniformly to F on the set of continuity points of F .
(a) Given ε > 0, choose a < b so that F (a) < ε and 1 − F (b) < ε. Then there exists a partition
P = {x0 = a < x1 < · · · < xk = b} such that |F (xi ) − F (xi−1 )| < ε for all i.
(b) There exists N such that |Fn (xi ) − F (xi )| < ε for all n ≥ N and all i. Fix such an n.
(c) If x ≤ a, then 0 ≤ F (x) < ε and 0 ≤ Fn (x) < 2ε.
(d) If x ≥ b, then 0 ≤ 1 − F (x) < ε and 0 ≤ 1 − Fn (x) < 2ε.
(e) If x ∈ [xi−1 , xi ], then F (xi−1 ) ≤ F (x) < F (xi−1 )+ε and F (xi−1 )−ε < Fn (x) ≤ F (xi−1 )+2ε.
(f) Conclude that |Fn (x) − F (x)| < 4ε for all x.
• I is positive: f ≥ 0 ⇒ I(f ) ≥ 0.
• I is continuous from above: fn ↓ 0 ⇒ I(fn ) → 0.
Note that I must then have the additional properties
Indeed, (x, t) in the left side iff f1 (x) ≤ t < g1 (x) and either t < f2 (x) or t ≥ g2 (x), that
is, iff (a) f1 (x) ≤ t < f1 (x) ∨ g1 (x) ∧ f2 (x) or (b) g1 (x) ∧ g2 (x) ∧ f1 (x) ≤ t < g1 (x).
Moreover, since (a) and (b) cannot occur simultaneously, the union is disjoint. Therefore
H is a semiring.K
(3) Define a set function ν on H by ν(f, g] = I(g − f ). Then ν is a measure on H and
hence, by 1.6.4, has an extension to σ(H).
S∞
JFor countableSadditivity, let (f, g] = n=1 (fn , gn ] (disjoint). Then for each x ∈ X,
∞
(f (x), g(x)] = Pn=1 (fn (x), gn (x)] (disjoint). Applying Lebesgue measure λ, we have
g(x) − f (x) = n gn (x) − fn P (x) . Since the partial sums of the
Pseries increase mono-
tonically to g − f , I(g − f ) = n I(gn − fn ), that is, ν(f, g] = n ν(fn , gn ].K
Measures on Locally Compact Spaces 195
(4) Let f ∈ L with f ≥ 0 and c > 0. Then there exists a sequence of nonnegative functions
fn in L such that fn ↑ 1{f >1} , hence (0, cfn ] ↑ (0, c1{f >1} ] = {f > 1} × (0, c] (by (1)).
JDefine fn = n(f − f ∧ 1) ∧ 1. If f (x) ≤ 1, then fn (x) = 0 for all n. If f (x) > 1, then
eventually fn (x) = 1. Therefore, fn ↑ 1{f >1} K
(5) σ(H) contains all sets of the form {a < f ≤ b} × (0, c], f ∈ L, 0 < a < b, c > 0.
JSince a ≥ 0, the sets are unchanged when f is replaced by f + , so we may assume that
f ≥ 0. By (4), {f > 1} × (0, c] ∈ H. Since
{a < f ≤ b} × (0, c] = {1 < a−1 f } × (0, c] \ {1 < b−1 f } × (0, c] ,
By a minor modification of the proof of 2.3.1 (necessitated by the use of left open
rather than right open intervals in the definition of Ci ), there exists a Rsequence (hn )
of such simple functions
R such that hn ↑ f . Taking limits in ν(0, hn ] = hn dµ yields
I(f ) = ν(0, f ] = f dµ.K
Part II
Functional Analysis
Chapter 8
Banach Spaces
Several examples of Banach spaces have played important roles in Part I of the text, notably
Lp spaces and various spaces of continuous functions. In this chapter we develop the basic
properties of general normed spaces. Additional properties are considered in Chapters 10
and 14.
(a) kxk ≥ 0, (b) x 6= 0 ⇒ kxk 6= 0, (c) kcxk = |c| kxk , (d) kx + yk ≤ kxk + kyk .
A seminorm has the same properties with the possible exception of (b). We also recall the
following variations of the triangle inequality:
X
X
n
n
x
≤ kxj k and kxk − kyk ≤ kx − yk . (8.1)
j
j=1 j=1
For ease of reference, we list below the main examples of normed spaces discussed in the
first part of the text together with some new ones. All are Banach spaces except (d) and (j).
The sequence spaces (h) – (k) are special cases of the function spaces (a) – (e). We remind
the reader that k·kp is in general only a seminorm unless one adopts the convention (which
we do) of identifying functions that are equal a.e.
8.1.1 Examples.
(a) Lp (X, F, µ) = {f : X → K : f is F-measurable and kf kp < ∞}, where
Z 1/p
kf kp = |f |p dµ (1 ≤ p < ∞), kf k∞ = sup{t : µ{|f | > t} > 0}.
(b) B(X) = the space of all bounded functions f : X → C with norm kf k∞ = sup |f (X)|,
where X is a nonempty set.
(c) Cb (X) = the space of all bounded continuous functions f : X → C with norm k·k∞ ,
where X is a topological space.
(d) Cc (X) = {f ∈ Cb (X) : supp(f ) is compact} with norm k·k∞ , where X is a locally
compact Hausdorff topological space.
(e) C0 (X) = closure of Cc (X) in Cb (X), X a locally compact Hausdorff space.
(f) M (X) = space of complex measures on a measurable space (X, F) with the total variation
norm kµk = |µ|(X)
199
200 Principles of Analysis
(g) Mra (X) = space of complex Radon measures on B(X) with the total variation norm
kµk = |µ|(X), where X is a locally compact Hausdorff space.
(h) `p = `p (N) := x = (xn ) : kxkp < ∞ , where
X
∞ 1/p
p
kxkp = |xn | (1 ≤ p < ∞), kxk∞ = sup |xn |.
n
n=1
(i) `p (Z) = the space of all bilateral sequences x = (. . . , x−1 , x0 , x1 , . . .) such that
kxkp < ∞, where
X
∞ 1/p
kxkp := |xn |p (1 ≤ p < ∞), kxk∞ := sup |xn |.
n=−∞ n∈Z
(j) c00 := x = (xn ) : xn = 0 for all but finitely many n , kxk∞ := supn |xn |.
(k) c0 := x = (xn ) : limn xn = 0 , kxk∞ := supn |xn |.
(l) c := x = (xn ) : limn xn exists , kxk∞ := supn |xn |. ♦
The open ball, closed ball, and sphere of radius r and center x in a normed space X take
the forms
In case of ambiguity, we include the norm symbol in the notation, as in Br (x, k·k). We also
use the simplified notation
The ball B1 is called the open unit ball and C1 is called the closed unit ball. The
following relations are occasionally useful (Ex. 8.5):
The reader may check that Cr (x) is the closure of Br (x) and Br (x) is the interior of Cr (x)
(Ex. 8.3), properties not shared by general metric spaces (consider a discrete space). The
balls Br (x) and Cr (x) are easily seen to be convex; Br and Cr have the additional property
of being balanced (see §0.2).
Banach Spaces 201
Separable Spaces
A normed linear space is separable if it is separable in the metric topology. Such spaces
are important in contexts where a metric is needed for the weak or weak∗ topologies discussed
in Chapter 10.
8.1.2 Examples.
p d
(a) For 1 ≤ Ppn < ∞, the space L (R ) is separable. For example, the collection of all step
functions i=1 ai 1Ij , where aj ∈ Q and Ij is a bounded open interval whose coordinate
intervals have rational endpoints, is dense in Lp .
(b) The space L∞ (Rd ) is not separable. To see this for the case d = 1, let ft := 1(−∞,t) and
note that the balls B1/2 (ft ) are disjoint. Since there are uncountably many of these, L∞ (R)
cannot contain a countable dense set.
(c) The space C[a, b] is separable under the uniform norm. Indeed, by the Weierstrass
approximation theorem, the set of polynomials on [a, b] with rational coefficients is dense in
C[a, b]. A similar argument shows that C(X) is separable for any compact subset X of Rd .
(d) The space Cb (R) of bounded continuous functions on R is not separable in the uniform
norm. The basic idea is a variation of the argument for L∞ : For each doubly infinite sequence
s = (. . . , s−1 , s0 , s1 , . . . , ), where sn = 0 or 1, define fs ∈ Cb (R) such that fs (n) = sn and fs
is linear for n ≤ x ≤ n + 1 (n ∈ Z). Then kfs − ft k = 1 (s = 6 t), hence the balls B1/2 (fs ) are
disjoint. Since the set of all such sequences is uncountable, Cb (R) cannot contain a countable
dense set. ♦
(e) The disk algebra A(D) is the algebra of continuous functions on the closed unit disk
cl D that are analytic on D. We show that A(D) is separable in the uniform norm by showing
that the set of all polynomials P (z) is dense in A(D). To this end, let 0 < r < 1 and note
P∞ if f ∈k A(D), then fr (z) := f (rz) is analytic on the disk r D ⊇ cl D. The Taylor series
−1
that
k=0 ck z for fr therefore converges uniformly to fr on cl D. Given ε > 0, choose a partial
sum Pn of the series such that |fr (z) − Pn (z)| < ε for all z ∈ D. Letting r → 1, we obtain
|f (z) − Pn (z)| ≤ ε on D.
Equivalent Norms
Two norms k·k and ||| · ||| on a vector space X are said to be equivalent if the associated
metrics are equivalent, that is, if there exist positive real numbers a and b such that
kxk ≤ a ||| x ||| and ||| x ||| ≤ b kxk for all x ∈ X. (8.3)
||| x |||p := k~
xkp , where x := x1 v1 + · · · + xd vd and x
~ := (x1 , . . . , xd ). (8.4)
These norms are easily seen to be equivalent. A somewhat surprising result is the following:
8.1.4 Theorem. All norms on a finite dimensional vector space X are equivalent.
Proof. Let k·k be an arbitrary norm on X. It suffices to show that k·k is equivalent to the
complete norm ||| · |||2 defined in (8.4).
One inequality in (8.3) is easy: In the notation of (8.4), we have, by the triangle and CBS
inequalities,
d
X X
d 1/2 X
d 1/2
2
kxk ≤ kvk k |xk | ≤ kvk k |xk |2 = a ||| x |||2 ,
k=1 k=1 k=1
Pd 2 1/2
where a := k=1 kvk k . For the other inequality, define a function F : Kd → R+ by
x) = kxk. Then
F (~
|F (~
x) − F (~y )| ≤ kx − yk ≤ a k~x − ~y k2 ,
hence F is continuous. Moreover, if x~ =6 0 then, by linear independence, x 6= 0. Thus
F is positive on the compact Euclidean sphere {~ x : k~xk2 = 1} and so has
a positive
minimum m there. For any x 6 0 we then have kx/ k~
~ = xk2 k = F x~ / k~
xk2 ≥ m, hence
kxk ≥ m k~xk2 = m||| x |||2 .
Theorem 8.1.4 shows that in a finite dimensional normed space X one may always choose
an equivalent norm relative to which X isometrically isomorphic to a Euclidean space Kd .
This implies that the metric properties of Kd carry over to X. In particular,
8.1.5 Corollary. A finite dimensional normed space is complete, its subspaces are closed,
and its bounded sets are relatively compact.
Interestingly, the last assertion of the corollary actually characterizes finite dimensional
spaces: a normed space with a compact ball is finite dimensional. The proof depends on
the following result, which guarantees the existence of vectors in a normed space that are
“nearly orthogonal” to a given closed subspace.
Banach Spaces 203
8.1.6 Theorem (F. Riesz). Let Y be a proper closed subspace of a normed space X. Then
for each ε ∈ (0, 1) there exists xε ∈ X such that
Proof. Choose any x ∈ X \ Y and set d := inf{kx − yk : y ∈ Y}. Since Y is closed, d > 0.
x
xε
d
1 <
1−ε
1−ε
Y
0 y0
hence
1 d
kxε − yk = kx − zk ≥ ≥ 1 − ε.
kx − y0 k kx − y0 k
8.1.7 Theorem. Let X be a normed space with S1 = {x ∈ X : kxk = 1} compact. Then
X is finite dimensional.
Proof. Assume that X is infinite dimensional. Choose x1 ∈ X with kx1 k = 1. Since the
span of x1 is a proper closed subspace of X, by 8.1.6 there exists a vector x2 with kx2 k = 1
such that kx2 − yk ≥ 1/2 for all y ∈ span{x1 }. Proceeding by induction, we obtain an
infinite sequence (xn ) in S1 such that
In particular, kxm − xn k ≥ 1/2 for all m 6= n. On the other hand, the compactness of S1
implies that (xn ) has a convergent subsequence. As these assertions are incompatible, X
must be finite dimensional.
∗
Strictly Convex Spaces
A normed space is strictly convex if it satisfies the equivalent conditions in the following
proposition, these conditions asserting in various ways that a sphere does not contain line
segments.
8.1.8 Proposition. Let X be a normed space. The following statements are equivalent:
(b) x 6= y and kxk = kyk = 1 ⇒ k(1 − t)x + tyk < 1 for all 0 < t < 1.
Proof. (a) ⇒ (b): Let x 6= y and kxk = kyk = 1. By hypothesis, the inequality in (b)
holds for t = 1/2. Now let 0 < t < 1/2. Then 0 < 2t < 1 and
hence
ktx + (1 − t)yk ≤ tkx + yk + (1 − 2t) kyk < 2t + (1 − 2t) = 1.
Thus the inequality in (b) holds for 0 < t < 1/2. Similarly, if 1/2 < t < 1, then 0 < 2t − 1 < 1
and
tx + (1 − t)y = (1 − t)(x + y) + (2t − 1)x,
hence
kxk kyk
+ = 1,
kx + yk kx + yk
−1 −1 −1
which forces kxk x = kyk y; otherwise, by (b) with t := kyk kx + yk ,
x+y
1=
=
(1 − t) x + t y
< 1.
kx + yk
kxk kyk
kx − y0 k = inf{kx − yk : y ∈ C}.
The relevance of this notion here is that if X is strictly convex then best approximations, if
they exist, are unique. To see this, let α denote the infimum and suppose that kx − z0 k = α
for some point z0 ∈ C distinct from y0 . Then x − z0 6= x − y0 , and since kx − z0 k =
kx − y0 k = α, we have, by strict convexity,
x − 1 (z0 + y0 )
= 1 k(x − z0 ) + (x − y0 )k < α.
2 2
Note that, as a special case, a nonempty convex subset of a strictly convex space X
cannot have more than one member with smallest norm.
While strict convexity guarantees uniqueness of best approximations, it does not guarantee
existence. For this additional conditions must be placed on X. One such condition is uniform
convexity, discussed in §10.4. For now, we offer the following more modest result, the proof
of which is left to the reader as an exercise (8.25).
8.1.10 Proposition. Let X be a normed space and Y a finite dimensional subspace of X.
Then for each x ∈ X there exists a best approximation to x out of Y.
For example, if 1 < p < ∞ and f ∈ Lp [0, 1], then there exists a unique polynomial on
[0, 1] of degree ≤ n that best approximates f in Lp norm out of all polynomials of degree
≤ n.
Exercises
8.1 Prove that the operations of addition and scalar multiplication in a normed linear space are
continuous.
8.2 Let X be a normed space and x 6= 0 ∈ X. Show that if (cn ) is a sequence in K such that
cn x → y ∈ X, then c := limn cn exists in K and cx = y.
8.3 Show that in a normed linear space, Cr (x) = cl Br (x) and Br (x) = int Cr (x).
8.4 Let C be a nonempty, closed subset C of a normed space X with the property x, y ∈ C ⇒
1
2
(x + y) ∈ C. Show that C is convex. JConsider the dyadic rationals.K
8.6 Let Y be a dense subspace of a normed linear space X. Show that the open unit ball B1 ∩ Y of
Y is dense in open unit ball B1 of X.
8.8 Show that the norms k·k1 and k·k∞ on C[0, 1] are not equivalent.
8.10 Show that an infinite dimensional Banach space X has a nonclosed linear subspace. JUse the
Baire category theorem.K
8.11 Show that the linear space D[a, b] of differentiable functions on [a, b] is not complete in either
the uniform norm or the L1 norm.
8.12 Show that the space c of all convergent sequences in C is a Banach space under the sup norm.
8.13 Let 0 < α < 1. A function f ∈ Cb (R) is Hölder continuous of order α if
|f (x) − f (y)|
kf k0,α := sup < ∞.
x6=y |x − y|α
Show that the set C0,α (R) of all such functions is a Banach space under the norm ||| f ||| :=
kf k0,α + kf k∞ .
8.14 (a) Prove that kf kbv := |f (a)| + V[a,b] (f ) defines a norm on the space BV [a, b] of functions of
bounded variation (see §5.5) and that BV [a, b] is a Banach space under this norm.
(b) Show that the space AC[a, b] of absolutely continuous functions on [a, b] is a closed subspace
of the Banach space BV [a, b].
(c) Show that the norms k·kbv and k·k∞ on AC[a, b] are not equivalent.
206 Principles of Analysis
8.15 Show that the spaces Cc (Rd ) and C0 (Rd ) are separable.
8.16 Prove that if X is a separable Banach space and Y is a closed subspace, then Y is separable.
8.17 Let A ⊆ R be Lebesgue measurable with λ(A) > 0. Show that Lp (A) is infinite dimensional,
1 ≤ p ≤ ∞.
8.18 Show that the Lp and Lq norms on C[0,1] are not equivalent if 1 ≤ p < q ≤ ∞.
8.19 Let X and Y be Hausdorff topological spaces and let Z be a dense subset of X. Let Cb (X)
and Cb (Y ) have the supremum norms. Suppose that f : X × Y → C is bounded such that
f ( · , y) ∈ Cb (X) for all y ∈ Y , f (z, · ) ∈ Cb (Y ) for all z ∈ Z, and the set of mappings
f (Z, · ) := {f (z, · ) : z ∈ Z} is relatively compact in Cb (Y ). Prove the following:
(a) The collection of mappings f (X, · ) = {f (x, · ) : x ∈ X} is relatively compact in Cb (Y ).
JLet x ∈ X and zα ∈ Z with zα → x. Then f (zβ , · ) → g ∈ Cb (Y ).K
(b) f ∈ Cb (X × Y ). JLet (xα , yα ) → (x0 , y0 ) in X × Y and use (a).K
(c) If Y is compact, then x → f (x, · ) maps X continuously into Cb (Y ). JArgue by contradiction,
using (b).K
8.20 Show that each of the conditions below is equivalent to strict convexity:
(i) x 6= y and kxk = kyk ⇒ kx + yk < kxk + kyk.
(ii) x 6= y and kxk , kyk ≤ 1 ⇒ kx + yk < 2.
(iii) x 6= y and kxk = kyk = 1 ⇒ k(1 − s)x + syk < 1 for some 0 < s < 1.
8.21 Prove the converse of 8.1.9: Let a normed space X have the property that for each closed convex
subset C, every x ∈ X has at most one best approximation out of C. Then X is strictly convex.
8.22 Show that for a locally compact Hausdorff space containing at least two points, C0 (X) is not
strictly convex.
8.23 Show that L1 and L∞ are not strictly convex except in trivial cases.
8.24 Let X be a normed space and C a nonempty subset of X. Show that the set of best approxima-
tions to x out of C is convex.
8.25 Prove 8.1.10.
For the reverse inequality, recall that kφk∞ = inf{t > 0 : |φ| ≤ t a.e.} (4.1.6). Thus for
0 < r < 1, the set on which |φ| ≥ r kφk∞ has positive measure. Since X is σ-finite, the
inequality holds on some set E of positive finite
R measure. Since f := µ(E)−1/p 1E has Lp
p p −1
norm equal to one, kMφ k ≥ kf φkp = µ(E) E
|φ| du ≥ (r kφk∞ )p . Letting r → 1 shows
p
Bilinear Transformations
Let X, Y, and Z be normed spaces over K. A mapping B : X × Y → Z is said to be
bilinear if B(x, y) is linear in x for each fixed y and linear in y for each fixed x. B is
said to be bounded if for some M > 0
kB(x, y)k ≤ M kxk kyk for all x ∈ X and y ∈ Y. (8.10)
The set BI(X × Y, Z) of all bounded bilinear mappings is easily seen to be a vector
space under pointwise addition scalar multiplication. Defining kBk to be the infimum of the
constants M in (8.10), we have
8.2.5 Theorem. BI(X × Y, Z) is a normed space and
kBk = sup{kB(x, y)k : kxk ≤ 1, kyk ≤ 1}. (8.11)
Moreover, if Z is complete, then BI(X × Y, Z) is complete.
Proof. The proof is similar to that of (8.2.2). For example, to verify (8.11) let s denote
the supremum and let M be as in (8.10). If kxk , kyk ≤ 1, then
kB(x, y)k ≤ M , hence
s ≤ M . Taking the infimum of the M 0 s yields s ≤ kBk. Since
B(kxk−1 x, kyk−1 y)
≤ s
(x, y 6= 0), we have kB(x, y)k ≤ s kxk kyk and so kBk ≤ s by (8.10).
A bilinear transformation B : X × X → K is called a bilinear form on X. For example,
for f, g ∈ X 0 , the mapping f ⊗ g : X × X → K defined by (f R⊗ g)(x, y) = f (x)g(y) is a
bilinear form with kf ⊗ gk = kf k kgk. The mapping (f, g) → f g dµ is a bilinear form on
L2 (µ) with norm ≤ kf k2 kgk2 .
Banach Spaces 209
Exercises
8.26 Show that every infinite dimensional normed linear space has an unbounded operator. JExtend
a linearly independent sequence (xn ) to a basis and start by defining T xn . K
8.27 Show that kT k = sup{kT xk : kxk = 1}.
8.28 Let X and Y be normed spaces and T : X → Y linear. Show that if T (Sr ) is bounded for some
r > 0, then T ∈ B(X, Y).
8.29 Prove (8.9).
8.31 Let X and Y be normed spaces with X finite dimensional. Show that every linear transformation
T : X → Y is continuous.
8.32 [↑ 3.2.16] (Translation operator ). For f : Rd → C, define (Tx f )(y) = f (x + y). Show that Tx
is an isometric isomorphism on both Cb (Rd ) and Lp (λd ) (1 ≤ p ≤ ∞). If we consider Tx on
Cb [0, ∞) and take x > 0, then kTx k = 1 but Tx is not an isometry.
8.33 [↑ 3.2.16] (Dilation operator ). For f : Rd :→ R, define (Dr f )(x) = f (rx), r 6= 0. Show that Dr
is an isometric isomorphism on Cb (Rd ) and |r|d/p Dr is an isometric isomorphism on Lp (λd ).
8.34 (Left and right shift operators). Define T` and Tr on sequences x = (xn ) by
T` (x1 , x2 , . . .) = (x2 , x3 , . . .) and Tr (x1 , x2 , . . .) = (0, x1 , x2 , . . .).
Clearly T` X ⊆ X and Tr X ⊆ X for the spaces `p , c0 , c. Show that in each case the operators
have norm one and that Tr is an isometry but T` is not. Show also that T` Tr = I 6= Tr T` .
8.35 (Evaluation functional ). For x ∈ [0, 1] define the linear functional x
b(f ) = f (x), f ∈ C[0, 1].
b is continuous on C[0, 1] in the uniform norm but not in the L1 norm.
Show that x
8.36 Let P[0, 1] denote the space of all polynomials on [0, 1]. Show that the derivative operator
Df = f 0 on P[0, 1] is unbounded in both the uniform norm and the L1 norm.
8.37 Let X and Y be normed linear spaces such that B(X, Y) is complete. Show that if X 0 6= {0},
then Y is complete.
8.38 Let X and Z be Banach spaces, Y a dense subspace of X, and T ∈ B(Y, Z). Show that T
extends uniquely to a member of B(X, Z) with the same norm.
8.39 Fix a ∈ `∞ and define T : `∞ → `∞ by T (x) = (a1 x1 , a2 x2 , . . .). (a) Find kT k. (b) Show that
ran(T ) need not be closed. (c) If T is 1-1, show that T −1 may not be bounded on ran T .
8.40 For x ∈ c define L(x) = limn xn . Show that L ∈ c0 and kLk = 1.
8.41 Let µ and ν be σ-finite measures on a measurable space (X, F) such that µ ν and set
ϕ = dµ/dν. Show that the mapping T f = ϕ1/p f is a linear isometry from Lp (µ) to Lp (ν). Show
that T is surjective iff ν µ.
8.42 Let (X, F, R and k : X ×X → C measurable such that the functions
R µ) be a σ-finite measure space
F (x) := |k(x, y)| dµ(y) and G(y) := |k(x, y)| dµ(x) are in L∞ . Let 1 < p < ∞ and let q be
conjugate to p. Show that the integral operator
Z
Kf (x) := k(x, y)f (y) dµ(y), f ∈ Lp (µ),
8.43 Let X be strictly convex and P ∈ B(X) such that P 2 = P and kP k ≤ 1. Suppose that for
each x ∈ X there exists Tx ∈ B(X) with kTx k ≤ 1 such that Tx P x = x. Show that P is the
identity operator.
210 Principles of Analysis
The Dual of c0 is `1
For x = (x1 , x2 , . . .) ∈ `1 , define a linear map fx by
∞
X
fx (y) := xn yn , y := (y1 , y2 , . . .) ∈ c0 .
n=1
We show that the mapping x → fx is an isometric isomorphism of `1 onto c00 with inverse
f → xf , where
xf := f (e1 ), f (e2 ), . . . .
Clearly, fx ∈ c00 with kfx k ≤ kxk1 . Now let f ∈ c00 be arbitrary and set yj := sgn f (ej ).
Then
Xn
y(n) := yj ej = (y1 , . . . , yn , 0, 0, . . .) ∈ c0 and ky(n) k∞ ≤ 1,
j=1
P P
n n
hence kf k ≥ |hy(n) , f i| = j=1 yj f (ej ) = j=1 |f (ej )|, which shows that kf k ≥ kxf k1 .
Moreover, if z = (z1 , z2 , . . .) ∈ c0 and z(n) := (z1 , . . . , zn , 0, . . .), then
z(n) − z
∞ → 0
Pn
and so f (z) = limn f (z(n) ) = limn k=1 zk f (ek ) = fxf (z). Therefore, f = fxf .
The Dual of c is `1
For x = (x1 , x2 , . . .) ∈ `1 define a linear map fx by
∞
X
fx (y) := x1 lim yn + xn+1 yn , y := (y1 , y2 , . . .) ∈ c
n
n=1
Clearly, x → fP x is linear. Moreover, since | limn yn | ≤ kyk∞ , kfx k ≤ kxk1 . Now let
∞
f ∈ c0 . As above, j=1 |f (ej )| < ∞, hence xf ∈ `1 . Set
n
X n
X
n
(n)
dn = (0, . . . , 0, 1, 1, . . .) = e − ej and y = sgn(f (dn ))dn + sgn f (ej ) ej .
j=1 j=1
Banach Spaces 211
P∞
Then y(n) ∈ c and ky(n) k ≤ 1, and since f (dn ) → f (e) − n=1 f (en ) we have
n
X
kf k ≥ |hy(n) , f i| = |f (dn )| + |f (ej )| → kxf k1 .
j=1
(n) if
z = (z1 , z2 , . . .) ∈
Finally, c, α := limn zn , and z(n) := (z1 , . . . , zn , α, α, . . .), then
z − z
→ 0, hence
∞
n
X
f (z) = lim f (z(n) ) = lim αf (dn ) + zk f (ek ) = fxf (z).
n n
k=1
The Dual of Lp is Lq
Let (X, F, µ) be a σ-finite measure space, 1 ≤ p < ∞, and let q be conjugate to p. For
g ∈ Lq define ϕg on Lp by Z
ϕg (f ) = f g dµ, f ∈ Lp .
Since Lpn ⊆ Lpn+1 , gn+1 = gn a.e. on Xn , hence we may define a measurable function g on
X such that g = gn on Xn . Since |gn | ≤ |gn+1 | on Xn+1 , kgn kq → kgkq by the monotone
convergence theorem, hence kgkq ≤ kϕk. Furthermore, if f ∈ Lp (µ), then kf 1Xn → f kp by
the dominated convergence theorem and so
Z Z Z
ϕ(f ) = lim ϕ(f 1Xn ) = lim f gn dµ = lim f 1Xn g dµ = f g dµ = ϕg (f ).
n n n
Thus if the assertion holds for the finite case then it holds for the σ-finite case.
We now establish the existence of g for the case µ(X) < ∞. To this end, define a set
function ν on F by ν(E) = h1E , ϕi. Then ν is countably additive. Indeed, if (En ) is a disjoint
sequence in F with union E, then
Z n
X p Z X X
1E − 1 Ej dµ = 1Ej dµ = µ(Ej ) → 0,
j=1 j>n j>n
Pn Lp
that is, j=1 1Ej → 1E . Countable additivity now follows from the linearity and continuity
of ϕ.
Next, observe that the inequality |ν(E)| ≤ kϕk k1E kp implies that ν µ. Thus, by the
212 Principles of Analysis
R
Radon-Nikodym theorem, there exists a function g ∈ L1 (µ) such that h1E , ϕi = E
g dµ for
all E ∈ F. In particular, for all simple functions f , we have
Z
ϕ(f ) = f g dµ. (a)
We claim that f g ∈ L1 and that (a) holds for all f ∈ Lp (µ). Define fn = 1En f , where
En = {|f | ≤ n}. For each n choose a sequence of simple functions (fn,k )k such that
a.e.
fn,k → fn and |fn,k | ≤ |fn | for all k (2.3.1). By the dominated convergence theorem,
limk kfn,k − fn kp → 0, hence from (a)
Z Z
ϕ(fn ) = lim ϕ(fn,k ) = lim fn,k g = fn g, (b)
k k
the last equality by the dominated convergence theorem, since |fn,k g| ≤ |ng|. Now set
|g| = eiθ g. Replacing f by eiθ |f | in the above, we have
Z Z
kϕk kf kp ≥ kϕk kfn kp = kϕk keiθ |fn |kp ≥ |ϕ(eiθ |fn |)| = geiθ |fn | dµ = |gfn | dµ.
p R
hence f g ∈ L1 . Moreover, since kf − fn kp = |f |>n |f | dµ → 0 we have ϕ(fn ) → ϕ(f ). Using
the dominated convergence theorem in (b), we see that (a) holds for all f ∈ Lp (µ).
We now show that g ∈ Lq and that kgkq ≤ kϕk, completing the argument. Suppose first
that q < ∞. Define gn = g if |g| ≤ n and gn = 0 otherwise, so that
We show that the mapping µ → ϕµ is an isometry from Mra (X) onto C0 (X)0 . This result is
known as the Riesz representation theorem.
Clearly, ϕµ is linear and |ϕµ (f )| ≤ kµk kf k∞ , hence ϕµ ∈ C0 (X)0 and kϕµ k ≤ kµk. To
Banach Spaces 213
show equality, let µ = eiθ |µ| be the polar decomposition of µ (5.3.6(b)). Since |µ| is a Radon
measure (7.1.5), by Lusin’s theorem (7.1.3) given ε > 0 there exists g ∈ Cc (X) such that
|g| ≤ 1 and g = e−iθ on a set E with |µ|(E c ) < ε/2. Then
Z Z Z
kµk = e−iθ dµ ≤ g dµ + (e−iθ − g) dµ ≤ |ϕµ (g)| + 2|µ|(E c ) ≤ kϕµ k + ε,
Ec
By considering real and imaginary parts of f ∈ Cc (X) we see that (‡) holds for f ∈ Cc (X).
Since Cc (X) is dense in C0 (X), (‡) holds for all f ∈ C0 (X). Therefore, ϕ = ϕµ with
µ = µr + iµi . ♦
Exercises
8.44 Show that the dual of c00 is `1 .
8.45 Let ba(N) denote the linear space of finitely additive, complex set functions µ on N with the
totalPvariation norm kµk. (The latter is defined exactly as in Rthe case of complex measures.) If
g= n j=1 aj 1Ej is a simple function in standard form, then g dµ may be defined as in § 3.1.
Moreover, (a) and (b) of 3.1.1 (linearity) hold since only finite additivity is used in the proof.
Verify the following to show that the dual of `∞ (N) is ba(N).
(a) For µ ∈ ba(N), ϕµ (g) := g dµ is a bounded linear functional on the subspace of `∞
R
214 Principles of Analysis
8.46 Give the space C k [0, 1] of k-times continuously differentiable functions on [0, 1] the norm
k
X
||| f ||| = kf (j) k∞ .
j=0
where µ is a complex Radon measure on [0, 1], a = (a0 , . . . , ak−1 ) ∈ Rk , and f~(a) =
(f (a), f 0 (a), . . . , f (k−1) (a)).
(d) Show that the mapping S : (a, µ) → ϕa,µ is a topological isomorphism from the product
space Kk × Mra [0, 1] onto the dual of C k [0, 1].
Product Spaces
Let X and Y be normed linear spaces over K. The product vector space is the set
X × Y together with the operations
There is no canonical norm for X ×Y; however, the following equivalent norms are frequently
used:
q
2 2
k(x, y)k1 := kxk + kyk , k(x, y)k2 := kxk + kyk , k(x, y)k∞ := max{kxk , kyk}.
Each of these norms induces the product topology on X × Y. More generally, we have the
following result, which may be seen as a direct consequence of 8.1.3. The proof is left as an
exercise (8.47).
8.4.1 Proposition. All norms on X × Y that generate the product topology are equivalent.
The projection maps PX : X × Y → X and PY : X × Y → Y are defined by
These are clearly linear and continuous in the product topology. The straightforward proof
of following proposition is left to the reader.
Banach Spaces 215
Direct Sums
Let Z be a vector space over K with subspaces X and Y. Then Z is said to be the
algebraic direct sum of X and Y if the following conditions hold:
(x, y) 7→ x + y : X × Y → Z. (8.13)
PX (x + y) = x and PY (x + y) = y.
Uniqueness of representation implies that the mappings are well-defined. Moreover, the
mappings are easily seen to be linear. The identities
Topological direct sums are defined as above by requiring that the map x1 + · · · + xn 7→
(x1 , · · · , xn ) be continuous. The proof of the following proposition is a straightforward
modification of that of 8.4.3.
8.4.4 Proposition. Let X be a normed space which is the algebraic direct sum of sub-
spaces X1 , . . . , Xn . Then X is the topological direct sum iff the projection mappings Pj are
continuous.
Quotient Spaces
Recall that if Y is a subspace of a linear space X, then X/Y is the linear space of all
equivalence classes x + Y with the operations
(x1 + Y) + (x2 + Y) = (x1 + x2 ) + Y and c(x + Y) = cx + Y.
Relative to these operations the quotient map
Q : X → X/Y, Qx := x + Y,
is linear with kernel Y. We show in this subsection that if X is a normed space, then the
quotient space has a natural norm, called the quotient norm, with respect to which Q is
continuous and open.
8.4.5 Theorem. Let Y be a closed linear subspace of a normed space X. Then
kQxk = kx + Yk := inf{kx + yk : y ∈ Y} (8.16)
defines a norm on X/Y. Moreover, if X is complete, then so is X/Y.
Proof. Since 0 ∈ Y, kQ(0)k = 0. Let x, x1 , x2 ∈ X and c ∈ K. If c 6= 0, then
kc(x + Y)k = inf{kcx + yk : y ∈ Y} = |c| inf{kx + c−1 yk : y ∈ Y} = |c| kx + Yk.
For the triangle inequality, note that for any y1 , y2 ∈ Y,
k(x1 + Y) + (x2 + Y)k ≤ kx1 + x2 + y1 + y2 k ≤ kx1 + y1 k + kx2 + y2 k.
Taking infima over y1 and y2 yields k(x1 + Y) + (x2 + Y)k ≤ kx1 + Y)k + kx2 + Yk.
For positivity, assume that kQxk = 0. Then there exists a sequence (yn ) in Y such that
kx + yn k → 0. Since Y is closed, x ∈ Y, hence x + Y = Y, that is, Qx = 0. Therefore,
(8.16) defines a norm.
Now assume that X is complete.
P∞ To show that X/Y is complete, we use 0.4.3: Let (xn )
be a sequence in X such that n=1 kxnP+ Yk < ∞. For each n choose yn ∈ Y such that
∞
kxn +Pyn k < kxn + Yk + 1/2n . Then n=1 kxn + yn k < ∞, so the sequence of partial
n
sums j=1 (xj + yj ) converges to some x ∈ X. Since
X
X
n
n
(xj + Y) − (x + Y)
≤
(xj + yj ) − x
,
j=1 j=1
P∞
the series n=1 (xn + Y) converges to x + Y.
Banach Spaces 217
8.4.6 Theorem. Let Y be a closed linear subspace of a normed space X and let X/Y have
the quotient norm. Then the quotient map Q has the following properties:
(a) Q is a bounded linear operator. If Y 6= X, then kQk = 1.
(b) Q is an open mapping.
(c) If Z is a normed space and T : X/Y → Z a linear mapping such that T Q is bounded,
then T is bounded.
Proof. (a) By (8.16), kQxk ≤ kxk, hence kQk ≤ 1. If Y 6= X, then kQxk = 1 for some
x ∈ X and so for each r > 1 there exists yr ∈ Y such that kx + yr k < r. Then
kQk kx + yr k ≥ kQ(x + yr )k = kQxk = 1, hence kQk ≥ 1/ kx + yr k > 1/r. Since r was
arbitrary, kQk ≥ 1.
(b) Note first that Q Br (0) = Br Q(0) (Ex. 8.57). Since an open set is a union of open
balls and since translations of open balls are open balls, it follows that Q is open.
(c) If T Q is continuous and V is open in Z, then U := Q−1 T −1 (V ) is open in X/Y
and so T −1 (V ) = Q(U ) is open in X/Y by (b).
The machinery of quotient spaces allows a simple proof of the following result.
8.4.7 Proposition. Let X be a normed space, Y a closed subspace of X, and F a finite
dimensional subspace of X. Then Y + F is a closed subspace of X.
Proof. Since
Q(F) is a finite dimensional subspace of X/Y, it is closed. Therefore, Y + F =
Q−1 Q(F) is closed.
Exercises
8.47 Prove 8.4.1.
8.49 Prove that c is the topological direct sum Ke ⊕ c0 , where e = (1, 1, . . .).
Q
8.50 Let (Xn ) be a sequence of normed spaces. The product vector space X := n Xn is the
collection of all sequences (x1 , x2 , . . .) (xn ∈ Xn ), with coordinate-wise addition and scalar
multiplication. Show that there exists a norm on X that induces the product topology of X iff
Xn = {0} for all sufficiently large n.
Q
8.51 [↑ 8.50] Let (Xn ) be a sequence of normed spaces and let n Xn have the product vector space
structure. For x = (x1 , x2 , . . .) define kxk∞ := supn kxn k. Show that X := {x : kxk < ∞}
is a normed space under k·k∞ . Show also that X is complete iff each space Xn is complete.
8.52 Let Z be a linear space with Z = X ⊕ Y and let T : Z → Z be linear. Prove that T PX = PX T
iff T X ⊆ X and T Y ⊆ Y.
8.53 Let Z be a normed space that is a topological direct sum of closed subspaces X and Y. Show
that Z/X is topologically isomorphic to Y.
8.54 Let X denote any of the sequence spaces c0 , c, `1 , or `∞ . Show that X = X1 ⊕ X2 , where the
summands are closed subspaces of X isometrically isomorphic to X.
8.55 Let X be a normed space, Y a complete (hence closed) subspace of X. Show that if X/Y is
complete, then X is complete.
218 Principles of Analysis
8.56 Let k·k be a seminorm on a linear space X. The notions of sequential convergence and Cauchy
sequence still make sense in this setting, except that limits, if they exist, may not be unique.
The device for handling this situation is as follows: Let Y = {y ∈ X : kyk = 0}. Show that Y
is a linear subspace of X and that kx + Yk = kxk defines a norm on Z := X/Y. Show also
that if the seminorm has the property that every Cauchy sequence converges in X then Z is
complete.
8.57 Let Y be a closed linear
of a normed space X and Q : X → X/Y the quotient map.
subspace
Show that Q Br (0) = Br Q(0) .
8.58 Let X be a noncompact locally compact Hausdorff space and let X∞ := X ∪ {∞} be the
one-point compactification of X (§ 0.12). Show that C0 (X) is isometrically isomorphic to the
space {f ∈ C(X∞ ) : f (∞) = 0} and that C(X∞ ) is topologically isomorphic to the direct
product C0 (X) × C.
8.59 Let X and Y be normed spaces with dual spaces X 0 and Y 0 , and let X ×Y and X 0 ×Y 0 have the
k·k2 -norms. Given z0 ∈ (X ×Y)0 , define Sz0 ∈ X 0 and T z0 ∈ Y 0 by hx, Sz0 i = h(x, 0), z0 i and
hy, T z0 i = h(0, y), zi. Show that the mapping Rz0 := (Sz0 , T z0 ) is an isometric isomorphism
from (X × Y)0 onto X 0 × Y 0 such that h(x, y), z0 i = hx, Sz0 i + hy, T z0 i.
JFor kz0 k ≤ kRzk use the Cauchy-Schwartz inequality in K2 . For the reverse inequality
let ε > 0, and find x ∈ X and y ∈ Y with norm one such that kSz0 k ≤ |hx, Sz0 i| + ε
and kT z0 k ≤ |hy, T z0 i| + ε. Choose
|a| =0 |b| = 1 such that |hx, Sz0 i| = ahx, Sz0 i and
|hy, T z i| = bhy, T z i and consider (a kSz k x, b kT z k y), z0 . K
0 0 0
or, equivalently,
g(y) + g(ỹ) = g(y + ỹ) ≤ p(y + ỹ) = p(x0 + y − x0 + ỹ) ≤ p(x0 + y) + p(−x0 + ỹ).
Thus (†) holds, which shows that g has the required extension to Z.
Now consider the collection E of all real linear extensions f of g for which f ≤ p on dom(f ).
For two such functions, write f1 f2 if f2 is an extension of f1 , that is, dom(f1 ) ⊆ dom(f2 )
and f1 = f2 on dom(f1 ). Then is a partial order on E such that every chain has an upper
bound. By Zorn’s lemma, there exists a maximal extension f ∈ E. From the first part of the
proof and maximality, dom f = X. Thus f is the desired extension of g.
Proof. The real case follows from 8.5.1, so we may assume that X is a complex linear space.
By the lemma, gr := Re g and gi := Im g are real linear functionals on Y. Since gr ≤ p on
Y, there exists a real linear extension fr of gr such that fr ≤ p on X. Define f as in (8.17).
By the lemma,
hence, since d > 0 (because Y is closed), kgk ≥ 1 and so kgk = 1. An application of 8.5.4
completes the argument.
The second part of the next corollary asserts that X 0 separates points of X.
8.5.6 Corollary. For any x0 = 6 0 in a normed space X, there exists f ∈ X 0 such that
kf k = 1 and f (x0 ) = kx0 k. In particular, if x1 6= x2 then there exists f ∈ X 0 such that
f (x1 ) 6= f (x2 ),
Proof. For the first part, take Y = {0} in 8.5.5. For the second part take x0 = x1 − x2 .
8.5.7 Corollary. Let X be a normed space. If X 0 is separable, then X is separable.
Proof. Let (fn ) be dense in X 0 . For each n, choose xn ∈ X such that kxn k = 1 and
| hxn , fn i | ≥ kfn k /2, and set Y = cl span {x1 , x2 , . . .}. We claim that Y = X. If not, then
by 8.5.5 we may choose f ∈ X 0 with kf k = 1 and f (Y) = {0}. But then
Proof. Let s denote the supremum. Since | hx, f i | ≤ kxk kf k, s ≤ kxk. By 8.5.6, there
exists f ∈ X 0 such that kf k = 1 and hx, f i = kxk. Therefore, s ≥ kxk.
Banach Spaces 221
b = hx, f i, f ∈ X 0 .
hf, xi
The collection of all evaluation functionals is denoted by X. b Corollary 8.5.8 asserts that
b 00
X ⊆ X and kxk b = kxk. For example, from §8.3 we see that the bidual of c0 may be
identified with `∞ . To find cb0 in this identification, note that for x = (xn ) ∈ c0 , x
b is the
mapping
X∞
b = hx, yi =
hy, xi xn yn , y ∈ `1 = c00 .
n=1
Now recall that in the identification of ` with (`1 )0 , a sequence (xn ) in `∞ is identified
∞
with the linear functional on `1 defined precisely by the above equation. Thus we see that
cb0 may be identified with the subspace c0 of `∞ .
8.5.9 Theorem. Let X be a normed space. Then the mapping x → x b is a linear isometry
b in X 00 is a concrete realization of the completion
into the bidual of X. Thus the closure of X
of X.
∗
Invariant Versions of the Hahn-Banach Theorem
A semigroup of operators on a vector space X is a set S of linear operators S : X → X
that is closed under composition. A subspace Y of X is said to be S-invariant if SY ⊆ Y
for all S ∈ S. A function G on an S-invariant subspace Y is said to be S-invariant if
G(Sy) = G(y) for all y ∈ Y and S ∈ S. The following versions of the Hahn-Banach
theorem, due to Agnew and Morse, address the problem of extending linear functionals that
are invariant under the action of a semigroup of operators.
8.5.10 Theorem. Let X be a real vector space, S a commutative semigroup of operators on
X, and p a Minkowski functional on X such that p(Sx) ≤ p(x) for all x ∈ X and S ∈ S.
Let Y be an S-invariant subspace of X and G an S-invariant, real-valued, linear functional
on Y such that G ≤ p on Y. Then G extends to a real-valued S-invariant linear functional
F on X such that F ≤ p on X.
Proof. We may assume that S contains the identity operator I. Let co S denote the set of
convex combinations of members of S:
X n n
X
co S = tj Sj : Sj ∈ S, tj ≥ 0, and tj = 1 .
j=1 j=1
Define q(x) on X by
q(x) := inf {p(T x) : T ∈ co S} .
By linearity of T , q is a Minkowski functional on X. Since G(y) = G(T y) ≤ p(T y) for all
T ∈ co S and y ∈ Y, G ≤ q on Y. By the Hahn-Banach theorem, G has a linear extension
F on X such that F ≤ q. It remains to show Pthat F is S-invariant.
n−1
Fix S ∈ S and for each n define Tn = n1 j=0 S j where S 0 := I. Then
n−1 n
1 X j X j 1
Tn (I − S) = S − S = (I − S n ),
n j=0 j=1
n
222 Principles of Analysis
Exercises
8.60 Show that the converse of 8.5.7 is false.
b is closed in X 00 iff X is a Banach space.
8.61 Let X be a normed space. Show that X
8.62 Let x1 , . . . , xn be linearly independent vectors in a normed space X. Show that X 0 has at least
n linearly independent vectors.
8.63 Show that if X is strictly convex, then for each f ∈ X 0 with kf k > 0 there is at most one x
such that kxk = 1 and f (x) = kf k.
8.64 (a)
R 1 Show that for each n ∈ N there exists a probability measure µn on [0, 1] such that
k
0
x dµ(x) = k + 1 for all integers 0 ≤ k ≤ n. Can this hold for all k ≥ 0? (b) Show that there
R1
exists a probability measure µ on [0, 1] such that 0 xk dµ(x) = (k + 1)−1 for all integers k ≥ 0.
8.65 (Krein) Let X be a set and G ⊆ F linear spaces of real-valued functions f on X such that for
each f ∈ F there exists g ∈ G with g ≥ f . Let I be a positive linear functional on G, that is,
I(g) ≥ 0 whenever g ∈ G and g ≥ 0, Show that I extends to a positive linear functional on F.
JConsider p(f ) := inf{I(g) : g ∈ G and g ≥ f }.K
8.66 Show that a finite dimensional subspace F of a normed space X is complemented, that is,
there exists a closed subspace Y of X such that X = F ⊕ Y. JLet {x1 , . . . , xd } be a basis for
F and define suitable x0j ∈ X 0 .K
If cn ∈ [0, 1] and c0 = 1, then the problem can be stated in probabilistic terms: When
does there exist a probability measure on [0, 1] with given moments cn ? Note that by the
Stone-Weierstrass theorem, the solution, if one exists, is unique.
Banach Spaces 223
R1
Since the integral 0 g dF defines a continuous linear functional on C[0, 1], the moment
problem may be stated somewhat more abstractly as follows: Given a sequence (cn ), when
does there exist a continuous linear functional F on C[0, 1] such that htn , F i = cn for all n?
This suggests that the problem may be cast in a broader context, where C[0, 1] is replaced by
an arbitrary normed space X and the functions tn are replaced by members of X. Here is the
precise statement of the general moment problem, the resolution of which is a consequence
of the Hahn-Banach theorem.
8.6.1 Theorem. Let X be a normed space, I an arbitrary index set, {xi :∈ I} ⊆ X, and
{ci : i ∈ I} ⊆ K. Then the following statements are equivalent:
(a) There exists x0 ∈ X 0 such that hxi , x0 i = ci for all i ∈ I.
(b) There exists M > 0 such that for all finite subsets I0 ⊆ I and all ti ∈ K,
X
X
t c ≤ M
t
j j
.
x
j j
i∈I0 j∈I0
(b) ⇒ (a):
P Let Y be the linear span of the set {xi : i ∈ I}. A typical member of Y may
be written j∈I0 tj xj , where I0 ⊆ I is finite. Define a mapping x0 on Y by
X X
x0 tj xj = tj cj .
j∈I0 j∈I0
The inequality in (b) implies that x0 is well-defined. Moreover, x0 is linear and (b) shows
that kx0 k ≤ M . Therefore, by 8.5.4, x0 has an extension to a member of X 0 .
Invariant Means
Let S be a nonempty set. A mean on B(S, R) is a linear functional m such that
where Rs and Ls are the right and left translation operators on B(S, R) defined by
8.6.3 Example. Let S be the free group on two generators a and b. Thus S consists of
an identity 1 and all concatenations of the symbols a, b, a−1 , and b−1 , these concatenations
called words. A word may be reduced to a unique expression of the form s11 s22 . . . snn where
the εj are integers and a1 := a, b1 := b. (Any pairs aa−1 etc. are omitted.) Assume B(S, R)
has an invariant mean m. Let B denote the subset of S consistingP of the identity and all
n
reduced words starting with b. Since the sets aj B are disjoint, 1 ≥ j=1 1aj B , hence
n
X
1 = m(1) ≥ m(1aj B ) = nm(1B ) for all n
j=1
Banach Limits
A Banach limit on B (0, ∞), R is linear functional, typically denoted by Limt→∞ f (t),
with the following properties:
(a) Limt→∞ f (t) = limt→∞ f (t) whenever the limit on the right exists.
(b) Limt→∞ f (t + s) = Limt→∞ f (t) for all f ∈ B (0, ∞), R and s ∈ (0, ∞).
(c) limt→∞ f (t) ≤ Limt→∞ f (t) ≤ limt→∞ f (t) for all f ∈ B (0, ∞), R .
8.6.4 Theorem. Banach limits exist.
Proof. Define a Minkowski functional p on B (0, ∞), R by p(f ) = limt→∞ f (t). Let F be
the subspace of all functions f : (0, ∞) → R such that the limit L(f ) := limt→∞ f (t) exists in
R. If Ts is the translation operator Ts f (t) = f (s + t), thenTs F ⊆ F, L(Ts f ) = L(f ) = p(f )
for all f ∈ F, and p(Ts f ) = p(f ) for all f ∈ B (0, ∞), R . An application of 8.5.10 yields
the desired functional.
The reader may easily formulate the analogous notion of Banach limit on `∞ (N), replacing
limt→∞ f (t) by limn xn , etc.
Exercises
8.67 Show that a Banach limit on B(0, ∞) or `∞ (N) is continuous.
8.68 Let aj ∈ R. Find the Banach limit of the sequence x = (a1 , . . . am , a1 , . . . am , . . .).
8.69 Show that there exists a continuous linear functional f → Limt→0 f (t) on B := B (−1, 1), R
8.70 Show that there exists a finitely additive, translation invariant measure µ on P(R) such that
µ(E) = λ(E) for every bounded, Lebesgue measurable set E ⊆ R.
8.7.1 Uniform Boundedness Principle. Let X and Y be Banach spaces and let T be a
subset of B(X, Y) such that supT ∈T kT xk < ∞ for each x ∈ X. Then supT ∈T kT k < ∞.
Proof.S The set Xn := {x ∈ X : kT xk ≤ n ∀ T ∈ T} is closed and, by hypothesis,
X = n Xn . By Baire’s theorem, some Xn contains a closed ball C(x0 , r). Thus
8.7.2 Banach-Steinhaus Theorem. Let X and Y be Banach spaces and let (Tn ) be a
sequence in B(X, Y). Then limn Tn x exists in Y for all x ∈ X iff the following conditions
hold:
(a) supn kTn k < ∞ and
Moreover, if (a) and (b) hold, then the pointwise limit T := limn Tn is a member of B(X, Y)
and kT k ≤ limn kTn k ≤ supn kTn k.
226 Principles of Analysis
Proof. If T x := limn Tn x exists for all x ∈ X, then T is linear and supn kTn xk < ∞ for
all x. Therefore, by the uniform boundedness theorem, supn kTn k < ∞. From kT xk =
limn kTn xk ≤ kxk limn kTn k we have kT k ≤ limn kTn k.
Now assume (a) and (b) hold and set s = supn kTn k. For x ∈ X and ε > 0, choose u ∈ D
such that kx − uk < ε/s. Then
The expression on the right is < 3ε for all sufficiently large m and n, hence (Tn x) is a
Cauchy sequence. Since Y is complete, (Tn x) converges in Y.
the second inclusion following from the first because −Br = Br . Thus for y ∈ Bε we have
y ± y0 ∈ cl T (B1 ), so by convexity y = 12 (y + y0 ) + 12 (y − y0 ) ∈ cl T (B1 ), as required.
Banach Spaces 227
Proof. Since the algebraic isomorphism (x, y) → x + y is continuous, the assertion follows
from the Banach isomorphism theorem.
8.7.8 Corollary. Let Z be a Banach space and P : Z → Z linear such that P 2 = P . If
ran P , ker P are closed, then P is continuous and Z is the topological direct sum of ran P
and ker P .
Proof. As noted earlier, Z is the algebraic direct sum of ran P and ker P . By the preceding
corollary, the sum is topological, hence P is continuous (8.4.3).
8.7.9 Corollary. Let X and Y be Banach spaces, let T ∈ B(X, Y) be surjective, and
let Q : X → X/ ker T be the quotient map. Then there exists a topological isomorphism
S : X/ ker T → Y such that SQ = T .
Q
TXZ TXY
X
FIGURE 8.2: Sard Quotient Theorem
Proof. The necessity is obvious. For the sufficiency use 8.7.9 to obtain a topological isomor-
phism SXY : X/ ker TXY → Y such that SXY Q = TXY , where Q : X → X/ ker TXY is the
quotient map. Since ker Q = ker TXY ⊆ ker TXZ , we may define SXZ ∈ B(X/ ker TXY , Z)
−1 −1 −1
so that SXZ Q = TXZ . Since Q = SXY TXY , we have TXZ = SXZ SXY TXY . Therefore, SXZ SXY
is the desired map TY Z .
8.7.11 Example. Let X be a compact Hausdorff space and Y ⊆ X closed. We show that
C(Y ) is isometrically isomorphic to C(X)/Y, where Y = {g ∈ C(X) : g(Y ) = 0}.
Define a bounded linear map T : C(X) → C(Y ) by T f = f Y . Then ker T = Y, and T
is surjective by Tietze’s extension theorem. Let Q : C(X) → C(X)/Y denote the quotient
map. By 8.7.9, there exists a topological isomorphism S ∈ B(C(X)/ ker T, Y) such that
SQ = T . It remains to show S is an isometry, that is,
inf kf + gk∞ : g ∈ Y =
f Y
∞ , f ∈ C(X).
228 Principles of Analysis
Then U is open and contains the compact set Y , hence there exists continuous function h
such that 0 ≤ h ≤ 1, h = 0 on Y , and h = 1 on U c . Setting g = −f h we have g ∈ Y and
|f (x) + g(x)| = |f (x)| |1 − h(x)| ≤ kf Y k∞ + ε, x ∈ X.
Therefore α ≤ kf + gk∞ ≤ kf Y k∞ + ε. Since ε was arbitrary, α ≤ kf Y k∞ . ♦
In particular, if T is continuous, then GT is closed. The converse holds for linear maps:
8.7.12 Closed Graph Theorem. Let X and Y be Banach spaces and let T : X → Y be
a linear map such that GT is closed in X × Y. Then T is continuous.
Proof. Give X × Y the norm k(x, y)k = max{kxk , kyk}, which generates the product
topology (see §8.4). Since T is linear, GT is a linear subspace of X × Y. Define projection
mappings PX : GT → X and PY : GT → Y by PX (x, T x) = x and PY (x, T x) = T x.
These maps are clearly linear and PX (x, T x) is trivially continuous. Moreover, because T
is closed, PY is continuous. Since PX is a bijection, PX−1 : X → GT is continuous by the
Banach isomorphism theorem. Thus T = PY PX−1 is continuous.
The following corollary is sometimes called the two norm theorem.
8.7.13 Corollary. Let X be a Banach space with respect to norms kxk and ||| x |||. Suppose
there exists a constant c such that ||| x ||| ≤ c kxk for all x. Then the norms are equivalent.
Proof. We show that the identity map I : (X, ||| · |||) → (X, k·k) is continuous. It will follow
that kxk = kIxk ≤ kIk ||| x |||, proving the corollary.
Let (xn , Ixn ) = (xn , xn ) → (x, y) in (X, ||| · |||) × (X, k·k), so ||| xn − x ||| → 0 and
kxn − yk → 0. Since ||| xn − y ||| ≤ c kxn − yk, we also have ||| xn − y ||| → 0. Therefore,
x = y. By the closed graph theorem, I is continuous.
Exercises
8.71 Define linear functionals fn (x) = nj=1 xj on c00 . Show that supn |fn (x)| < ∞ for all x, yet
P
supn kfn k = ∞. Conclude that the completeness of X and Y in 8.7.1 is essential.
8.72 Let X be a normed space and A ⊆ X such that sup{|f (x)| : x ∈ A} < ∞ is bounded for
every f ∈ X 0 . Prove that sup{kxk : x ∈ A} < ∞. Thus weak boundedness implies norm
boundedness.
8.73 Let X, Y be Banach spaces and T : X → Y linear such that T is weakly continuous, that
is, f ◦ T is continuous for each f ∈ Y 0 . Show that T is continuous.
8.74 [↑ 8.2] Let X, Y, Z be Banach spaces and let B : X × Y → Z be bilinear and separately
continuous, that is, continuous in x for each y and continuous in y for each x. Show that B is
bounded.
Banach Spaces 229
8.75 Let X, Y be Banach spaces and T ∈ B(X, Y) injective. Prove: T −1 is continuous on ran(T ) iff
ran(T ) is closed.
8.76 [↓ 10.2.11] Let X, Y be Banach spaces and T ∈ B(X, Y) surjective. Show that there exists
c > 0 such that for each x there exists x1 with T x1 = T x and kx1 k ≤ c kT xk. JUse 8.7.9.K
8.77 Let X and Y be Banach spaces and T : X → Y linear. Suppose T has the property that
xn → 0 and T xn → y ⇒ y = 0. Prove that T is continuous.
8.78 Let (X, F, µ) be a measure space and T : L1 (µ) → L1 (µ) linear with the property that if (fn )
a.e a.e
is a sequence in L1 with fn → 0, then T fn → 0. Show that T is bounded.
8.79 Let X be a Banach space, T ∈ B(X) injective, and S : X → X linear with T S is continuous.
Prove that S is continuous.
8.80 Let C 1 [0, 1] and C[0, 1] have the sup norms. Show that the linear map D : C 1 [0, 1] → C[0, 1],
Df = f 0 , has a closed graph but is unbounded. Thus the completeness hypothesis in the closed
graph theorem is essential.
8.82 Let (X, F, µ) be a measure space and g measurable such that f g ∈ L1 for all f ∈ L1 . Show
that the linear mapping T : f → f g on L1 is continuous and that g ∈ L∞ .
8.83 Let (X, F, µ) be a measure space and E ∈ F. Let Y = {g ∈ L1 : g(E) = 0}. Show that
L1 (X, F, µ)/Y is isometrically isomorphic to L1 (E, F ∩ E, ν), where ν = µF∩E .
*8.8 Applications
Divergent Fourier Series
Let f : R → C be a periodic function with period 2π. The Fourier series of f is the
formal series Z 2π
∞
X 1
f (t) ∼ ck eikt , where ck := e−ikx f (x) dx.
2π 0
k=−∞
The L convergence of Fourier series is discussed in §11.3. Deeper questions center around
2
which is the nth partial sum of the Fourier series for f evaluated at t = 0. We show
that limn kFn k = ∞. It will then follow from the uniform boundedness principle that
supn |Fn (f )| = ∞ for some f ∈ X, as claimed.
230 Principles of Analysis
The nth partial sum of the Fourier series for f may now be written
Xn Z 2π Z
1 1 2π
e−ikx f (x) dx eikt = f (x)Dn (t − x) dx.
2π 0 π 0
k=−n
we see that |hg(z) − g(w), x0 i| ≤ 4M r−2 |z − w|. Therefore, kg(z) − g(w)k ≤ 4M r−2 |z − w|,
verifying the Cauchy property.
Summability
Let A be an infinite matrix with entries amn ∈ C. Then A
Pmaps sequences x = (x1 , x2 , . . .)
∞
onto sequences y = Ax with mth term the series ym := n=1 amn xn (which may or may
not converge). We denote the limit of a sequence x, if it exists, by lim x. The following
theorem characterizes those matrices that preserve limits. It asserts that the summability
property lim Ax = lim x holds iff the `1 (N) norms of the rows of A are uniformly bounded,
the columns are members of c0 , and the row sums tend to one.
8.8.2 Theorem (Silverman-Toeplitz). Ax ∈ c and lim Ax = lim x for all x ∈ c iff the
following conditions are satisfied:
∞
X ∞
X
(a) sup |amn | < ∞, (b) lim amn = 0 ∀ n ∈ N, (c) lim amn = 1.
m m m
n=1 n=1
232 Principles of Analysis
P∞
Proof. (Sufficiency) Let x ∈ c and x = lim x. By (a), the series ym := n=1 amn xn is
absolutely convergent for each m. Now write
∞
X ∞
X
ym = amn (xn − x) + x amn .
n=1 n=1
By (c), the second term on the right has limit x as m → ∞. Therefore, to show that ym → x
it suffices to show that the first term on the right tends to zero as m → ∞. Let s denote the
supremum in (a). Given ε > 0, choose N so that |xn − x| < ε/s for all n > N . Then
∞ N N
X X X X
a (x − x)≤ |a | |x − x| + |a | |x − x| ≤ |amn | |xn − x| + ε,
mn n mn n mn n
n=1 n=1 n>N n=1
P∞
hence, by (b), limm n=1 amn (xn − x) ≤ ε, which implies the desired conclusion.
n
(Necessity) Fix n and let x := (0, . . . , 0, 1, 0, . . .). Then Ax = (a1n , a2n , . . .) and
lim x = 0, hence limm→∞ amn = lim Ax = 0, proving (b). For (c), take x = (1, 1 . . .) and
argue similarly. P∞
To prove (a), we show first that n=1 |amn | < ∞. IfPthis is not the case, then there exists
nk+1
a strictly increasing sequence of indices nk such that j=n k +1
|amj | > k. Define
Schauder Bases
A sequence (en ) in a normed space X is said to be a Schauder basis or simply a basis
P∞X if ken k = 1 for all n and if each x ∈ X can be represented uniquely as a series
for
k=1 ck ek , that is, there exist unique scalars ck ∈ K such that
n
X
lim
x − ck ek
= 0.
n
k=1
n
For example, the sequences en := (0, . . . , 0, 1, 0 . . .) form a basis for each of the spaces c0
and `p , 1 ≤ p < ∞. In c, one must augment this set by e = (1, 1, . . .) (Ex. 8.85, 8.86).
The uniqueness of the representation implies that the coefficients ck depend linearly on x.
Thus we may write
X∞
x= ck (x)ek , ck (xj ) = δjk . (8.19)
k=1
Banach Spaces 233
Taking the supremum over all p, we have ||| x − xn ||| ≤ ε, proving that xn → x in (X, ||| · |||).
Therefore, X, ||| · |||) is complete.
It now follows from 8.7.13 that k·k and ||| · ||| are equivalent. Since |ck (x)| = kck (x)ek k ≤
||| x |||, ck is ||| · |||-continuous, hence also k·k-continuous.
Exercises
8.84 Show that the matrix A = [aij ], where aij = 1/i for j ≤ i and aij = 0 otherwise, satisfies (a),
(b), and (c) of 8.8.2. Conclude that limn (x1 + · · · + xn )/n = limn xn .
n
8.85 Show that {en = (0, . . . , 0, 1, 0, . . .) : n ∈ N} is a basis for c0 and `p , 1 ≤ p < ∞, but not for `∞
n
8.86 Show that the vectors en = (0, . . . , 0, 1, 0, . . .) together with e = (1, 1 . . .) form a basis for c.
n
8.87 Let d1 = (1, 0, . . .) and dn = (1, 0 . . . , 0, 1/n, 0, . . .), n ≥ 2.
(a) Show that (dn ) is a basis for c00 .
JConsider c1 (x) := x1 − ∞
P
k=2 kxk and cn (x) := nxn (n ≥ 2).K
(b) Show that kdn − d1 k → 0 but c1 (dn − d1 ) 6→ 0. Conclude that completeness of the
normed space X in 8.8.3 is essential.
234 Principles of Analysis
hx, T 0 f i = hT x, f i, x ∈ X, f ∈ Y 0 .
Annihilators
Let X be a normed space. The annihilators A⊥ and ⊥ B of subsets A ⊆ X and B ⊆ X 0
are defined by
T 0 (T −1 )0 = (T −1 T )0 = I and (T −1 )0 T 0 = (T T −1 )0 = I,
The following theorem uses the quotient map to identify the dual of X/Y with Y ⊥ .
8.9.7 Theorem. Let X be a normed space, Y a closed subspace, and Q : X → X/Y the
quotient map. Then Q0 : (X/Y)0 → X 0 is an isometry onto Y ⊥ .
Proof. We claim that the range of Q0 is Y ⊥ . Indeed, if ψ ∈ (X/Y)0 and y ∈ Y, then
hy, Q0 ψi = hQy, ψi = 0, hence Q0 ψ ∈ Y ⊥ . Conversely, if f ∈ Y ⊥ , then the equation
hQx, ψi = hx, f i defines ψ ∈ (X/Y)0 with Q0 ψ = f .
Now, since kQk ≤ 1,
To see that kψk ≤ kQ0 ψk, let 0 < r < 1. Since Q is surjective, we may choose Qx with
norm one such that |hQx, ψi| > r kψk. Since kQxk < r−1 we may choose y ∈ Y with
kx + yk < r−1 . Then
r kψk < |hQx, ψi| = |hQ(x + y), ψi| = |hx + y, Q0 ψi| ≤ r−1 kQ0 ψk ,
Proof. Let I : Y
,→ X denote the inclusion map. Then I 0 : X 0 → Y 0 is the restriction
mapping f → f Y , which has kernel Y and which is surjective by 8.5.4. By 8.7.9, there
⊥
Taking the infimum on g yields kT Qf k ≤ kQf k. On the other hand, given h ∈ Y 0 there exists
an f ∈ X 0 such that I 0 f = h and kf k = khk (8.5.4), so kT Qf k = kI 0 f k = kf k ≥ kQf k .
Exercises
8.88 Prove 8.9.2.
8.89 Let X, Y be normed linear spaces and T ∈ B(X, Y). Prove that kT xk = kT 00 xk.
b
8.90 Let X and Y be normed spaces and T ∈ B(X, Y). Prove that T is an isometry onto Y iff T 0
is an isometry onto X 0 .
8.91 [↑ 8.34] Let Tr and T` be the right and left shift operators on c0 . Identify c00 with `1 as in §8.3.
Find Tr0 and T` 0
8.92 Find the dual of the multiplication map Mφ of 8.2.3(c) for the case 1 < p < ∞.
8.95 Let 1 ≤ p < ∞, r 6= 0, and let Dr : L (λ ) → L (λd ) be the dilation operator Dr f (x) = f (rx).
p d p
8.97 Let Y be a closed subspace of a normed space X. Prove that X/Y is isometrically isomorphic
to (Y ⊥ )0 .
8.98 Show that there is a norm one projection of X 000 onto X 0 (identified with X
c0 ).
where y1 , . . . , yn is a basis for ran T and x0j ∈ X. The collection of all operators of finite
rank is denoted by B00 (X, Y):
with the analogous inclusions holding for B00 . In particular, B0 (X) and B00 (X) are ideals
in the Banach algebra B(X).
8.10.2 Theorem. B0 (X, Y) is operator-norm closed in B(X, Y).
Proof. Let T ∈ B(X, Y) and Tn ∈ B0 (X, Y) with kTn − T k → 0. S Given ε > 0 choose n
m
such that kTn − T k < ε, and let x1 , . . . , xm ∈ C1 so that Tn (C1 ) ⊆ j=1 Bε (Tn xj ). Then
for each x ∈ C1 there exists j such that kTn x − Tn xj k < ε, hence
An application of the CBS inequality shows that kKf k2 ≤ kkk2 kf k2 , hence K is a bounded
linear operator on L2 (µ) with kKk ≤ kkk2 . The operator K is called an integral operator
with kernel k. We show that K is compact.
First, assume that k is continuous. Then the collection of functions F :=S{k(·, y) : y ∈ X}
n
is compact in C(X), hence given ε > 0 there exist yj ∈ X such that F ⊆ j=1 Bε (k(·, yj )).
Let
Then T has finite rank, and for all f with kf k2 ≤ 1 and all x ∈ X,
n Z
X Z
|Kf (x) − T f (x)| ≤ |k(x, y) − k(x, yj )| |f (y)| dµ(y) ≤ ε |f | ≤ ε.
j=1 Bj
|gm (y) − gn (y)| ≤ |gm (y) − gm (yk )| + |gm (yk ) − gn (yk )| + |gn (yk ) − gn (y)|
≤ 2Cky − yk k + |gm (yk ) − gn (yk )|,
and since y may be approximated by a yk we see that (gn (y))n is a Cauchy sequence,
verifying the claim.
Now let g(y) := limn gn (y) (y ∈ cl ran T ). Clearly, g is linear and |g(y)| = limn |gn (y)| ≤
skyk, hence g is continuous on cl ran T . Therefore, g ◦ T ∈ X 0 , and for any x ∈ X
0
We claim that kT gn − g ◦ T k → 0. Suppose the claim is false. Then there exists ε > 0 such
that kT 0 gn − g ◦ T k ≥ ε for infinitely many n, say for n ∈ S. For each n ∈ S choose xn with
norm one such that
Since T is compact, there exists a strictly increasing sequence (nk )k in S and y ∈ Y such that
T xnk → y. Since supn kgn k < ∞, gnk (T xnk ) → g(y). But this contradicts (†). Therefore,
kT 0 gn − g ◦ T k → 0, hence T 0 is compact. The proof that T 0 compact ⇒ T compact is left
as an exercise (8.103).
∗
Fredholm Alternative for Compact Operators
Let A be an n × n matrix. A standard argument shows that one of the following holds:
(i) The system of equations Ax = 0 has a nonzero solution in Kn .
(ii) The system Ax = y has a unique solution for each y ∈ Kn .
In this subsection we prove an infinite dimensional version of this result using the following
lemmas.
8.10.5 Lemma. Let T ∈ B(X) be compact, and for each x ∈ X let d(x) denote the
distance from x to ker(I − T ). Then there exists M > 0 such that d(x) ≤ M k(I − T )xk
for all x.
Banach Spaces 239
hence f = S 0 h ∈ ran S 0 .
We may now prove
8.10.7 Theorem (Fredholm). Let T ∈ B0 (X) and λ 6= 0. Then λI − T is surjective iff
λI − T is injective. Thus one of the following holds:
(i) The equation T x − λx = 0 has a nonzero solution.
(ii) The equation T x − λx = y has a unique solution for any y ∈ X.
Proof. Since λI − T = λ(I − λ−1 T ) and λ−1 T is compact, we may take λ = 1. Set S := I − T .
Suppose that S is surjective but not injective. Then Sx1 = 0 for some x1 6= 0. We claim that
the containment ker(S n−1 ) ⊆ ker(S n ) is proper. Indeed, since S is surjective, there exists x2
such that Sx2 = x1 , and in general there exists a vector xn such that Sxn = xn−1 . Then
S n xn = S n−1 xn−1 = · · · = Sx1 = 0 and S n−1 xn = S n−2 xn−1 = · · · = Sx2 = x1 6= 0, so
xn ∈ ker(S n ) \ ker(S n−1 ), verifying the claim. By 8.1.6 there exists yn ∈ ker(S n ) \ ker(S n−1 )
such that
kyn k = 1 and inf{kyn − yk : y ∈ ker(S n−1 )} ≥ 1/2.
Now write
T yn − T ym = (I − S)yn − (I − S)ym = yn + Sym − ym − Syn .
The term in square brackets is in ker(S n−1 ) for all n > m, hence kT yn − T ym k ≥ 1/2. But
then (T yn ) has no convergent subsequence, contradicting that T is compact. Therefore, S
is injective.
240 Principles of Analysis
Conversely, assume that S is injective. We claim that ran(S) is closed. To verify this, we
use the following simple observation regarding sequences (xn ) in X:
Exercises
8.99 Let X be a Banach space, (xn ),P(x0n ) sequences in X and X 0 with norm ≤ 1, and (cn ) ⊆ `1 (N).
Show that the operator T x = ∞ 0
k=1 ck hx, xk ixk is compact.
8.102 Let X be a normed space, λ 6= 0, and T ∈ B(X) compact. Prove that ker (λ − T )m is finite
dimensional for all m ∈ N.
8.103 Let X and Y be Banach spaces and T ∈ B(X, Y). Prove: If T 0 is compact, then T is compact.
Chapter 9
Locally Convex Spaces
241
242 Principles of Analysis
Seminormed Spaces
Let P be a family of seminorms on a vector space X. The initial topology induced by
the collection of all functions of the form z 7→ p(z − y), where p ∈ P and y ∈ X, is called
the seminorm topology generated by P. The space X with this topology is called a
seminormed space. A neighborhood base at x for a seminorm topology consists of finite
intersections of sets of the form
where pj ∈ P and ε > 0. In particular, a net (xα ) converges to x in this topology iff
p(xα − x) → 0 for all p ∈ P. It follows easily from properties of seminorms that the
seminorm topology on X is a vector topology.
Locally Convex Spaces 243
By continuity of scalar multiplication, sx ∈ U for sufficiently small s > 0, hence pU (x) < ∞.
The following result is the key to establishing the connection between locally convex spaces
and seminormed spaces.
9.1.4 Proposition. Let U be an open, convex, balanced neighborhood of zero in a TVS X.
Then pU is a Minkowski functional that is continuous in the topology of X. Moreover,
Proof. To verify the subadditivity property, let x ∈ sU and y ∈ tU (s, t > 0). By convexity
of U ,
1 s t
(x + y) = (s−1 x) + (t−1 y) ∈ U,
s+t s+t s+t
hence pU (x + y) ≤ s + t. Since s and t were arbitrary, pU (x + y) ≤ pU (x) + pU (y).
For positive homogeneity, let c ∈ F, c 6= 0. Since U is balanced, c−1 U = |c−1 |U , hence
Fréchet Spaces
If X is a LCS with a countable generating class (pn ) of seminorms (or, equivalently, a
countable basis of open convex neighborhoods of zero), then
∞
X pn (x − y)
d(x, y) := 2−n (9.4)
n=1
1 + pn (x − y)
defines a metric for the locally convex topology of X, as is readily verified. If X is complete
in this metric, then X is called a Fréchet space. The metric d is not derived from a norm,
since homogeneity fails (dramatically). We shall call d the standard metric for X. Clearly
every Banach space is a Fréchet space. Here are some nontrivial examples:
9.1.6 Examples.
(a) The space C(U ). Let U ⊆ Rd be open. Define compact subsets of U by
Kn := {x ∈ Rd : |x| ≤ n, d(x, U c ) ≥ 1/n} n ∈ N.
Then Kn ⊆ int Kn+1 and Kn ↑ U . Now define seminorms pn on C(U ) by
pn (f ) = sup{|f (x)| : x ∈ Kn }.
Since the sets {x : d(x, U c ) > 1/n} form an increasing open cover of U , every compact set
is contained in some Kn . Thus convergence in the locally convex topology generated by the
seminorms pn is uniform convergence on compact subsets of U , also called local uniform
convergence. Since each space C(Kn ) is complete, C(U ) is a Fréchet space.
(b) The space H(U ). For U ⊆ R2 , the space H(U ) of analytic (holomorphic) functions is a
closed subspace of C(U ) in (a), since the property of analyticity is conveyed by local uniform
convergence. Therefore H(U ) is also a Fréchet space.
(c) The space C ∞ (U ). Let U and (Kn ) be as in (a). Define a countable family of seminorms
pm,α on C ∞ (U ) by
pm,α (f ) = sup{|∂ α f (x)| : x ∈ Km },
where α = (α1 , . . . , αd ) (αj ∈ Z+ ), is a multi-index. A sequence (fn ) converges to zero in
the locally convex topology generated by these seminorms iff ∂ α fn → 0 locally uniformly for
all α. To see that C ∞ (U ) is a Fréchet space, let (φn ) be a Cauchy sequence with respect to
the standard metric, so that
lim sup |∂ α φn (x) − ∂ α φm (x)| = 0 ∀ j and ∀ multi-index α.
m,n x∈Kj
Locally Convex Spaces 245
Since C(Kj ) is complete and Kj ↑ U , for each multi-index-α there exists φα ∈ C(U ) such
that ∂ α φn → φα uniformly on each compact subset of U . Set φ = φ(0,...,0) , so φn → φ locally
uniformly. Letting n → ∞ in
Z x1
φn (x1 , . . . , xd ) = ∂ (1,0...,0) φn (t1 , x2 . . . , xd ) dt1 ,
0
we obtain Z x1
φ(x1 , . . . , xd ) = φ(1,0,...,0) (t1 , x2 . . . , xd ) dt1 .
0
This shows that ∂ (1,0,...,0) φ(x1 , . . . , xd ) exists and equals φ(1,0,...,0) (x1 , . . . , xd ). In a similar
manner, it may be shown that ∂ α φ(x1 , . . . , xd ) exists and equals φα (x1 , . . . , xd ) for all
multi-indices α. Therefore, C ∞ (U ) is complete.
For later reference we note that the space Cc∞ (U ) is dense in C ∞ (U ). Indeed, by
Urysohn’s lemma for C ∞ functions, for each n there exists a function φn ∈ Cc∞ (U ) such
that φn = 1 on Kn . For any f ∈ C ∞ (U ) we then have φn f ∈ Cc∞ (U ) and φn f = f on Kn ,
hence for n > m and all α, pm,α (φn f − f ) = 0.
(d) Schwartz space. The space S of rapidly decreasing functions is a Fréchet space under
the countable family of norms qα,m defined by
qα,m (φ) = sup (1 + |x|)m |∂ α φ(x)|.
x∈Rd
The proof that S is complete with respect to the standard metric is similar to that of (b).
By 6.3.2, the same Fréchet topology is obtained by using the countable family of norms
pα,β (φ) = sup |xα ∂ β φ(x)|. ♦
x∈Rd
There are metrizable TVS that are not locally convex and hence not Fréchet spaces. Here
is one such example:
9.1.7 Example. Let (X, F, µ) be a finite measure space and let L0 = L0 (X, F, µ) denote
the linear space of measurable functions f : X → K. Then
Z
|f − g|
d(f, g) = dµ
1 + |f − g|
defines a metric on L0 (where, as usual, we identify functions equal a.e.). Convergence in
this metric is simply convergence in measure (Ex. 3.22). The inequalities
µ{|(fn + gn ) − (f + g)| ≥ 2ε|} ≤ µ{|f − fn | ≥ ε|} + µ{|g − gn | ≥ ε|} and
µ{|cn fn − cf | ≥ 2ε|} ≤ µ{|cn fn − cn f | ≥ ε|} + µ{|cn f − cf | ≥ ε|}
≤ µ{|fn − f | ≥ (|cn | + 1)−1 ε|} + µ{|f | ≥ |cn − c|−1 ε|}
then imply that L0 is a TVS under the usual pointwise operations.
Now consider the measure space ([0, 1], B[0, 1], λ). If L0 were locally convex, then the
open ball B1/2 (0) would contain an open convex neighborhood of zero, which in turn would
contain an open ball Br (0), whose convex hull is then contained in B1/2 (0). For each n > 1/r,
let fj = 1[(j−1)/n,j/n) , 1 ≤ j ≤ n. Then
Z
|fj | 1
d(fj , 0) = dλ = < r,
1 + |fj | 2n
Pn
hence the convex combination f := (1/n) j=1 fj is in B1/2 (0). But f = 1/n a.e. and so
Z
1/n n 1
d(f, 0) = dλ = > (n > 1). ♦
1 + 1/n n+1 2
246 Principles of Analysis
Exercises
9.1 Let X be a TVS and A, B ⊆ X with A compact and B closed. Show that A + B is closed.
9.3 Show that the closed convex hull of subset A of a TVS X is the closure of co A and that the
closed, convex, balanced hull of A is the closure of cobal A.
9.4 Let X be a linear space. Show that if U is balanced and |a| ≤ |b|, then aU ⊆ bU .
9.5 Let U and V be open, convex, balanced neighborhoods of zero in a TVS X. Show that
pU ∩V = max{pU , pV }.
9.6 A subset E of a TVS is bounded if for each neighborhood V of zero there exists t > 0 such
that E ⊆ tV . Verify the following
(a) If E1 , . . . , En are bounded, then E1 + · · · + En , n
S
j=1 Ej , cE1 , and cl E1 are bounded.
(b) Every compact set K is bounded.
(c) E is bounded iff xn ∈ E, tn ∈ K and tn → 0 ⇒ tn xn → 0.
(d) In a LCS, E is bounded iff p(E) is bounded for every continuous seminorm p.
9.7 (Kolmogorov). Let X be a TVS with a bounded, convex, balanced neighborhood U of zero.
Show that pU is a norm that gives the original topology of X. JFor positivity, let x 6= 0 and V
a balanced neighborhood of zero that does contain x. If U ⊆ tV , then pU (x) ≥ 1/t. For the
equality of topologies, consider suitable nets.K
9.8 Let X be a LCS generated by a family of seminorms P and let Y a linear subspace of X. Prove
that the relative topology of Y is the locally convex topology τ induced by the seminorms pY
(p ∈ P).
9.9 Let p and q be seminorms on a vector space X such that {x : p(x) < 1} = {x : q(x) < 1}.
Show that p = q.
9.10 Let X be a TVS and p a seminorm on X such that the set {x ∈ X : p(x) < 1} is open. Show
that p is continuous.
9.11 Let X be a vector space with locally convex topologies τ1 and τ2 . Show that τ1 ≤ τ2 iff every
τ1 -continuous seminorm is τ2 -continuous.
and observe that the second term is in ker f . Therefore, X = K x0 + ker f . The sum is direct
since if cx0 ∈ ker f , then 0 = f (cx0 ) = cf (x0 ), hence c = 0.
9.2.2 Proposition. Let f be a linear functional on a TVS X. The following statements
are equivalent:
(a) f is continuous.
Proof. That (a) ⇒ (b) is clear. For (b) ⇒ (c) we may assume that f is not identically zero.
Let x 6∈ ker f and choose a neighborhood U of 0 such that (U + x) ∩ ker f = ∅. By 9.1.2,
we may assume that U is balanced. We claim that f is bounded on U . If not, then for any
c ∈ K there exists u ∈ U such that |f (u)| > |c|. Setting a := c/f (u) we have |a| < 1 and
so c = f (au) ∈ f (U ). Thus f (U ) = K, and in particular f (u) = −f (x) for some u ∈ U .
But this contradicts (U + x) ∩ ker f = ∅. Therefore, f (U ) must be bounded.
To prove (c) ⇒ (a), let |f (u)| < r for all u in a neighborhood U of zero. If xα → 0 and
ε > 0, then eventually (r/ε)xα ∈ U and so |f (xα )| < ε. Therefore, f is continuous.
It is possible for a TVS not to have any nontrivial continuous linear functionals, as the
following example demonstrates.
9.2.3 Example. We show that the space L0 [0, 1] of Example 9.1.7 has no nontrivial
continuous linear functionals. Let F be such a functional and choose f ∈ L0 [0, 1] such that
F (f ) 6= 0. Next, choose whichever of the functions f 1[0,1/2) or f 1[1/2,1] , call it f1 , has the
property F (f1 ) 6= 0, and note that λ{f1 6= 0} ≤ 1/2. By induction, we obtain a sequence
(fn ) such that αn := F (fn ) 6= 0 and λ{fn 6= 0} ≤ 1/2n . Set gn := αn−1 fn . Then
Z Z
|fn | |fn |
d(gn , 0) = dλ = dλ ≤ λ{fn 6= 0} → 0,
|αn | + |fn | |fn |6=0 |αn | + |fn |
We shall see in the next section that, unlike the TVS case, a LCS always has a rich supply
of continuous linear functionals.
248 Principles of Analysis
Proof. (a) ⇒ (b): By continuity of f at zero, there exists a basic neighborhood U of zero
as in (9.1) such that |f (u)| < 1 for all u ∈ U . Set p(x) = maxj pj (x). For any x ∈ X and
δ > 0, εx/(p(x) + δ) ∈ U hence |f (x)| < ε−1 (p(x) + δ). Letting δ → 0 yields (b) with
M = 1/ε.
(b) ⇒ (c): Take q = M maxj pj (x).
(c) ⇒ (a): If xα → 0, then q(xα ) → 0, hence f (xα ) → 0.
Exercises
9.12 Let X be a real TVS and f a linear functional on X such that {f ≤ t} is closed for some t.
Show that f continuous.
9.13 Let X be a TVS and f a linear functional on X such that ker f is not dense in X. Show that
f is continuous.
9.14 Let X and Y be locally convex spaces. Show that a linear transformation T : X → Y is
continuous iff p ◦ T is continuous for every continuous seminorm p on Y.
9.15 Let X be a TVS, f ∈ X 0 a nontrivial real linear functional, and t ∈ R. Let C = {x : f (x) ≤ t}
and U = {x : f (x) < t}. Show that cl U = C and int C = U .
Locally Convex Spaces 249
f =t
f <t
f >t
A
Proof. Suppose first that K = R. Fix x0 ∈ A and y0 ∈ B and let z0 := y0 − x0 . The set
U := A − B + z0 is convex, contains zero, and is open, the last property because U is a
union of the open sets A − y + z0 (y ∈ B). Let p be the Minkowski functional of U . Since
A and B are disjoint, z0 6∈ U , hence p(z0 ) ≥ 1 by (9.3). Define g on the one-dimensional
space Y := Rz0 by g(cz0 ) = c. Then g ≤ p on Y, hence g extends to a linear functional f
on X with f ≤ p. (8.5.1). Since p < 1 on U , −ε < f < ε on the open set −εU ∩ εU , hence f
is continuous at zero and therefore everywhere. If x ∈ A and y ∈ B, then x − y + z0 ∈ U ,
hence
f (x) − f (y) = f (x − y + z0 ) − 1 ≤ p(x − y + z0 ) − 1 < 0
and so f (x) < f (y). Since convex sets are connected, f (A) and f (B) are disjoint intervals
in R, hence f (A) lies to the left of f (B). Moreover, since A is open and f is nontrivial, f (A)
is open. Therefore, we may take t in (9.6) to be the right endpoint of f (A).
For the case K = C, apply the first part to X as a real linear space to obtain a real
linear functional fr that satisfies fr (x) < t ≤ fr (y) for all x ∈ A and y ∈ B. Then
f (x) := fr (x) − ifr (ix) defines a complex linear functional satisfying (9.6).
Proof. Suppose A is compact. Let U0 be a neighborhood base at zero of open convex sets.
We claim that there exists U ∈ U0 such that (U + A) ∩ B = ∅. Assuming this and noting
that C := U + A is open and convex, we may choose by 9.3.1 f ∈ X 0 and t ∈ R such that
proving (9.7).
To verify the claim, for each x ∈ A ⊆ B c choose Vx ∈ U0 such that x + Vx ⊆ B c .
Next, choose Ux ∈ U0 so that Ux + Ux ⊆ Vx . This is possible by continuity of addition at
(0, 0). Then the sets x + Ux + US x and B are disjoint. Moreover,
Tby compactness, there exist
n n
x1 , . . . , xn ∈ A such that A ⊆ j=1 (xj + Uxj ). Setting U = j=1 Uxj , we have
n
[ n
[
A+U ⊆ (xj + Uxj + U ) ⊆ (xj + Uxj + Uxj ) ⊆ B c ,
j=1 j=1
verifying the claim and completing the proof for case A compact.
If B is compact, then reversing the roles of A and B yields
For x ∈ A, write |f (x)| = eiθ f (x) = f (eiθ x) = Re f (eiθ x). Since eiθ x ∈ A we have
sup{|f (x)| : x ∈ A} < t < Re f (y) ≤ |f (y)| for all y ∈ B, verifying (9.8).
But because Y is a linear space, Re g(Y) cannot be bounded above unless Re g(Y) = {0},
which then implies that g(x0 ) 6= 0. Since Im g(y) = −Re g(iy), we have g(Y) = {0}. Now
take f = g/g(x0 ).
Locally Convex Spaces 251
Therefore, f extends g.
9.3.6 Corollary. A finite dimensional subspace Y of a LCS X is closed.
Pd
Proof. Let y1 , . . . , yd be a basis for Y. Then y = j=1 gj (y)yj (y ∈ Y), where gj is a
linear functional on Y. By 9.2.6, gj is continuous and so has a continuous extension fj ∈ X 0 .
Therefore, if yα ∈ Y and yα → x ∈ X we have
X X
x = lim fj (yα )yj = fj (x)yj ∈ Y.
i
j j
9.3.7 Corollary. A LCS X is finite dimensional iff it has a compact neighborhood of zero.
Proof. If X is finite dimensional, then X is topologically isomorphic to Kd (9.2.5), proving
the necessity.
For the sufficiency, let V be a neighborhood of zero in X with compact closure. Then
there exists a finite subset F of X such that
[
cl V ⊆ x + 12 V = F + 21 V.
x∈F
hence
V ⊆ Y + 12 V ⊆ Y + Y + 1
2n+1 V =Y+ 1
2n+1 V,
A0 = {f ∈ X 0 : |f (x)| ≤ 1 ∀ x ∈ A} and 0
B = {x ∈ X : |f (x)| ≤ 1 ∀ f ∈ B}.
Proof. Let C denote the closed convex balanced hull of A. Since A ⊆ 0A0 and 0A0 is
closed, convex, and balanced, C ⊆ 0A0 . For the reverse inclusion, let y ∈ C c and choose
f ∈ X and t ∈ R so that sup{|f (x)| : x ∈ C} < t < |f (y)| (9.3.3). Set g := f /t. Then
sup{|g(x)| : x ∈ A} < 1 < |g(y)|, hence y 6∈ 0A0 .
Exercises
9.16 A half-space in Rd is a set of the form {x ∈ Rd : a1 x1 + · · · + ad xd ≤ a}. Show that a closed,
convex subset C of Rd is the intersection of all half-spaces that contain it.
9.17 Let X be a LCS, A ⊆ X, and B ⊆ X 0 . The annihilators of A and B are defined as for normed
spaces by
A⊥ = f ∈ X 0 : hx, f i = 0 ∀ x ∈ A and ⊥
B = {x ∈ X : hx, f i = 0 ∀ f ∈ B} .
9.19 Show that if A and B are open in 9.3.1, then there exists t ∈ R such that Ref (x) < t < Ref (y)
for all x ∈ A and y ∈ B.
Quotient Spaces
The following results generalize theorems in §8.4 on quotients of normed spaces.
9.4.1 Theorem. Let X be a TVS, Y a closed subspace of X, and X/Y the algebraic
quotient space with quotient map Q : X → X/Y. Then X/Y is a TVS in the quotient
topology and Q is an open map. Moreover, if X is locally convex (Fréchet), then X/Y is
locally convex (Fréchet).
Proof. Recall that the quotient topology on X/Y is the strongest topology relative to which
Q is continuous; equivalently, W is open in X/Y iff Q−1 (W ) is open in X. Now, if U is
open in X, then [
Q−1 Q(U ) = U + Y = U + y,
y∈Y
Proof.
Let P denote the family of all seminorms on X with property that the restriction
pXn is a continuous seminorm on Xn . The identically zero seminorm obviously has this
property, hence P is nonempty. Let τ denote the locally convex topology on X generated by
P and let τ 0n denote the relative topology on Xn induced by τ . By Ex. 0
9.8, τ n is generated
by the collection PXn . Since, by definition, the seminorms in PXn are τ n -continuous,
τ 0n ≤ τ n (Ex. 9.11). To show that τ n ≤ τ 0n , it suffices to show that every τ n -continuous
seminorm pn may be extended to a τ -continuous seminorm p on X. Indeed, it will then
follow that pn is continuous in the relative topology, implying the inequality. To construct
the extension, we use 9.4.5. By induction, for each m ≥ n there exists aSτ m+1 -continuous
seminorm pm+1 on Xm+1 such that pm+1 Xm = pm . Define p on X = m≥n Xm so that
p = pm on each Xm . Then p is a well-defined seminorm on X and by construction p ∈ P.
Now let σ be a locally convex topology with property that the relative topology on Xn
induced by σ is τ n . If q is a σ -continuous seminorm on X, then U := {x ∈ X : q(x) < 1}
is σ -open hence U ∩ Xn is τ n -open, which implies that q Xn is τ n -continuous (Ex. 9.10).
Thus q ∈ P, hence σ ≤ τ .
It remains to verify (d). Assume that Xn is Hausdorff for all n. Let x ∈ X and x = 6 0.
Then x ∈ Xn for some n, hence there exists a continuous seminorm pn with pn (x) 6= 0.
By the preceding, pn extends to a τ -continuous seminorm p on X. Since p(y) 6= 0, τ is
Hausdorff. The converse is similar.
The space X with the topology τ is called the inductive limit of the system (Xn , τ n ).
9.4.7 Corollary. Let each Xn be a Fréchet space. Then the inductive limit topology τ has
the following properties:
(a) A sequence (xn ) τ -converges to x in X iff there exists a k such that (xn ) ⊆ Xk and
xn → x in the topology τ k .
(b) If T is a linear mapping from X to a LCS Y, then T is τ -continuous iff for each k the
restriction of T to Xk is τ k -continuous. In particular, τ -continuity and τ -sequential
continuity of linear maps on X are equivalent.
Proof. (a) The sufficiency is clear. For the necessity, we may take x = 0. Suppose, for a
contradiction, that the necessity is false. Thus for each k, xn 6∈ Xk for infinitely many n. Set
Y1 = X1 and choose xn1 6∈ Y1 . Next, choose j > 1 such that xn1 ∈ Xj and set Y2 = Xj .
Continuing in this manner, we obtain a subsequence (yk := xnk ) of (xn ) and a subsequence
(Yk ) of (Xn ) such that Yk ↑ X and yk ∈ Yk+1 \ Yk . It is easy to see that the inductive limit
of (Yk ) is the same as that of (Xn ) (Ex. 9.21). Now let p1 be a continuous seminorm on Y1
such that p1 (y1 ) = 1. By the construction in the proof of 9.4.6, there exists a continuous
seminorm p on X that extends p1 such that p|Yk is a continuous seminorm on Yk for each
k. Incorporating the second assertion of 9.4.5 into this construction shows that p may be
chosen so that p(yk ) ≥ 1 for all k. Then (yk ) cannot converge to zero in X.
(b) The necessity is clear. For the sufficiency, let q be any continuous seminorm on Y.
Then p := q ◦ T is a seminorm on X. Since T |Xn is continuous, p|Xn is continuous, so p is
continuous on X. Therefore, T is continuous (Ex. 9.14).
Exercises
9.21 Let (Xn , τ n ) be a strict inductive system for X and let (nk ) be a strictly increasing sequence
of positive integers. Set Yk = Xnk and σk = τ nk . Show that the inductive limit of (Yk , σk ) is
the same as that of (Xn , τ n ).
9.22 Show that X is not a Fréchet space. JAssume the contrary. Choose xn ∈ Xn+1 \ Xn and εn > 0
so that d(εn xn , 0) < 1/n and apply (a) of 9.4.7.K
Chapter 10
Weak Topologies on Normed Spaces
In this chapter we consider two important locally convex topologies: the weak topology on a
normed space X and the weak∗ topology on its dual X 0 . The chapter relies on some of the
material developed in Sections 9.1–9.3.
xα + yα → x + y and cα xα → cx.
It follows that w is a vector topology. By 0.6.4, a neighborhood base at zero is given by the
open, convex, balanced sets
U (f1 , . . . , fk ; ε) := y : |fj (y)| < ε, j = 1, . . . , k , fj ∈ X 0 , ε > 0. (10.1)
Thus Xw is a LCS with generating seminorms pf (x) = |f (x)|. (The separating property is
a consequence of 9.3.4.)
By definition of initial topologies, every member of Xτ0 is w continuous, and since w ≤ τ ,
0
every member of Xw is τ continuous. Thus
Xw 0 = Xτ 0 and (Xw )w = Xw .
For the remainder of the chapter, we shall be mainly concerned with the weak topology on
normed spaces rather than on general LCS. (We return to the general case in later chapters.)
For ease and uniformity of notation, we frequently denote the norm topology on X by s (for
strong topology). The following result shows that for infinite dimensional normed spaces it is
always the case that w < s.
10.1.1 Proposition. If X is a normed space, then w = s iff X is finite dimensional.
Proof. Assume w = s. Then U := {x : kxk < 1} is w-open and hence Tncontains a neighbor-
hood of 0 of the form U0 := U (f1 , . . . , fn ; ε), as in (10.1). We then have j=1 ker fj ⊆ U0 ⊆ U ,
257
258 Principles of Analysis
Tn
and since U is norm bounded, j=1 ker fj = {0}. The linear map x 7→ (f1 (x), . . . , fn (x))
from X to Kn is therefore 1-1 and so X is finite dimensional.
Conversely, assume that X is finite dimensional. We may then identify X with Euclidean
space Kd for some d. Since the open ball Bn with center 0 and radius n has compact closure,
the weak and norm topologies agree on Bn (0.8.5). S Thus if U is norm open, then U ∩ Bn is
open in the weak topology for every n and so U = n U ∩ Bn is weakly open.
By the Vitali-Hahn-Saks theorem (5.2.4), ν(E) := limn νn (E) defines a complex measure on
F. Moreover, ν µ, hence dν = f dµ for some f ∈ L1 (µ). Thus
Z Z
lim fn 1E dµ = ν(E) = f 1E dµ, E ∈ F.
n
Taking D in 10.1.2 to be the collection of measurable indicator functions we see that (fn )
converges weakly to f in L1 .
Proposition 10.1.1, together with Corollary 0.5.6, imply that in every infinite dimensional
normed space there are nets that converge weakly but not strongly. The same assertion
cannot be made for sequences:
10.1.4 Theorem (Schur). A weakly convergent sequence in `1 (N) converges in norm. Thus
the notions of weak and norm sequential convergence in `1 (N) coincide.
Proof. Suppose the assertion is false. Then there exists a sequence (xn ) ∈ `1 and ε > 0 such
w
that xn → 0 and kxn k ≥ 5ε for all n. We construct a subsequence xnk and a member y of
the dual space `∞ such that |hxnk , yi| ≥ ε for all k, producing the desired contradiction.
w
Since xn → 0, xn (j) → 0 for all j. Set m0 = n0 = 1. Let n1 be an integer > n0
Weak Topologies 259
Pm0
such that |xn1 (j)| = |xn1 (m0 )| < ε, and let m1 be an integer > m0 such that
j=1
P∞ Pm1
j=m1 +1 |xn1 (j)| < ε. Next, let n2Pbe an integer > n1 such that j=1 |xn2 (j)| < ε, and let
∞
m2 be an integer > m1 such that j=m2 +1 |xn2 (j)| < ε. In this way we construct strictly
increasing sequences (mk ) and (nk ) such that
mk−1 ∞
X X
|xnk (j)| < ε and |xnk (j)| < ε ∀ k. (†)
j=1 j=mk +1
Now define y ∈ `∞ by y(j) = sgn xnk (j) (mk−1 < j ≤ mk ∈ N). Fix k and set
as required.
Combining the last theorem with 10.1.3, we obtain
10.1.5 Corollary. A bounded sequence (xn ) in `1 (N) converges in norm to x iff xn (j) →
x(j) for each j.
Note that Theorem 10.1.4 does not hold in `p for 1 < p < ∞ (see Ex. 10.1).
∞
X
x = w- cj (x)ej ,
j=1
where the cj are linear functionals satisfying cj (ei ) = δij . We show that if X is a Banach
space, then a weak basis is a Schauder basis.
Let X ∞ denote the linear space of all functions f = (f (1), f (2), . . .) : N → X such that
kf k∞ := supn kf (n)k < ∞. The space X ∞ is easily seen to be a Banach space under this
norm. Define a linear map T : X → X ∞ by
m
X w
T x = (S1 x, S2 x, . . .), where Sm x := cj (x)ej → x.
j=1
Note that kT xk∞ = supn kSn xk. We use the closed graph theorem to show that T is
continuous. Let xn → x in X and T xn → f in X ∞ . In particular, we have the coordinate-
wise convergence
lim Sm xn = f (m) for each m. (†)
n
We claim that
m
X
f (m) = αj ej for some αj ∈ C. (‡)
j=1
m+1
X
f (m + 1) = lim Sm+1 xn = lim Sm xn + cm+1 (xn )em+1 = αj ej
n n
j=1
for some αm+1 . Therefore, the claim holds by induction. Now let x0 ∈ X 0 . From (†),
kSm x − xk ≤ kSm (x − yn )k + kx − yn k ≤ kT k kx − yn k + kx − yn k ≤ ε.
Weak Topologies 261
Exercises
10.1 Find a sequence (xn ) in c0 that converges weakly but not strongly to zero. Do the same for `p ,
1 < p < ∞.
10.2 (von Neumann). Let 1 < p < ∞. For each pair m, n ∈ N with 1 ≤ m < n, define xm,n ∈ `p
m n
by xm,n := (0, . . . 0, 1 , 0 . . . , 0, m, 0, . . .). Let A be the set of all xm,n . Show that zero is in the
weak closure of A in `p (N), but no sequence in A converges in norm to zero.
10.3 Show that the sequence of functions xn (t) = tn in C[0, 1], k·k∞ converges weakly but not
strongly.
w
10.4 Show that xn → x in c iff the following hold:
(a) supn kxn k < ∞, (b) xn (j) → x(j) ∀ j, and (c) limn limj xn (j) = limj x(j).
10.5 Let (X, F, µ) be a σ-finite measure space and 1 < p < ∞. Show that a sequence (fn ) in Lp
converges weakly to f ∈ Lp iff the following hold:
(b) E fn dµ → E f dµ ∀ E ∈ F with µ(E) < ∞.
R R
(a) sup kfn k∞ < ∞,
w
10.6 Let X be a locally compact Hausdorff space. Show that fn → f in C0 (X) iff sup kfn k∞ < ∞
and fn → f pointwise on X.
10.7 Let X and Y be Banach spaces and T : X → Y linear. Show that T is norm continuous iff T is
weak-weak continuous.
10.8 Let X be a normed space. Show that if C is weakly compact, then {cx : x ∈ C, |c| ≤ r} is
weakly compact.
10.9 Let X be a Banach space. Prove the following:
(a) If X is infinite dimensional, then every weak neighborhood U of zero is unbounded.
(b) The weak topology of a normed space X is metrizable iff X is finite dimensional. JConsider
{x ∈ X : d(x, 0) < 1/n} and use the uniform boundedness principle.K
10.10 A sequence (xn ) in a normed space X is said to be weakly Cauchy if hxn , x0 i is Cauchy
in K for all x0 ∈ X 0 . The space X is weakly sequentially complete if every weakly Cauchy
sequence (xn ) in X converges weakly to a member of X. Prove:
(a) A weakly Cauchy sequence is norm bounded.
(b) `1 is weakly sequentially complete.
(c) c0 is not weakly sequentially complete.
(d) C[0, 1] with the uniform norm is not weakly sequentially complete.
10.11 Let X be compact and (fn ) a bounded sequence in C(X) that converges pointwise to f ∈ C(X).
Show that there exists a sequence of convex combinations of members of (fn ) that converges in
the uniform norm to X.
10.12 Prove that in an infinite dimensional normed space the weak closure of S1 is C1 . JSuppose
there exists x0 ∈ C1 \ clw S1 . Choose an open, convex, weak neighborhood U of zero such that
V := U + x0 does not meet S1 .K
10.13 Prove the following result on compact convergence of bounded nets: A bounded net (xα )
converges weakly to x0 in a normed space X iff hxα , f i → hx0 , f i uniformly in f on compact
subsets of X 0 .
JFor each norm compact K ⊆ X 0 and ε > 0, define
U (K; ε) := x ∈ X : supf ∈K | hx, f i | < ε .
If B a bounded subset of X, then, for each x0 ∈ B, the sets x0 + U (K; ε) ∩ B form a
neighborhood base of x0 in the relative weak topology of B.K
262 Principles of Analysis
It follows that w∗ is a vector topology. By 0.6.4, a neighborhood base at zero is given by the
open, convex, balanced sets
U (x1 , . . . , xk ; ε) := f ∈ X 0 : |f (xj )| < ε, j = 1, . . . , k . (10.2)
Proof. By definition of the w∗ -topology, there exist xj ∈ X and ε > 0 such that |ϕ(f )| < 1
for all f ∈ U := U (x1 , . . . , xT
k ; ε). In particular, if f (xj ) = 0 for all j then nf ∈ U for all
n ∈ N, hence ϕ(f ) = 0. Thus j ker x cj ⊆ ker ϕ, which implies that ϕ is a linear combination
Pk P
of the xcj , say ϕ = j=1 cj x cj (0.2.3). Therefore, ϕ = x, b where x = kj=1 cj xj .
But the left side is unbounded unless f = 0, in which case x00 (f ) = 0, impossible.
b
10.2.3 Corollary. Let A ⊆ X be finite. Then A00 ⊆ X.
w∗
Proof. Let ψ ∈ A00 . We show that ψ is weak∗ continuous on X 0 . Let fα → 0 in X 0 and
ε > 0. Then ε−1 fα (x) = fα (ε−1 x) → 0 for each x ∈ X, and since A is finite, there exists
α0 such that
sup{ε−1 |fα (x)| : x ∈ A} ≤ 1 for all α ≥ α0 .
For such α, ε−1 fα ∈ A0 , hence |ψ(fα )| < ε. Therefore, ψ(fα ) → 0.
Weak Topologies 263
w∗
defines a metric on C10 such that d(fα , f ) → 0 iff fα → f . Therefore, the metric and
w∗ -topologies agree on C10 , and the conclusion follows from the theorem.
∗
Application: Means on Function Spaces
Let S be a set and F a norm-closed, conjugate-closed, linear subspace of B(S) that
contains the constant functions. A mean on F is a linear functional m on F such that
Weak∗ Continuity
Here is a significant extension of 10.2.1 for Banach spaces:
10.2.9 Theorem. Let X be a Banach space. If ϕ is a linear functional on X 0 whose
restriction to the closed unit ball C10 is w∗ -continuous, then ϕ = x
b for some x ∈ X.
Proof. Fix n ∈ N. By hypothesis, the set
nϕ ∈ V 0 + C100 , n ∈ N, (α)
where C100 is the closed unit ball in X 00 . To see this, note first that V 0 + C100 convex and
balanced, and since V 0 is weak∗ closed and C100 is weak∗ compact, V 0 + C100 is weak∗ closed
in X 00 (Ex. 9.1). If nϕ 6∈ V 0 + C100 , then by 9.3.3 and 10.2.1 there exists f ∈ X 0 such that
inequalities show that f ∈ C10 ∩ 0 V 0 = C10 ∩ V ⊆ U . But then |ϕ(nf )| < 1, contradicting (β).
Therefore, (α) holds.
b hence from (α) there exists x
By 10.2.3, V 0 = A00 ⊆ X, b n ∈ V 0 and φn ∈ C100 such that
nϕ = xn + φn (n ∈ N). It follows that
b
b by (γ).
hence (xn /n) is a Cauchy sequence. Setting x := limn xn /n, we have ϕ = x
10.2.10 Corollary. Let X and Y be Banach spaces and T : X 0 → Y linear. If T restricted
to the closed unit ball C10 of X 0 is weak∗ -weak continuous, then T is weak∗ -weak continuous
on X 0 .
Proof. For each y0 ∈ Y 0 , the map x0 7→ hT x0 , y0 i is w∗ -continuous on C10 , hence there
exists x depending on y0 such that hT x0 , y0 i = hx, x0 i for all x0 ∈ X 0 . Thus hT x0 , y0 i is
weak∗ continuous in x0 .
∗
The Closed Range Theorem
Let X and Y be normed spaces and T ∈ B(X, Y). Recall that
⊥
ran T = ker T 0 and ⊥
[ran T 0 ] = ker T.
(a) ran T is norm closed. (b) ran T 0 is w∗ closed. (c) ran T 0 is norm closed.
Proof. (a) ⇒ (b): By the preceding, it suffices to show that [ker T ]⊥ ⊆ ran T 0 . To this end
let x0 ∈ [ker T ]⊥ and define g on ran T by g(T x) = hx, x0 i. Then g is well-defined and
linear. We claim that g is continuous. For the verification, we use Ex. 8.76, which asserts
that for some c > 0 and each y ∈ ran T , the inequality kxk ≤ c kyk holds for some x with
T x = y. Let yn ∈ X such that yn → 0, and choose kxn k ≤ c kyn k such that T xn = yn .
Then xn → 0, hence g(yn ) = hxn , x0 i → 0, establishing the claim. By the Hahn-Banach
theorem there exists y0 ∈ Y’ that extends g, that is, hT x, y0 i = hx, x0 i for all x. It follows
that T 0 y0 = x0 , hence x0 ∈ ran T 0 .
(c) ⇒ (a): Let S : X → Z := cl ran T be the mapping T but with the indicated new
codomain. Let I : Z ,→ Y denote the inclusion map, so that T = IS and the dual map
I 0 : Y 0 → Z 0 is the restriction mapping. By the Hahn-Banach theorem, I 0 is surjective.
It follows that ran S 0 = S 0 (Z 0 ) = S 0 (I 0 (Y 0 )) = T 0 (Y 0 ) = ran T 0 , hence ran S 0 is closed.
Moreover, if z0 ∈ Z 0 and S 0 z0 = 0 then z0 = 0 on ran S = ran T hence z0 = 0 on Z.
Therefore, S 0 is 1-1 and so S 0 : Z 0 → ran S 0 is invertible. Thus there exists ε > 0 such that
kS 0 z0 k ≥ ε kz0 k for all z0 ∈ Z 0 . We claim that in the space Z, Bε ⊆ cl S(B1 ) (= cl S(C1 ));
it will follow from 8.7.4 that S is surjective, hence ran T = S(X) = Z = cl ran T , completing
266 Principles of Analysis
the proof. To verify the claim, let z ∈ Z \ cl S(C1 ). By 9.3.3, there exists z0 ∈ Z 0 with
norm one such that
sup{|hSx, z0 i| : kxk ≤ 1} < |hz, z0 i|.
The right side is ≤ kzk and the left side equals kxk ≤ 1} = kS 0 z0 k ≥ ε kz0 k = ε. Therefore,
kzk > ε, as required.
10.2.12 Corollary. T is surjective iff ran T 0 is closed and T 0 is injective. In this case, T 0
has a continuous inverse (T 0 )−1 : ran T 0 → Y 0 .
⊥
Proof. (Necessity). By the theorem, ran T 0 is closed. Moreover, ker T 0 = ran T = {0},
hence T 0 is injective. By the open mapping theorem, (T 0 )−1 : ran T 0 → Y 0 is continuous.
(Sufficiency). By the theorem, ran T is closed. Thus ran T = ⊥ [ker T 0 ] = ⊥ {0} = X.
10.2.13 Corollary. T 0 is surjective iff ran T is closed and T is injective. In this case, T
has a continuous inverse T −1 : ran T → Y.
Proof. (Necessity). By the theorem, ran T is closed. Moreover, ker T = ⊥ [ran T 0 ] = {0}.
Therefore, T is injective and so has a continuous inverse T −1 : ran T → Y.
(Sufficiency). By the theorem, ran T 0 is w∗ -closed. Thus ran T 0 = [ker T ]⊥ = {0}⊥ = X 0 .
Exercises
10.14 [↑ 9.6] Show that the set E := {nen : n ∈ N} is bounded in the weak∗ topology of `1 (N) = c000
but is not norm bounded.
10.15 Find an example of a Banach space X for which the unit sphere in X 0 is not weak∗ compact.
10.19 [↑ 10.9] Show that the weak∗ topology of X 0 is metrizable iff X is finite dimensional.
10.20 Let X and Y be Banach spaces with Y separable and let T ∈ B(X, Y). Prove that T 0 is
compact iff T 0 carries weak∗ convergent sequences (y0n ) in Y 0 onto norm convergent sequences
(T y0n ) in X 0 .
10.21 Let X be a normed space and E ⊆ X 0 . Show that E is weak∗ dense in X 0 iff for every x 6= 0
there exists x0 ∈ E such that hx, x0 i =
6 0.
10.23 Show that the dual space of a Banach space w∗ -sequentially complete, that is, if hxn , x0 i
0 0 ∗
is Cauchy in K for all x ∈ X , then (xn ) in X weak converges to a member of X. Give an
example to show that the assertion is generally false if X is not complete.
10.24 Let X and Y be Banach spaces. Prove that a linear map T : Y 0 → X 0 is w∗ -w∗ continuous iff
T = S 0 for some S ∈ B(X, Y). Thus w∗ -w∗ continuity implies s-s continuity.
10.25 [↑ §7.4] Prove the analog of 10.1.2 for weak∗ sequential convergence. Conclude the following:
Let X be compact and Hausdorff and let (µn ) be a sequence in M (X). Then (µn ) converges in
the weak∗ topology iff supn kµn k < ∞ and limn µn (E) exists for every E ∈ F.
10.26 Let X be a locally compact Hausdorff space and µ a Radon measure on X. Prove that C0 (X)
is weak∗ dense in L∞ (µ).
Weak Topologies 267
10.28 Show that F := {f dλ : f ∈ L1 [a, b]} is a norm-closed, non-weak∗ closed subspace of M [a, b].
10.29 Let X be a locally compact Hausdorff space, {fi : i ∈ I} ⊆ C0 (X), and {ci : i ∈ RI} ⊆ C. Suppose
for each finite set F ⊆ I there exits µF ∈ Mra (X) with kµF k ≤ 1 such R that fi dµF = ci for
all i ∈ F . Prove that there exists µ ∈ Mra (X) with kµk ≤ 1 such that fi dµ = ci for all i ∈ I.
Formulate more generally.
10.30 Let S be a nonempty set and F a conjugate closed, norm closed subspace of B(S). Show that
the convex balanced hull of {δs : s ∈ S} is weak∗ -dense in the closed unit ball C10 of F 0 .
10.31 Let X be a compact Hausdorff space and P(X) the space of probability measures on X.
Identifying P(X) with a subset of C(X)0 , show that P(X) is the w∗ -closed convex hull of the
set δX of all Dirac measures on X.
that a reflexive space is a dual space and hence is complete. Moreover, by Alaoglu’s theorem,
the ball C1 in a reflexive space is X is weakly compact.1 Note that the property of reflexivity
is invariant under a change to an equivalent norm. This is a consequence of the fact that
dual spaces are defined topologically and hence remain the same under such a change.
Every finite dimensional space X is reflexive since X b and X 00 have the same dimension.
The spaces L (1 < p < ∞) are reflexive, as can be seen by identifying (Lp )0 with Lq and
p
(Lq )0 with Lp , where q is conjugate to p. The space L1 is not reflexive unless it is finite
dimensional. This may be seen as a simple consequence of a later result on extreme points
that implies in the infinite dimensional case that L1 is not a dual space (see 14.4.7(b)). The
spaces c0 and c are not reflexive, as their bidual is `∞ (§8.3). The spaces C(X), X compact,
and L∞ are not reflexive unless they are finite dimensional (Ex. 10.42, 10.43).
The next theorem shows that the property of reflexivity is either common to both X and
X 0 or to neither. The proof is a simple consequence of the following general result.
c0 ⊕ X
10.3.1 Lemma. In any normed space X, X 000 = X b⊥.
hT f, ϕi = hf, T 0 ϕi = hf, xi
b = hx, f i = hx, T f i = hT f, xi
b .
b
Since T is surjective (8.5.4), ϕ = x.
that
b fbi| : kxk ≤ 1} < |hϕ, fbi|.
sup{|hx,
But the left side is kf k while the right side is ≤ kf k. Therefore, K = C100 .
10.3.5 Theorem. Let X be a normed space. Then X is reflexive iff C1 is weakly compact.
Proof. We have already noted the necessity. For the sufficiency, if C1 is weakly compact,
c1 is weak∗ -compact in X 00 . By the lemma, C
then, by definition of the w∗ -topology, C c1 = C 00 ,
1
hence Xb = X 00 .
10.3.6 Theorem. The closed unit ball C1 in a reflexive Banach space X is weakly sequen-
tially compact.
Proof. Let xn ∈ X with kxn k ≤ 1 for all n. The closed linear span Y of (xn ) is separable
b = Y 00 is separable and therefore so is Y 0 (8.5.8). By
and by 10.3.3 is also reflexive. Thus Y
10.2.5, the closed unit ball in Y is weak∗ sequentially compact. By reflexivity, this is simply
00
the assertion that the closed unit ball in Y is weakly sequentially compact. Thus (xn ) has
a weakly convergent subsequence.
Exercises
10.32 Let X and Y be reflexive spaces. Show that T ∈ B(X, Y) is compact iff T (C1 ) is compact.
10.33 Let X be a normed linear space. Show that the weak and weak∗ topologies on X 0 are equal iff
X is reflexive.
10.34 Let X be a Banach space and T : X → X 00 : x → x b the canonical embedding. Show that X is
reflexive iff the adjoint T 0 : X 000 → X 0 is 1-1.
Weak Topologies 269
10.35 Let Y be a subspace of a normed space X. Prove that Y b ⊆ Y ⊥⊥ and that equality holds iff
Y is reflexive. JIf Y is reflexive and F ∈ Y , define G on Y 0 by G(g) = F (e
⊥⊥
g ), where ge is an
extension of g to X with ke g k = kgk.K
10.36 Prove that X is reflexive iff every norm closed subspace of X 0 is weak∗ closed.
10.37 Show that the weak∗ analog of 10.1.6 holds in X 0 iff X is reflexive.
10.38 Let X be reflexive and A ⊆ X 0 . Show that (⊥ A)⊥ is the norm closed linear span of A.
10.40 Use 10.3.6 to show that `1 is not reflexive. Conclude that c0 and c are not reflexive.
10.41 Let X be reflexive and x0 ∈ X 0 . Prove that kx0 k = hx, x0 i for some x with kxk = 1. (R.C.
James showed that every space with this property is reflexive.) Give an example of a Banach
space for which the assertion is false.
10.43 Show that Mra [0, 1], and therefore C[0, 1], is not reflexive by using the following argument:
1 1
P first the version of ` that consists of real sequences and define T : ` → Mra [0, 1] by
Consider
Tx = xn δ1/n . Show that T is an isometry. Then consider the complex case.
Geometrically, this says that the midpoints of line segments in the closed unit ball with
lengths bounded away from zero are uniformly distant from the surface. For ease of reference
we let P (x, y) denote antecedent of the implication 10.4 and Q(x, y) the consequent.
A normed space that satisfies the parallelogram law 2
2 2 2 2
kx + yk + kx − yk = 2 kxk + 2 kyk
22
is uniformly convex. √ Indeed, if P (x, y) holds, then kx + yk ≤ 4 − ε , hence Q(x, y)
holds for δ := 1 − 12 4 − ε2 . In particular, L2 (X, F, µ) is uniformly convex. More generally,
Clarkson has shown that Lp (X, F, µ) is uniformly convex for 1 < p < ∞ [8]. (See Ex. 10.44
for the case p ≥ 2.)
Here is a useful sequential characterization of uniform convexity:
10.4.1 Proposition. A normed linear space X is uniformly convex iff for any sequences
(xn ) and (yn ) in C1 with k 21 (xn + yn )k → 1 it follows that kxn − yn k → 0.
2 Spaces that satisfy the parallelogram law are called inner product spaces. These are discussed in detail
Proof. Let X be uniformly convex with sequences in C1 such that kxn − yn k 6→ 0. Then
there exist ε > 0 such that kxn − yn k ≥ ε for infinitely many n. By uniform convexity,
there exists δ > 0 such that k 12 (xn + yn )k ≤ 1 − δ for infinitely many n. Therefore,
k 12 (xn + yn )k 6→ 1.
ε ∈ (0, 2) and
Now suppose that X is not uniformly convex. Then there exists an
sequences
(xn ) and (yn )
with kxn k ≤
1, kyn k ≤ 1, kxn − yn k ≥ ε, and
12 (xn + yn )
> 1 − 1/n.
It follows that
12 (xn + yn )
→ 1. Since kxn − yn k 6→ 0, the sequential criterion fails.
kx00 − xk
b ≥ 2ε for all x ∈ C1 . (†)
Define n o
δ = δ(ε) := inf 1 − k 21 (x + y)k : x, y ∈ C1 , kx − yk ≥ ε .
It follows easily from 10.4.1 that δ(ε) > 0. Since kx00 k = 1, we may choose f ∈ C10 such that
|hf, x00 i − 1| < δ/2. Thus x00 is in the weak∗ open set
V := y00 ∈ X 00 : |hf, y00 i − 1| < δ/2 .
b1 (approximate hf, x00 i by hf, xi.)
By 10.3.4, x00 is in the weak∗ closure of V ∩ C b Now, for
b b
any x and y in V ∩ C1 ,b
b − 1) + (hf, yi
|hx + y, f i| = |2 + (hf, xi b − 1)| ≥ 2 − δ,
hence kx + yk ≥ 2 − δ and so 1 − k 21 (x + y)k ≤ 12 δ < δ. From the definition of δ,
kx − yk < ε. Thus V ∩ C c1 ⊆ xb + εC1 ‘00 . Since x
b + εC1 ‘00 is weak∗ closed and since x00 is in
∗ c1 , we conclude that x00 ∈ x
the weak closure of V ∩ C b + εC1 ‘00 . But this contradicts (†).
Exercises
10.44 Verify steps (a)–(d) below and then use (d) to show that Lp is uniformly convex for p ≥ 2.
(a) For c > 0, the function f (p) = (1 + cp )1/p is strictly decreasing in p on [2, ∞).
(b) (ap + bp )1/p ≤ (a2 + b2 )1/2 (a ≥ 0, b ≥ 0, p ≥ 2).
(c) c2 + d2 ≤ 21−2/p (cp + dp )2/p (c ≥ 0, d ≥ 0). JFor p > 2, use Hölder’s inequality.K
(d) |s + t|p + |s − t|p ≤ 2p−1 (|s|p + |t|p ) (s, t ∈ R, p ≥ 2).
10.45 Let X be a uniformly convex Banach space and f 6= 0 ∈ X 0 . Show that there exists a unique
x ∈ S1 such that kf k = f (x). JIt suffices to consider the case kf k = 1. Let xn ∈ S1 and
f (xn ) → 1. Use 10.4.4 to show that (xn ) is Cauchy. (One may also use 10.41.) For uniqueness,
suppose also that f (y) = 1 for some y ∈ S1 with kx − yk > 0 and consider 12 (x + y).K
Chapter 11
Hilbert Spaces
A Hilbert space is a Banach space whose norm is derived from an inner product. This
feature endows Hilbert spaces with rich geometric structure that accounts for the broad
applicability of the subject to areas such as harmonic analysis, differential equations, and
quantum mechanics. In this chapter we examine the structure of Hilbert spaces. The next
chapter treats operators on these spaces.
273
274 Principles of Analysis
P3 P3
Since k=0 ik = 0 and k=0 (−1)
k
= 0, the desired formula follows.
11.1.2 Corollary. If K = C, then a sesquilinear form on X is Hermitian iff B(x, x) is
real for all x.
Proof. The necessity is clear. For the sufficiency, factor out ik to write the general term of
k
the sum in (b) as ik B(i k x + y, i x + y). Taking conjugates in (b) and using the hypothesis
we have
3
X 3
X
k k k
4B(x, y) = i B(i x + y, i x + y) = ik B(ik x + y, ik x + y) = 4B(y, x).
k=0 k=0
Semi-Inner-Product Spaces
For the remainder of the chapter, we use the notation (· | ·) for positive Hermitian
sesquilinear forms. A vector space X over K equipped with such a form is called a semi-
inner-product space. Define an associated function k·k : X → [0, ∞) by
p
kxk = (x | x), x ∈ X. (11.1)
The polarization identities may then be written as
3
X
2 2
2
4 (x | y) = kx + yk − kx − yk (K = R) and 4 (x | y) = ik
x + ik y
(K = C).
k=0
Then
2 2 2 2
4(x | y) = kx + yk − kx − yk + i kx + i yk − kx − i yk
2 2 2 2
= ky + xk − ky − xk − i ky + i xk − ky − i xk
= 4(y | x).
4 (x + y | z) + 4 (x − y | z)
h i h i
2 2 2 2
= k(x + z) + yk + k(x + z) − yk − k(x − z) + yk + k(x − z) − yk
h i h i
2 2 2 2
+ i k(x + i z) + yk + k(x + i z) − yk − i k(x − i z) + yk + k(x − i z) − yk .
Applying the parallelogram identity to each bracketed expression reduces the right side to
h i h i
2 2 2 2
2 kx + zk − kx − zk + 2i kx + izk − kx − i zk = 8 (x | z) .
276 Principles of Analysis
We now have
(x + y | z) + (x − y | z) = 2 (x | z) . (‡)
Taking x = y and noting that (0 | z) = 0, we have (2y | z) = 2 (y | z) hence 2 12 y | z =
(y | z) for all y. Setting x + y = u and x − y = v in (‡) yields
(u | z) + (v | z) = 2 12 (u + v) | z = (u + v | z) .
The associated norm is the Euclidean norm on Kd . Note that the parallelogram law fails for
the norms k·k1 and k·k∞ on Kd , hence these are not inner product norms.
(b) Let A := [aij ]d×d be a matrix with entries in Kd that satisfies
Then
d X
X d
(x | y) := aij xj yj , x := (x1 , . . . , xd ), y := (y1 , . . . , yd ),
i=1 j=1
defines an inner product on Kd . One obtains the Euclidean inner product of (a) by taking A
to be the identity matrix.
(c) The trace tr(A) of a square matrix A is the sum of the diagonal elements of A.
Clearly, tr(·) is linear and tr(A∗ ) = tr(A), where A∗ is the conjugate transpose of A. Let
Mmn = Mmn (C) denote the vector space over C of m×n complex matrices. For A, B ∈ Mmn ,
(A | B) = tr(B ∗ A) defines an inner product on Mmn called the trace inner product.
Hilbert Spaces 277
(d) The space L2 (X, F, µ) with the L2 norm is a Hilbert space under the inner product
Z
(f | g) = f g dµ, f, g ∈ L2 .
(As usual, we identify functions equal a.e.) In particular, `2 is a Hilbert space. On the other
hand, for p 6= 2 the Lp norm is not induced by an inner product, since the parallelogram law
fails (Ex. 11.3).
(e) Let U be open in C and let A2 (U ) be the space of functions in L2 (U ) that are analytic
on U . Then A2 (U ) is closed in the L2 -norm and hence is a Hilbert space. To see this, we
first establish the formula
Z
1
f (z) = 2 f (w) dλ2 (w) z ∈ U, Cr (z) ⊆ U, f ∈ A2 (U ). (†)
πr Cr (z)
Thus Z Z r
f (w) dλ2 (w) = 2πf (z) t dt = πr2 f (z),
Cr (z) 0
which is (†).
1 c
Now let K be a compact subset of U and let r := 2 dist(K, U ). If z ∈ K, then
Cr (z) ⊆ U , hence from (†) and the CBS inequality,
Z Z !1/2
1 2 1 2 2 1
|f (z)| ≤ 2 |f (w)| dλ (w) ≤ √ |f (w)| dλ (w) ≤ √ kf k2 .
πr Cr (z) πr Cr (z) πr
By considering a finite cover of K by disks Cr (z), we see from the above inequality that if
fn ∈ A2 (U ) and kfm − fn k2 → 0, then (fn ) is uniformly Cauchy on compact subsets of U
and therefore converges uniformly (and in L2 ) to a continuous function f . Thus f is analytic
and so A2 (U ) is closed in L2 (U ). ♦
Exercises
11.1 Verify the parallelogram law.
11.2 Show that the uniform norm on C[0, 1] is not an inner product norm.
11.3 Show that for 1 ≤ p < ∞, the Lp norm on C[0, 1] is not an inner product norm unless p = 2.
11.5 Let X, (· | ·) be a semi-inner-product space with associated seminorm k·k. By Ex. 8.56,
Y := {x : kxk = 0} is a subspace of X. Let Q : X → X/Y denote the quotient map. Show
that hQx | Qyi := (x | y) is a well-defined inner product on X/Y.
11.6 Let H denote the linear space of absolutely continuous functions f on [0, 1] such that f (0) = 0
and f 0 ∈ L2 [0, 1]. Show that (f | g) = f 0 (t)g 0 (t) dt defines an inner product on H relative to
R
11.7 Prove directly (without using uniform convexity) that an inner product space is strictly convex.
11.2 Orthogonality
The central feature of a Hilbert space that accounts for its rich structure is the concept of
orthogonality. This leads to the notions of orthogonal complement and orthonormal bases,
considered in this section and the next.
Orthogonal Complements
Vectors x and y in H are said to be orthogonal, written x ⊥ y, if (x | y) = 0. The
following result on orthogonality generalizes the classical Pythagorean theorem.
11.2.1 Proposition. Let x, y ∈ H.
2 2 2
(a) If K = R, then x ⊥ y iff kx + yk = kxk + kyk .
2 2 2
(b) If K = C, then x ⊥ y iff kαx + βyk = kxk + kyk for all α, β ∈ T.
S ⊥ := {x ∈ H : (x | y) = 0 ∀ y ∈ S}.
and so
2 2 2 2 2
kx − yk = kxk + kyk − 2Re (x | y) ≤ 2 kxk + 2 kyk − 4d2 .
2 2
Now let xn ∈ K and kxn k → d. Then 2 kxn k + 2 kxm k → 2d2 , hence from (†),
2
kxn − xm k → 0. The limit x := limn xn is then a member of K with smallest norm.
2
If y ∈ K also has smallest norm, then by (†), kx − yk ≤ 2d2 + 2d2 − 4d2 = 0, hence
x = y.
11.2.4 Theorem. If M is a closed subspace of H, then H = M ⊕ M ⊥ . Moreover, if
x = m + m⊥ , then m is the unique member of M closest to x.
Proof. For a fixed x ∈ H, there exists, by the lemma, a unique member y of x + M such
that
kyk ≤ kx + mk for all m ∈ M.
We show that y ∈ M ⊥ . Let m ∈ M. Since y + tm = x + (y − x + tm) ∈ x + M, the
function
2 2 2
f (t) := ky + tmk = kyk + 2t Re (y | m) + t2 kmk
2
has minimum value kyk = f (0). It follows that f 0 (0) = 0 and so Re (y | m) = 0. Replacing
m by im yields Im (y | m) = 0. Therefore, (y | m) = 0, hence y ∈ M ⊥ .
We may now write x = (x − y) + y, which shows that H = M + M ⊥ . The sum is direct
since z ∈ M ∩ M ⊥ ⇒ (z | z) = 0. Since
0 = (x | z) = (y | z) + (z | z) = (z | z) ,
fy (x) := (x | y)
Then fy is a linear functional with
fy
= kyk (11.1.5). Furthermore, fay+bz = afy + bfz .
Thus the map y → fy is a conjugate linear isometry from H into H 0 . The next theorem
asserts that the mapping is surjective.
11.2.6 Riesz Representation Theorem. Every f ∈ H 0 is of the form fy for some
y ∈ H.
Proof. We may assume that f is not the zero functional. Then H = ker f ⊕ (ker f )⊥ where
2
(ker f )⊥ has dimension one. Choose z ∈ (ker f )⊥ with f (z) = 1 and set a = 1/ kzk . For
x ∈ H we may write x = u + cz, where u ∈ ker f , hence
Therefore, f = faz .
Recall that a net (xα ) in a normed space X converges weakly to x ∈ X if hxα , f i → hx, f i
for all f ∈ X 0 . Thus from the Riesz representation theorem we have
11.2.7 Corollary. A net (xα ) in H converges weakly to x iff (xα | y) → (x | y) ∀ y ∈ H.
11.2.8 Corollary. A Hilbert space is reflexive.
Proof. The dual space H 0 is a Hilbert space under the inner product
fx | fy = (y | x) .
(The transposition of the elements on the right side is necessary to compensate for the
conjugate linearity of the mapping x → fx .) Let ϕ ∈ H 00 . By the Riesz representation
theorem applied to H 0 , there exists fy such that for all x,
b x ).
ϕ(fx ) = fx | fy = (y | x) = fx (y) = y(f
c
b and so H 00 = H.
Therefore, ϕ = y
Exercises
11.9 Let S be a subset of a Hilbert space H. Use 11.2.4 to prove that S ⊥⊥ = cl span(S).
11.13 A function f ∈ L2 [−1, 1] is odd (even) if f (−t) = −f (t) (f (−t) = f (t)) for all a.a. t ∈ [−1, 1].
Let O (E) denote the linear space of odd (even) functions. Show that each space is the orthogonal
complement of the other and that L2 [−1, 1] = O ⊕ E.
11.14 For each linear functional F on `2 (N), find a function g such that F (f ) = (f | g) for all f .
(a) F (f ) = m (b) F (f ) = f (2) − f (1). (c) F (f ) = ∞ −n
P P
j=1 f (j). n=1 2 [f (n) − f (n + 1)].
11.15 Find an example of an inner product space X and a continuous linear functional f ∈ X 0 such
that no vector y ∈ X exists for which f (x) = (x | y) for all x ∈ X.
11.16 Let H be the Hilbert space defined in Ex. 11.6. Let F be the evaluation functional F (f ) = f (1/2).
Find a function g such that F (f ) = (f | g) for all f ∈ H.
11.17 Let T ∈ B(H) be weak-norm continuous. Show that ran T is finite dimensional. JThere exist
xj ∈ H and ε > 0 such that | (x | xj ) | < ε (1 ≤ j ≤ n) implies kT xk < 1. K
11.18 [↑ 8.34] Let Tr and T` denote the right and left shift operators on `2 and let x ∈ H. Compute
the weak limits limn T`n x and limn Trn x
where at most countably many of the terms in the sum are nonzero.
282 Principles of Analysis
P
Proof. Let F ⊆ E be finite and set y = e∈F (x | e) e. By orthonormality and sesquilin-
earity, X
(y | y) = (x | e) (x | e) = (x | y) ,
e∈F
hence (x − y | y) = 0. Thus
2 2 2 2 2
X
kxk = kx − y + yk = kx − yk + kyk ≥ kyk = | (x | e) |2 .
e∈F
In (b) – (d), at most countably many of the Fourier coefficients (x | e) are nonzero.
Proof. (a) ⇒ (b): Denote the nonzero Fourier coefficients of x by (x | en ). We show that
∞
X
x= (x | en ) en .
n=1
which,
Pn by Bessel’s inequality, tends to 0 as n → ∞. Therefore, the sequence of partial sums
k=1 (x | ek ) ek is Cauchy and so converges to some y. It remains to show that y = x.
Now, for any e ∈ E, by continuity of the inner product we have
n
X
(y | e) = lim (x | ek ) (ek | e) .
n
k=1
If e = em for some m, then the right side is (x | em ). If e 6= em for all m, then both
(x | e) and (y | e) are zero. Thus (x − y | e) = 0 for all e ∈ E. Since E is a basis, x = y.
(b) ⇒ (c): Using a common sequence (en ) for x and y we have
∞
X ∞
X
x= (x | en ) en and y = (y | en ) en .
n=1 n=1
Letting n → ∞ and using the continuity of the inner product yields (c).
2
(d) ⇒ (a): Then kxk = 0 for every x ∈ E ⊥ , hence E is a basis.
Hilbert Spaces 283
By Bessel’s inequality, T x is well-defined, and at most countably many terms are nonzero.
By sesquilinearity and continuity of the inner product,
X X X
(T x | T y) = (x | e) Ψ(e) e ) Ψ(e
(y | e e) = (x | e) (e | y) = (x | y) .
e∈E e ∈E
e e∈E
is not zero. Define en+1 = yn+1 / kyn+1 k. Then en+1 ∈ span An+1 , (en+1 | ek ) = 0 for
k ≤ n, and span An+1 = span{e1 , . . . , en+1 }.
284 Principles of Analysis
For this and other interesting examples of orthonormal bases on L2 [a, b], the reader is referred
to [28].
Most infinite dimensional Hilbert spaces one encounters in applications are separable.
Analysis of such spaces is somewhat easier because of the following result:
11.3.8 Proposition. If a Hilbert space H is separable, then it has a countable basis.
Proof. We may assume that H is not finite dimensional. Let (xn ) be a dense sequence of
nonzero vectors in H. If x2 is a multiple of x1 , we may remove it without changing the span
of (xn ). Likewise, if xn is a linear combination of its predecessors, then it may be removed
without affecting the span. By induction, we obtain a linear independent subsequence (yn )
of (xn ) with span (yn ) = span (xn ). The Gram-Schmidt process may be applied to (yn )
to obtain an orthonormal sequence (en ) such that span (en ) = span (yn ) = span (xn ). If
x ⊥ en for all n, then x ⊥ xn for all n, and since (xn ) is dense in H, x = 0. Therefore,
(en ) is a basis.
For example, the vectors en = (0, . . . , 0, 1, 0, . . .) (n ∈ N) form an orthonormal basis in
`2 . It follows from 11.3.8 that every separable Hilbert space is isomorphic to `2 . This fact,
however, does not necessarily lead to simplifications in the study of separable Hilbert spaces,
as the isomorphism may obscure certain essential properties of concrete Hilbert spaces such
as L2 [0, 1]. Nonetheless, it is of some interest to know that, structurally, all separable Hilbert
spaces are “like” `2 .
Fourier Series
We show that the functions
form an orthonormal basis for L2 [0, 1] with respect to Lebesgue measure. The calculation
Z 1 Z 1
2πint 2πimt
e e dt = e2πi(n−m)t dt
0 0
shows that (en )n is an orthonormal set. Let A denote the algebra of continuous functions
f : [0, 1] → C with f (0) = f (1). Since C[0, 1] is dense in L2 [0, 1], a simple linearization
argument shows that the same is true for A. For each f ∈ A define Ff : T → C by
Ff e2πit = f (t), t ∈ [0, 1].
Let T denote the collection of all such functions τ . By the Stone-Weierstrass theorem, Fτ
is uniformly dense in C(T). It follows that T is dense in A in the uniform norm and is
therefore dense in L2 [0, 1]. Thus (en )n∈Z is a basis, as claimed.
Hilbert Spaces 285
From 11.3.4 we see that every f ∈ L2 [0, 1] has a Fourier series expansion
∞
X Z 1
f= fb(n)en , fb(n) := (f | en ) = f (t)e−2πint dt, (11.6)
n=−∞ 0
where convergence is in L2 [0, 1]. The function fb is called the Fourier transform of f .
The convergence of the series in (11.6) implies that limn fb(n) = 0, which is the classical
Riemann-Lebesgue lemma.
The following is an interesting application to the Fourier transform of a rapidly decreasing
function on R (see §6.3).
11.3.9 Theorem (Poisson Summation Formula). Let ϕ be a rapidly decreasing function on
R with Fourier transform ϕ.
b Then
∞
X ∞
X
ϕ(n) = b
ϕ(n).
n=−∞ n=−∞
Proof. Define
∞
X
f (t) := ϕ(t + n), t ∈ R.
n=−∞
The rapidly decreasing property of ϕ implies that the series, as well as all derived P∞ series,
converge absolutely and locally uniformly. Thus f is a C ∞ function. Moreover, n=−∞ ϕ(n) b
converges because ϕ b is also rapidly decreasing. Since f (t + 1) = f (t) for all t, we may consider
f ∈ L2 [0, 1]. Multiplying (†) by e−2πimt and integrating term by term, we have
∞ Z
X 1 ∞ Z
X n+1
fb(m) = ϕ(t + n)e−2πimt dt = ϕ(t)e−2πim(t−n) dt
n=−∞ 0 n=−∞ n
X∞ Z n+1 Z ∞
= ϕ(t)e−2πimt dt = ϕ(t)e−2πimt dt
n=−∞ n −∞
b
= ϕ(m),
Since both series are continuous in t, the equation holds for all t. Setting t = 0 yields the
desired equality.
Exercises
11.19 Let (en )n be an orthonormal basis and fix y ∈ H. Show that P
the minimum value of the function
x → kx − yk for x ∈ span {e1 , . . . , em } occurs when x = m j=1 (y | ej ) ej .
11.20 Show that the sequence 1, z, z 2 , . . . is orthogonal in L2 (D, λ2 ). Is the normalized sequence
z n kz n k−1
2 a basis?
11.21 A Hamel basis for a vector space is a linearly independent set that spans the space. Let H be
an infinite dimensional Hilbert space. Show that an orthonormal basis cannot be a Hamel basis.
Show that a Hamel basis in H is uncountable.
286 Principles of Analysis
w
11.22 Show that in a Hilbert space, xn → 0 iff supn kxn k < ∞ and (xn | e) → 0 for every e in an
orthonormal basis.
11.23 (Wirtinger’s inequality). Let f ∈ C 1 [0, a] with f (0) = f (a) = 0. Show that π kf k2 ≤ a kf 0 k2 .
JExtend f to [−a, a] as an odd function. Use Parseval’s identity on f ∈ L2 [−1, 1] with the basis
√1 eibnt dt (b := 2π/a) and integrate fb(n) by parts.K
2a
11.24 Show that the Fourier transform is a linear isometry from L2 [0, 1] onto `2 (Z).
11.25 Let (X, F, µ) be σ-finite and φ ∈ L∞ (µ). Show that the range of the multiplication mapping
Mφ f := f φ on L2 (µ) is closed iff φ = 1E for some E ∈ F.
11.26 Let (X, F, µ) be σ-finite and φ ∈ L∞ (µ). Show that φ−1 ∈ L∞ iff supn kfn k2 < ∞ for any
sequence (fn ) in L2 for which (φfn ) converges in L2 . JFor the sufficiency, suppose φ−1 6∈ L∞ .
Choose
Pn An ∈ F such that An ⊆ {|φ| < 1/n2 } and 0 < µ(An ) < ∞ (how?) and set fn =
−1/2
k=1 µ(A k) 1Ak .K
The Hilbert space adjoint of an operator T ∈ B(H) is closely related to the Banach space
dual operator T 0 , the essential difference being that the former acts on H while the latter
acts on H 0 . The existence of an adjoint operation in B(H) accounts to a large extent for
the rich structure of B(H) and its various subalgebras, this structure absent in the Banach
space case. For the construction of the adjoint we need the following notion.
BT (x, y) := (x | T y)H , x ∈ H, y ∈ K,
the last equality from 11.1.5. One easily checks that (11.7) defines a norm on the linear
space S(H × K) of all bounded sesquilinear functionals on H × K and that S(H × K) is
complete in this norm (Ex. 11.27). Moreover, the mapping T → BT is a conjugate linear
isometric isomorphism from B(K, H) into S(H × K). The following theorem shows that
the mapping is surjective.
11.4.1 Theorem. If B is a bounded sesquilinear functional on H × K, then B = BT for
some T ∈ B(K, H).
Hilbert Spaces 287
Proof. Fix y ∈ K. Since B(·, y) ∈ H 0 , by the Riesz representation theorem there exists a
unique vector T y ∈ H such that
(x | T y)H = B(x, y) for all x ∈ H.
For each x, the right side is conjugate linear in y, so T is linear. Moreover, since kBk < ∞,
T is bounded.
The following is the Hilbert adjoint analog of 8.9.2. The proof is an exercise for the reader.
⊥
11.4.5 Proposition. Let T ∈ B(H). Then ker T ∗ = ran T and ker T = [ran T ∗ ]⊥ .
B(H) as a C ∗ -algebra
The properties in the conclusion of 11.4.4 assert that B(H) is a C ∗ -algebra. A norm
closed subalgebra C of B(H) that is closed under the operation of involution is called a
C ∗ -subalgebra of B(H). For example, if T ∈ B(H) and T T ∗ = T ∗ T , then the closure in
B(H) of the set of all polynomials in T , T ∗ is a commutative C ∗ -algebra (see §13.1).
The following concept will occasionally be needed: The commutant of a subset S of
B(H) is the set
S0 := {T ∈ B(H) : T S = ST ∀ S ∈ S}.
The notation is in conflict with that for dual spaces, but this should not be a problem,
as context will indicate the intended meaning. The bicommutant of S is defined by
S00 := (S0 )0 , that is, the commutant of the commutant. The proof of following is an exercise
(11.35).
11.4.6 Proposition. The commutant of S ⊆ H is a C ∗ -subalgebra of B(H) containing
the identity operator. Moreover, S ⊆ S00 .
Exercises
11.27 Prove that (11.7) defines a complete norm on S(H).
3
1X k
T ∗S = i (S + ik T )∗ (S + ik T ).
4 i=0
11.30 Let T ∈ B(H). Suppose there exist a, b > 0 such that kT xk ≥ a kxk and kT ∗ xk ≥ b kxk for
all x. Show that T is invertible.
11.31 Let H be a Hilbert space, T ∈ B(H), and M a closed subspace M. Then M is said to be
invariant under T if T M ⊆ M. If both M and M ⊥ are T -invariant, then M is said to reduce
T . Let P be the orthogonal projection onto M. Prove:
11.33 [↑ 8.34] Find the adjoints of the left and right shift operators T` and Tr on `2 .
11.34 Show that Mφ∗ = Mφ for the multiplication operator Mφ on L2 (X, F, µ), where φ ∈ L∞ .
The special structure of Hilbert spaces allows the construction of classes of operators that
have no analogs in general Banach spaces. In this chapter we discuss the main properties of
these operators and consider as well various algebras of operators on Hilbert spaces.
(T ∗ T x | y) = (T x | T y) = (T ∗ x | T ∗ y) = (T T ∗ x | y) ,
Self-Adjoint Operators
An operator T ∈ B(H) is said to be self-adjoint if T ∗ = T . For example, a multiplication
operator Mφ is self-adjoint iff φ is real-valued. Clearly, every self-adjoint operator is normal.
On the other hand, the operator iI is normal but not self-adjoint.
It is clear that the sum of self-adjoint operators is self-adjoint. The product of self-adjoint
operators S, T need not be self-adjoint. Indeed, the equality (ST )∗ = T ∗ S ∗ = T S shows
that ST is self-adjoint iff ST = T S.
For any S ∈ B(H), the operators S ∗ S, SS ∗ , S + S ∗ and i(S − S ∗ ) are self-adjoint. These
examples suggests that self-adjoint operators may be viewed as the analogs of real numbers
in the complex number system, the adjoint operation being the analog of conjugation. The
289
290 Principles of Analysis
following proposition strengthens this analogy. The proof is left as an exercise for the reader
(12.2).
12.1.3 Proposition. For T ∈ B(H), define
Re T = 21 (T + T ∗ ) and Im T = 1
2i (T − T ∗ ).
0 = (T (cx + y) | cx + y) = |c|2 (T x | x) + (T y | y) + c (T x | y) + c (T y | x)
= c (T x | y) + c (T y | x) .
Proof. Let s denote the supremum. Obviously, s ≤ kT k. For the reverse inequality, let
x, y ∈ C1 . Since (T y | x) = (y | T x) = (T x | y), we have
(T (x + y) | x + y) − (T (x − y) | x − y) = 2 (T x | y) + 2 (T y | x) = 4Re (T x | y) .
Therefore, Re (T x | cy) ≤ s for all x and y with norm ≤ 1 and all with |c| = 1. Choosing
c so that Re (T x | cy) = | (T x | y) |, we have | (T x | y) | ≤ s. Taking the supremum over
all x, y ∈ C1 shows that kT k ≤ s.
We give an application of Rayleigh’s theorem in 12.1.9. The theorem actually holds more
generally for normal operators (13.2.10), but the proof is considerably deeper, depending on
notions of spectral theory.
Operator Theory 291
Positive Operators
An operator T is said to be positive, written T ≥ 0, if (T x | x) ≥ 0 for all x ∈ H. Thus
a positive operator is self-adjoint; the converse is trivially false.
If S ∈ B(H), then S ∗ S and SS ∗ are clearly positive. The next theorem shows that all
positive operators are of this form. The theorem reinforces the analogies between self-adjoint
operators and real numbers and between positive operators and nonnegative real numbers.
A direct proof of the theorem may be given now, but we prefer to wait until §13.6 when the
machinery for a simpler proof will be available.
12.1.7 Theorem. Let T ∈ B(H).
(a) If T is positive, then T has a unique positive square root, that is, a unique positive
operator T 1/2 that satisfies (T 1/2 )2 = T . Moreover, if T is invertible, then T −1 is
positive, T 1/2 is invertible, and (T 1/2 )−1 = (T −1 )1/2 .
(b) If T is self-adjoint, then there exists a unique pair of positive operators T + and T −
such that T = T + − T − and T + T − = T − T + = 0.
(c) The operators T 1/2 in (a) and T ± in (b) are members of the bicommutant {T }00 of T .
12.1.8 Corollary. The operator |T | := (T ∗ T )1/2 is the unique positive operator |T | with
the property kT xk = k |T |x k for all x ∈ H. Moreover, |T | = T + + T − .
Proof. For the norm equality we have
2 2
kT xk = (T x | T x) = (T ∗ T x | x) = |T |2 x | x = (|T |x | |T |x) = k |T |xk .
12.1.9 Theorem. Let R and Tn be self-adjoint operators such that Tn ≤ Tn+1 ≤ R for all
n. Then there exists self-adjoint operator T such that
(a) Tn ≤ T for all n.
By 11.4.1, there exists a self-adjoint operator T such that B(x, y) = (T x | y) for all
x, y ∈ H. Thus (Tn x | y) → (T x | y) and (Tn x | x) ↑ (T x | x) for all x, y. In particular,
T satisfies (a) and (b). Since Sn := T − Tn ≥ 0, by the CBS inequality applied to the positive
sesquilinear form (Sn x | y) we have, for any pair of unit vectors x, y,
| (Sn x | y) |2 ≤ (Sn x | x) (Sn y | y) ≤ (Sn x | x) kSn k ≤ (Sn x | x) (c + kT k).
2
Taking the sup over all such y yields kSn xk ≤ (Sn x | x) (c + kT k). Since (Sn x | x) → 0
we see that kSn xk → 0, proving (c).
Proof. If P is an orthogonal projection, then (a) – (f) obviously hold. If (c) holds, then
from 12.1.1, P x = 0 iff P ∗ x = 0, hence ker P = ker P ∗ = (ran P )⊥ . Taking orthogonal
complements yields (e). Therefore, (c) implies (d) and (e). Conversely, if (e) holds, then we
have the orthogonal decomposition H = ran P ⊕ ker P , hence P is an orthogonal projection.
It follows that (a) – (e) are equivalent and imply that P is an orthogonal projection.
Finally, we show that if kP k ≤ 1, then (e) holds. Let x ∈ (ker P )⊥ . Since x−P x ∈ ker P ,
2 2
kxk = (x − P x + P x | x) = (P x | x) ≤ kP xk kxk ≤ kxk ,
2 2
hence kxk = kP xk = (P x | x). Therefore,
2 2 2
kx − P xk = kxk + kP xk − 2Re (P x | x) = 0
and so x ∈ ran P . Thus (ker P )⊥ ⊆ ran P . For the reverse inclusion, let x ∈ ran P and write
x = y + z, where y ∈ ker P and z ∈ (ker P )⊥ ⊆ ran P . Then x = P x = P y + P z =
P z = z hence x in (ker P )⊥ .
Unitary Operators
An operator U ∈ B(H) is said to be unitary if
U ∗ U = U U ∗ = I. (12.1)
Proof. The necessity is clear. Conversely, if U is a surjective isometry, then (12.2) holds by
the polarization identity, hence U ∗ U = I. Therefore, U ∗ = U −1 , hence U U ∗ = I.
For example, the translation operator and the Fourier transform are unitary operators on
L2 (Rd ). The right shift on `2 (N) is an isometry that is not unitary.
Note that the operator αI is unitary iff |α| = 1. This suggests that the set of unitary
operators is the analog of the subset T of C. The next proposition reinforces this analogy.
12.1.15 Proposition. The set U of all unitary operators in B(H) is a group under
composition.
Proof. If U ∈ U, then (U −1 )∗ U −1 = U ∗∗ U ∗ = U U ∗ = I and similarly U −1 (U −1 )∗ = I,
hence U −1 ∈ U. If V ∈ U , then (U V )∗ (U V ) = V ∗ U ∗ U V = V ∗ IV = I hence U V ∈ U.
Here is an application of unitary operators due to von Neumann. We give a generalization
in Corollary 17.6.9.
294 Principles of Analysis
12.1.16 Mean Ergodic Theorem. Let U ∈ B(H) be unitary and let P : H → M be the
orthogonal projection from H to M := {m ∈ H : U m = m}. Then for every x ∈ H,
n−1
1X k
lim Sn x = P x, where Sn := U . (12.3)
n n
k=0
Proof. (F. Riesz). The set K of all x for which (12.3) holds is clearly a linear space containing
M. We claim that U x − x ∈ K for all x ∈ H. Indeed, the calculation
(U x − x | m) = x | U −1 m − (x | m) = (x | m) − (x | m) = 0
n
shows that U x − x ⊥ M, hence P (U x − x) = 0, and because kU n k ≤ kU k ≤ 1 we also
have
lim Sn (U x − x) = lim n1 (U n x − x) = 0,
n n
For example, the left shift T` (x1 , x2 , . . .) = (x2 , x3 , . . .) on `2 (N) is a partial isometry
with final space `2 and initial space consisting of all vectors of the form (0, x2 , x3 , . . .). The
orthogonal projection P is T`∗ T` x = Tr T` x = (0, x2 , x3 , . . .).
Operator Theory 295
Exercises
12.1 Show that if T is self-adjoint (normal), then T n is self-adjoint (normal) (n ∈ N).
12.3 Let S, T ∈ B(H) be normal and T ∗ S = ST ∗ . Prove that S + T and ST are normal.
12.5 Let U ∈ B(H). Prove that U is an isometry iff kU x − xk2 = 2Re (x − U x | x).
12.6 Show that if Tn ∈ B(H) is normal for all n and Tn → T ∈ B(H), then T is normal.
12.8 Show that the relation T ≤ S iff S − T ≥ 0 is a partial order on the set of all self-adjoint
operators on H.
12.12 Let S, T ∈ B(H) with S ≥ 0 and T ≥ 0. Show that ST ≥ 0 iff ST = T S. Show that one then
has (ST )1/2 = S 1/2 T 1/2 .
12.13 Show that T ∈ B(H) is normal iff the real and imaginary parts of T commute.
12.24 Let T ∈ B(H) be normal. Show that if T is invertible, then so is |T | and |T |−1 = |T −1 |
12.25 (a) Let T ∈ B(H) be self-adjoint. Show that T + iI is invertible.
(b) Define the Cayley transform U of T by U = (T − iI)(T + iI)−1 . Show that U is unitary.
(c) Let U be a unitary operator on H such that I − U is invertible. Show that the operator
T := i(I + U )(I − U )−1 is self-adjoint and that U is its Cayley transform.
12.26 Let φ, φ−1 ∈ L∞ (X, F, µ). Find the polar decomposition of the multiplication operator Mφ .
12.27 Let T = U |T | be the polar decomposition of T ∈ B(H). Show that
(a) U ∗ U |T | = |T |, U ∗ T = |T |, and U U ∗ T = T . (b) U |T |U ∗ = |T ∗ | (use uniqueness of |T ∗ |).
(c) T is normal iff |T ∗ | = |T |. (d) T is normal iff U |T | = |T |U and U U ∗ = U ∗ U .
e = (x
(x ⊗ y)x e | x) y, x
e ∈ H. (12.4)
Clearly, every linear combination of rank one operators is of finite rank. Conversely, every
T ∈ B00 (H, K) may be written
Xn
T = xj ⊗ yj (12.6)
j=1
An Approximation Theorem
Here is the main result of the section.
12.2.1 Theorem. B0 (H, K) is the operator norm closure of B00 (H, K).
Proof. We show that an arbitrary operator T ∈ B0 (H, K) is the limit of a sequence of
operators of finite rank. Since cl ran T is separable (Ex. 8.101), it has a countable orthonormal
basis (en ). For each n define a finite rank operator
n
X
Pn := (T ∗ ek ) ⊗ ek .
k=1
Since
∞
X n
X
Tx = (T x | ek ) ek and Pn x = (T x | ek ) ek ,
k=1 k=1
These facts, together with the compactness of T (C1 ), implySm that kPn − T k → 0. Indeed,
given ε > 0, choose x1 , . . . , xm ∈ C1 such that T (C1 ) ⊆ j=1 Bε (T yj ). Let x ∈ C1 and
choose j so that kT x − T xj k < ε. Then
12.2.2 Theorem. Let T ∈ B0 (H). Then there exists a net (Pα ) of projections of finite
rank such that kPα T − T k → 0.
Proof. Let E be an orthonormal basis for H, and for each finite set α ⊆ E let Pα denote the
projection of H onto span α. Then (Pα ) is aPnet, where the indices are directed upward by
inclusion. Set Qα := Pα − I. For each x = e∈E (x | e) e we have, by Parseval’s identity,
2
X
kQα xk = | (x | e) |2 → 0. (†)
e∈E\α
If it is not the case that kPα T − T k → 0, then there exists an ε > 0, a subnet (Qβ ), and a
net (xβ ) of unit vectors with kQβ T xβ k ≥ ε for all β. Since T is compact we may assume
that T xβ → y for some y. But then
Exercises
12.28 Let T ∈ B(H). Show that the commutant of B00 (H) is C I, hence B00 (H)00 = B(H).
w
12.29 Prove that T ∈ B(H) is compact iff xn → 0 ⇒ kT xn k → 0. Show that this is false in `1 (N).
12.30 Show that T ∈ B(H) is compact iff the following condition holds:
w w
xn → x and yn → y ⇒ (T xn | yn ) → (T x | y) .
12.31 Let φ ∈ c0 . Show that the multiplication operator Mφ on `2 (N) is compact. Show that the
analogous assertion for φ ∈ c is false.
12.34 Let φ ∈ L∞ (0, 1). Show that if the multiplication operator Mφ on L2 (0, 1) is compact then
φ = 0 a.e. Find an example of a measure space (X, F, µ) for which the assertion is false in
L2 (X, F, µ).
12.37 Show that T is compact (has finite rank) iff |T | is compact (has finite rank). JUse a polar
decomposition.K
Operator Theory 299
(b) If α and β are distinct eigenvalues, then the eigenspaces ker(αI − T ) and ker(βI − T )
are mutually orthogonal.
Proof. (a) Since αI − T is normal with adjoint αI − T ∗ , we have k(αI − T )xk =
k(αI − T ∗ )xk. Therefore, (αI − T )x = 0 iff (αI − T ∗ )x = 0.
(b) Let T x = αx and T y = βy, where x, y 6= 0. Then
α (x | y) = (T x | y) = (x | T ∗ y) = x | βy = β (x | y) .
Since α 6= β, (x | y) = 0.
Diagonalizable Operators
An operator T ∈ B(H) is said to be diagonalizable if there exists an orthonormal basis
{ei : i ∈ I} of H and a bounded set of complex numbers {αi : i ∈ I} such that
X
Tx = αi (x | ei ) ei for all x ∈ H. (12.7)
i
or simply X
T = (T ei | ei ) ei ⊗ ei .
i∈I
P P
From (12.7) we see that T x = αx iff i (αi − α) (x | ei ) ei = 0 iff i |αi − α|2 | (x | ei ) |2 =
0. Thus the eigenvalues of T are the numbers αi . Moreover, since x is an eigenvector
corresponding to α iff (x, ei ) = 0 for all i with αi 6= α, we see that the eigenspace
corresponding to α is the span of those ei for which αi = α. Thus
X
x= (x | ei ) ei , x ∈ ker (αI − T ). (12.8)
i:αi =α
The next two propositions give the basic properties of diagonalizable operators.
1 We remove the compactness requirement in §13.6.
300 Principles of Analysis
the first equation holding in the operator norm and the second and third holding pointwise in
the norm topology of H. Moreover, the sequence (|λn |) may be taken to be decreasing, hence
in the infinite case |λn | ↓ 0.
Proof. The first assertion follows from the preceding lemma. For the proof of 12.10, we
consider only the case where the sequence (λn ) is infinite. Collecting together the terms in
the expansion (12.7) corresponding to the same αi , we have
X X X X
Tx = λn Pn x, Pn x = (x | ei ) ei = (ei ⊗ ei )x and x = Pn x.
n αi =λn αi =λn n
Since (λn ) vanishes at infinity, given 0 < ε ≤ |λ1 | we may choose the smallest n = n(ε) for
whichP |λk | < ε for all k > n. For k ≤ n we then have |αi | ≥ ε for all αi coinciding with λk .
n
Thus k=1 λk Pk is the operator
P Tε in the lemma. Since n(ε) increases as ε decreases, the
lemma implies that T = n λn Pn holds in the operator norm. By considering the finite sets
{|λn | ≥ 1} ⊆ {|λn | ≥ 1/2} ⊆ · · · , we may arrange the sequence (λn ) so that |λn+1 | ≤ |λn |
for all n.
The multiplicity of an eigenvalue λn is the dimension of ran Pn , where Pn is the
projection of the theorem.
The following result is proved in Chapter 13. It will be used here to prove the existence
of eigenvalues for a compact normal operator, the essential ingredient in the proof of the
spectral theorem.
12.3.6 Lemma. Let T ∈ B(H). Then σ(T ) is nonempty and bounded. Moreover
1/n
sup{|λ| : λ ∈ σ(T )} = lim kT n k . (12.11)
n
such that k(I − S)xn k ≤ 1/n. By compactness of S we may take a convergent subsequence
Sxnk → y. We then have
xnk = (I − S)xnk + Sxnk → y
and so Sy = y, that is, T y = λy, and kyk = 1.
Here is the main result of the section:
12.3.9 Theorem. Let T ∈ B(H) be compact and normal. Then T is diagonalizable.
Proof. Let O denote the family of all orthonormal sets whose members are eigenvectors of T .
By 12.3.8, O 6= ∅. A standard Zorn’s lemma argument shows that O has a maximal member,
that is, an orthonormal set E of eigenvectors that is not properly contained in a larger such
set. Let K denote the closed linear span of E and observe that T (K) ⊆ K. Also, by 12.3.2,
T ∗ (K) ⊆ K, hence T (K ⊥ ) ⊆ K ⊥ . Since T is diagonalizable on K it therefore suffices to
show that K ⊥ = {0}.
Suppose that K ⊥ 6= {0}. We consider two cases: If T K ⊥ = 0, then every unit vector
in K ⊥ is an eigenvector with eigenvalue zero. If T K ⊥ 6= 0, then, by 12.3.8, T has an
eigenvector in K ⊥ . Each outcome contradicts the maximality of E, hence K ⊥ = {0}.
The following application of the spectral theorem will be needed in the discussion of
Hilbert-Schmidt integral operators in the next section.
12.3.10 Corollary. If T ∈ B0 (H, K) is not the zero operator, then there exist orthonormal
(possibly finite) sequences (xn ) ⊆ H, (yn ) ⊆ K, and (αn ) ⊆ (0, ∞) such that in the operator
norm X
T = αn (xn ⊗ yn ). (12.12)
n
If the sequences are infinite, then αn ↓ 0.
Proof. By the spectral theorem applied to T ∗ T ∈ B0 (H), there exists an orthonormal
sequence (xn ) of eigenvectors of T ∗ T and a decreasing sequence of corresponding eigenvalues
βn > 0 such that
X∞
T ∗T x = βn (x | xn ) xn , x ∈ H. (†)
n=1
We assume that (βn ) is an infinite sequence (hence βn ↓ 0); otherwise the sum in (†) is finite
and the notation
√ in the remainder of the proof may be adjusted accordingly.
Set αn = βn and yn = αn−1 T xn . The calculation
αm αn (ym | yn ) = (T xm | T xn ) = (T ∗ T xm | xn ) = αm
2
(xm | xn )
implies that (yn ) is orthonormal, hence it remains to show that (12.12) holds.
Now, by Bessel’s inequality,
X
2 X
m
m
2
α (x | x ) y
= αk2 | (x | xk ) |2 ≤ βn kxk ≤ βn , kxk ≤ 1,
k k k
k=n k=n
Pm
hence the operators k=n αk xk ⊗ yk form a Cauchy sequence in B00 (H, K). Let
n
X
S := lim αk xk ⊗ yk (operator norm convergence),
n
k=1
Exercises
12.38 Let (X, F, µ) be a σ-finite measure space and let φ ∈ L∞ (X, F, µ). Show that λ is an eigenvalue
of the multiplication operator Mφ on L2 iff φ = λ on a set of positive measure.
12.39 Find the eigenvalues of the left shift operator T` on `2 . Show that the right shift operator Tr
has no eigenvalues.
12.40 Let f, g ∈ L2 [0, 1] and extend f periodically to R so that the convolution operator Tg f := f ∗ g
is defined on [0, 1]:
Z 1
(Tg f )(x) = f (x − y)g(y) dy.
0
12.41 Show that the operator T on L2 [0, 1] defined by (T f )(t) = tf (t) is self-adjoint with no
eigenvalues.
Rx
12.42 Show that the operator T on C[0, 1] defined by T f (x) = 0 f (t) dt does not have an eigenvalue.
12.43 Let T ∈ B(H) be self-adjoint, λ ∈ C, and let P be the projection of H onto ker (λ − T ).
Show that S ∈ B(H) and ST = T S ⇒ SP = P S. Conclude in (12.10) that for T self-adjoint,
ST = T S ⇒ SPn = Pn S for all n. (By 13.6.2, these assertions hold for normal T .)
Thus the definition of kT k2 is independent of the choice of the orthonormal basis and
kT k2 = kT ∗ k2 . If kT k2 < ∞, then T is called a Hilbert-Schmidt operator. The set of all
Hilbert-Schmidt operators is denoted by B2 (H, K). It is easy to check that B2 (H, K) is a
linear space and kT k2 is a norm. For example, the triangle inequality kT + Sk2 ≤ kT k2 +kSk2
follows easily from the CBS inequality in H and the triangle inequality in `2 (N).
The following proposition makes important connections between the operator norm and
the Hilbert-Schmidt norm.
12.4.1 Proposition. If S ∈ B(L, H), T ∈ B2 (H, K), and R ∈ B(K, L), then
Proof. By Parsevals’s identity and the CBS inequality, for x ∈ H with kxk ≤ 1,
2
X X X 2 2 2
kT xk = | (T x | f) |2 = | (x | T ∗ f) |2 ≤ kT ∗ fk = kT ∗ k2 = kT k2 ,
f f f
12.4.2 Theorem. The inclusions B00 (H, K) ⊆ B2 (H, K) ⊆ B0 (H, K) hold. Moreover,
under the Hilbert-Schmidt norm, B2 (H, K) is a Banach space and B00 (H, K) is dense in
B2 (H, K).
Proof. To show that B2 (H, K) is complete, let (Tn ) be a Cauchy sequence in B2 (H, K) with
respect to k·k2 . Then (Tn ) is Cauchy with respect to the operator norm, hence there exists
T ∈ B(H, K) such that kTn − T k → 0. Given ε > 0, choose N so that kTm − Tn k2 < ε
for all m, n ≥ N . For such n and any finite E ⊆ E,
X 2
X 2 2
k(T − Tn )ek = lim k(Tm − Tn )ek ≤ lim kTm − Tn k2 ≤ ε.
m m
e∈E e∈E
2
Since E was arbitrary, kT − Tn k2 ≤ ε. Therefore, T = T − Tn + Tn ∈ B2 (H, K) and Tn → T
in the Hilbert-Schmidt norm, proving that B2 (H, K) is a Banach space.
Now let T ∈ B00 (H, K) and choose an orthonormal basis {f1 , . . . , fn } in ran T . Then
by Parseval’s identity,
X n
XX n X
X n
X
2 2
kT ek = | (T e | f j ) |2 = | (e | T ∗ f j ) |2 = kT ∗ f j k < ∞.
e∈E e∈E j=1 j=1 e∈E j=1
(x ⊗ y | u ⊗ v) = (x | u) (y | v) , x, u ∈ H, y, v ∈ K. (12.14)
Operator Theory 305
P3
2
Proof. From the polarization identity 4(Se | T e) = k=0 ik
Se + ik T e
, we have
3
X X 3
k k k
X
4(S | T ) = i (S + i T )e | (S + i T )e = ik kS + ik T k22 ,
k=0 e∈E k=0
which shows that the series in (12.13) converges absolutely and that the definition of (S | T )
is independent of the basis. The proof that (S | T ) is an inner product is straightforward.
For the verification of (12.14), note that the left side is
X X
(x ⊗ y)e | (u ⊗ v)e = (e | x) y | (e | u) v = (y | v) (u | x) .
e∈E e∈E
hence (T | x ⊗ y) = 0 for all x and y. Since B00 (H, K) is dense in B2 (H, K), (T | T ) = 0,
hence T = 0. Therefore, G is a basis.
12.4.5 Example. Let (X, F, µ) and (Y, G, ν) be measure spaces such that L2 (µ) and
L2 (ν) are separable
with orthonormal bases (φn ) and (ψn ), respectively. We show that
B2 L2 (ν), L2 (µ) and L2 (µ ⊗ ν) are isomorphic as Hilbert spaces under a mapping U such
that U (f ⊗ g) = f g, where (f g)(x, y) = f (x)g(y).
The calculation
Z Z
(φm ψn | φj ψk ) = (φm ψn ) (φj ψk ) = φm (x)ψn (y)φj (x)ψk (y) dµ(x) dν(y)
ZX×Y Z X×Y
= φm φj · ψn ψk = (φm | φj ) (ψn | ψk )
X Y
= (φm ⊗ ψn | φj ⊗ ψk )
shows that the image (φm ψn )m,n under U of the orthonormal basis (φm ⊗ ψn )m,n is an
orthonormal set. It remains then to show that the set is complete.
Let f ∈ L2 (µ ⊗ ν) such that for all m, n,
ZZ
0 = φm ψ n | f = φm (x)ψn (y) f (x, y) dµ(x) dν(y)
Z Z −
= ψn (y) φm (x) f (x, y) dµ(x) dν(y). (†)
By Fubini’s theorem, ZZ
2
|f (x, y)|2 dµ(x) dν(y) = kf k2 < ∞,
hence f (·, y) ∈ L2 (µ) for a.a. y. For such y, by the CBS inequality
Z Z
φm (x)f (x, y) dµ(x) ≤ |φm (x) |f (x, y)| dµ(x) ≤ kφm k kf (·, y)k < ∞.
2 2
Thus the inner integral in (†) is an L2 function of y and so must be zero, by the completeness of
(ψn )n . Using the completeness of (φm )m , we conclude that f = 0 a.e. Therefore, (φm ψn )m,n
is complete. ♦
306 Principles of Analysis
Clearly A ⊗ B is linear in T and since kBT A∗ k2 ≤ kBk kT k2 kA∗ k (12.4.1), we see that
A ⊗ B is bounded with kA ⊗ Bk2 ≤ kAk kBk. By (12.5),
For uniqueness, simply note that a pair of bounded linear operators on B2 (H, K) that
agree on the set {x ⊗ y : x ∈ H, y ∈ K} must in fact be equal, since the span of this set
is dense in B2 (H, K) (12.4.2).
12.4.7 Proposition. The following properties hold:
(a) (A, B) → A ⊗ B is sesquilinear.
(c) (A ⊗ B)∗ = A∗ ⊗ B ∗ .
(e) A ⊗ B is invertible iff both A and B are invertible, and then (A ⊗ B)−1 = A−1 ⊗ B −1 .
Proof. Parts (a)–(c) follow from uniqueness and the properties of rank one operators. For
example,
[(A1 + A2 ) ⊗ B](x ⊗ y) = (A1 + A2 )x ⊗ (By) = (A1 x) ⊗ (By) + (A2 x) ⊗ (By)
= (A1 ⊗ B)(x ⊗ y) + (A2 ⊗ B)(x ⊗ y),
(A ⊗ B)(C ⊗ D)(x ⊗ y) = (A ⊗ B)(Cx ⊗ Dy) = (ACx) ⊗ (BDy)
= [(AC) ⊗ (BD)](x ⊗ y) and
∗
(A ⊗ B) (x ⊗ y) | u ⊗ v = x ⊗ y | (Au) ⊗ (Bv) = (x | Au) (y | Bv)
= (A∗ x | u) (B ∗ y | v)
= (A∗ x) ⊗ (B ∗ y) | u ⊗ v .
For (d), we have already shown that kA ⊗ Bk2 ≤ kAk kBk. For the reverse inequality, let
kxk = kyk = 1. Then kx ⊗ yk = kxk kyk = 1, hence
Taking the supremum over all such x and y yields kA ⊗ Bk2 ≥ kAk kBk.
For (e), if A and B are invertible, then (A ⊗ B)(A−1 ⊗ B −1 ) = AA−1 ⊗ BB −1 = I ⊗ I,
which is the identity operator in B2 (H, K). Conversely, suppose that A ⊗ B is invertible.
Then
I ⊗ I = (A ⊗ B)−1 (A ⊗ B) = (A ⊗ B)−1 (I ⊗ B)(A ⊗ I)
and
I ⊗ I = (A ⊗ B)(A ⊗ B)−1 = (A ⊗ I)(I ⊗ B)(A ⊗ B)−1 ,
hence A ⊗ I is invertible. Thus there exists c > 0 such that
Taking y 6= 0 we see that kAxk ≥ c kxk for all x, which implies that A is injective. Since
(A ⊗ B)∗ is invertible and (A ⊗ B)∗ = A∗ ⊗ B ∗ , the same argument applied to A∗ ⊗ B ∗ shows
that A∗ is injective. Therefore, A is surjective and so is invertible. Similarly, B is invertible.
Finally, if A and B are unitary, then A∗ A = I and B ∗ B = I, hence (A∗ ⊗ B ∗ )(A ⊗ B) =
A A ⊗ B ∗ B = I ⊗ I and so A ⊗ B is unitary, proving (f).
∗
Note that the converse of (f) is false. (Take A = (1/2)I and B = 2I.)
By Ex. 12.44, K is bounded with kKk ≤ kkk2 . We show in this subsection that K is a
Hilbert-Schmidt operator.
First, we show that K is compact. Let (φn )n be an orthonormal basis for L2 (µ) and
define φn φm on X × X by
This is a slight variation of the definition given in (12.4.5), but still gives an orthonormal
basis for L2 (µ ⊗ µ). Thus we have the Fourier expansion
∞
X
k= (k | φn φm ) φn φm .
m,n=1
Moreover,
ZZ
(Kφm | φn ) = k(x, y)φm (y)φn (x) dµ(y) dµ(x) = (k | φn φm ) ,
Now let Pn denote the orthogonal projection of L2 (µ) onto the span of {φ1 , . . . , φn } and
308 Principles of Analysis
set Kn = KPn + Pn K − Pn KPn . Then Kn has finite rank, hence to show K is compact it
suffices to show that Kn → K in operator norm. For f ∈ L2 (µ) and ck := (f | φk ) we have
∞
X n
X
f= ck φ k , Pn f = ck φk ,
k=1 k=1
X∞ n
X
Kf = ck Kφk , KPn f = ck Kφk ,
k=1 k=1
Xn ∞
n X
X
Pn Kf = (Kf | φk ) φk = ci (Kφi | φk ) φk , and
k=1 k=1 i=1
Xn X n X n
Pn KPn f = (KPn f | φk ) φk = ci (Kφi | φk ) φk .
k=1 k=1 i=1
Thus
X n X
X
Kf − Kn f = (K − KPn )f + (Pn KPn − Pn K)f = ci Kφi − ci (Kφi | φk ) φk ,
i>n k=1 i>n
so for each j
X n X
X
(Kf − Kn f | φj ) = ci (Kφi | φj ) − ci (Kφi | φk ) (φk | φj ) .
i>n k=1 i>n
P
The right side is zero if j ≤ n and equals i>n ci (Kφi | φj ) otherwise. By the CBS inequality
in `2 (N) and, by Bessel’s inequality, for j > n we have
X X 2
X
| (Kf − Kn f | φj ) |2 ≤ |ci |2 | (Kφi | φj ) |2 ≤ kf k2 | (Kφi | φj ) |2 ,
i>n i>n i>n
hence
2
X 2
XX
kKf − Kn f k2 = | (Kf − Kn f | φj ) |2 ≤ kf k2 | (Kφi | φj ) |2 .
j>n j>n i>n
By (†), the term on the right tends to 0 as n → 0, hence kKn − Kk → 0. This shows that K
is compact.
To show that K is a Hilbert-Schmidt operator we use 12.3.10, which guarantees the
existence of orthonormal sequences (ϑn ) and (ψn ) in L2 (X) and λn ↓ 0 such that
X
Kf = λn (f | ϑn ) ψn , f ∈ L2 (X).
n
R
Now, for fixed x λn ψn (x) = Kϑn (x) = k(x, y)ϑn (y) dµ(y), the integral being a Fourier
coefficient of the function k(x, ·) with respect to the basis (ϑn ). By Bessel’s inequality,
X X Z
2
|Kϑn (x)|2 = |λn ψn (x)|2 ≤ kk(x, ·)k2 = |k(x, y)|2 dµ(y).
n n
Exercises
12.44 Let K be as in (12.16).
(a) Show that K is bounded with kKk ≤ kkk2 .
(b) Compute the adjoint of K. When is K self-adjoint?
(c) Let L be the Hilbert-Schmidt operator with kernel `. Find the kernel of LK. Give a condition
on the kernels of K and L that implies LK = KL.
(d) Use (b) and (c) to give a sufficient condition on k for K to be normal.
Z t
(e) Show that the Volterra operator (Kf )(t) = f (s) ds (t ∈ [0, 1]) is a Hilbert-Schmidt
0
integral operator on L2 [0, 1].
If kT k1 < ∞, then T is said to be of trace class. The set of all trace class operators is
denoted by B1 (H). The calculation
X X
2
kT k1 = (|T |1/2 e | |T |1/2 e) = k |T |1/2 e k2 =
|T |1/2
2
e∈E e∈E
shows that kT k1 is independent of the choice of orthonormal basis and that T ∈ B1 (H) iff
|T |1/2 ∈ B2 (H).
We show below that B1 (H) is a linear space and that k·k1 is indeed a norm on B1 (H).
First, we establish some preliminary results.
12.5.1 Proposition. T ∈ B1 (H) iff any one (hence both) of the following conditions holds:
(a) T = AB for some A, B ∈ B2 (H).
hence T ∈ B1 (H).
Finally, if (a) holds, then using the polar decomposition of T again we have |T | = U ∗ T =
∗
(U A)B (12.27), which gives (b).
12.5.2 Corollary. B00 (H) ⊆ B1 (H) ⊆ B2 (H) ⊆ B0 (H).
Proof. The second inclusion follows from the proposition and the fact that B2 (H) is an
algebra. For the first inclusion, let T ∈ B00 (H) and let T = U |T | be the polar decomposition
of T . From U ∗ T = |T | we see that ran |T | is finite dimensional. Thus we may choose an
orthonormal basis E for H so that some finite subset F is an orthonormal basis for ran |T |.
Since e ⊥ ran |T | for e ∈ E \ F , the sum in (12.17) is finite and so T ∈ B1 (H).
12.5.3 Theorem. B1 (H) is a self-adjoint ideal of B(H) and the trace norm is a norm.
Proof. Absolute homogeneity of k·k1 follows from Ex. 12.10. For the triangle inequality,
let S, T ∈ B1 (H) and let S = U |S|, T = V |T |, and S + T = W |S + T | be the polar
decompositions. Then
Similarly,
1/2 ∗
|T | V W
≤
|T |1/2
kV ∗ W k ≤
|T |1/2
.
2 2 2
Since F was arbitrary, we obtain from (†) the triangle inequality
kS + T k1 ≤
|S|1/2
2 +
|T |1/2
2 = kSk1 + kT k1 .
where
P the αf are the eigenvalues of |T | with corresponding eigenvectors f. Since
f∈F (|T |f | f) = kT k1 = 0 and the terms αf = (|T |f | f) are nonnegative, αf = 0
for all f. Therefore, |T | = 0 and so T = 0.
To show that B1 (H) is an ideal in B(H), let T ∈ B1 (H) and S ∈ B(H). By 12.5.1,
T = AB for some A, B ∈ B2 (H), hence T S = A(BS). Thus T is a product of members
of B2 (H), hence B1 (H)B(H) ⊆ B1 (H). Similarly B(H)B1 (H) ⊆ B1 (H). Therefore,
B1 (H) is an ideal of B(H). Since T ∗ = B ∗ A∗ and A∗ , B ∗ ∈ B2 (H), T ∗ ∈ B1 (H).
Therefore, B1 (H) is self-adjoint.
The Trace
The trace tr T of T ∈ B1 (H) is defined in terms of the orthonormal basis E by
X
tr T := (T e | e) . (12.18)
e∈E
The following proposition shows that tr T is well-defined and independent of the basis.
P
12.5.4 Proposition. For T ∈ B1 (H), the sum e∈E (T e | e) converges absolutely. More-
over,
tr(B ∗ A) = (A | B) A, B ∈ B2 (H), (12.19)
where the right side is the Hilbert-Schmidt inner product of A and B.
Proof. By 12.5.1, T = B ∗ A, where A, B ∈ B2 (H). Then
1 2 1 2
| (T e | e) | = | (Ae | Be) | ≤ kAek kBek ≤ 2 kAek + 2 kBek .
This proves the first assertion of the proposition. The second assertion follows directly from
the definition of the trace and the Hilbert-Schmidt inner product.
Here are additional noteworthy properties of the trace and the trace norm.
12.5.5 Theorem. Let T ∈ B1 (H), S ∈ B(H). Then
(a) tr(·) is a linear functional on B1 (H) and is positive, that is, T ≥ 0 ⇒ tr T ≥ 0.
(b) tr T ∗ = tr T ( =: tr T ).
312 Principles of Analysis
(e) kT ∗ k1 = kT k1 .
kT ∗ k1 = tr |T ∗ | = tr(U |T |U ∗ ) = tr(U ∗ U |T |) = tr |T | = kT k1 ,
proving (e).
For (f), let ST = V |ST | be the polar decompositions of ST . Then |ST | = V ∗ ST =
V SU |T |, hence, by (d), kST k1 = tr(V ∗ SU |T |) ≤ kV ∗ SU k k|T |k1 ≤ kSk k|T |k1 . Using this
∗
≤ lim kTm − Tn k1 ≤ ε,
m
In (1), |ΨA (T )| ≤ kAk1 kT k and in (2) |ΨA (T )| ≤ kT k1 kAk (12.5.5). Thus we obtain linear
mappings
The next theorem uses the map Ψ to identify B1 (H) with B0 (H)0 and B(H) with B1 (H)0 .
12.5.7 Theorem. The mappings Ψ in (10 ) and (20 ) are isometric isomorphisms.
Proof. We follow the treatment in [36]. For (10 ) we need to prove that kAk1 ≤ kΨA k and
that every member ψ of B0 (H)0 is of the form ΨA for some A ∈ B1 (H).
Since kSk ≤ kSk2 for S ∈ B2 (H) ⊆ B0 (H), ψ restricted to B2 (H) is a member of
B2 (H)0 . By the Riesz representation theorem, there exists a T in the Hilbert space B2 (H)
such that ψ(·) = (·, T ). Set A := T ∗ , so that ψ(S) = tr(AS) for all S ∈ B2 (H). It remains
to show that A ∈ B1 (H). For this let A = U |A| be the polar decomposition of A. If F ⊆ E
is finite and P is the projection of H onto the span of F , then
X X X
(|A|e | e) = (U ∗ Ae | e) = (P U ∗ Ae | e) = tr(P U ∗ A) = tr(AP U ∗ ) = ψ(P U ∗ ).
e∈F e∈F e∈E
P
Since |ψ(P U ∗ )| ≤ kψk kP U ∗ k ≤ kψk, we have e∈F (|A|e | e) ≤ kψk for all finite F .
Therefore kAk1 < ∞, completing the proof of the first part of the theorem.
314 Principles of Analysis
For (2’), we need to prove that kAk ≤ kΨA k and that every member ψ of B1 (H)0 is of
the form ΨA for some A ∈ B(H). Now, for any x, y ∈ H, by direct calculation we have
2
(y ⊗ x)∗ = x ⊗ y and (x ⊗ y)(y ⊗ x) = kxk (y ⊗ y), (†)
Therefore, B is bounded with kBk ≤ kψk. By 11.4.1, there exists an operator S ∈ B(H)
with kSk = kBk ≤ kψk such that
the last equality from 12.5.5(c). Since every operator is a linear combination of self-adjoint
operators, ψ = ΨA .
Exercises
12.48 If H is finite dimensional, show that tr(T ) is the sum of the diagonal elements of the matrix of
T relative to any basis.
13.1 Introduction
In this chapter we develop the essential properties of commutative Banach algebras. The
main goal is the Gelfand representation theorem, which asserts that such an algebra may be
represented as the algebra of continuous functions on some topological space. Applications
to operator theory, including the spectral theorem for normal operators, are given in §13.6.
If A and B are Banach ∗-algebras and ϕ(x∗ ) = ϕ(x)∗ for all x ∈ A, then ϕ is called a
∗-homomorphism.
Recall that an ideal I of a Banach algebra A is a linear subspace such that xy, yx ∈ A
for all x ∈ A and y ∈ I. If I = 6 A, then I is called a proper ideal. If I is closed, then
A/I is a Banach algebra under multiplication (x + I)(y + I) = xy + I, and the quotient
map is an algebra homomorphism (Ex. 13.7). Quotient algebras will be of considerable
importance later in connection with maximal ideals and characters of a Banach algebra.
We have seen several examples of Banach algebras and C ∗ -algebras throughout the text.
For convenience, we include some of these in the following list.
315
316 Principles of Analysis
13.1.1 Examples.
(a) If X is a (nontrivial) Banach space, then B(X) is a unital, noncommutative Banach
algebra under the operator norm and with respect to operator composition.
(b) If H is a Hilbert space, then B(H) is a C ∗ -algebra, where involution is the adjoint
operation. The spaces B00 (H), B0 (H), B1 (H), and B2 (H) are ideals of B(H).
(c) If X is a set, then B(X) is a unital, commutative C ∗ -algebra with involution f → f .
(d) If X is a topological space, then Cb (X) is a unital, commutative C ∗ -subalgebra of B(X).
(e) If X is a noncompact, locally compact, Hausdorff topological space, then C0 (X) is a
non-unital C ∗ -subalgebra of Cb (X).
(f) `1 group algebra. The space `1 (Z) of all bilateral sequences
P∞ x = (. . . , x−1 , x0 , x1 , . . .) is a
commutative Banach ∗-algebra under the norm kxk1 := k=−∞ |xk | < ∞ with convolution
product x ∗ y and involution x∗ defined by
∞
X
(x ∗ y)(n) = xn−k yk , and x∗ (n) = x∗ (−n).
k=−∞
0 2
Moreover, `1 (Z) has identity e0 := (. . . , 0, 1, 0, . . .). In general kx∗ ∗ x∗ k 6= kxk , hence
`1 (Z) not a C ∗ -algebra (Ex. 13.2).
(g) L1 group algebra. The space L1 (Rd ) is a commutative, non-unital Banach ∗-algebra
under convolution f ∗ g and involution f ∗ defined by
Z
f ∗ g(x) = f (x − y)g(y) dy, and f ∗ (x) = f (−x).
(h) Measure algebra. The space M (Rd ) of complex Borel measures on Rd with the total
variation norm is a commutative Banach algebra under convolution.
13.1.3 Corollary. If x ∈ A and z ∈ C with |z| > kxk, then ze − x is invertible and
∞
X
(ze − x)−1 = z −n−1 xn .
n=0
13.1.4 Theorem. The group G of invertible elements in A is open and the map x → x−1
on G is continuous.
−1
Proof. Let x0 ∈ G and set r =
x−1
0
. Then G contains the open ball Br (x0 ). Indeed, if
kx − x0 k < r, then
xx−1 − e
=
(x − x0 )x−1
≤ kx − x0 k kx−1 k < 1,
0 0 0
hence xx−1 −1
0 is invertible. Denoting the inverse by y and setting a = x0 y, we see that
−1
xa = xx0 y = e. A similar argument produces an element b such that bx = e. Thus
x is invertible, verifying the claim and proving that G is open.
To show continuity
P∞of the inverse at e, let xn → e in G. By 13.1.2, for sufficiently large
n we have x−1n = k=0 (e − x n )k
, hence for 0 < ε < 1 and ke − xn k < ε,
∞
−1
X k ε
xn − e
≤ ke − xn k ≤ .
1−ε
k=1
Therefore limn
x−1
n − e ≤ ε(1 − ε)
−1
and letting ε → 0 shows that limn x−1
n = e.
In the general case, let xn → x in G. Then xn x−1 → e, hence, by the preceding
paragraph, xx−1n = (xn x
−1 −1
) → e and so x−1n →x
−1
.
Then
kan (B0 − B) + an−1 (B1 − B) + · · · + an−N (BN − B)k < ε/2. (‡)
Exercises
13.1 Verify that multiplication in a Banach algebra is jointly continuous.
13.3 (Banach algebra generated by x and e). Let A be a commutative unital Banach algebra and
let x ∈ A. Show that the intersection B ofP all closed subalgebras of A containing x and e is
the closure of the set P of all polynomials n j 0
j=0 cj x in x, where x := e.
13.4 The commutant of a nonempty subset E of a unital Banach algebra A is the set E 0 :=
{x : xy = yx ∀ y ∈ E}. The bicommutant E 00 of E is the commutant of the commutant:
E 00 = (E 0 )0 . Show that E 0 is a closed unital subalgebra of A. Show also that if x ∈ GA , then
x−1 ∈ {x}00 .
13.5 Let X, Y be topological spaces and τ : Y → X a continuous function. Show that ϕ(f ) := f ◦ τ
defines a ∗-homomorphism from Cb (X) into Cb (Y ).
13.6 When is the dilation operator (Dr f )(x) = f (rx) (r > 0), a homomorphism on the group algebra
L1 (Rd )?
13.7 Let A be a Banach algebra and I a closed ideal in C. Show that the Banach space A/I is
a Banach algebra under multiplication (x + I)(y + I) = xy + I and that the quotient
map Q is a homomorphism. Show also that if A is a Banach ∗-algebra and I is closed under
involution, then A/I is a Banach ∗algebra under involution (x + I)∗ = x∗ + I and Q is a
∗-homomorphism.
13.8 [↓ 13.3.4] Let I be a proper ideal of a unital Banach algebra. Show that cl I is a proper ideal.
13.9 [↑ 8.46] Show that the space C n [0, 1] of n-times continuously differentiable functions on [0, 1] is
a Banach algebra with the norm kf k = n (k)
P
k=0 kf k∞ .
13.11 Let A and B be unital Banach algebras and Φ : A → B a homomorphism that maps identity
onto identity. Show that Φ(GA ) ⊆ GB .
Banach Algebras 319
2 2πint
P∞ on L [0, 1], set en (t) = e
13.12 Let A denote the Banach algebra of bounded linear operators ,
and define T ∈ A so that T en = en+1 , that is, T x = n=−∞ (x | en ) en+1 . Let B be the
Banach algebra generated by T and I. Show that T ∈ GA \ GB .
−1A
be a unital Banach algebra and (xn ) ⊆ GA such that xn → x 6∈ GA . Show that
13.13 Let
xn
→ ∞.
13.14 [↑ 8.1.2] (Disk algebra). Let A(D) denote algebra of all bounded continuous functions on the
closed unit disk cl(D) that are analytic on D. Show that A(D) is a unital commutative C ∗ -algebra
with respect to the sup norm and involution f ∗ (z) = f (z).
13.15 [↑ 6.4, 7.1.6] Show that the set of all measures µ ∈ M (Rd ) with µ λ is an ideal in M (Rd ).
13.16 (Arens multiplication). Let A be a Banach algebra. For f in the dual space A 0 and x ∈ A,
define x f ∈ A 0 by x f (y) = f (xy). Next, for F, G in the bidual A 00 and f ∈ A 0 define Gf ∈ A 0
by Gf (x) = G(x f ) and F G ∈ A 00 by F G(f ) = F (Gf ). Show that A 00 is a Banach algebra under
the multiplication (F, G) 7→ F G and that the canonical embedding x 7→ x b is a homomorphism.
Proof. By 13.1.3, if ze − x is not invertible, then |z| ≤ kxk. Therefore, σ(x) is bounded
and r(x) ≤ kxk. Since the mapping f (z) = ze − x is continuous and ρ(x) = f −1 (GA ) is
open, σ(x) is closed.
The following lemma will be used to prove the key property that σ(x) 6= ∅.
13.2.2 Lemma. Let x ∈ A and ϕ ∈ A 0 . Define f on the open set ρ(x) by
f (z) = (ze − x)−1 , ϕ . (13.1)
Then f is analytic on ρ(x) and f 0 (z) = − [ze − x]−2 , ϕ .
320 Principles of Analysis
hence lim|z|→∞ |f (z)| = 0. By Liouville’s theorem, f is identically zero. Since ϕ was arbitrary,
(ze − x)−1 is zero for all z, impossible.
13.2.4 Theorem (Gelfand-Mazur). If A is a division algebra (that is, every nonzero element
in A is invertible), then A = Ce.
Proof. Let x ∈ A and z ∈ σ(x). Then ze − x is not invertible and so equals 0.
Proof. Note first that for kxk ≤ r the series g(x) is absolutely convergent, hence converges.
Now let |z| ≤ r. From the identity
we have
∞
X ∞
X
g(z)e − g(x) = an (z n e − xn ) = (ze − x) an yn .
n=1 n=1
Since kyn k ≤ nrn−1 , the series on the right converges to some y ∈ A which commutes with
(ze − x), that is,
g(z)e − g(x) = (ze − x)y = y(ze − x).
Thus if g(z)e − g(x) is invertible, then so is ze − x, verifying (13.2).
Banach Algebras 321
1/n
13.2.6 Theorem. r(x) = limn kxn k .
1/n
Proof. By 13.2.5, z ∈ σ(x) ⇒ z n ∈ σ(xn ) ⇒ |z n | ≤ kxn k ⇒ |z| ≤ kxn k . Therefore,
1/n
r(x) ≤ limn kxn k .
1/n
To see that limn kxn k ≤ r(x), notePfirst that if |z|
> kxk, then the function f in (13.1)
∞
with kϕk ≤ 1 is well-defined and f (z) = k=0 xk , ϕ z −k−1 . By 13.2.2, f (z) is analytic on
the larger set |z| > r(x). It follows that the preceding Laurent series expansion for f is valid
for |z| > r(x) and converges uniformly on |z| ≥ r for any r > r(x). Multiplying the series
expansion by z n+1 and integrating term by term along the contour z = reiθ yields
Z 2π ∞
X Z 2π
k n−k
rn+1 ei(n+1)θ f (reiθ ) dθ = x ,ϕ r ei(n−k)θ dθ = 2π hxn , ϕi .
0 k=0 0
Now set s := supθ k(reiθ e − x)−1 k. Noting from (13.1) that |f (z)| ≤
(ze − x)−1
, we
have
Z
1 2π n+1 i(n+1)θ
n
| hx , ϕi | = r e f (re ) dθ ≤ rn+1 sup |f (reiθ )| ≤ rn+1 s.
iθ
2π 0 θ
1/n
Since ϕ was arbitrary, kxn k ≤ rn+1 s. Thus limn kxn k ≤ r, and since r > r(x) was
1/n
arbitrary, limn kxn k ≤ r(x), as required.
Proof. (a) z ∈ σ(x∗ ) iff ze − x∗ is not invertible iff ze − x = (ze − x∗ )∗ is not invertible
iff z ∈ σ(x).
(b) If z ∈ σ(x), then z ∈ σ(x∗ ) by (a). Since x∗ = x−1 , z −1 ∈ σ(x) by Ex. 13.21.
Therefore, |z| and |z −1 | are both ≤ kxk = 1 and so |z| = 1.
(c) If x is self-adjoint, then exp(ix) is unitary (13.2.7), hence σ(exp(ix)) ⊆ T by (b).
Now let z ∈ σ(x). By 13.2.5, eiz ∈ σ(exp(ix)). Since |eiz | = 1, z ∈ R.
For a normal element x, the converses of (b) and (c) hold (Ex. 13.37). Moreover, if x is
self-adjoint, then x ≥ 0 iff σ(x) ⊆ R+ (Ex. 13.51). The proofs use the functional calculus
developed in §13.6.
13.2.9 Proposition. If x ∈ A is normal, then kxk = r(x).
2
n
2n
Proof. If x is self-adjoint, then
x2
= kxk ; iterating yields
x2
= kxk . In the
general case, apply this result to the self-adjoint element x∗ x using kx∗ xk = kxk2 to
obtain
2n+1 2n n n n n
kxk = kx∗ xk = k(x∗ x)2 k = k(x2 )∗ x2 k = kx2 k2 .
The assertion now follows from 13.2.6.
Here is an application of 13.2.9 to normal operators. The formula for the special case of a
self-adjoint operator was proved in 12.1.6.
13.2.10 Corollary. Let H be a complex Hilbert and T ∈ B(H) normal. Then
Proof. Let s denote the supremum. By 13.2.9, we may choose λ ∈ σ(T ) such that |λ| = kT k.
By 12.3.7, there exists a sequence (xn ) with unit norm such that kT xn − λxn k → 0. Then
Therefore, s ≥ | (T xn | xn ) | → |λ| = kT k ≥ s.
Banach Algebras 323
Exercises
13.18 [↓ 13.6.1] Let A and B be unital Banach algebras and Φ : A → B a homomorphism that maps
the identity onto the identity. Show that σ Φ(x) ⊆ σ(x).
13.19 Let A and B be unital C ∗ algebras and Φ : A → B a ∗-homomorphism that maps identity
onto identity. Show that kΦ(x)k ≤ kxk and hence that Φ is continuous. JConsider r(x∗ x) and
r(Φ(x∗ )Φ(x)).K
···
x11 x12 x1n
0 x22 ··· x2n
x= .
.. .. ..
.. . . .
0 0 ··· xnn
13.21 Let A be a unital algebra and x ∈ A invertible. Show that σ(x−1 ) = {z : z −1 ∈ σ(x)}.
13.22 Let X be a nonempty set and f ∈ B(X). Show that σ(f ) = cl f (S).
13.23 Let U be an open subset of C and let A be the Banach algebra of all bounded analytic functions
on U with the sup norm. Show that for any f ∈ A, σ(f ) = cl f (U ).
13.25 [↑ 8.34] Find the spectrum of the left shift and right shift operators on `2 .
13.27 Let x, y ∈ A such that xy = yx. Show that r(xy) ≤ r(x)r(y) and that equality holds if
x = y.
13.28 (Resolvent identity). The resolvent function of a member x of a unital Banach algebra is
the function R(z) = (ze − x)−1 , z ∈ ρ(x). Verify that R(z) − R(w) = (w − z)R(z)R(w).
13.29 Consider the Banach algebra C 1 [0, 1] of Ex. 13.9. Let f (x) = x. Show that r(f ) = 1 < kf k.
13.30 Let A be a unital C ∗ -algebra and B a closed C ∗ -subalgebra of A containing the identity. Let
x ∈ B. Obviously, σA (x) ⊆ σB (x), hence ρB (x) ⊆ ρA (x). Carry out the following steps to
prove that σB (x) ⊆ σA (x) and hence that σA (x) = σB (x).
(a) If U ⊆ V are open subsets of C and V ∩ bd(U ) = ∅, then every component of U is a
component of V . JIf U 0 is a component of U , then bd U 0 ⊆ bd U . K
(b) If z is a boundary point of the open set ρB (x) and zn ∈ ρB (x) with zn → z, then
k(zn e − x)−1 k → ∞. JUse Ex. 13.13.K
(c) ρA (x) ∩ bd ρB (x) = ∅.
(d) σB (x) is the union of σA (x) and certain bounded components of ρA (x).
(e) If x is self-adjoint, then σA (x) = σB (x). JρA (x) is connected.K
(f) If x is invertible in A, it is invertible in B. Jx∗ x is invertible in B.K
(g) σA (x) = σB (x).
324 Principles of Analysis
Characters
A character of A is a homomorphism χ from A into C that is not identically zero. Thus
χ(e) 6= 0, and it follows from the calculation χ(e) = χ(e2 ) = χ(e)2 that χ(e) = 1. The
collection of all characters of A is called the spectrum or character space of A and is
denoted by σ(A). For example, if X is a topological space and x ∈ X, then the mapping
f 7→ f (x) is a character of the Banach algebra Cb (X).
13.3.1 Proposition. If χ is a character, then χ is continuous and kχk ≤ 1.
Proof. Let x ∈ A and suppose that |χ(x)| > kxk. Set α = 1/χ(x). Then kαxk < 1, so
e − αx is invertible. Denote the inverse by y, so that y − αyx = y(e − αx) = e. But
then 1 = χ(e) = χ(y) − αχ(y)χ(x) = χ(y) − χ(y) = 0. Therefore, |χ(x)| ≤ kxk, hence
kχk ≤ 1.
The preceding proposition shows that σ(A) is a subset of the closed unit ball of A 0 . As
such it inherits the weak∗ topology of A 0 , also called the Gelfand topology of σ(A).
13.3.2 Example. The spectrum of C(X). Let X be a compact Hausdorff space. For x ∈ X
b denote the character x
let x b(f ) = f (x), f ∈ C(X). We show that the mapping x → x b is a
homeomorphism onto the spectrum Σ := σ(C(X)) of C(X).
The mapping x → x b is obviously continuous in the weak∗ topology of C(X)0 . Moreover,
since the functions in C(X) separate points (Urysohn’s lemma), the mapping is 1-1. It
remains to verify surjectivity.
Let χ ∈ Σ. We claim that there exists x0 ∈ X such that g(x0 ) = 0 for all g ∈ ker χ.
If this is not the case, then for each x ∈ X there exists gx ∈ ker χ such that gx (x) 6= 0.
By continuity, there exists an open neighborhood Ux of x such that gx = 6 0 on Ux . By
compactness of X, P , . . . , xn ∈ X such that X = Ux1 ∪ · · · ∪ Uxn . Set gj = gxj .
there exist x1P
n n
The function g := j=1 gj g j = j=1 |gj |2 is then positive on X and hence invertible in
Pn
C(X). On the other hand, χ(g) = j=1 χ(gj )χ(gj ) = 0, impossible for an invertible element.
This verifies the claim.
Now let f ∈ C(X). Then h := f − χ(f ) · 1 ∈ ker χ, hence h(x0 ) = 0 and so x b0 (f ) = χ(f ).
Therefore, the mapping x → x b is surjective. ♦
Maximal Ideals
A maximal ideal of A is a proper ideal that is not contained in a larger proper ideal.
Here is an interesting and illuminating example.
13.3.3 Example. Let X be a (nontrivial) compact Hausdorff space. For a subset Y of X,
set IY := {f ∈ C(X) : f (Y ) = 0}. Then IY is easily seen to be a proper ideal of C(X). We
show that IY is maximal iff Y is a singleton.
To show that Iy is maximal, suppose that Iy is properly contained in an ideal I and let
f ∈ I \ Iy , so that f (y) 6= 0. Define g(x) = f (x) − f (y) (x ∈ X). Then g(y) = 0, hence
g ∈ Iy ⊆ I. It follows that the nonzero constant function f − g = f (y) is in I, hence
I = C(X). Therefore, Iy is maximal.
Conversely, if Y has more than one element and y ∈ Y , then by Urysohn’s lemma we can
construct a function f ∈ Iy \ IY . Then IY is properly contained in Iy , so is not maximal. ♦
Banach Algebras 325
13.3.4 Proposition. Every proper ideal I is contained in a maximal ideal and every
maximal ideal is closed.
Proof. Partially order the collection of proper ideals of A containing I by inclusion. The
union J of a chain of proper ideals containing I is an ideal containing I and is proper
since e 6∈ J. Therefore, J is an upper bound for the chain. By Zorn’s lemma, I is contained
in a maximal ideal.
For the second part of the proposition, let M is a maximal ideal that is not closed. Then
M is properly contained in cl(M). But cl(M) is a proper ideal (Ex. 13.8), contradicting the
maximality of M.
Recall that the quotient space A/I of A by a closed ideal I is a Banach algebra and
the quotient map Q : A → A/I is a continuous homomorphism (Ex. 13.7). The following
theorem will be needed in the proof of 13.3.6 below.
The first term on the right is in I and the second is a member of ker χ ⊆ I. Therefore,
a ∈ I, proving that I = A. Therefore, ker χ is maximal.
Now let M be any maximal ideal. By 13.3.5, A/M is a field, hence, by the Gelfand-Mazur
theorem, A/M = {ze + M : z ∈ C}. Now define χ0 (ze + M) = z and set χ = χ0 ◦ QM .
Then χ is a character with kernel M.
Finally, if ker χ1 = ker χ2 , then cχ1 = χ2 for some c ∈ C (0.2.3). Since χ1 (e) = χ2 (e) = 1,
χ1 = χ2 .
Because of the 1-1 correspondence in 13.3.6, the spectrum σ(A) of A is also called the
maximal ideal space of A.
326 Principles of Analysis
Exercises
13.31 Show that GcA is the union of all maximal ideals in A.
13.32 Let A and B be commutative, unital Banach algebras and Φ : A → B a surjective homomor-
phism. Prove: if M is a maximal ideal in A, then Φ(M) is a maximal ideal in B.
13.33 [↑ 13.14] Showthat the evaluation mapping zb is a homeomorphism from cl D onto the spectrum
Σ := σ A(D) of the disk algebra. JFor surjectivity, let χ ∈ Σ and show that there exists a
z ∈ T such that χ(P ) = P (z) for every polynomial P on cl D.K
13.34 Let A be a unital Banach algebra, x ∈ A and let B be the closed subalgebra of A generated
by x and e. Show that the map F (χ) = χ(x) defines a homeomorphism from ΣB onto σB (x).
13.35 Let χ be a linear functional on A with χ(e) 6= 0. Prove that the following are equivalent.
(a) χ is a character of A. (b) ker A is an ideal. (c) ker A is a subalgebra and χ(e) = 1.
13.36 The radical of A is the intersection of all maximal ideals in A. Prove that the radical consists
of all x ∈ A such that limn (cx)n = 0 for all c ∈ C.
b(f ) := f (x), f ∈ F
x
ιS : S → S F , ιS (x) = xb,
is called the canonical mapping from S to S F . The Gelfand representation theorem yields
a simple proof of the following generalization of the Stone-Čech compactification theorem:
13.4.2 Theorem. Let S be a topological space and F a unital C ∗ -subalgebra of Cb (S).
(a) S F is compact Hausdorff topological space and ιS is a continuous function from S
onto a dense subset of S F .
(b) The adjoint map ι∗S : C S F → F is a surjective isometric isomorphism.
(c) Let T be a topological space, G a unital C ∗ -subalgebra of Cb (T ), and ϕ : S → T a
continuous function such that the dual map ϕ∗ : Cb (T ) → Cb (S) maps G into F. Then
there exists a continuous map ϕe : S F → T G such that the following diagram commutes:
ϕ
S F −−−−→ T G
e
x x
ιS
ιT
ϕ
S −−−−→ T
e ◦ ιS = ιT ◦ ϕ. Clearly, ϕ
hence ϕ e is continuous in the Gelfand topology.
328 Principles of Analysis
In this subsection we use the Gelfand representation theorem to prove the following classical
result:
13.4.4 Theorem (N. Wiener). The reciprocal of an absolutely convergent, nonvanishing
trigonometric series is an absolutely convergent trigonometric series.
Proof. (Gelfand). We apply the representation theorem to the unital, commutative Banach
algebra `1 (Z) (13.1.1(f)). We claim that the characters of `1 (Z) are the functions χz defined
by
X∞
χz (x) = xn z n , x ∈ `1 (Z), z ∈ T.
n=−∞
Clearly χz (e0 ) = 1. The calculation
∞
X ∞
X ∞
X ∞
X
χz (x ∗ y) = xn−k yk z n = yk z k z n−k xn−k = χz (x)χz (y)
n=−∞ k=−∞ k=−∞ n=−∞
b If x
Thus the absolutely convergent trigonometric series are precisely the characters x. b is
never zero, then by (d) of the representation theorem the reciprocal 1/x b is the Gelfand
transform of a member of `1 (Z), proving the theorem.
Banach Algebras 329
Exercises
13.37 Let A be a unital C ∗ -algebra and x ∈ A normal. Prove the following:
(a) x is unitary iff σ(x) ⊆ T.
(b) x is self-adjoint iff σ(x) ⊆ R.
(c) x is a projection iff σ(x) ⊆ {0, 1}.
(d) If A is commutative, then x ≥ 0 iff σ(x) ⊆ R+ . (See Ex. 13.51 for a strengthened version.)
JConsider C ∗ (x).K
13.38 Let A be a unital commutative C ∗ algebra and let x ∈ A be a projection such that x =
6 0 and
x 6= e. Show that the spectrum of x is disconnected.
13.39 Show that the spectrum of x ∈ `1 (Z) consists of all numbers ∞n=−∞ xn z with z ∈ T.
n
P
13.40 Let A be a unital Banach algebra and x, y ∈ A with xy = yx. Prove: r(x+y) ≤ r(x)+r(y).
13.41 Let A be the Banach algebra C 1 [0, 1] with the norm kf k = kf k∞ +kf 0 k∞ . One may argue exactly
as in 13.3.2 that the mapping x → x b is a homeomorphism from [0, 1] onto the spectrum of A,
so the spectrum may be identified with [0, 1]. Show that the Gelfand transform Γ : A → C[0, 1]
is neither surjective nor an isometry.
Then A1 is an algebra with identity (0, 1). Moreover, (x, a) = (x, 0) + (0, a)(0, 1), so
identifying A × {0} with A and {0} × C with C we may write (x, a) = x + a. With this
algebraic identification, A is a maximal ideal in A1 . Moreover, it is easy to check that A1 is
a Banach algebra under the norm
and that A is isometrically isomorphic to A × 0. (In Ex. 13.43, the reader is asked to verify
these assertions.) The algebra A1 is called the unitization of A.
The spectrum of A1 is related to that of A as follows: For χ ∈ σ(A) define
Then χ1 is easily seen to be a character of A1 (Ex. 13.44). In particular, |χ(x)| = |χ1 (x, 0)| ≤
kχ1 k k(x, 0)k = kxk, so kχk ≤ 1. Thus a character of A is a member of the closed unit ball
of A 0 . The spectrum of A may not be closed, but it is the case that σ(A) ∪ {0} is closed
and hence weak∗ compact. Indeed, if χα → ϕ in the weak∗ topology of σ(A), then ϕ is easily
seen to be a homomorphism, hence either ϕ = 0 or ϕ ∈ σ(A). Now let ϕ be any character of
σ(A1 ). Then
ϕ(x, a) = ϕ(x, 0) + ϕ(0, a) = ϕ(x, 0) + a.
330 Principles of Analysis
The map x → ϕ(x, 0) is either a character of A 0 or the zero homomorphism. In the former
case, ϕ is of the form χ1 as in (13.3), and in the latter case ϕ is the character ϕ0 (x + a) := a.
Thus we see that
σ(A1 ) = {χ1 : χ ∈ σ(A)} ∪ {ϕ0 } and σ(A1 )A = σ(A) ∪ {0}. (13.4)
In this way we may identify σ(A1 ) with σ(A) ∪ {0}. From (13.3), the Gelfand transforms
Γ : A → σ(A) and Γ1 : A1 → σ(A1 ) are related by
Proof. Since σ(A) ∪ {0} is weak∗ compact and since removing a point from a compact space
produces a locally compact space, we see that σ(A) is locally compact, proving (a). Thus if
σ(A) is not compact, then σ(A) ∪ {0} is the one-point compactification of σ(A).
Part (b) is clear. To prove (c) recall that, by the unital case,
\
k(x, 0)k∞ = lim k(x, 0)n k
1/n 1/n
= lim kxn k .
n n
Furthermore,
\
(x, b
0)(χ1 ) = Γ((x, 0))(χ1 ) = Γ(x)(χ) = x(χ) \
and (x, 0)(ϕ0 ) = 0,
\
hence k(x, b ∞ . Therefore, (b) holds.
0)k∞ = kxk
Finally, the hypothesis in (d) implies that Γ(A) is conjugate closed. Since Γ(A) trivially
separates points and characters are not identically zero, the locally compact version of the
Stone-Weierstrass theorem (0.12.13) implies that Γ(A) is dense in C0 (X).
Then χ∞ agrees with χ on the space C0 (X), which may be identified with the set of functions
in C(X∞ ) that are zero at ∞. We claim that χ∞ is in the spectrum Σ∞ of C(X∞ ). Clearly,
χ∞ is linear and χ∞ (1) = 1. From
f0 g0 = f X − f (∞) g X − g(∞) = (f g)X − f (∞)g0 − g(∞)f0 − f (∞)g(∞)
Banach Algebras 331
we have
(f g)0 = (f g)X − f (∞)g(∞) = f0 g0 + f (∞)g0 + g(∞)f0 .
Since f0 , g0 ∈ C0 (X),
χ∞ (f g) = χ (f g)0 + (f g)(∞) = χ(f0 )χ(g0 ) + f (∞)χ(g0 ) + g(∞)χ(f0 ) + f (∞)g(∞)
= χ∞ (f )χ∞ (g).
Therefore, χ∞ ∈ Σ∞ and so χ∞ = c ∞ or x b for some x ∈ X. But if the former, then for all
f ∈ C(X∞ ) we have f (∞) = χ∞ (f ) = χ(f0 ) + f (∞), which implies that χ(g) = 0 for all
g ∈ C0 (X), contrary to the definition of character. Thus χ∞ = x b for some x ∈ X and so
χ=x b = F (x), proving that F is surjective.
It remains to show that F −1 is continuous. This follows from the implications x bα → xb
in Σ ⇒ f (xα ) → f (x) for all f ∈ C0 (X) ⇒ f (xα ) → f (x) for all f ∈ C(X∞ ) ⇒ xbα → x
b in
Σ∞ ⇒ xα → x in X∞ ⇒ xα → x in X. ♦
We show that the spectrum Σ of the Banach algebra L1 (Rd ) (13.1.1(g)) may be identified
with Rd as follows: For t ∈ Rd , define a function φt on L1 (Rd ) by
Z
φt (f ) := ei t·x f (x) dx. (†)
It follows that for each f ∈ L1 , φ(f )h(y) = φ(fy ) for a.a. y. Choosing f so that φ(f ) 6= 0, we
332 Principles of Analysis
then have h(y) = φ(f )−1 φ(fy ) for a.a. y. The right side of this equation is then a continuous
version of h. Replace h by this version and note that h, which is uniquely determined by φ,
does not depend of f . Thus for all f ∈ L1 and y ∈ Rd , φ(f )h(y) = φ(fy ) and so
Replacing f (x) by eitx f (x), we may take t = 0. Thus we must show that tα → 0. Taking
f = 1[0,1] in (‡) and integrating shows that the net (tα ) must be bounded. Let (tβ ) be
any convergent subnet, say tβ → s. Then ei tβ x → ei sx uniformly in x ∈ [0, 1], and taking
f = (e−i tβ x − 1)1[0,1] in (‡) we see that
Z 1
|ei sx − 1|2 dx = 0.
0
i sx
Therefore, e = 1 for all x ∈ [0, 1], which is possible only if s = 0. This shows that tα → 0,
completing the argument. ♦
Exercises
13.42 Let f ∈ Cb (Rd ) such that f (x) 6= 0 and f (x + y) = f (x)f (y) for all x, y ∈ Rd . Carry out the
following steps to prove that there exists t ∈ Rd such that f (x) = exp (i t · x) for all x.
Z a Z a
(a) There exists a > 0 such that α := ··· f (y1 , . . . , yd ) dy1 . . . dyd 6= 0.
0 0
Z a+x1 Z a+xd
(b) αf (x) = ··· f (y1 , . . . , yd ) dy1 . . . dyd , hence f is continuously differentiable.
x1 xd
13.43 Let A be a Banach algebra and A1 the unitization of A. Prove the following:.
(a) A1 is an algebra with identity 1 := (0, 1).
(b) (x, a) = (x, 0) + (0, a)(0, 1), so that identifying A × {0} with A with {0} × C with C we
may write (x, a) = x + a.
(c) A1 is commutative iff A is commutative.
(d) A1 is a Banach algebra with the norm k(x, a)k = kxk+|a| and A is isometrically isomorphic
to A × 0.
(e) A is a maximal ideal of A1 .
13.44 Let A be a nonunital commutative Banach algebra with spectrum Σ and let A1 be the unitization
of A. Prove that χ1 is a character of A1 and that the mapping χ → χ1 is an injection from Σ
into the spectrum Σ1 of A1 .
Banach Algebras 333
Ψ(1) = Γ−1 (1 ◦ x)
b = Γ−1 (1) = e.
is bounded on C. We claim that f is an entire function. Assuming this for the moment, we
conclude from Liouville’s theorem that fxy is constant. Therefore,
for all x, y, that is, R(z) = S for all z. Thus S exp(zT ∗ ) = exp(zT ∗ )S and so by induction
X∞
zn
cn = 0 for all z.
n=0
n!
n
Since for some M > 0, |cn | ≤ M kT ∗ k , the series converges uniformly on bounded sets
and therefore defines an analytic function of z. Since the function is identically zero, the
coefficients cn are zero. In particular c1 = 0, which implies the desired result.
To see that f := fxy is entire, set cm,n := ((T ∗ )n S(T ∗ )m x | y) and note that
X (−1)n
f (z) = (exp(−zT ∗ )S exp(zT ∗ )x | y) = z n+m cm,n .
m,n
n!m!
m+n
Since for some C > 0 |cm,n | ≤ C kT ∗ k , the series converges uniformly on bounded sets.
It follows that f is entire.
13.6.3 Corollary. Let S, T ∈ B(H) with T normal. If ST = T S, then Sf (T ) = f (T )S
for all f ∈ C σ(T ) . That is, f (T ) ∈ {T }00 .
Banach Algebras 335
(b) If (fn ) is a uniformly bounded sequence in BL(K) that converges pointwise to f , then
w
fn (T )x → f (T )x for every x ∈ H.
P Pn
n k
(c) k=0 ak z (T ) = k=0 ak T k .
(d) kf (T )k ≤ kf k∞ .
Moreover, the ∗-homomorphism f 7→ f (T ) is unique with respect to properties (a) and (b).
Proof. (a) and (c) hold by the continuous functional calculus f 7→ f (T ) : C(K) → B(H).
We extend this to BL(K) as follows: For each pair x, y ∈ H, the mapping f → (f (T )x | y)
is a bounded linear functional on C(K), hence, by the Riesz representation theorem, there
exists a complex measure µ(x, y) on K such that for each f ∈ C(K)
Z
(f (T )x | y) = f dµ(x, y). (13.5)
K
We claim that
(i) µ(ax + by, z) = aµ(x, z) + bµ(y, z). (ii) µ(y, x) = µ(x, y).
(iii) µ(x, x) ≥ 0. (iv) dµ(g(T )x, y) = gdµ(x, y), g ∈ C(K).
Banach Algebras 337
Indeed, by integrating against a continuous function f and using (13.5), we see that (i) holds
because (f (T )x | y) is sesquilinear in (x, y), and (ii) follows from the calculation
Z
f dµ(y, x) = (f (T )y | x) = (y | f (T )∗ x) = y | f (T )x = f (T )y | x
K
Z Z
= f dµ(x, y) = f d µ(x, y).
K K
For (iii), if f ≥ 0 and g = f 1/2 , then, by the continuous functional calculus, we have
f (T ) = g 2 (T ) = g(T )g(T ), hence
Z
f dµ(x, x) = (f (T )x | x) = (g(T )x | g(T )x) ≥ 0.
K
which shows that f (T )g(T ) = (f g)(T ). We have proved that the mapping f 7→ f (T ) from
BL(K) into B(H) is a ∗-homomorphism satisfying (a), (c), and (d).
To verify (b), we apply the dominated convergence theorem to obtain
Z Z
(fn (T )x | y) = fn dµ(x, y) → f dµ(x, y) = (f (T )x | y) .
K K
proving (e).
338 Principles of Analysis
It remains to show uniqueness with respect to properties (a) and (b). Let f 7→ fe(T ) be
another ∗-homomorphism with these properties. Then the collection of all f ∈ BL(K) for
which fe(T ) = f (T ) is a conjugate closed algebra containing all polynomials on K and is
closed under pointwise limits of uniformly bounded sequences and so must coincide with
BL(K) by 13.6.7.
The mapping f 7→ f (T ) in the above theorem is known as the Borel functional calculus.
(c) P (E ∩ F ) = P (E)P (F ).
S P∞
(d) If E1 , E2 , . . . are disjoint, and E = n En , then the series n=1 P (En )x converges
in norm to P (E)x for every x.
Proof. Parts (a) – (c) follow immediately from theSnBorel functional calculus, as does (d) for
finite sequences. For infinite sequences, set Fn = j=1 Ej . Then 1Fn → 1E pointwise on K,
w
hence P (Fn )x → P (E)x for all x ∈ X, by part (b) of 13.6.8. Set Tn = P (E) − P (Fn ) =
w
P (E \ Fn ). Then Tn x → 0 and
2
kP (E)x − P (Fn )xk = (Tn x | Tn x) = (Tn∗ Tn x | x) = (Tn x | x) → 0,
proving (d).
We may now formulate the functional calculus in terms of integrals. For each x, y ∈ X,
define P(x,y) (E) := (P (E)x | y). Then
Z
Px,y (E) = (1E (T )x | y) = 1E dµ(x,y) = µ(x, y)(E),
K
so the set function P(x,y) is simply the measure µ(x,y) of the Borel functional calculus, and
(13.5) may be written Z
(f (T )x | y) = f (z) dPx,y (z)
σ(T )
or simply Z
f (T ) := f (z) dP (z).
σ(T )
This expresses f (T ) as an integral with respect to the set function P , which is called the
spectral measure for T . The special case
Z
I= 1 dP (z)
σ(T )
is the motivation for the alternate terminology spectral resolution of the identity. The
special case f (z) = z results in the spectral theorem for normal operators:
Banach Algebras 339
Note that if T is compact, then σ(T ) is a sequence (λn ) ∈ c0 , hence the last integral
reduces to an infinite series, giving the spectral theorem of §12.3.
Exercises
13.45 Let A be a unital C ∗ algebra. Show that if x∗ = −x, then σ(x) ⊆ i R.
13.46 Let A be a unital C ∗ algebra, x ∈ A unitary and σ(x) 6= T. Show that x = eiy for some
self-adjoint y .
13.47 Verify the following assertions to obtain an alternate proof that the operators T ± are unique:
(a) Let T = A − B, for positive operators A and B with AB = 0. Then AT = T A and BT = T B.
(b) A and B commute with T ± .
(c) If C and D are positive operators and CD = DC, then CD is positive.
(d) Set S := T + − A = T − − B. Then 0 ≤ S ∗ S = S 2 = −(T − A + T + B) ≤ 0.
(e) S = 0.
Use this to prove that if x is self-adjoint, then σ(x) ⊆ [0, ∞) iff kce − xk ≤ c for some (for
every) c ≥ kxk JUse 13.2.5.K
13.50 Let A be a unital C ∗ algebra and x, y ∈ A positive. Use the preceding exercise to show that
13.51 Let A be a unital C ∗ algebra and let x ∈ A be self-adjoint. Prove the following to conclude
that x ≥ 0 iff σ(x) ⊆ R+ .
(a) If σ(x) ⊆ R+ , then x ≥ 0. JConsider the functional calculus on C ∗ (x).K
(b) Let x ≥ 0. Then x = x+ − x− , where σ(x± ) ⊆ R+ . JUse the functional calculus exactly
as in 13.6.5.K
(c) Set z = yx− . Then σ(z∗ z) ⊆ (−∞, 0].
(d) σ(zz∗ + zz∗ ) ⊆ [0, ∞). JWrite z = u + iv, where u and v are self-adjoint and use
Ex. 13.50.K
(e) σ(zz∗ ) ⊆ [0, ∞). JUse (c), (d), and Ex. 13.50.K
(f) z∗ z = 0. JUse (c), (e) and Ex. 13.26.K
(g) σ(x) ⊆ R+ .
13.52 Show that the definition x ≤ y iff y − x ≥ 0 gives a partial order on the set of self-adjoint
members of a unital C ∗ −algebra such that x ≤ y ⇒ z∗ xz ≤ z∗ yz.
340 Principles of Analysis
13.53 Let A be a unital C ∗ -algebra. Show that if x, y ∈ A are positive and xy = yx, then xy is
positive. JFirst assume A is commutative.K
13.55 Let T ∈ B(H) be normal. Show that (f ◦ g)(T ) = f (g(T )), where g is a bounded Borel function
on σ(T ) and f is a bounded Borel function on the closure K of g(σ(T )). JFix g and let B
denote the set of Borel functions f on K for which the equality holds. Then B is a conjugate
closed algebra with properties (a) and (b) of 13.6.7.K
Chapter 14
Miscellaneous Topics
In this chapter we consider some of the deeper aspects of functional analysis and give several
important applications. Additional applications may be found in Chapters 15, 16, and 17.
Proof. (a) ⇒ (b): Assume that the limits in (b) exist. By the hypothesis, (fn ) has a p
sequential limit point, say limk fnk = f ∈ C(X). Let x ∈ X be a limit point of (xm ), say
xmα → x. Then
lim lim fnk (xmα ) = lim fnk (x) = f (x) = lim f (xmα ) = lim lim fnk (xmα ).
k α k α α k
341
342 Principles of Analysis
(iii) |f1 (x) − f (x)| < 1 and |fn+1 (y) − f (y)| < 1/(n + 1), y ∈ {x, x1 , . . . , xn }.
of x which is used in (†) to obtain the point xn in (i) and (ii); and (iii) uses the fact that
f is in the pointwise closure of A. Now, since f is bounded, there exists a subsequence
(yk := xmk ) such that f (yk ) → c for some c ∈ K. Then, by (i) and (iii),
lim lim fn (yk ) = lim f (yk ) = c and lim lim fn (yk ) = lim fn (x) = f (x).
k n k n k n
Give Z := X/ ∼ the quotient topology and let Q : X → Z denote the quotient map. Define
fen on Z by fen ◦ Q = fn . Since fen is continuous, the initial topology τ defined by (fen ) is
weaker than the quotient topology τq . Furthermore, τ is metrizable by
X∞
1 |fn (x) − fn (y)|
d(Q(x), Q(y)) = n 1 + |f (x) − f (y)|
.
n=1
2 n n
Since τ is Hausdorff and τq is compact, τ = τq . Now, by (a), (fn ) has a p-limit point f
in C(X), say fnα → f . Define fe on Z so that fe ◦ Q = f . Then fe is well-defined, since
Q(x) = Q(y) ⇒ f (x) = limα fnα (x) = limn fnα (y) = f (y). Since fe is a p-limit point of
p p
(fen ), by the preceding paragraph fenk → fe for some subsequence (fnk ). Therefore fnk → f ,
proving (a).
Part (b) of the lemma is known as Grothendieck’s double limit property.
convergence in this space. It follows that C10 , which is weak∗ compact, is metrizable under
the metric
X∞
1 |f (xn ) − g(xn )|
d(f, g) = n 1 + |f (x ) − g(x )|
.
n=1
2 n n
In particular, C10 has a weak∗ dense sequence (fm ). By a diagonal argument, there exists a
subsequence (yn ) of (xn ) such that αm := limn hyn , fm i exists for each m. Since (yn ) is
w
relatively weakly compact, there exists y ∈ Y and a subnet (yα ) of (yn ) such that yα → y.
Therefore, hy, fm i = limα hyα , fm i = αm for all m. If z is another such limit point, then
hz, fm i = αm for all m, hence y = z because (fm ) is weak∗ dense in C10 . Therefore, (yn )
has a unique weak limit point and so must converge weakly.
For the sufficiency, note that the hypothesis and 10.1.2 imply that A is norm bounded.
Let (xα ) be a net in A. Then (x b α ) is a norm bounded net in A b and so has a subnet (xb β)
∗ 00 b
that weak converges to some ϕ in X . It remains to show that ϕ ∈ X, that is, ϕ is weak∗
continuous. By 10.2.9, it suffices to show that the restriction of ϕ to the closed unit ball C10
in X 0 is w∗ -continuous. But this topology is simply the topology of pointwise convergence
on C10 . Thus we have reduced the problem to showing that A b 0 is relatively p-compact in
C1
0
the space of continuous functions on C 1 . But this follows from the hypothesis and 14.1.1,
b
since A C 0 is relatively p-sequentially compact.
1
By the proof of the Vitali-Hahn-Saks theorem (5.2.4), given ε > 0 there exists δ > 0 and
m ∈ N such that
Now observe that η µ, hence we may choose δ0 > 0 so that µ(A) < δ0 ⇒ η(A) < δ. For
such A, supn |µn (A)| ≤ 3ε from (‡). Thus, by 4.4.2, (gn ) is uniformly integrable.
Miscellaneous Topics 345
Suppose U is weakly relatively compact but not uniformly integrable. Then there exists
δ > 0 such that Z
lim sup |f | dµ ≥ 2δ.
n→∞ f ∈U |f |>n
By the Eberlein-S̆mulian theorem, (fn ) has a subsequence (gn ) that converges weakly to
some g. But then (gn ) is uniformly integrable by 14.2.3, contradicting (α).
Conversely, suppose that U is uniformly integrable. We show that a sequence (fn ) in U
has a weakly convergent subsequence. By considering real, imaginary, positive, and negative
parts, we may assume that fn ≥ 0 for all n. Note that by 4.4.2
Then choose N so that |µn (Hk ∩ G) − µm (Hk ∩ G)| < ε for all m, n ≥ N . By the triangle
inequality, |µn (H ∩ G) − µm (H ∩ G)| < 3ε for all n, m ≥ N . Therefore the sequence
(µn (H ∩ G)) is Cauchy, hence H ∈ H.
We have shown that the limit η(E) exists for all E ∈ F0 . By the Vitali-Hahn-Saks
theorem, η is a measure, and clearly η µ. By the Radon-Nikodym theorem, there exists a
F0 -measurable function g such that dη = g dµ. Thus
Z Z
hg dµ = lim hgn dµ (γ)
n
holds for all F0 -measurable indicator functions h, hence for all F0 -simple functions. Since
the simple functions are dense in L∞ (F0 , µ) (4.2.1), an approximation argument shows that
(γ) holds for all h ∈ L∞ (F0 , µ). Therefore, gn → g weakly in the subspace L1 (F0 , µ), hence
also in the ambient space L1 (F, µ).
Since |βj | ≤ 1, cl B is the closed convex hull of the weakly compact set (cl D) · K and hence
is weakly compact.
Miscellaneous Topics 347
Mazur’s Theorem
Here is an analog of the Krein-S̆mulian theorem for Fréchet spaces, but in the original
topology.
E such that co F ⊆ V + E.
is continuous, co F is compact. Therefore, there exists a finite setP
m
It follows that co K ⊆ U + E. Indeed, let y ∈ co K, say y := j=1 tj yj , where yj ∈ K,
Pm
tj ≥ 0, and j=1 tj = 1. By choice of F , there exist zj ∈ F such that yj − zj ∈ V . By
convexity of V ,
m
X m
X
y= tj (yj − zj ) + tj zj ∈ V + co F ⊆ V + V + E ⊆ U + E.
j=1 j=1
We may assume each cj 6= 0, otherwise reduce the above sums accordingly. Choose k so that
|cj /tj | ≤ |ck /tk | for j = 1, . . . , m. Then tj /tk ≥ |cj /ck | ≥ cj /ck , hence, using (†), we have
Xm Xm
cj cj cj
tj − tk ≥ 0, tj − tk = 1, and tj − tk aj = x.
ck j=1
ck j=1
ck
Since the kth coefficient in the last sum is 0, x is now expressed as a convex combination of
fewer than m vectors in A. Continuing P this reduction process verifies the claim.
Now let S = {(t1 , . . . , td+1 ) : tj ≥ 0, j tj = 1}. By the result of the previous paragraph
applied to A = K, we see that co K is the imageP of the compact set S ×K ×· · ·×K under the
d+1
continuous map (t1 , . . . , td+1 , x1 , . . . , xd+1 ) 7→ j=1 tj xj . Therefore, co K is compact.
348 Principles of Analysis
Proof. Assume that (a) holds. The verification of (b) is a simple induction argument. Indeed,
Pn+1
the assertion is obviously true for n = 2, and if the assertion holds for n and z = j=1 tj xj ,
then, setting t = 1 − tn+1 , we have
X
n Xn
tj tj
z=t xj + (1 − t)xn+1 , = 1,
j=1
t j=1
t
Proof. Let x ∈ int K and choose y ∈ int K with y 6= x. Since the interior of a convex set
is convex (9.1.1), the function f (t) = y + t(x − y) = tx + (1 − t)y maps [0, 1] into int K.
Since f is continuous, f [0, 1 + ε] ⊆ int K for some ε > 0. Set z = f (1 + ε) = (1 + ε)x − εy.
Then z, y ∈ K, z = 6 y, and x = ε(1 + ε)−1 y + (1 + ε)−1 z, hence x is not an extreme
point.
Miscellaneous Topics 349
14.4.3 Examples.
(a) Let X be a strictly convex normed space. It follows from 8.1.8 that the extreme points
of the closed unit ball in X are the points on the boundary S1 . In particular, this holds for
Hilbert spaces and Lp spaces (1 < p < ∞).
(b) The closed unit ball C1 in c0 has no extreme points. Indeed, if x = (xn ) ∈ S1 and n is
chosen so that |xn | < 1/2, then the equation
is closed and does not contain x0 , hence we may choose a nonzero g ∈ C(X) such that g = 0
on C and kgk∞ < r, r > 0 to be determined. Now, f = 12 (f + g) + 12 (f − g) so if we can
choose r so that kf ± gk ≤ 1 it will follow that f is not extreme. Thus it suffices to show
that for suitable r, |f (x)| + |g(x)| ≤ 1 for x ∈ C c . But for such x,
hence
|f (x)| + |g(x)| ≤ 21 (1 + |f (x0 )|) + r.
Choosing r = 21 (1 − |f (x0 )|) completes the argument.
A similar argument shows that the extreme points of the closed unit ball in L∞ are
the functions f with |f (x)| = 1 a.e. (Or one may use the fact that L∞ is isometric and
isomorphic to C(X), where X is the spectrum of the C ∗ -algebra L∞ .)
(f) Let X be a compact Hausdorff space. Identify the dual of C(X) with the space of all
complex regular Borel measures µ on X with total variation norm |µ|(X). Let C10 denote
the closed unit ball in C(X)0 and P the convex subset of probability measures. We show:
350 Principles of Analysis
(i) The extreme points of C10 are the complex measures cδx , c ∈ C, |c| = 1.
(ii) The extreme points of P are the Dirac measures δx .
To see that cδx is extreme in C10 , let cδx = tµ + (1 − t)ν, where µ, ν ∈ C10 and 0 < t < 1.
For any Borel set E 3 x,
hence µ(E) = ν(E) = c. In particular, µ(X) = ν(X) = c, hence µ(E c ) = µ(X) − µ(E) =
0 = ν(E c ). Therefore, µ = ν and so cδx is extreme in C10 .
Conversely, suppose that kµk = |µ|(X) = 1 and that the support K of |µ| contains at
least two points x and y. Choose disjoint open sets U 3 x and V 3 y. Then |µ|(U ) > 0 and
|µ|(V ) > 0, hence also |µ|(U c ) > 0. Define
µ(U ∩ E) µ(U c ∩ E)
ν(E) = and η = ,
|µ|(U ) |µ|(U c )
and convex, B ⊆ K. Suppose the containment is proper, and let x ∈ K \ B. By 9.3.2, there
exists a real continuous linear functional f such that f (x) < inf f (B). Now, since the set
C := {z ∈ K : f (z) = inf f (K)} is nonempty, compact, and convex, it has an extreme point
z, by the first paragraph. Since C is an extreme subset of K (by the lemma), z is an extreme
point of K. In particular z ∈ B, which is impossible, since f (z) = inf f (K) ≤ f (x) < f (y)
for all y ∈ B. Therefore, it must be the case that B = K.
The following theorem describes a minimality property of ex K. It asserts that the closure
of any subset E of K that “generates” K must already contain the extreme points of K.
14.4.6 Theorem. Let X be a LCS and let K ⊆ X be a nonempty, compact, convex, subset
of X. If K = cl co E, then ex K ⊆ cl E.
Proof. We may assume that E is closed, hence compact. Suppose for a contradiction that x
is an extreme point of K not contained in E. Let U be a closed, balanced, neighborhood of
zero such that (x + U ) ∩ E = ∅. By compactness, there exist z1 , . . . , zn ∈ E such that the
sets zj + U cover E. Set Ej := E ∩ (zj + U ), these sets being compact and contained in K.
Now, the mapping
n n
X o n
X
(t1 , . . . , tn ) : tj ≥ 0, tj = 1 × E1 × · · · × En → K : (t1 , . . . , tn , x1 , . . . , xn ) 7→ tj xj
j=1 j=1
S S
n n
is continuous and so has compact range co j=1 Ej . Since E ⊆ co j=1 Ej , we have
[
n [
n
K = cl co E ⊆ cl co Ej = co Ej .
j=1 j=1
Pn Pn
Thus x may be expressed as x = j=1 tj xj , where xj ∈ Ej , tj ≥ 0, and j=1 tj = 1. Since
x is extreme, x = xj for some j. Thus x ∈ Ej ⊆ zj + U ⊆ E + U. But then x = e + u for
some e ∈ E and u ∈ U , producing the contradiction x − u = e ∈ (x + U ) ∩ E = ∅.
14.4.7 Remarks.
(a) The set of extreme points of a compact convex set need not be closed, even in the finite
dimensional case, as the figure illustrates.
(b) If X is the dual of a normed space, then the closed unit ball C1 is weak∗ compact and so
C1 is the closed convex hull of its extreme points. Thus, by (b) and (c) of 14.4.3, L1 (Rd , λd )
and c0 are not dual spaces.
(c) The space C(X, R), where X is a nontrivial, compact, connected, Hausdorff topological
space, is not a dual space. Indeed, the extreme points of C1 are the functions f with |f | = 1.
For such a function, f −1 {−1} and f −1 {1} are disjoint open sets whose union is X, hence one
352 Principles of Analysis
of these sets must equal X. Therefore, the extreme points of C1 are the constant functions ±1
and so ex C1 consists of constant functions. However, Urysohn’s lemma implies the existence
of nonconstant functions in C1 . Thus cl co(ex C1 ) 6= C1 , verifying the assertion. ♦
1
Pn
Let f ∈ C(X). Since T (µn ) = n j=1 T j (µ) we have
Thus if ν is any weak∗ limit point of (µn ), then T (ν)(f ) = ν(f ) for all f ∈ C(X), hence PT
is nonempty.
Now let µ ∈ PT . The mapping U f := f ◦ T maps L2 (µ) onto L2 (µ) and is unitary. By
the mean ergodic theorem (12.1.16),
n−1
1X j
lim U f = Pf (†)
n n j=0
in L2 (µ) norm, where P is the projection onto the closed linear subspace of L2R(µ) consisting
of those g ∈ L2 with U g = g. Applying the continuous linear functional h → h dµ = µ(h),
we see by invariance of µ that µ(P f ) = µ(f ).
We claim that if µ is ergodic, then P f must be constant and that constant must be
µ(f ). To see this, observe that U maps real functions onto real functions, hence so does P .
By considering real and imaginary parts, we may take f to be real. Set g = P f . Because
g = U g = g ◦ T , the set An := {g ≥ µ(g) + 1/n} satisfies T −1 (An ) = An µ-a.e. and so has
measure zero or one. If the measure were one, then by integrating we would obtain the
absurdity µ(g) ≥ µ(g) + 1/n. Therefore, µ(An ) = 0 for all n and so g ≤ µ(g) µ-a.e. A similar
argument shows that g ≥ µ(g) µ-a.e. Thus P f = µ(P f ) = µ(f ) for all f ∈ C(X), verifying
the claim.
Miscellaneous Topics 353
Setting t = µ(A) we have 0 < t < 1 and µ = tν + 1 − t)η, hence µ is not extreme. This
verifies the claim for the case T = {T }.
Now consider the general case. Let S, T ∈ T. Because the maps commute, T maps
PS onto itself, hence we may restrict the mapping µ → T (µ) in the argument of the
second paragraph to the set PS to conclude that µ is both T - and S-invariant, that is,
P{S,T } 6= ∅. More generally, PF 6= ∅ for every finite F ⊆ T. Since these sets are compact,
their intersection PT is nonempty. The entire argument in the preceding paragraphs then
goes through if {T } is replaced by T.
Proof. We identify the space L∞ (E) with the subspace of all f ∈ L∞ (X) such that f = 0 on
E c . Since ν is σ-finite, we may suppose that ν(E) < ∞, otherwise consider a subset F of E
with positive finite measure and work with the subspace L∞ (F ). Set E0 = E. Since ν is non-
atomic, we may choose measurable sets En such that En ⊆ En−1 and 0 < ν(En ) < ν(En−1 )
for all n. Set Fn = En−1 \ En . Then the sets Fn are disjoint and have positive measure
implying that the indicator functions 1Fn are linearly independent.
14.5.5 Theorem (Lyapunov). Let µ1 , . . . , µd be real-valued non-atomic measures on F.
For E ∈ F, define µ(E) = µ1 (E), . . . , µd (E) . Then the set µ(F) := {µ(E) : E ∈ F} is a
compact convex subset of Rd .
Pd
Proof. Set ν = j=1 |µj | and note that ν is a non-atomic measure on (X, F) with |µj | ν
for each j. By the Radon-Nikodym theorem, there exists gj ∈ L1 (ν) such that dµj = gj dν,
so µ = (g1 ν, . . . , gd ν). Define a linear map
Z Z Z Z
∞ d
T : L (ν) → R , T f = f dµ1 , . . . , f dµd = f g1 dν, . . . , f gd dν .
X X X X
Then T is continuous with respect to the weak∗ topology of L∞ (ν) = (L1 )0 and the norm
topology of Rd . Moreover, T 1E = µ(E) for all E ∈ F. Now consider the convex set
C :=R {f ∈ L∞R(ν) : 0 ≤ f ≤ 1 ν a.e.}. If (fα ) is a net in C that w∗ -converges to f , then
0 ≤ E f dν ≤ E 1 dν for all E, hence 0 ≤ f ≤ 1 ν-a.e. Therefore, C is w∗ -closed and so is
w∗ -compact, by the Banach-Alaoglu theorem. Thus T (C) is compact and convex in Rd .
We claim that µ(F) = T (C), which will prove the theorem. By definition of C, we
have µ(E) = T (1E ) ∈ T (C) for all E, that is, µ(F) ⊆ T (C). For the reverse inclusion, let
x ∈ T (C) and consider the convex, weak∗ compact set K := {f ∈ C : T f = x}. By the
Krein-Milman theorem, K has an extreme point g. We show that g is an indicator function.
If not, then ν{g(1 − g) 6= 0} > 0 and so for some ε > 0 the set E := {ε ≤ g ≤ 1 − ε} has
positive ν measure. By the lemma, L∞ (E) is infinite dimensional. Since T L∞ (E) is finite
dimensional, it follows that T cannot be 1-1, hence T h = 0 for some nonzero h ∈ L∞ (E)
−1
with khk∞ 6= 0. Multiplying by ε khk∞ , we may assume that −ε ≤ h ≤ ε. But then g ± h
are distinct members of K and g = 2 (g + h) + 12 (g − h), contradicting that g is extreme in
1
Proof. By 14.5.8, is suffices to show that T is noncontracting. Let x, y ∈ C and let (Tα )
be a net in T such that Tα x − Tα y → 0. Let V be an arbitrary neighborhood of 0 and
choose U as in the theorem. Next, choose α0 such that Tα (x) − Tα (y) ∈ U for all α ≥ α0 .
For such α, x − y = Tα−1 Tα x − Tα−1 Tα y ∈ V. Since V was arbitrary, x = y.
We are now in a position to prove the main result of the section. We give a nontrivial
application in Chapter 16.
14.5.10 Theorem (Ryll-Nardzewski). Let C be a nonempty, weakly compact, convex subset
of a locally convex space Xτ and let T be a τ -noncontracting semigroup of weakly continuous
affine maps from C into itself. Then T has a fixed point.
T
Proof. (Dugundji-Granas) By 14.5.6, it suffices to prove that T ∈F FT 6= ∅, where F ⊆ T is
finite. Let S denote the subsemigroup of T generated by F. Then S consists of all products
of members of F. Choose any point x0 ∈ C. Since S is countable, the S-invariant convex
set K := clτ co(Sx0 ) ⊆ C is τ -separable. Moreover, by 10.1.6, K is weakly closed. Let X be
a weakly closed, minimal, S-invariant subset of K. We show that S is noncontracting on X
in the weak topology. It will follow from 14.5.7 that S has a fixed point in K, proving the
theorem.
Let x and y be distinct members of X. Since S is τ -noncontracting, there exists a τ -open,
convex neighborhood U of 0 such that the neighborhood V − V of zero is disjoint from
{Sx − Sy : S ∈ S}, where V := cl U . Since {z + U : z ∈ X} is a cover of X and X is
τ -separable, there exist countably many sets (zn + V ) ∩ X that cover X. Since V is τ -closed
and convex, it is weakly closed. Therefore, the weakly compact set X is a countable union of
weakly closed sets (zn + V ) ∩ X. By Baire’s theorem (0.12.5), some set (zn + V ) ∩ X contains
a nonempty, weakly open S set W . Now, the collection {S −1 (W ) : S ∈ S} of weakly open sets
covers X; otherwise X \ S∈S S −1 (W ) would be a weakly closed, nonempty, S-invariant
subset properly contained in the minimal set X. To show that S is noncontracting in the
weak topology, suppose for a contradiction that there exists a net (Sα ) in S such that
w- limα [Sα x − αSα y] = 0. We may assume by the weak compactness of X that the limits
w- limα Sα x andSw- limα Sα y exist and hence are equal. Let z denote their common value.
Since z ∈ X ⊆ S∈S S −1 (W ), we may choose S so that Sz ∈ W , implying that SSα x
and SSα y are eventually in W ⊆ zn + V . But then SSα x − SSα y is eventually in V − V ,
contradicting the choice of V .
Moreover, if X is a Hilbert space and (14.3) holds for weakly integrable f and g, then
Z Z Z
f dµ g dµ = (f (x) | g(y)) dµ(x) dµ(y) for all x ∈ X. (14.5)
X X X
R
Indeed, taking x0 (·) = · | X g dµ in (14.3), we have
Z Z Z Z
f dµ g dµ =
f (x) g(y) dµ(y) dµ(x),
X X X X
and using (14.3) on the inner integral yields (14.5). Finally, if T : X → Y is continuous and
linear and (14.3) holds for f and T f , where (T f )(x) = T (f (x)), then
Z Z Z Z Z
0 0 0 0 0 0 0
T f dµ, y = f dµ, T y = hf, T y i dµ = hT f, y i dµ = T f dµ, y ,
X X X X X
that is, Z Z
T f (x) dµ(x) = T (f (x)) dµ(x). (14.6)
X X
For the construction of the weak integral, we consider first the case of a Banach space.
In particular, Z
0
hx , Tf0 1E i = hf, x0 i dµ, E ∈ F, x0 ∈ X 0 .
E
R R
We denote Tf0 1E by E
f dµ. Thus E
f dµ is the unique member of X 00 satisfying
D Z E Z
0
x, f dµ = hf (x), x0 i dµ(x), ∀ E ∈ F and x0 ∈ X 0 . (14.7)
E E
R R
The vector E f dµ is called the Dunford integralR of f over E. If E f dµ ∈ X(= X) b for
all E, then f is said to be Pettis integrable and E f dµ is called the Pettis integral of
f over E. In this case, (14.7) may be written
DZ E Z
f dµ, x0 = hf (x), x0 i dµ(x), ∀ E ∈ F and x0 ∈ X 0 . (14.8)
E E
If X is reflexive, then the Dunford and Pettis integrals clearly coincide. The following
example shows this is not necessarily the case for nonreflexive spaces.
Miscellaneous Topics 359
14.6.1 Example. Let µ be counting measure on N and let a = (an ) ∈ `∞ (N). Define
f : N → c0 by f (n) = an en . For any x = (xn ) ∈ `1 (N) = c00 ,
∞
X
hf (n), xi = aj xj en (j) = an xn ,
j=1
hence Z ∞
X
hf (n), xi dµ(n) = an xn = hx, ai.
N n=1
R
Therefore, f is Dunford integrable with N
f dµ = a ∈ `∞ (N) = c000 , and f is Pettis integrable
iff a ∈ c0 . ♦
The next theorem gives a simple sufficient condition for Pettis integrability.
as required.
The question of countable additivity of the Dunford and Pettis integrals is of critical
importance in applications. The Dunford integral is countably additive in the weak∗ sense
but not S To see the former, let (En ) be a disjoint sequence in F and set
Snin the norm sense.
Fn = j=1 En and E = n En . Then for all x0 ,
D Z E Z Z D Z E
0 0 0 0
lim x , f dµ = lim hf (x), x i dµ(x) = hf (x), x i dµ(x) = x , f dµ ,
n Fn n Fn E E
On the other hand, taking a = (1, 1, . . . ) in Example 14.6.1 we have for any n
Z Z Z X
en , f dµ = hen , f (k)i dµ(k) = hen , ek i dµ(k) = hen , ek i = 1,
N N N k
It is a remarkable fact that, in contrast to the Dunford integral, the Pettis integral is also
countably additive in the norm sense. This may be seen as a consequence of the Orlicz-Pettis
theorem regarding weak subseries-convergence of sequences. For details the reader is referred
to [12] or [45].
14.6.4 Corollary. Let X be a LCS, X a compact Hausdorff topological space, and P(X)
R measures µ on X. If f : X → X is continuous and
the set of all Radon probability R cl co f (X)
is compact, then the integral X f dµ exists in X. Moreover, the mapping µ → X f dµ from
P(X) to X is w∗ -w continuous.
We say that f is Bochner integrable if µ(Ek ) < ∞ for all k. In this case we define the
Bochner integral of f by
Z Xn
f dµ = µ(Ek )xk .
X k=1
Note that
Z
X n Z
f dµ
≤ µ(Ek ) kxk k = kf k dµ. (14.9)
X k=1 X
An argument entirely similar to that of 3.1.1(b) shows that for Bochner integrable simple
functions f and g and scalars c,
Z Z Z Z Z
(f + g) dµ = f dµ + g dµ and cf dµ = c f dµ.
X X X X X
Thus the integral is linear on the vector space of all Bochner integrable simple functions.
A function f : X → X is strongly measurable if there exists a sequence of simple
functions fn : X → X such that limn kfn (x) − f (x)k = 0 for µ-a.a. x ∈ X. In this case we
a.e.
write fn → f . It is easy to check that the set of strongly measurable functions is a linear
space under pointwise operations. Moreover, since kfn k is measurable and kfn (x)k → kf (x)k
a.e., we see that the norm of a strongly measurable function is measurable.
A strongly measurable function f : X → X is said to be Bochner integrable if there
exists a sequence of Bochner integrable simple functions fn such that
Z
a.e.
fn → f and lim kfn − f k dµ = 0.
n X
We shall call the sequence of simple functions (fn ) in this definition a defining sequence
for the integral of f . To see that the limit in (14.10) exists, note that by (14.9)
Z Z
Z Z Z
fm dµ − fn dµ
≤ kfm − fn k dµ ≤ kfm − f k dµ + kfn − f k dµ,
X X X X
362 Principles of Analysis
R
hence fn dµ is a Cauchy sequence and so converges. To see that the limit in (14.10) is
independent of the defining sequence (fn ), let (gn ) be another such sequence. Then
Z Z
Z Z
fn dµ − gn dµ
≤ kfn − f k dµ + kgn − f k dµ → 0.
X X X X
and Z Z Z Z
cf dµ = lim cfn = lim c fn dµ = c f dµ.
X n X n X X
The last assertion
R of the proposition follows from the obvious fact (fn 1E ) is a defining
sequence for E f .
Here is a useful characterization of Bochner integrability:
14.6.6 Theorem. Let f : X → X
Rbe strongly
measurable.
R Then f is Bochner integrable iff
kf (·)k is integrable. In this case,
X f dµ
≤ X kf k dµ.
R
Proof. If f is Bochner integrable and (fn ) is a defining sequence for X f , then, by definition,
kfn − f k is integrable. Since kf k ≤ kfn − f k + kfn k, kf k is integrable.
Conversely, assume that kf k is integrable. Choose a sequence of simple functions fn
with kfn (x) − f (x)k → 0 µ-a.e. Set En = {x : kfn (x)k ≤ 2 kf (x)k} and gn := fn 1En .
Then gn is simple, and because kfn (x)k → kf (x)k, x ∈ En for Rall sufficiently large n and
so kgn (x) − f (x)k → 0 a.e. Since kgn (x) − f (x)k ≤ 3 kf (x)k, kgnR− f k dµ → R 0 by the
dominated
R convergence
R theorem. Therefore, f is Bochner integrable, gn dµ → f dµ, and
kgn k dµ
→R kf k dµ,
the
R last limit by the dominated convergence theorem. Finally, taking
limits in
X gn dµ
≤ X kgn k dµ, we obtain the desired inequality.
Next, we prove a dominated convergence theorem for the Bochner integral. For this we
need the following lemma.
14.6.7 Lemma. Suppose that (X, F, µ) is σ-finite. If fn : X → X is strongly measurable
for all n and limn kfn (x) − f (x)k = 0 a.e., then f is strongly measurable.
Proof. Since (X, F, µ) is σ-finite, there exists a positive, integrable function ψ on X (Ex. 3.25).
Since kfn − fm k is measurable, kfn − f k = limm kfn − fm k is measurable. For measurable
functions g and h, define
Z
kg(x) − h(x)k
d(g, h) = ψ dµ.
1 + kg(x) − h(x)k
Since kf (x) − fn (x)k → 0 a.e., d(f, fn ) → 0 by the dominated convergence theorem. For
each n, let (gn,k )k be a sequence of simple functions converging a.e. in norm to fn . Then
limk d(fn , gn,k ) = 0. Since d(f, gn,k ) ≤ d(f, fn ) + d(fn , gn,k ), we may choose a sequence of
simple functions hn such that d(f, hn ) → 0. Passing to a subsequence if necessary, we may
assume (since ψ is positive) that khn − f k → 0 a.e. Therefore, f is strongly measurable.
Miscellaneous Topics 363
Now let (fn ) be a defining sequence of simple functions for the integral of f . From
kT fn (x) − T f (x)k ≤ kT k kfn (x) − f (x)k, we see that (T fn ) is a defining sequence of simple
functions for the integral of T f . Therefore,
Z Z Z Z Z
T f dµ = lim T fn dµ = lim T fn dµ = T lim fn dµ = T f dµ.
X n X n X n X X
Here is the connection between the Pettis integral and the Bochner integral.
14.6.10 Proposition. If f : X → X is Bochner integrable, then f is Pettis integrable and
the Bochner and Pettis integrals coincide.
Pm
Proof. Let g be a Bochner integrable simple function, say g = k=1 xk 1Ek , µ(Ek ) < ∞.
Then for any x0 ∈ X 0 ,
DZ E DX m E X m Z X
m
g, x0 = µ(Ek )xk , x0 = µ(Ek )hxk , x0 i = hxk , x0 i1Ek dµ
X k=1 k=1 X k=1
Z DX
m E Z E
= 1Ek xk , x0 dµ = hg, x0 dµ. (†)
X k=1 X
Now let (gn ) be the defining sequence for the Bochner integral of f constructed in the
proof of 14.6.6. For any x0 ∈ X 0 , (x0 ◦ f )(x) = limn (x0 ◦ gn )(x) a.e. and |(x0 ◦ gn )(x)| ≤
kx0 k kgn (x)k ≤ 2 kx0 k kf (x)k, hence by the dominated convergence theorem and (†)
DZ E DZ E Z E Z E
f dµ, x0 = lim gn dµ, x0 = lim hgn , x0 dµ = hf, x0 dµ.
X n X n X X
R
Replacing f by f 1E shows that f is Pettis integrable and that the Bochner integral E
f dµ
is the same as the Pettis integral.
Note that the function f in Example 14.6.1 is Bochner integrable iff a ∈ `1 (N) (14.6.6).
Choosing a ∈ c0 \ `1 (N) produces an example of a Pettis integrable function that is not
Bochner integrable.
364 Principles of Analysis
(f ) c(f − g) ≤ kf − gk∞ .
Proof. (a) The first inequality is clear, and the second follows from the fact that the constant
function h(x) := kf k∞ is affine.
(b) Let 0 < t < 1. If hj ∈ A(K, R) and hj ≥ fj , then th1 + (1 − t)h2 ∈ A(K, R) and
tf1 + (1 − t)f2 ≤ th1 + (1 − t)h2 , hence
c tf1 + (1 − t)f2 (x) ≤ th1 (x) + (1 − t)h2 (x) for all x ∈ K.
Miscellaneous Topics 365
For the second part of (b), note that if c(f )(x) < a, then h(x) < a for some h ∈ A(K, R)
with h ≥ f . Since h < a on some neighborhood U of x, the inequality c(f ) < a holds on U .
Thus {c(f ) < a} is open, hence measurable.
(c) Assume for a contradiction that f (y) < c(f )(y) for some y. Since f is continuous and
concave, C := {(x, t) : t ≤ f (x)} is a closed convex subset of the real LCS X × R. Since
y, c(f )(y) 6∈ C, by the separation theorem there exists an a ∈ R and a continuous linear
functional F on X × R such that
F (x, t) ≤ a < F y, c(f )(y) ∀ (x, t) ∈ C.
In particular, F (y, f (y)) < F y, c(f )(y) , and by subtracting and normalizing we see that
F y, c(f )(y) − F (y, f (y))
F (0, 1) = > 0.
c(f )(y) − f (y)
Now define
a − F (x, 0)
h(x) := , x ∈ K.
F (0, 1)
Then
F x, h(x) = F 0, h(x) + F x, 0) = h(x)F (0, 1) + F x, 0) = a.
If also F x, t = a, then a−F (x, 0) = tF 0, 1), and dividing by F 0, 1) shows that t = h(x).
Thus h(x) is the unique real number satisfying F (x, h(x)) = a. It follows that h ∈ A(K, R).
Since for all x ∈ K,
a ≥ F x, f (x) = F (x, 0) + f (x)F 0, 1) = a − h(x)F (0, 1) + f (x)F 0, 1),
hence h(y) < c(f )(y). With this contradiction we see that (c) holds.
(d) The first inequality is proved by considering affine functions h and k majorizing f and
g, respectively, and noting that h + k is affine. The second follows from the fact that h is
affine iff th is affine.
(e) By (a) and (d) it suffices to show c(f ) + h ≤ c(f + h). But if k is affine and k ≥ f + h
then k − h is affine and majorizes f , so k − h ≥ c(f ), or k ≥ c(f ) + h. Taking infima over k
gives the desired inequality.
(f) By part (d), c(f ) = c(f −g+g) ≤ c(f −g)+c(g), hence c(f )−c(g) ≤ c(f −g) ≤ kf − gk∞ ,
the last inequality by (a).
We may now prove
14.7.3 Theorem (Choquet). Let X be a real LCS and K a nonempty, compact, convex,
metrizable subset of X. Then each x0 ∈ K is represented by a Radon probability measure µ
on cl ex K supported by ex K.
366 Principles of Analysis
Since weak inequality holds for the remaining functions hk in the definition of g, it follows
that g(tx + (1 − t)cy) < tg(x) + (1 − t)g(cy), verifying the claim.
Fix x0 ∈ K and define a functional p on C(K, R) by p(f ) = c(f )(x0 ), where c(f ) is the
function in 14.7.2. From 14.7.2(d), p is subadditive and positively homogeneous. Define a
linear functional ϕ on the subspace B := A(K, R)+R g of C(K, R) by ϕ(h+rg) = (h+rg)(x0 ).
In particular, ϕ(h) = h(x0 ), ϕ(g) = g(x0 ), and ϕ(1) = 1. We claim that ϕ ≤ p on B, that
is,
h(x0 ) + rg(x0 ) ≤ c(h + rg)(x0 ) ∀ r ∈ R and h ∈ A(K, R).
Indeed, if r ≥ 0, then c(h + rg) = h + rc(g) ≥ h + rg by 14.7.2(a,e), and if r < 0, then rg is
concave, hence c(h + rg) = h + rg by 14.7.2(c), verifying the claim. By the Hahn-Banach
theorem, ϕ extends to a linear functional µ on C(K, R) such that µ ≤ p on C(K, R). Noting
that f ≤ 0 ⇒ µ(f ) ≤ p(f ) = c(f )(x0 ) ≤ 0, we see that µ is a positive linear functional
on C(K, R). Since ϕ(1) = 1, we may identify µ with a Radon probability measure on K.
Since g(x0 ) = ϕ(g) = µ(g) for all g ∈ A(F, R), µ represents x0 . It remains to show that
supp µ ⊆ ex K.
We claim that µ(g) = µ c(g) (recalling that c(g) is a bounded Borel function). Indeed,
by 14.7.2(a), µ(g) ≤ µ c(g) . For the reverse inequality,
let h ∈ A(K, R) and h ≥ g. Then
h ≥ c(g), hence h(x0 ) = ϕ(h) = µ(h) ≥ µ c(g) . Taking the infimum over all such h yields
µ(g) = ϕ(g) = c(g)(x
R 0 ) ≥ µ c(g) .
We now have [c(g) − g] dµ = 0. Since c(g) − g ≥ 0 (14.7.2(a)), µ g < c(g) = 0.
To complete the proof it therefore suffices to show that K \ ex K ⊆ {g < c(g)}. But if
x ∈ K \ ex K and y = 6 z ∈ K with x = 12 (y + z), then, by the strict convexity of g and the
concavity of c(g), g(x) < 12 g(y) + 12 g(z) ≤ 12 c(g)(y) + 12 c(g)(z) ≤ c(g)(x), as required.
The proof of Choquet’s Theorem given above is due to Bonsall. This, as well as a proof
of the more general Choquet-Bishop-deLeeuw theorem (where the metrizability hypothesis
on K is removed), may be found in [37], which contains many related results. Here is one of
particular interest:
14.7.4 Corollary (Rainwater). Let X be a separable normed space, x ∈ X, and (xn ) a
bounded sequence in X. If limn hxn , x0 i = hx, x0 i for every extreme point x0 of the closed
w
unit ball C10 of X 0 , then xn → x.
Proof. We may assume that X is a real normed space. It suffices to show that limn hxn , y0 i =
hx, y0 i for y0 ∈ C10 . Since C10 is compact, convex, and metrizable in the weak∗ topology, y0
may be represented by a probability measure µ on C10 supported by the extreme points:
Z
y0 = x0 dµ(x0 ).
ex C10
Applications
Chapter 15
Distributions
Spaces of distributions are the duals of spaces of C ∞ functions on open subsets of Rd . The
operations of differentiation, convolution, and Fourier transform of functions may be extended
by duality to distributions, opening up the possibility of finding non-differentiable solutions,
so-called weak solutions, of differential equations that may not have smooth solutions. For
example, consider the partial differential equation
X
ψα (x)∂ α f (x) = g(x)
α∈S
There are no constant terms here because φ has compact support. Functions f that satisfy
the last equation for every φ ∈ Cc∞ (U ) are called weak solutions of the original PDE. There
is no reason to assume that these solutions must be smooth.
It is beyond the scope of the text to delve into the distributional theory of PDEs. Our
goal here is merely to define the main distribution spaces, describe their functional analytic
properties, and discuss the standard operations on distributions. We do, however, give a
simple application to PDEs in §15.6.
369
370 Principles of Analysis
Examples of Distributions
(a) Let f : U → C be locally Lebesgue integrable, that is, f is measurable and f K is
Lebesgue integrable for every compact subset K of U . Denote the space of all R locally
integrable functions by L1loc (U ). For each f ∈ L1loc (U ), the equation hFf , φi := U f φ dλd
defines a distribution, as may be seen by taking m = 0 in 15.1.3. Note that the mapping
f → Ff : L1loc (U ) → D0 (U ) is linear. Moreover, if we identify functions
R that are equal a.e.,
then the map is 1-1. Indeed, this amounts to the assertion that U f φ = 0 for all φ ∈ Cc∞ (U )
⇒ f = 0 a.e., which is valid by a standard approximation argument, since Cc∞ (U ) is dense
in L1 (U ). In view of this correspondence and to simplify notation one frequently writes f
for Ff , so that Z
hf, φi = hFf , φi = f φ dλd , φ ∈ D(U ). (15.1)
U
R
(b) Let µ be a Radon measure on U . Then φ → U φ dµ defines a distribution,
R again by
taking m = 0 in 15.1.3. More generally, for fixed α, the mapping φ → U ∂ α φ dµ defines a
distribution, this time by taking m = |α|.
(c) A special case of (b) is obtained by taking µ to be the Dirac measure δx at x ∈ U . This
gives the Dirac delta distribution φ → φ(x) at x.
1 These refer to the existence and the basic properties of strict inductive limits. Since an understanding of
the material in the current chapter does not depend on the abstract notion of inductive limit, the reader
may simply accept the statement of Theorem 15.1.2.
Distributions 371
15.1.4 Remarks. The Dirac delta distribution is not given by a function as in (a). To see
this, take the special case U = R and x = 0. For r > 0 consider the test function
(
exp 1 − [1 − (x/r)2 ]−1 if |x| ≤ r
φr (x) =
0 otherwise.
Here we have used the convention described in (15.1), The right side of the equation makes
sense for any locally integrable function f . Thus we define the distributional or weak
derivative ∂ α f of f by
For example, the classical derivative of f (x) = |x| does not exist on R, but the distributional
derivative of f exists and equals 1(0,∞) − 1(−∞,0) . Indeed, if φ ∈ D(R), then integrating by
parts we have
Z 0 Z ∞ Z 0 Z ∞
0 0 0
−hf, φ i = xφ (x) dx − xφ (x) dx = − φ+ φ = h1(0,∞) − 1(−∞,0) , φi.
−∞ 0 −∞ 0
Derivative of a Distribution
Generalizing the preceding, we define the derivative ∂ α F of F ∈ D0 (U ) by
It follows directly from 15.1.3 that ∂ α F ∈ D0 (U ). For an example, take H = 1[0,∞) , the
so-called Heaviside function on R. For any φ ∈ D(R) we have
Z ∞
d
FH , φ = − φ 0 = φ(0),
dx 0
0
hence FH = δ0 (φ).
372 Principles of Analysis
that is, f Fg = Ff g . Furthermore, for any 1 ≤ k ≤ d and φ ∈ D, the classical product rule
∂k (f φ) = f ∂k φ + φ∂k f implies that
h∂k (f F ), φi = −hf F, ∂k φi = −hF, f ∂k φi = −hF, ∂k (f φ)i + hF, φ ∂k f i
= h∂k F, f φi + h(∂k f )F, φi = hf (∂k F ), φi + h(∂k f )F, φi,
that is,
∂k (f F ) = f (∂k F ) + (∂k f )F.
This is the product rule for distributions.
which we write as
Ff ◦T (φ) = | det T |−1 Ff (φ ◦ T −1 ), φ ∈ D(U ), f ∈ Cc (V ).
The identification f ↔ Ff then suggests the following definition of F ◦ T for an arbitrary
distribution F :
F ◦ T (φ) = | det T |−1 F (φ ◦ T −1 ), φ ∈ D(U ), F ∈ D0 (V ).
One easily checks that F ◦ T ∈ D0 (U ). In particular, for reflections T (x) = −x we define
the distribution Fe by
e where φ(x)
Fe(φ) := F ◦ T (φ) = F (φ), e := φ(−x).
E0 (U ) := {F ∈ D0 (U ) : supp(F ) is compact}.
Recall from 9.1.6(c) that topology on the Fréchet space C ∞ (U ) is defined by the seminorms
where the Km are compact, Km ⊆ int(Km+1 ), and Km ↑ U , and that Cc∞ (U ) is dense in
C ∞ (U ). The next theorem asserts that the dual of C ∞ (U ) is E0 (U ). For the statement, we
employ the following convenient notation: Let X be a LCS and Y a linear subspace of X
with a locally convex topology with respect to which the inclusion mapping Y ,→ X is
continuous. This simply means that the given topology of Y is stronger than the relative
topology from X. It follows that the restriction to Y of every member of the dual X 0 of X
is a member of Y 0 . We express this by writing X 0 Y ⊆ Y 0 .
15.3.3 Theorem. The inclusion mapping D(U ) ,→ C ∞ (U ) is continuous, and the restriction
to D(U ) of a member G of the dual of C ∞ (U ) is a distribution F . Moreover, F has compact
support, and every member of D(U )0 with compact support arises in this manner, that is,
extends (uniquely) to a member of the dual of C ∞ (U ). Thus
C ∞ (U )0 = E(U )0 .
D(U )
Proof. Let (φn ) τ -converge to 0 in D(U ) as in (a) of 15.1.2. Thus there exists a compact
∞
K ⊆ U such that (φn ) ⊆ CK (U ) and ∂ α φn → 0 uniformly on U for all multi-indices α. Then,
α
trivially, ∂ φn → 0 uniformly on any compact subset of U . Thus by (b) of 15.1.2, D(U ) ,→
C ∞ (U ) is continuous. This shows that every continuous linear functional G on C ∞ (U )
374 Principles of Analysis
restricts to a continuous linear functional F on D(U ). To see that F has compact support, by
continuity of G choose C > 0 and m, N ≥ 1 such that |G(φ)| ≤ C max|α|≤N pm,α (φ) for all
φ ∈ C ∞ (U ). If φ ∈ Cc∞ (U ) and supp(φ) ⊆ Km c
, then pm,α (φ) = 0 and so F (φ) = G(φ) = 0.
c
Therefore, Km is one of the open sets comprising U \ supp(F ), hence supp F ⊆ Km .
Conversely, let F ∈ D0 (U ) have compact support. Choose ψ ∈ Cc∞ (U ) such that ψ = 1 on
supp F and set K := supp ψ ⊇ supp F . Define G on C ∞ (U ) by G(f ) = F (f ψ). By continuity
∞
of F on CK (U ), there exists M > 0 and N ≥ 1 such that |F (f ψ)| ≤ M max|α|≤N k∂ α (f ψ)k∞
for all f ∈ C ∞ (U ). Now, by the product rule, ∂ α (f ψ) is a sum of derivatives (∂ β f )(∂ γ ψ)
(|β| + |γ| = |α|), and each of the terms ∂ γ ψ has support in K. Letting M 0 be a bound for
the sum of the terms |∂ γ ψ|, we then have for sufficiently large m
|G(f )| = |F (f ψ)| ≤ M M 0 max sup |∂ α (f )| = M M 0 max pm,α (f ), f ∈ C ∞ (U ),
|α|≤N K |α|≤N
An argument similar to that of the preceding paragraph (using the mean value theorem)
shows that if tn → 0, then
t−1 −1
n ψx+tn (y) − ψx (y) = tn ψ(x + tn − y) − ψ(x − y) → [(∂/∂x1 )ψ](x − y)
Distributions 375
in D(Rd ) and so (∂/∂x1 )(F ∗ ψ)(x) = F (∂/∂x1 ψ)x . Analogous arguments
apply to the
α α
other variables. By induction we obtain ∂ (F ∗ ψ)(x) = F (∂ ψ)x , proving (a) and (b).
From the definitions of convolution and derivative,
verifying (c).
15.4.2 Proposition. If F ∈ D0 (Rd ) and ψ ∈ Cc∞ (Rd ), then supp F ∗ ψ ⊆ supp F + supp ψ.
In particular, the members of E0 (Rd ) ∗ Cc∞ (Rd ) have compact support, that is, the inclusion
E0 (Rd ) ∗ Cc∞ (Rd ) ⊆ E0 (Rd ) holds.
Proof. Since supp F is closed and supp ψ is compact, the set C := supp F + supp ψ is closed.
Let U be open with compact closure contained in C c . Then cl U − supp ψ is compact and
does not meet the closed set supp F , hence there exists g ∈ C ∞ (Rd ) such that g = 0 on an
open set V ⊇ cl U − supp ψ and g = 1 on an open set W ⊇ supp F . Then for all φ ∈ D(Rd ),
supp(gφ − φ) ⊆ W c ⊆ (supp F )c and so F (gφ − φ) = 0. In particular, F (gψx − ψx ) = 0 that
is, F ∗ ψ(x) = F (gψx ). But if x ∈ U , then gψx is identically equal to zero. Indeed, assume
that g(y)ψx (y) 6= 0 for some y. Then x − y ∈ supp ψ, hence y ∈ V . But g = 0 on V . Thus
F ∗ ψ(x) = F (gψx ) = 0 on U . Since U was arbitrary, F ∗ ψ = 0 on the open set C c and so
supp F ∗ ψ ⊆ C.
The following lemma will be used to prove the associative law for convolutions.
15.4.3 Lemma. Let F ∈ D0 (Rd ) and ψ, φ ∈ D(Rd ). Then hF ∗ ψ, φi = hF, ψe ∗ φi, where
e
ψ(x) := ψ(−x).
Proof. The left side of the desired equality may be written
Z Z
hF ∗ ψ, φi = (F ∗ ψ)(x)φ(x) dx = F, φ(x) ψx dx.
where the integral may be taken to be a Bochner integral. Thus we must show that
Z Z
F, φ(x)ψx dx = F, φ(x)ψx (·) dx .
To this end, note first that the integrand on the right, as a function of y, is supported in the
compact set K := supp(φ) + supp(ψ). Overlay K with a grid Q of cubes Qj with volumes
vj and let xj ∈ Qj . Set X
S(y, Q) := φ(xj )ψxj (y)vj .
j
Then Z XZ
φ(x)ψx (y) dx − S(y, Q) = φ(x)ψx (y) − φ(xj )ψxj (y) dx.
j Qj
376 Principles of Analysis
as required.
Here is the aforementioned associative law for convolutions:
15.4.4 Theorem. Let F ∈ D0 (Rd ) and ψ, φ ∈ D(Rd ). Then F ∗ (ψ ∗ φ) = (F ∗ ψ) ∗ φ.
Proof. For all y,
Z Z
(ψ ∗ φ)x (y) = (ψ ∗ φ)(x − y) = ψ(z)φ(x − y − z) dz = e
ψ(z)φ e
x (y − z) dz = (ψ ∗ φx )(y).
We show that given qH,k there exists n ≥ 0 and M > 0 such that
and so
Recall that the space Cc (Rd ) may be viewed as a subspace of E0 (Rd ) via the identification
f ↔ Ff . Since F ∗ ψ ∈ Cc∞ (Rd ) for F ∈ E0 (Rd ) and ψ ∈ Cc∞ (Rd ) (15.4.2), the following
theorem implies that the space Cc∞ (Rd ) is weak∗ dense in E0 (Rd ).
15.4.6 Theorem. There exists a sequence (ϕn ) in Cc∞ (Rd ) such that for every F ∈ D0 (Rd ),
w∗
F ∗ ϕn → F , that is,
Z
hF, φi = limhF ∗ ϕn , φi = lim (F ∗ ϕn )φ, φ ∈ D(Rd ).
n n
Proof. Let (ϕn ) ⊆ Cc∞ (Rd ) be a sequence such that f ∗ ϕn → f uniformly for all uniformly
continuous and bounded functions f on Rd , where supp(ϕn ) ⊆ B1/n (0) and each ϕn is an
even function (6.1.3). By associativity,
Z
e
F ∗ ϕn , φ = (F ∗ ϕn )(x)φ(x) dλd (x) = [(F ∗ ϕn ) ∗ φ](0) e
= [F ∗ (ϕn ∗ φ)](0) = hF, ϕn ∗ φi,
the last equality from (ϕn ∗ φ)0 = ϕn ∗ φ (because ϕn is even). But the sequence (ϕn ∗ φ) is
supported in a compact set K and ∂ α (ϕ
n ∗ φ) = ϕn ∗ ∂
α φ → ∂ α φ uniformly on K for all α.
Therefore, ϕn ∗ φ → φ in D(Rd ) and so F ∗ ϕn , φ → F, φ .
15.4.7 Remark. Lemma 15.4.3 suggests the following definition of convolution in E0 (Rd ):
e ∗ φi, φ ∈ C ∞ (Rd ), F, G ∈ E0 (Rd ).
hF ∗ G, φi = hF, G c
It may be shown that F ∗ G is a distribution with compact support and that convolution on
E0 (Rd ) is commutative, associative, and bilinear. (See, for example, [48]). ♦
Moreover, if supp(φ) ⊆ Km , then the supremum in the definition of qα,n (φ) may be taken
over Km , hence for a suitable M > 0 depending only on m,
qα,n (φ) = sup (1 + |x|2 )n |∂ α φ(x)| ≤ M sup |∂ α φ(x)| = M pm,α (φ). (15.4)
x∈Km x∈Km
(c) D(Rd ) is dense in S(Rd ) and S(Rd ) is dense in the Fréchet space C ∞ (U ).
Proof. Part (a) is clear. For (b) let φn → 0 in D(Rd ). Then there exists m such that
supp(φn ) ⊆ Km for all n, hence, by (15.4), qα,m (φn ) → 0. This shows that D(Rd ) ,→ S(Rd )
is continuous. A similar argument using (15.3) shows that S(Rd ) ,→ C ∞ (Rd ) is continuous.
(c) Since Cc∞ (Rd ) is contained in S(Rd ) and is dense in C ∞ (Rd ) (9.1.6(b)), S(Rd ) must
be dense in C ∞ (Rd ). To show that Cc∞ (Rd ) is dense in S(Rd ), let f ∈ S(Rd ) and choose
φ ∈ Cc∞ (Rd ) such that φ(x) = 1 for all |x| ≤ 1. The function fn (x) := f (x)φ(x/n) is in
Cc∞ (Rd ), hence the desired conclusion will follow if we show that fn → f in the topology of
S(Rd ), that is,
k
1 + |x|2 ∂ α f (x) 1 − φ(x/n) → 0 uniformly on Rd .
Now, ∂ α f (x) 1 − φ(x/n) is a sum of terms ∂ β f (x) · ∂ γ 1 − φ(x/n)). Moreover, for any
compact set K, supx∈K |1−φ(x/n)| = 0 for all large n. Thus the sequence ∂ α f (x) 1−φ(x/n)
converges uniformly to zero on compact sets. Since (1 + |x|2 )k ∂ β f (x) is in C0 (Rd ), it follows
that (1 + |x|2 )k ∂ β f (x) · ∂ γ 1 − ψ(x/n)) converges uniformly to zero on Rd , completing the
proof.
implies that ∂ β φ ∈ S(Rd ) and that the function φ → ∂ β φ is continuous. Now consider
n
qα,n (f φ) = sup 1 + |x|2 |∂ α (f φ)(x)|.
x∈Rd
n n
By the product rule, (1 + |x|2 ∂ α (f φ)(x) is a sum of products (1 + |x|2 ∂ β f (x) · ∂ γ φ(x),
which are majorized by qβ,n (f ) · qγ,0 (φ). This shows that f φ ∈ S(Rd ) and that φ 7→ f φ is
continuous. A similar argument shows that φ 7→ gφ is continuous.
The members of S0 (Rd ) are called tempered distributions. By 15.5.1(b), they may be
viewed as distributions that are continuous in a weaker topology and with an enlarged space
of test functions. Their importance derives from connections with Fourier analysis, discussed
in the next subsection.
Distributions 379
15.5.3 Examples.
(a) A distribution F with compact support is tempered. To see this, let K = supp(F ) and
for φ ∈ S(Rd ) set G(φ) := F (φψ) where ψ ∈ Cc∞ (Rd ) and ψ = 1 on K. For any φ ∈ Cc∞ (Rd ),
supp(φ(1 − ψ)) ⊆ K c , hence F φ(1 − ψ) = 0, that is, G = F on Cc∞ (Rd ). Therefore,
G is a linear extension of F to S(Rd ). To see that G is continuous, let φn → 0 in S(Rd ).
Then ∂ α φn → 0 uniformly on Rd , hence, by the product rule and the boundedness of the
derivatives of ψ, ∂ α (ψφn ) → 0 uniformly on Rd . Therefore, ψφn → 0 in D(Rd ) and so
G(φn ) = F (ψφn ) → 0.
(b) AR polynomial f on Rd is tempered. This is simply the assertion that the linear functional
φ → f φ on S(Rd ) is continuous, that is, , for some continuous seminorm qα,m ,
Z
f φ ≤ qα,m (φ) ∀ φ ∈ S(Rd ).
It therefore suffices to choose m sufficiently large so that the term in parentheses is finite.
(See 3.6.3.)
(d) If F is a tempered distribution, then so are ∂ α F , f F (f ∈ S(Rd )) and gF (g a polynomial).
This follows immediately from 15.5.2. ♦
The Sobolev inequalities, proved below, imply that one actually obtains ordinary derivatives
by taking f ∈ Lpm (U ) for sufficiently large m.
Define a norm on Lpk (U ) by
X Z 1/p
α p
kf kk,p := |∂ f | .
|α|≤k U
Sobolev spaces, being defined in terms of Lp norms, tend to be somewhat easier to manage
than spaces of distributions. Moreover, they have an advantage over Lp spaces in that a
derivative of a member of Lpk (U ) is a member of Lpk−1 (U ). These features make Sobolev
spaces important tools in the study of weak solutions of PDEs.
15.6.1 Theorem. Lpk (U ) is a Banach space and L2k (U ) is a Hilbert space.
Proof. Let (fn ) be a Cauchy sequence in Lpk (U ). Then for each α with |α| ≤ k, (∂ α fn ) is a
Cauchy sequence in Lp (U ) and so converges to some fα ∈ Lp (U ). For any φ ∈ Cc∞ (U ) we
then have
Z Z Z Z
hfα , φi = fα φ = lim (∂ α fn )φ = lim(−1)|α| fn (∂ α φ) = (−1)|α| f (∂ α φ) = h∂ α f, φi,
n n
Banach space.
Distributions 381
where ∂i = ∂/∂xi . We assume all functions are real-valued. Further, we assume that the
matrix [gij ] is strictly positive definite, that is,
d
X
yi yj gij (x) > 0 for all yj ∈ R and x ∈ U0 . (15.6)
i,j=1
where the last equality comes from an integration by parts. Since the functions gij are
bounded, it follows from the definition of inner product in H12 (U ) and the CBS inequality
that for some constant c > 0
|B(φ, ψ)| ≤ c kφk1,2 kψk1,2 .
Therefore, B extends continuously to a sesquilinear form on H12 (U ). Furthermore, by (15.8),
Z Xd Z d Z
X
2 2
B(φ, φ) = a φ + gij (∂i φ)(∂j φ) ≥ m (∂j φ)2 = m kφk1,2 . (15.9)
i,j=1 j=0
Now, since
Z Z 1/2 Z 1/2 Z 1/2
2
hg ≤ |h|2
|g|2
≤ |h|2
kgk1,2 ,
R
the functional g → hg is continuous on H12 (U ). By R the Lax-Milgram theorem (11.4.2),
there exists a unique f ∈ H12 (U ) such that B(f, g) = hg. In particular, for all ψ ∈ Cc∞ (U ),
Z
hP f + af, ψi = B(f, ψ) = hψ = hh, ψi,
Sobolev Inequalities
These inequalities are important tools in determining existence and uniqueness of solutions
of a variety of PDEs, as well as in the study of regularity properties of these solutions. In
this subsection we give the reader a flavor of the subject by proving two such inequalities.
15.6.2 Theorem. If f ∈ L1d (Rd ), then kf k∞ ≤ c kf k1,d and there exists g ∈ Cb (Rd ) such
that f = g a.e. Moreover, if f ∈ L1d+k (Rd ) (k ≥ 1), then one may take g ∈ Cbk (Rd ).
Proof. Consider first the case f ∈ C ∞ (Rd ) and d = 2. For any ψ ∈ Cc∞ (R2 ),
Z y Z x
∂2ψ
ψ(x, y) = (s, t) ds dt,
−∞ −∞ ∂x ∂y
hence Z y Z x 2
2
∂ ψ
|ψ(x, y)| ≤ (s, t) ds dt ≤
∂ ψ
.
∂x ∂y
∂x ∂y
−∞ −∞ 1
Replacing ψ by f ψ, we have
2
∂ (f ψ)
|f ψ(x, y)| ≤
∂x ∂y
, x, y ∈ R.
1
Since
∂ 2 (f ψ) ∂2f ∂f ∂ψ ∂f ∂ψ ∂2ψ
=ψ + + +f ,
∂x ∂y ∂x ∂y ∂x ∂y ∂y ∂x ∂x ∂y
we see that
2
2
∂ f
|f ψ(x, y)| ≤ kψk∞
+
∂ψ
∂f
+
∂ψ
∂f
+ kf k
∂ ψ
.
∂x ∂y
∂y
∂x
∂x
∂y
1
∂x ∂y
∞
1 ∞ 1 ∞ 1
Now let 0 ≤ ψ ≤ 1 such that ψ = 1 on [−1, 1] × [−1, 1] and ψ = 0 outside [−2, 2] × [−2, 2].
Set ψn (x) = ψ(x/n). Since the partial derivatives of ψ are bounded, there exists a constant
c depending only on ψ such that for all (x, y) and n,
2
∂ f
|f ψn (x, y)| ≤ c
+
∂f
+
∂f
+ kf k .
∂x ∂y
∂x
∂y
1
1 1 1
The norm on the right is a sum of terms that are L1 norms of the derivatives
Z Z
α α |α|
∂ (f ∗ φn )(x) = f (y)∂x φn (x − y) dy = (−1) f (y)∂yα φn (x − y) dy
Z
= (∂ α f )(y)φn (x − y) dy = (∂ α f ∗ φn )(x),
where ∂ α f is the distributional derivative. Taking absolute values and integrating with
respect to x, recalling that kφn k1 = 1, we see that k∂ α (f ∗ φn )k1 ≤ k∂ α f k1 . Taking the
sum over all |α| ≤ d and using (†) we obtain
L1
Since f ∗φn → f , there exists a subsequence such that f ∗φnk → f a.e. Thus kf k∞ ≤ c kf k1,d .
Since f − f ∗ φn ∈ L1d , we may replace f in the last inequality by f − f ∗ φn to conclude that
kf − f ∗ φn k∞ ≤ c kf − f ∗ φn k1,d .
and the definition of the Fourier transform of a distribution we have for |α| ≤ m
Since S(Rd ) is dense in L2 (Rd ), F(∂ α f ) = (−2π i ξ)α F(f ). Taking L2 norms of the last
equation and using the Plancherel theorem kF(∂ α f )k2 = k∂ α f k2 , we have for a suitable
constant M1 Z
|ξ α |2 |fb(ξ)|2 dξ = M1 k∂ α f k2 , |α| ≤ m.
hence it suffices to show that the second factor on the right is finite. Pd Now, by taking α’s of
the form (0, . . . , 0) and (0, . . . , 0, m, 0 . . . , 0), we have h(ξ) ≥ 1 + j=1 |ξj |2m . The inequality
X
d m m d
X
2m 2 2
|ξ| = |ξj | ≤ d max |ξj | = dm max |ξj |2m ≤ dm |ξj |2m
1≤j≤d 1≤j≤d
j=1 j=1
2m
R −1shows that h(ξ) ≥ 1 + c|ξ| . By 3.6.3 (with s = 2m, t = 1, and p = 2) we see that
then
h < ∞ and so for a suitable M2 > 0
X
kfbk1 ≤ M2 k∂ α f k2 < ∞.
|α|≤m
It now follows from the Fourier inversion formula and the Riemann-Lebesgue lemma that
f ∈ C0 (Rd ). Finally, since Z
f (x) = fb(ξ)e2πi x·ξ dξ
P
we have kf k∞ ≤ kfbk1 ≤ C2 |α|≤m k∂ α f k2 . This completes the proof of the first part of
theorem. The second part may be proved in a similar manner by replacing f throughout by
∂ β f , |β| ≤ k.
Chapter 16
Analysis on Locally Compact Groups
Lebesgue measure on R and counting measure on Z are examples of measures µ that are
translation invariant, that is, µ(B + x) = µ(B) for all Borel sets B. These are special cases
of a general construct called Haar measure. As we shall see, the existence Haar measure
leads to a unification and generalization of Fourier analysis, the basic aspects of which are
presented in this chapter.
are continuous. For example, a TVS, and in particular Kd , is an abelian topological group
under addition. The set of nonzero members of K is an abelian topological group under
multiplication. The set of n × n matrices over K with determinant one is a nonabelian
topological group under matrix multiplication.
Here are useful alternate characterizations of a topological group that will be needed in
the chapter.
16.1.1 Proposition. Let G be a group with a topology. The following are equivalent:
(c) The map (s, t) 7→ s−1 t : G × G → G is continuous at (e, e), and for each a ∈ G the
translation mappings x → ax and x → xa are continuous.
Proof. (a) ⇒ (b): The map is a composition of the continuous mapping (s, t) 7→ (s−1 , t) and
the multiplication map, hence is continuous.
(b) ⇒ (c): The first statement is clear. If xα → x, then (a−1 , xα ) → (a−1 , x), hence
axα → ax. Therefore, x 7→ ax is continuous. Similarly, x 7→ xa is continuous.
(c) ⇒ (a): If sα → s, then, by the second part of the hypothesis, (s−1 sα , s−1 s) → (e, e).
Applying the first part of the hypothesis, we have s−1 α s = (s
−1
sα )−1 (s−1 s) → e, hence
−1 −1
sα → s , which shows that inversion is continuous. Since multiplication is the composition
of the continuous maps (s, t) 7→ (s−1 , t) and (s−1 , t) 7→ st, multiplication is continuous at
(e, e). Now let sα → s and tβ → t. Then s−1 sα → e and tβ t−1 → e, hence s−1 sα tβ t−1 → e
and so sα tβ → st. Therefore, multiplication is continuous.
385
386 Principles of Analysis
The basic properties of topological groups are given in the next proposition.
16.1.2 Proposition. Let G be a topological group and H a subgroup of G.
(a) For fixed a ∈ G, the mappings t → at, t → ta, and t → t−1 are homeomorphisms.
(b) Each neighborhood U of e contains a symmetric neighborhood of e, that is, a neigh-
borhood V of e such that V = V −1 (= {x−1 : x ∈ V }).
(c) Each neighborhood U of e contains a neighborhood V of e such that V V ⊆ U .
(d) The closure of H is a subgroup.
(e) If H is open, then it is also closed.
(f ) If G is Hausdorff and H is locally compact, then H is closed.
Proof. Part (a) follows from 16.1.1. For part (b), take V = U ∩ U −1 . Part (c) follows from
the continuity of the mapping (s, t) → st at (e, e) and part (d) from the continuity of the
group operations. For (e), let x ∈ cl(H). Since xH is a neighborhood of x, xH ∩ H 6= ∅.
Then xy ∈ H for some y ∈ H and so x = (xy)y −1 ∈ H.
To prove (f), let x ∈ clG (H), (xα ) ⊆ H, and xα → x. Since H is locally compact,
there exists an open neighborhood V of e in G such that clH (V ∩ H) is compact in H.
Therefore, clH (V ∩ H) is compact in G, hence also closed. From x−1 α →x
−1
and x−1
α ∈H
−1 −1 −1 −1
we have x ∈ clG (H). Thus since V x is a neighborhood of x , H ∩ V x = 6 ∅. Choose
y ∈ H ∩ V x−1 . Then yx ∈ V so yxα is eventually in V ∩ H. Thus yx is in the closed set
clH (V ∩ H) ⊆ H and so x = y −1 yx ∈ H.
|f (st) − f (s)| ≤ |f (st) − f (sj )| + |f (sj ) − f (s)| = |f (sj y) − f (sj )| + |f (sj ) − f (sj x)| < ε.
fe(x) = f (x−1 ), x ∈ G.
Right invariance is defined by replacing sB by Bs. A nontrivial (that is, not identically
zero) left (right) invariant Radon measure on G is called a left (right) Haar measure.
A measure that is both a left Haar measure and a right Haar measure is called a Haar
measure. Lebesgue measure on Rd and counting R measure on Zd are Haar measures. One
−1
may show directly that the set function B 7→ B x dx defines a Haar measure on the
multiplicative group of nonzero real numbers. We shall see other examples later.
Now define a Borel measure µ e by
Proof. That (a), (b), and (c) are equivalent follows easily from the regularity properties of
Radon measures (7.1). Clearly, (d) implies (b) and (e).
Now suppose that (a) holds. Then
Z Z
Ls 1B dµ = 1B (st) dµ(t) = µ(s−1 B) = µ(B),
hence (d) holds for indicator functions. The usual arguments then show that (d) holds for
all f ∈ L1 . That (e) ⇒ (d) follows by approximation (7.1.2).
16.2.2 Proposition. Let µ be a left Haar measure on G. Then
(a) µ(U ) > 0 for all for all open U ⊆ G.
Proof. (a) Suppose µ(U ) = 0 for some nonempty open set. Since any compact set K may
be covered by finitely many translates sU and since µ(sU ) = µ(U ) = 0, µ(K) = 0. By
regularity, µ(B) = 0 for all Borel sets B, contradicting the definition of Haar measure.
(b) The sufficiency follows from the definition of Radon measure. For the necessity, assume
that G is not compact. Choose any open neighborhood U of e with compact closure. Then
G cannot be covered by finitely many left translates Sn of U . Letting s1 be arbitrary, we
may construct a sequence (sn ) such that sn+1 6∈ k=1 sn U . Now let V be a symmetric
open neighborhood of e with V V ⊂ U . The sets sn V are disjoint. Indeed, if m > n and
(sn V ) ∩ (sm V ) 6= ∅, then sn vn = smS
vm for some vn , vm ∈ V and we haveP the contradiction
−1
sm = sn vn vm ∈ sn U . Now set B = n sn V . By left invariance, µ(B) = n µ(V ). But since
µ(B) < ∞, µ(V ) = 0, contradicting (a).
Part (c) follows from (a).
Analysis on Locally Compact Groups 389
JParts (a) – (d) follow directly from the definition of (f : φ). For (e), let
m
X n
X
f≤ ai Lsi g and g ≤ bj Ltj φ.
i=1 j=1
Pn P
Then Lsi g ≤ j=1 bj Ltj si φ, hence f ≤ ai bj Ltj si φ and so
i,j
X X m X n
(f : φ) ≤ ai bj = ai bj .
i,j i=1 j=1
Pm Pn
Taking infima over all sums i=1 ai and j=1 bj gives (e).
Pm Pm
Now let f P≤ i=1 ci Lsi g. Then f (x) ≤ kgk∞ Pi=1 ci for all x and so we have
m m
kf k∞ ≤ kgk∞ i=1 ci . Taking infima over the sums i=1 ci yields (f).K
(f : φ)
(2) Let f0 be an arbitrary member of Cc+ and define Iφ (f ) := . Then Iφ has the
(f0 : φ)
following properties:
(a) Iφ (f1 + f2 ) ≤ Iφ (f1 ) + Iφ (f2 ). (b) Iφ (cf ) = cIφ (f ) ∀ c > 0.
(c) f ≤ g ⇒ Iφ (f ) ≤ Iφ (g). (d) Iφ (Ls f ) = Iφ (f ) ∀ s ∈ G.
−1
(e) (f0 : f ) ≤ Iφ (f ) ≤ (f : f0 ).
JBy (f) of (1), (f0 : φ) > 0, hence I is well-defined. Properties (a) – (e) then follow
immediately from the corresponding parts (a) – (e) of (1).K
390 Principles of Analysis
JLet g ∈ Cc+ such that g = 1 on supp(f1 + f2 ), and let δ > 0 be arbitrary. Set
h := f1 + f2 + δg and hk := fk /h. Note that if h(x) = 0, then fk (x) = 0, in which
case the value of hk (x) is taken to be zero. With this definition, one easily checks that
hk ∈ Cc+ . By 16.1.3, there exists a neighborhood VPof e such that |hk (x) − hk (y)| < δ
whenever y −1 x ∈ V . If K := supp(φ) ⊆ V and h ≤ i ci Lsi φ, then for k = 1, 2 we have
X
fk (x) = h(x)hk (x) ≤ ci φ(si x)hk (x).
i
Since the only contribution to the sum on the right comes from terms for which si x ∈ K,
and since for these |hk (x) − hk (s−1
i )| < δ, we see that
X
fk (x) ≤ ci φ(si x) hk (s−1
i )+δ .
i
P −1
Therefore, (fk : φ) ≤ i ci [hk (si )
+ δ], hence
X X
(f1 : φ) + (f2 : φ) ≤ ci h1 (s−1 −1
i ) + h2 (si ) + 2δ ≤ (1 + 2δ) ci ,
i i
P
the last inequality because h1 + h2 ≤ 1. Taking the infimum over all such sums i ci
and dividing by (f0 : φ) we have
Iφ (f1 ) + Iφ (f2 ) ≤ (1 + 2δ)Iφ (h) ≤ (1 + 2δ) Iφ (f1 + f2 ) + δIφ (g)
= Iφ (f1 + f2 ) + 2δIφ (f1 + f2 ) + δ(1 + 2δ)Iφ (g) ,
the second inequality by (a) and (b) of (2) applied to h = f1 + f2 + δg. By (e) of (2),
the term in square brackets is ≤ 2δ(f1 + f2 : f0 ) + δ(1 + 2δ)(g : f0 ). Choosing δ so that
this expression is less than ε completes the proof of (3).K
(4) There exists a positive linear functional I on Cc (G) such that I(Ls f ) = I(f ) for all
s ∈ G.
JThe aforementioned limiting process Iφ → I is provided by Tychonoff’s theorem, using
+ −1
Q of (2): For each f ∈ Cc , let Jf denote the interval [(f0 : f ) , (f : f0 )] and let
part (e)
X := f ∈Cc+ Jf . Then X is compact in the product topology, that is, the topology with
basic open neighborhoods
Then CV is compact and has theTfinite intersection property, since CV1 ∩ · · · ∩ CVn ⊇
CV1 ∩···∩Vn . By compactness of X, V CV 6= ∅. If I is a member of this intersection, then,
from (†), for each V , ε > 0, and fi ∈ Cc+ there exists φ with supp(φ) ⊆ V such that
It follows from (3) that I is additive on Cc+ and has properties (b) – (e) of (2). Extending
I to Cc (G) by defining I(f ) := I(f + ) − I(f − ) produces the desired functional.K
Analysis on Locally Compact Groups 391
It follows from the uniqueness part of 7.2.1 that µ = cν, proving the theorem.
To verify (†), for a given ε > 0 choose a compact symmetric neighborhood V of e contained
in U such that
|fi (xy) − fi (yx)| < ε for all y ∈ V and x ∈ G, i = 1, 2.
This is possible by the uniform continuity of fi . Next, choose g ∈ Cc+ such that g(x) = g(x−1 )
and supp(g) ⊆ V . (For example, one could choose h ∈ Cc+ such that 1{e} ≤ h ≤ 1V and
then take g(x) = h(x) + h(x−1 ).) By left invariance of µ,
Z Z ZZ ZZ
g dν fi dµ = g(y)fi (x) dµ(x) dν(y) = g(y)fi (yx) dµ(x) dν(y),
and by left invariance of µ and ν, the symmetry property of g, and Fubini’s theorem for
Radon measures (7.3.2),
Z Z ZZ ZZ
g dµ fi dν = g(x)fi (y) dµ(x) dν(y) = g(y −1 x)fi (y) dµ(x) dν(y)
ZZ ZZ
= g(x−1 y)fi (y) dν(y) dµ(x) = g(y)fi (xy) dν(y) dµ(x)
ZZ
= g(y)fi (xy) dµ(x) dν(y).
Thus
Z Z Z Z Z Z
g dν fi dµ − g dµ fi dν ≤ g(y)|fi (yx) − fi (xy)| dµ(x) dν(y)
V G
Z
≤ εµ(Ki ) g dν
and so
R R
fi dµ g dµ µ(Ki )
R − R ≤ εR , i = 1, 2.
fi dν g dν fi dµ
Therefore, R R
f1 dµ f2 dµ µ(K1 ) µ(K2 )
R
f1 dν − R f2 dν ≤ ε R f1 dµ + R f2 dµ .
Letting ε → 0 shows that the ratios on the left are equal.
392 Principles of Analysis
Since this obviously holds for µ replaced by cµ, c > 0 and since all left Haar measures are
of this form, we see that ∆ is independent of the measure µ. The function ∆ is called the
modular function of G. It is an intrinsic feature of G.
Theorem 16.2.6 below gives the key properties of the modular function. For the proof we
need the following lemma, a generalization of which is given later.
16.2.5 Lemma. Let f ∈ Cc (G) and 1 ≤ p < ∞. Then the mapping x 7→ Rx f is continuous
at e in the Lp norm.
Proof. Let U be a compact, symmetric neighborhood of e and set K := supp f , so that KU is
compact and supp(Rx f ) ⊆ KU for x ∈ U . By uniform continuity of f (16.1.3), given ε > 0 we
may choose a neighborhood V of e contained in U such that |f (yx) − f (y)| < ε(µ(KU ))−1/p
for all x ∈ V and y ∈ G. For such x we then have
Z
p
kRx f − f kp = |f (yx) − f (y)|p dµ(y) ≤ εp .
KU
hence (16.4) holds for measurable indicator functions f . The usual arguments then show
that the equation holds for all f ∈ LR1 (µ).
Now take f ∈ Cc (G) such that f dµ = 6 0. By the lemma, the left side of (16.4) is
continuous in x at e. It follows that ∆ is continuous at e, and since ∆ is a homomorphism,
it is continuous on G.
It follows directly from the definition that a left Haar measure is right invariant iff ∆(x) ≡ 1.
In this case, G is said to be unimodular. Abelian groups are obviously unimodular. Here
is another important class of unimodular groups.
16.2.7 Proposition. Every compact group is unimodular.
Proof. If G is compact, then 0 < µ(G) < ∞ (16.2.2). Since Gx = G, we have µ(G) =
µ(Gx) = ∆(x)µ(G), hence ∆(x) = 1.
Analysis on Locally Compact Groups 393
For a compact group G, the unique Haar measure µ for which µ(G) = 1 is called
normalized Haar measure. For a finite group G = {x1 , . . . , xn } normalized Haar measure
is given by
n
1X
µ(B) = 1B (xj ), B ⊆ G.
n j=1
We conclude this section with a result that relates a left Haar measure µ to the right
e (see (16.3)).
Haar measure µ
16.2.8 Proposition. Let µ be a left Haar measure on G. If one side of the following
equation exists, then so does the other and the equality is then valid.
Z Z
−1 −1
f (y )∆(y ) dµ(y) = f (y) dµ(y).
e = µ, iff G is unimodular.
In particular, µ is inverse invariant, that is, µ
Proof. Replacing f by fe shows that the assertion is equivalent to
Z Z Z
f (y)∆(y −1 ) dµ(y) = f (y −1 ) dµ(y) = f (y) de
µ(y) .
= I(f ),
the third equality by (16.4). Therefore, I is a right Haar integral. Since Ie is also a right Haar
integral, there exists c > 0 such that I = cI,e that is,
Z Z
f (y)∆(y −1 ) dµ(y) = c f (y −1 ) dµ(y),
in the sense that if one side is finite then so is the other, in which case equality holds. In
particular, if f is symmetric, then
Z Z
(1 − c) f dµ = [1 − ∆(y −1 )]f (y) dµ(y).
Let ε > 0 and let U be a compact symmetric neighborhood of e on which |∆ − 1| < ε. Taking
f = 1U in the last equation we have |1 − c|µ(U ) ≤ εµ(U ). Since ε was arbitrary, c = 1, hence
Ie = I, completing the proof.
Note that the conclusion of the proposition may be written
Z Z
f (y)∆(y) de
µ(y) = f (y) dµ(y),
hence
∆G×H (a, b) = ∆G (a)∆H (b).
It follows that if G × H is unimodular iff both G and H are unimodular.
Define multiplication on G × H by
(a, b)(x, y) = ax, σ(b, x)y = ax, σx (b)y , a, x ∈ G, b, y ∈ H.
hence µ⊗ν is a left Haar measure on G σ H. To find the modular function, let σa (ν) denote
the image measure on B(H):
σa (ν)(B) = ν σa−1 (B) = ν σa−1 (B) , B ∈ B(H).
so σa (ν) is a left Haar measure. By essential uniqueness, σa (ν) = δ(a)ν for some δ(a) > 0.
From
δ(ax)ν = σax (ν) = (σa ◦ σx )(ν) = σa δ(x)(ν) = δ(a)δ(x)ν
we see that δ : G → (0, ∞) is a homomorphism. Moreover, from
Z Z Z
δ(a) f dν = f dσa (ν) = f σa (y) dν(y), f ∈ Cc (H),
H H H
hence
∆G σ H
(a, b) = ∆G (a)∆H (b)δ(a).
Moreover, ∆G = ∆H on H.
Proof. We show first that the right side of the equation, which we denote by I(f ), is
well-defined. Let F (x) denote the inner integral:
Z Z
F (x) = f (xy) dν(y) = f (xy) dν(y), x ∈ G, K := supp(f ). (†)
H (x−1 K)∩H
Now, y ∈ (x−1 K) ∩ H ⇒ Q(x) = Q(xy) ∈ Q(K), so (x−1 K) ∩ H = ∅ for all x for which
Analysis on Locally Compact Groups 397
Q(x) ∈ Q(K)c . It follows from (†) that F (x) = 0 for such x and so supp(f 0 ) ⊆ Q(K).
Therefore, f 0 ∈ Cc (G/H) and
Z Z
I(f ) = F (x) dη(xH) = f 0 (xH) dη(xH), (‡)
G/H G/H
Proof. The proofs of (a)–(e) are entirely similar to the corresponding parts of 6.1.1, except
that care must be taken to allow for the fact that the group is not necessarily abelian and dx
is not necessarily right invariant. For example, to prove (c) use left invariance and Fubini’s
theorem2 to obtain
Z ZZ
f ∗ (g ∗ h)(x) = f (z)(g ∗ h)(z −1 x) dz = f (z)g(y)h(y −1 z −1 x) dy dz
ZZ Z
= f (z)g(z −1 y)h(y −1 x) dy dz = (f ∗ g)(y)h(y −1 x) dy
= (f ∗ g) ∗ h(x).
R
To prove (f), let ϕi ∈ Cc (G) and set Ki = supp ϕi . From ϕ1 ∗ϕ2 (x) = K1 ϕ1 (y)ϕ2 (y −1 x) dy
we see that if y ∈ K1 and x 6∈ yK2 , then the integrand is zero. Therefore, supp ϕ1 ∗ϕ2 ⊆ K1 K2 ,
which is compact.
= (g ∗ f )∗ (x).
and Z Z
∗ −1
(f ∗ g)(x) = ∆(y )f (y −1 )g(y −1 x) dy = f (y)g(yx) dy = (Rx g | f ) .
However, the theorem is valid for functions f ∈ Lp , 1 ≤ p < ∞, since it may be shown that such functions
are zero outside a σ-compact set. We may therefore invoke Fubini’s theorem and shall do so without further
comment. For the technical details, the reader is referred to [21] or [34].
Analysis on Locally Compact Groups 399
Approximate Identities
In this subsection we generalize to arbitrary locally compact groups the existence of an
approximate identity, established for the group Rd in 6.1.2. 3 The proof uses the following
lemma, which expresses an important continuity property of left and right translations,
extending 16.2.5.
16.4.5 Lemma. Let f ∈ Lp (G), 1 ≤ p < ∞. Then the mappings x → Lx f and x → Rx f
are continuous in the Lp norm.
Proof. We prove the right translation version. Let ε > 0, g ∈ Cc (G), U a compact neighbor-
hood of U of e, and x ∈ U . Then
Since Cc (G) is dense in Lp we may choose g ∈ Cc (G) so that the sum of first two terms in
the last expression is < ε/2. By 16.2.5, there exists a neighborhood V of e contained in U
such that the third term is < ε/2 for x ∈ V . For such x, kRx f − f kp < ε, which shows that
x 7→ Rx f is Lp continuous at e. Continuity at arbitrary x0 follows from
p
16.4.6 Theorem. Let f ∈ L , 1 ≤ p < ∞, and ε > 0. Then there exists a neighborhood V
of the identity such that kf ∗ ψ − f kp < ε and kψ ∗ f − f kp < ε for all symmetric ψ ∈ Cc+ (G)
R
with supp(ψ) ⊆ V and ψ = 1. Moreover, if p = ∞, then the first inequality holds if f is
right uniformly continuous, and the second holds if f is left uniformly continuous.
Proof. We prove only the part concerning f ∗ ψ. Given ε > 0, by the preceding lemma we
may choose a neighborhood V of e such that kRy f − f kp < ε for all y ∈ V . If f is right
uniformly continuous, then we may choose
R V so that kRy f − f k∞ < ε. Now let ψ ∈ Cc+ (G)
be symmetric with supp(ψ) ⊆ V and ψ = 1. Then, by left invariance and symmetry of ψ,
Z Z Z
f ∗ ψ(x) − f (x) = f (y)ψ(y −1 x) dy − f (x) ψ(y) dy = f (xy) − f (x) ψ(y) dy.
3 For a discrete group G, L1 (G) actually has an identity, namely the indicator function 1{e} .
400 Principles of Analysis
verifying the desired inequality for p < ∞. If f is right uniformly continuous, then
Z
kf ∗ ψ − f k∞ ≤ kRy f − f k∞ ψ(y) dy < ε.
Theorem 16.4.6 is typically used as follows: Since the set of all neighborhoods V of the
identity is directed downward by inclusion, we may form a net (ψV )V , where ψV has the
properties in the theorem. We then have
lim f ∗ ψV = f in Lp , 1 ≤ p ≤ ∞.
V
It follows that f ∗ g = F ∈ I.
The measure µ∗ν is called the convolution of µ and ν. By 7.3.2 and 7.3.3, µ∗ν may also be
seen as the image measure m(µ⊗ν) of µ⊗ν under the multiplication mapping m(x, y) = xy.
Therefore, we have Z Z
h(z) d(µ ∗ ν)(z) = h(xy) d(µ⊗ν)(x, y)
Analysis on Locally Compact Groups 401
in the usual sense that whenever one side exists then so does the other and equality holds.
It is easy to check that the collection Mra (G) of Radon measures on G is a Banach algebra
under the operation of convolution. The proof is the same as for the special case Mra (Rd )
(see 6.4.1). Moreover, Mra (G) is a ∗-algebra under involution µ → µ∗ defined by
or, equivalently,
Z Z Z −
∗ −1 −1
φ(x) dµ (x) = φ(x ) dµ(x) = φ(x ) dµ(x) , φ ∈ Cc (G).
Finally, the Dirac measure δe is an identity for Mra (G) as is seen, for example, from
Z ZZ Z Z
φ d(µ ∗ δe ) = φ(xy) dµ(x)dδe (y) = φ(xe) dµ(x) = φ(x) dµ(x).
16.5 Representations
Positive-Definite Functions
A function φ : G → C is said to be positive definite if
n
X
cj ck φ(x−1
k xj ) ≥ 0 for all cj ∈ C, xj ∈ G, and n ∈ N. (16.5)
j,k=1
Using the Euclidean inner product, we may write this condition as (Ac | c) ≥ 0, where
c = (c1 , . . . , cn ) and A = [ajk ]n×n , ajk := φ(x−1
k xj ). Thus φ is a positive definite function
iff A is a positive definite matrix.
16.5.1 Proposition. Let φ be positive definite and x, y ∈ G. Then
(a) φ(x−1 ) = φ(x).
Therefore, cφ(x) + cφ(x−1 ) is real. Taking c = 1 and c = i shows that φ(x) + φ(x−1 ) and
i[φ(x) − φ(x−1 )] are real, which implies (a). Choosing c in (†) so that cφ(x) = −|φ(x)| and
using (a), we have 0 ≤ 2φ(e) − |φ(x)| + cφ(x−1 ) = 2φ(e) − 2|φ(x)|, proving (b).
For (c), take n = 3, x1 = e, x2 = x, x3 = y. For |c| = 1 and t real,
−1
φ(e) φ(x ) φ(y −1 ) 1
0 ≤ (A(1, tc, −tc) | (1, tc, −tc)) = 1 tc −tc φ(x) φ(e) φ(y −1 x) tc
−1
φ(y) φ(x y) φ(e) −tc
= φ(e) 1 + 2t + ct φ(x) − φ(y) + ct φ(x ) − φ(y ) − t φ(y x) + φ(x−1 y)
2 −1 −1 2 −1
= φ(e) 1 + 2t2 + 2tRe c φ(x) − φ(y) − 2t2 Re φ(y −1 x),
the last equality by (a). Taking c = |φ(x) − φ(y)|[φ(x) − φ(y)]−1 we have for all real t
0 ≤ 2[φ(e) − Re φ(y −1 x)]t2 + 2 φ(x) − φ(y)t + φ(e) =: at2 + bt + c.
Since Cc (G) is dense in L1 (G), to test for this property it suffices to take f ∈ Cc (G).
R if fn ∈ Cc (G)
Indeed, R satisfies the preceding inequality for all n and if kfn − f k1 → 0, then
0 ≤ (fn∗ ∗ fn )φ → (f ∗ ∗ f )φ by L1 continuity of convolution.
For future reference we note that
Z ZZ ZZ
(g ∗ ∗ f )φ = ∆(y −1 )g(y −1 )f (y −1 x)φ(x) dy dx = g(y)f (x)φ(y −1 x) dx dy, (16.6)
where we have used 16.2.8 and the left invariance of dx. Taking g = f and considering the
conjugate of the last integral, we see that φ is of positive type iff φ is of positive type.
We denote the set of all continuous functions of positive-type by P(G):
Z
∗ 1
P(G) := φ ∈ Cb (G) : (f ∗ f )φ ≥ 0 for all f ∈ L (G) .
Since
X ZZ
|I − Sε | ≤ 1Ej ×Ek (x, y)|g(x, y) − g(xj , xk )| dx dy ≤ ε|K|2 ,
j,k
R
Since ψU = 1 we see that
X X Z Z
IU − cj ck φ(x−1
k xj ) = cj ck ψU (x)ψU (y) φ(y −1 x−1 −1
k xj x) − φ(xk xj ) dx dy.
j,k j,k U U
P −1
which shows that limU IU = j,k cj ck φ(xk xj ). Since IU ≥ 0, the limit is nonnegative.
Unitary Representations
Let X be a normed space. The strong operator topology of B(X) is the locally convex
topology defined by the seminorms
The weak operator topology of B(X) is the locally convex topology defined by the
seminorms
p(T ) = max{| T xj , x0j | : xj ∈ X, x0j ∈ X 0 , 1 ≤ j ≤ n}.
Thus a net (Tα ) in B(X) converges to T in the strong operator topology (resp., weak
s w
operator topology) iff Tα x → T x (resp., Tα x → T x) for each x ∈ X.
A representation of G on X is a mapping π from G into B(X) such that
π(xy) = π(x)π(y), x, y ∈ G.
404 Principles of Analysis
If X = H is a Hilbert space and each π(x) is unitary, then π is called a unitary repre-
sentation of G. In this case we shall require that π be continuous in the strong operator
topology. Thus a unitary representation π : G → B(H) satisfies
16.5.4 Corollary. Let f ∈ L2 (G) and fe(x) := f (x−1 ) (= ∆(x)f ∗ (x)). Then f ∗ fe ∈ P(G).
R
Proof. f ∗ fe(x) = f (x−1 y)f (y) dy = πL (x)f | f .
16.5.5 Corollary. Let Pc (G) := Cc (G) ∩ P(G). Then Cc (G) ∗ Cc (G) ⊆ span Pc (G)).
Moreover, span Pc (G) is dense in Cc (G) in the uniform norm and is dense in Lp (G) in the
Lp norm for 1 ≤ p < ∞.
Proof. Let f ∈ Cc (G) and K := suppf . By 16.5.4, f ∗ fe ∈ P(G). Also, from f ∗ fe(x) =
R
K
f (y)f (x−1 y) dy we see that supp(f ∗ fe) ⊆ KK −1 . Therefore, f ∗ fe ∈ Pc (G). Since the
mapping (g, h) 7→ g ∗ eh on Cc (G) × Cc (G) is sesquilinear, by the polarization identity we
P4
have g ∗ e
h = 14 k=1 ik (g + ik h) ∗ (g + ik h)e. Replacing h by e
h we see that g ∗ h ∈ span Pc (G).
Taking h to be an approximate identity, we conclude that span Pc (G) is dense in Cc (G) in
the uniform and Lp norms and hence is dense in Lp .
Analysis on Locally Compact Groups 405
where the second equality is from (16.6). Then (f | g)φ is a positive sesquilinear form on
L1 (G) and by the CBS inequality
If f˘1 = f˘2 and g˘1 = g˘2 , then (f1 − f2 | f1 − f2 )φ = (g1 − g2 | g1 − g2 )φ = 0 and so by the
CBS inequality
Therefore, (f˘ | ğ)φ is well-defined. It is readily established that (f˘ | ğ)φ is an inner product
on L1 (G)/N. Denote the Hilbert space completion of L1 (G)/N by Hφ (11.1.7). From (16.9),
˘
f ğ φ ≤ kgk1 kf k1 kφk∞ .
Next, for x ∈ G define L̆x on L1 (G)/N by L̆x f˘ = (Lx f )˘. By left invariance,
ZZ ZZ
−1
(Lx f | Lx g)φ = g(xz)f (xy)φ(z y) dy dz = g(z)f (y)φ(z −1 y) dy dz = (f | g)φ ,
hence L̆x is well-defined, preserves the inner products, and therefore extends to a unitary
operator on Hφ . Now define a mapping πφ : G → B(Hφ ) by πφ (x) = L̆x−1 . Then
πφ (x)πφ (y)f = L̆x−1 (L̆y−1 f ) = L̆x−1 (Ly−1 f )˘= (Lx−1 Ly−1 f )˘= (Ly−1 x−1 f )˘= πφ (xy)f,
406 Principles of Analysis
and so
Z Z
(f˘ | πφ (y)x)φ = (πφ (y)−1 f˘ | x)φ = f (yx)φ(x) dx = f (x)φ(y −1 x) dx.
It follows that if f˘ | πφ (y)x = 0 for all y, then f˘ = 0, which shows that the linear
span of πφ (G)x is dense in Hφ . Moreover, if g ∈ Cc (G), then the vector integral I(g) :=
R
g(y)πφ (y)x dy exists, and from (‡) we have (f˘ | ğ)φ = f˘ | I(g) φ for all f . Therefore,
Z Z
x | πφ (y)x φ g(y) dy = x | I(g) φ = lim ψ̆β | I(g) φ = (x | ğ)φ = gφ,
β
the last equality from (†). The desired conclusion now follows from the preceding lemma,
since Cc (G) is dense in L1 .
It is not necessarily the case that φ(·) = (π(·)x | x) a.e. on G. Indeed, as the proof shows,
such a conclusion would depend on (L1 , L∞ ) duality, which holds generally only in the
σ-finite case.
Irreducible Representations
Let π be a unitary representation of G on a Hilbert space H. An invariant subspace for
π is a subspace M of H such that π(x)M ⊆ M for all x ∈ G. If the only invariant subspaces
for π are the trivial subspaces {0} and H, then π is said to be irreducible; otherwise π is
reducible. Also, call an operator in B(H) nontrivial if it is not a multiple of the identity
operator I. The following result is a fundamental tool in the study of representations.
16.5.8 Schur’s Lemma. A unitary representation π is reducible iff there exists a nontrivial
T ∈ B(H) that commutes with every π(x).
Proof. Assume that π is reducible and let M be a nontrivial closed subspace of H such that
π(x)M ⊆ M for all x ∈ G. For x ∈ M and x⊥ ∈ M ⊥ , x | π(x)x⊥ = π(x−1 )x | x⊥ = 0,
hence π M ⊥ ⊆ M ⊥ . If P denotes the orthogonal projection onto M, then
hence T ∗ commutes with each π(x). Therefore, the self-adjoint operators Tr := (T + T ∗ )/2
and Ti := (T − T ∗ )/(2i) commute with π(x). Since T = Tr + iTi , at least one of the operators
is nontrivial. Thus we may as well assume that the original operator T is self-adjoint. Now
consider the Borel functional calculus f 7→ f (T ). Since π(x) commutes with T it commutes
with the projections PE := 1E (T ), where E is a nontrivial Borel subset of σ(T ). Then ran PE
is a nontrivial subspace of H invariant under every π(x), hence π is reducible.
Since
2
2 2
kmk +
m⊥
= kxk = (πφ (e)x | x)φ = φ(e) = 1,
equation (†) exhibits φ as a proper convex combination of members of P ∩ S1 . Therefore, φ
is not extreme.
Now assume that πφ is irreducible and let φ = θ + ψ, θ, ψ ∈ P(G). Then, by (16.8),
(f | g)φ = (f | g)θ + (f | g)ψ , which implies that (f | f )θ ≤ (f | f )φ and so
It follows that B(f˘, ğ) := (f | g)θ is a well-defined bounded Hermitian sesquilinear form on
Hφ . By 11.4.1 there exists T ∈ B(Hφ ) such that (T f˘ | ğ)φ = (f | g)θ for all f, g ∈ L1 (G).
Recalling that
(πφ (x)f˘ | ğ)φ = (L̆x−1 f˘ | ğ)φ = (Lx−1 f | g)φ ,
with the analogous equations holding for θ, we have
(T πφ (x)f˘ | ğ)φ = (T (Lx−1 f )˘| ğ)φ = (Lx−1 f | g)θ = (f | Lx g)θ = (T f˘ | L̆x ğ)φ
= (πφ (x)T f˘ | ğ)φ .
Thus T commutes with πφ (x) for all x and so T = cI for some c ∈ C by Schur’s lemma.
Therefore,
Z Z
(g ∗ ∗ f )θ = (f | g)θ = (T f˘ | ğ)φ = (cf˘ | ğ)φ = (cf | g)φ = (g ∗ ∗ f )cφ
408 Principles of Analysis
≥ c|V |2 > 0.
16.5.12 Theorem (Gelfand-Raikov). Given distinct points x, y ∈ G, there exists an irre-
ducible unitary representation π of G such that π(x) 6= π(y).
Proof. Let a := x−1 y and choose g ∈ Cc (G) such that La g 6= g. Set f := La g − g ∈ Cc (G)
and choose ψ ∈ P(G) as in the lemma. Normalizing, we may assume ψ ∈ P(G) ∩ S1 . By
the Krein-Milman theorem, ψ is a weak∗ limit of convex combinations R of extreme points of
P(G) ∩ S1 , hence there must exist an extreme point φ such that (f ∗ ∗ f )φ > 0. Thus, in
2
the notation of 16.5.7, (f | f )φ > 0. Since
πφ (a−1 )ğ − ğ
φ = (f | f )φ > 0, πφ (x)ğ =
6 πφ (y)ğ.
Finally, by 16.5.10, πφ is irreducible.
Then T is a compact, positive, nonzero operator and T π(x) = π(x)T for all x ∈ G.
Proof. For any x, y ∈ H,
Z
(T x | y) = (x | π(x)u) (π(x)u | y) dx.
R
In particular, (T x | x) = | (x | π(x)u) |2 dx ≥ 0, and because | (u | π(x)u) |2 is continuous
in x and positive at x = e, (T u | u) > 0. Therefore, T is a nonzero, positive operator.
Furthermore, by translation invariance,
Z Z
(T π(y)x | y) = (π(y)x | π(x)u) (π(x)u | y) dx = x | π(y −1 x)u (π(x)u | y) dx
Z Z
= (x | π(x)u) (π(yx)u | y) dx = (x | π(x)u) π(x)u | π(y −1 )y dx
= T x | π(y −1 )y = (π(y)T x | y) .
Analysis on Locally Compact Groups 409
16.5.17 Lemma. Let K be a finite dimensional complex Hilbert space and let V be a
group of operators on K (under composition) whose identity is the identity operator. If V is
compact in B(K), then there exists an inner product on K relative to which each member
of V is unitary.
Proof. Clearly, V is a topological group under composition. If dV denotes normalized Haar
measure on V and (x | y) is the given inner product on K, then
Z
hx | yi := (V x | V y) dV
cl V
is the required new inner product on K. For example, the calculation
Z
hV0 x | V0 yi = (V V0 x | V V0 y) dV = hx | yi
V
shows that V0 ∈ V is unitary.
We may now prove
16.5.18 Theorem (Peter-Weyl). Let G be a compact topological group. Then C(G) is dense
in C(G).
Proof. By the Gelfand-Raikov theorem, C = C(G) separates points of G. We show that C
is closed under multiplication and complex conjugation. The desired conclusion will then
follow from the Stone-Weierstrass theorem.
The product of typical members of C is of the form
X n X
m X
(πj (x)xj | yj ) (e ek |y
πk (x)x e k) = (πj (x)xj | yj ) (e
πk (x)xk | yk ) .
j=1 k=1 j,k
b is an
Furthermore, the map x 7→ ξ −1 (x) = ξ(x) is easily seen to define a character. Thus G
abelian group with identity the constant function 1. We show in this subsection that G b is
locally compact under a natural topology. We use the standard notation
b
hx, ξi = ξ(x), x ∈ G, ξ ∈ G.
f[
∗ g = fb · gb and fc∗ = f . (16.10)
shows that ΦGb ⊆ Σ. For the reverse inclusion, let Φ ∈ Σ ⊆ L1 (G)0 and choose φ ∈ L∞ (G)
(see footnote) such that
Z
Φ(f ) = φ(y)f (y) dy, f ∈ L1 (G).
4 In the non-σ-finite case, the assertion that the dual of L1 (G) is L∞ (G) requires a modification of the
definition of L∞ (G) using the notion of local measurability. We shall assume that L∞ (G) has been so
modified. (see [21]). Alternatively, the reader may simply assume in what follows that G is σ-finite.
412 Principles of Analysis
Therefore, φ may
−1 be identified with, and hence replaced by, the continuous function y 7→
Φ(g) Φ Ly−1 g , which is a nonzero continuous homomorphism from G into C. Since
b
φ(y n ) = φ(y)n for every n ∈ Z and φ is bounded, we see that |φ(y)| = 1, hence φ ∈ G.
Recall that Σ is locally compact in the weak∗ (Gelfand) topology of the dual of L1 (G). Let
b have the unique topology that makes the mapping ξ → Φξ : G
G b → Σ a homeomorphism.
b
Then G is locally compact, and a basic neighborhood of ξ0 ∈ G is of the form b
n o
b : fbj (ξ) − fbj (ξ0 ) < ε, j = 1, . . . , n ,
V (ξ0 ; f1 , . . . , fn ; ε) = ξ ∈ G (16.11)
as the Gelfand transform of f and the other as the Fourier transform of f , coincide:
fb Φξ = Φξ (f ) = fb(ξ), ξ ∈ G.
b
We now show that G b is a topological group under the topology described in the preceding
paragraph. For this it is helpful to introduce an equivalent neighborhood system on G. b The
following lemmas accomplish this.
b is uniformly continuous. Moreover, hx, ξi is jointly continuous
16.6.3 Lemma. Every ξ ∈ G
b
in (x, ξ) ∈ G × G.
Proof. For f ∈ L1 (G),
Z Z Z
Lx f (ξ) = f (xy)ξ(y) dy = f (y)ξ(x y) dy = ξ(x) f (y)ξ(y) dy = ξ(x)fb(ξ),
d −1
−1
hence if fb(ξ) 6= 0, then ξ(x) = fb(ξ) L dx f (ξ). Since
d d
Lx f (ξ) − Ly f (ξ) ≤ kLx f − Ly f k1 ,
b
Therefore, W (ξ0 , K, ε) is open in G.
It remains to show that every neighborhood V (ξ0 ; f1 , . . . , fn ; δ) in (16.11) contains
W (ξ0 , K, ε) for suitable K and ε > 0. Since
W (ξ0 , K1 ∪ K2 , ε1 ∧ ε2 ) ⊆ W (ξ0 , K1 , ε1 ) ∩ W (ξ0 , K2 , ε2 ),
it suffices to show that, given f ∈ L1 (G) and δ > 0, W (ξ0 , K, ε) ⊆ V (ξ0 ; f ; δ) for some K
and ε, that is,
| hx, ξi − hx, ξ0 i | < ε ∀ x ∈ K ⇒ fb(ξ) − fb(ξ0 ) < δ. (†)
But for any compact K ⊆ G,
Z Z Z Z
b b
f (ξ) − f (ξ0 ) ≤ |(ξ − ξ0 ) · f | + |(ξ − ξ0 ) · f | ≤ |(ξ − ξ0 ) · f | + 2 |f |,
K Kc K Kc
and choosing K so that the second term in the last inequality is < δ/2 and taking ε =
δ/(2 kf k1 ) we see that (†) holds.
We may now prove the main result of the subsection:
b is a locally compact abelian topological group in the Gelfand topology.
16.6.6 Theorem. G
b is a topological group. This follows easily from
Proof. All that needs to be proved is that G
the characterization of convergence given in 16.6.5: Let ξα → ξ and ζα → ζ uniformly on
compact sets K. Then ξα−1 = ξα → ξ = ξ −1 uniformly on K, and from the inequality
|ξα ζα − ξζ| ≤ |ξα ζα − ξζα | + |ξζα − ξζ| = |ξα − ξ| + |ζα − ζ|
we see that ξα ζα → ξζ uniformly on K.
The topological group G b is called the dual group of G. The following examples give
concrete representations of various dual groups.
16.6.7 Examples.
(a) The dual of R is R: Every character of (R, +) is of the form ξy (x) := eiyx , where y ∈ R.
Indeed, if ξ is a character of R, then for any a, x ∈ R,
Z a+x Z a Z a
ξ(t) dt = ξ(x + t) dt = ξ(x) ξ(t) dt.
x 0 0
Ra
Choosing a such that α := 0
ξ(t) dt 6= 0 (possible because ξ(0) = 1), we have
Z
1 a+x
ξ(x) = ξ(t) dt,
α x
414 Principles of Analysis
For example, by the theorem the dual of R/Z consists of all characters on R of the form
d is isomorphic to Z. The latter can also be seen
x 7→ e2πinx (n ∈ Z), which implies that R/Z
from the fact that R/Z is topologically isomorphic to T under the map x + Z 7→ eix and
that the dual of T is Z.
16.6.10 Corollary. If x ∈ G \ H, then there exists ξ ∈ H ⊥ such that hx, ξi 6= 1.
Proof. By 16.5.9 and the Gelfand-Raikov theorem, the characters of a locally compact
[ such that ζ(xH) 6= 1. Then
abelian group separate points. Thus we may choose ζ ∈ G/H
ξ := ζ ◦ Q has the desired properties.
Bochner’s Theorem
b if
A function φ on G is said to be represented by µ ∈ Mra (G)
Z
φ(x) = hx, ξi dµ(ξ), x ∈ G. (16.12)
The theorem proved in this subsection gives necessary and sufficient conditions on φ for
such a representation to exist. We shall need the following lemma.
16.6.11 Lemma. Let µ and ν be complex Radon measures on G b such that
Z Z
hx, ξi dµ(ξ) = hx, ξi dν(ξ) for all x ∈ G.
Then µ = ν.
Proof. First, note that for f ∈ L1 (G),
ZZ ZZ Z
f (x) hx, ξi dµ(ξ) dx = f (x) hx, ξi dx dµ(ξ) = fb(ξ −1 ) dµ(ξ),
R R
and similarly for ν. Thus fb(ξ −1 ) dν(ξ) = fb(ξ −1 ) dµ(ξ) for all f ∈ L1 (G). Since the space
b (16.6.2), the measures µ and ν are equal.
of Fourier transforms is dense in C0 (G)
hence φ is of positive type. That φ is continuous follows from inner regularity of µ and 16.6.4.
Therefore, φ ∈ P(G).
Conversely, let φ ∈ P(G). We may assume that kφk∞ = 1. By the CBS inequality (see
proof of 16.5.7),
Z 2 Z Z
(g ∗ ∗ f )φ ≤ (f ∗
∗ f )φ (g ∗
∗ g)φ , f, g ∈ L1 (G). (†)
416 Principles of Analysis
Now let ψV be an approximate identity in L1 (G) and take g = ψV in (†). Since kφk∞ = 1
R R 2
and ψV∗ ∗ ψV = ψV = 1, we have
Z 2 Z
(ψ ∗ ∗ f )φ ≤ (f ∗ ∗ f )φ.
V
Letting V → e we obtain
Z Z 1/2
f φ ≤ (f ∗ ∗ f )φ , f ∈ L1 (G).
hence Z Z
−1
φ(x) = x, ξ dν(ξ) = hx, ξi dµ(ξ),
hence fb ≥ 0 on G
b and fb > 0 on K. Finally, for any ξ ∈ Gb and h ∈ L1 (G),
Z ZZ ZZ
(h∗ ∗ h)(ξg) = h(y −1 )h(y −1 x)ξ(x)g(x) dy dx = (ξh)(y −1 )(ξh)(y −1 x)g(x) dy dx
Z
= [(ξh)∗ ∗ (ξh)]g ≥ 0,
R
hence (h∗ ∗ h)f ≥ 0 and so f ∈ P(G).
Here is the promised inversion theorem. For convenience, we indicate the property of the
function f described in the conclusion of the last lemma by writing f ∼ K.
16.6.14 Theorem. If f ∈ S := L1 (G) ∩ span P(G), then fb ∈ L1 (G)
b and
Z
f (x) = hx, ξi fb(ξ) dξ, x ∈ G, (16.13)
b
where dξ is a suitably normalized Haar measure on G.
Proof. We give the proof in several steps:
R
b such that f (x) =
(1) For each f ∈ S there exists µf ∈ Mra G hx, ξi dµf (ξ). Moreover,
b
f dµg = gb dµf .
JThe first assertion follows from Bochner’s theorem. For the second, let h ∈ L1 (G). Then
Z Z Z Z
b
h gb dµf = h [ ∗ g dµf = hx, ξi(h ∗ g)(x) dx dµf (ξ) = (h ∗ g)(x)f (x−1 ) dx
= [(h ∗ g) ∗ f ](e).
R
Similarly b
h fbdµg = [(h ∗ f ) ∗ g](e). Since [(h ∗ g) ∗ f ] = [(h ∗ f ) ∗ g] we have
Z Z
b
h gb dµf = b h fbdµg for all h ∈ L1 (G).
b
Then I is independent of the choice of f and is a positive linear functional on Cc (G).
JIf also g ∼ supp ϕ, then by step (1),
Z Z Z Z
ϕ ϕ ϕ b ϕ
dµf = gb dµf = f dµg = dµg .
b
f b
f gb b
f gb gb
418 Principles of Analysis
Clearly, I is positive and I(cf ) = cI(f ). To verify additivity, let f ∼ supp ϕ1 ∪ supp ϕ2 .
Then f ∼ supp ϕj , hence
Z Z Z
ϕ1 + ϕ2 ϕ1 ϕ2
I(ϕ1 + ϕ2 ) = dµf = dµf + dµf = I(ϕ1 ) + I(ϕ2 ).K
fb fb fb
R
b I(ϕb
(3) For g ∈ S and ϕ ∈ Cc (G), g) = ϕ dµg . In particular, I is nontrivial.
JBy step (1), Z Z Z
ϕb
g ϕ b
I(ϕb
g) = dµf = f dµg = ϕ dµg .
fb fb
R
Now choose g and ϕ so that ϕ dµg 6= 0.K
(4) I is translation invariant.
b and set τ (ξ) := ξζ. For the image measure τ (µf ) we have
JFix ζ ∈ G
Z Z Z
hx, ξi dµζf (ξ) = (ζf )(x) = hx, ζξi dµf (ξ) = hx, ξi dτ (µf )(ξ),
To complete the proof of the theorem, let dξ denote the Haar measure corresponding to
the linear functional I, and let f ∈ S. By step (3),
Z Z
ϕ(ξ)fb(ξ) dξ = ϕ(ξ) dµf (ξ) for all ϕ ∈ Cc (G).
b
hx, ξi = b
6 hy, ξi for some ξ ∈ G.
For a given Haar measure dx on G, the measure dξ for which the conclusion of the
theorem holds is called the dual measure of dx. For example, in 6.2.4 we had the formulas
Z Z
f (ξ) = f (x)hx, ξi dx and f (x) = fb(ξ) hx, ξi dξ, hx, ξi := e2πiξx .
b
The map hx, ξi identifies R with its dual, and under this identification the dual of Lebesgue
measure is itself.
Analysis on Locally Compact Groups 419
16.6.16 Proposition. If G is compact, then G b has the discrete topology. Moreover, if Haar
measure on G is normalized so that |G| = 1, then the characters form an orthonormal set in
L2 (G) ⊆ L1 (G) and the dual measure is counting measure.
b then ξ ∈ L1 (G) and for all y
Proof. If |G| = 1 and ξ ∈ G,
Z Z Z
hx, ξi dx = hxy, ξi dx = hy, ξi hx, ξi dx.
R R
Thus if ξ 6= 1, then G ξ = 0. It follows that G ξζ = 1 or 0 according as ξ = ζ or ξ 6= ζ, that
is, the characters form an orthonormal set in L2 (G). R
R
Since the function φ 7→ φ is weak∗ continuous on C(G), U := {ξ ∈ G b : ξ − 1 < 1/2}
R
is open in G.b But ξ = 0 or 1, hence U = {1}. Therefore, {1} is open, which implies that G b
is discrete. R
Now, if g = 1 on G, then gb(ξ) = ξ = 1{1} (ξ). Therefore, if dµ(ξ) denotes the dual
measure on G, b then, by the inversion theorem,
Z
1 = g(e) = he, ξi gb(ξ) dµ(ξ) = µ{e}.
Proof. If G is discrete, then the Dirac function δe is an identity for L1 (G), hence the spectrum
b of L1 (G) is compact. If Haar measure dx on G is counting measure and f = 1{e} , then
G
Z
b
f (ξ) = hx, ξif (x) dx = he, ξi = 1, ξ ∈ G, b
G
hence Z Z
1 = f (e) = he, ξifb(ξ) dξ = b
1 dξ = |G|.
G
b G
b
For example, consider the compact group T with Haar measure dθ/2π and dual group Z
with counting measure. The characters are, respectively, ξn (z) = z n and ξθ (n) = einθ hence,
the inversion theorem in this setting is
Z 2π ∞
X
dθ
fb(n) = f (θ)e−inθ , f (θ) = fb(n)einθ .
0 2π n=−∞
the last equality by (16.10). This shows that the Fourier transform is an L2 -isometry from
b Since L1 (G) ∩ L2 (G) contains Cc (G), which is dense in L2 (G), the
L1 (G) ∩ L2 (G) to L2 (G).
transform has a unique extension to an isometry T from L2 (G) into L2 (G). b It remains to
show that T is surjective. For this it suffices to show that the image of L (G)∩L2 (G) under T
1
has a trivial orthogonal complement. To this end, let ϕ ∈ L2 (G) b with ϕ ⊥ L1 (G) ∩ L2 (G) b.
For any x ∈ G, ξ ∈ G,b and f ∈ L1 (G) ∩ L2 (G),
Z Z Z
\
(Rx f )(ξ) = hy, ξif (yx) dy = hyx , ξif (y) dy = hx, ξi hy, ξif (y) dy = hx, ξi fb(ξ),
−1
hence, Z Z
ϕ(ξ) hx, ξi fb(ξ) dξ = \
ϕ(ξ)(R x f )(ξ) dξ = 0.
b is dense in L2 (G)
Since Cc (G) b (7.1.2), ϕ = 0 a.e. on G.
b Therefore, T is surjective.
We shall use the notation fb to indicate the image of f ∈ L2 (G) under the unitary
transformation of the theorem. By the unitary property we have Parseval’s formula
Z Z
f (x)g(x) dx = fb(ξ)b
g (ξ) dξ f, g ∈ L2 (G). (16.14)
By the CBS inequality, the absolute value of the last term is ≤ kφj k2 kgk2 . Since g was
arbitrary and L1 (G) ∩ L2 (G) is dense in L2 (G), fj ∈ L2 (G). Since
ZZ ZZ
f (x) = hx, ξi φ1 (ξζ −1 )φ2 (ζ) dζ dξ = hx, ξζi φ1 (ξ)φ2 (ζ) dξ dζ = f1 (x)f2 (x),
R
f ∈ L1 (G). Since also f ∈ span P(G), by the inversion theorem f (x) = hx, ξi fb(ξ) dξ. Thus
Z Z
hx, ξi φ(ξ) dξ = f (x) = hx, ξi fb(ξ) dξ for all x ∈ G
and so by 16.6.11, fb = φ.
16.6.22 Theorem (Pontrjagin). The mapping is Φ : x 7→ hx, ·i a homeomorphism and
b
b.
group isomorphism from G onto G
b
Proof. Let xα → x in G. Since hxα , ξi → hx, ξi uniformly in ξ on compact subsets of G
b b
b . Conversely, let Φ(xα ) → Φ(x) in G
b . By 16.6.5,
(16.6.4), Φ(xα ) → Φ(x) in G
hxα , ξi = hξ, x
bα i → hξ, x
bi = hx, ξi
Since span Pc (G) is dense in Cc (G) (16.5.5), f (xα ) → f (x) for all f ∈ Cc (G). This implies
that xα → x. Otherwise, there would exist a compact neighborhood U of x such that xα is
frequently in U c , and we obtain a contradiction by choosing f ∈ Cc (G) such that f (x) = 1
and f = 0 on U c . Therefore, Φ a homeomorphism of G onto Φ(G).
b
b . Now, because Φ is a homeomorphism and group
It remains to show that Φ(G) = G
b
b and hence is closed (16.1.2). Suppose
isomorphism, Φ(G) is a locally compact subgroup of G
b
b . By 16.6.20, there exists a nonzero convolution ϕ := ϕ1 ∗ϕ2
for a contradiction that Φ(G) $ G
b
b ) that vanishes identically on Φ(G). By 16.6.21, ϕ = fb for some
of functions ϕj ∈ Cc (G
1 b
f ∈ L (G). In particular, for all x ∈ G,
Z Z
0 = ϕ(bx) = hξ, x
bi f (ξ) dξ = hx, ξi f (ξ) dξ.
G
b G
b
b
b,
But then by 16.6.11, f = 0, producing the contradiction ϕ ≡ 0. Therefore, Φ(G) = G
completing the proof.
422 Principles of Analysis
In this chapter we study representations of semigroups with a topology. Some of the results
here rely on, and indeed may be be seen as extensions of, results in the Fourier analysis of
groups discussed in the last chapter. In particular, compact topological groups and unitary
representations play a central role.
Much of the material in this chapter is based on the papers [10], [11] and [16]. Gener-
alizations and additional material, as well as detailed references, may be found in [4] and
[41].
423
424 Principles of Analysis
show that W AP (S) is a conjugate closed translation invariant subspace of Cb (S). Since
RS (f g) ⊆ (RS f )(RS g), to show that W AP (S) is an algebra it suffices to prove that the
product AB = {f g : f ∈ A, g ∈ B} of weakly compact subsets of Cb (S) it weakly compact.
Now, by 13.4.2, Cb (S) is (canonically) isometrically isomorphic to C(βS), where βS is the
spectrum of Cb (S). The images of A and B in C(βS) are then weakly compact in C(βS),
and the assertion follows easily from the equivalence of pointwise and weak compactness in
C(βS) (14.1.1).
To show that W AP (S) is closed, let fn ∈ W AP (S) and fn → f ∈ Cb (S). We show that an
arbitrary sequence (Rsk f ) of right translates of f has a weakly convergent subsequence. It will
follow from the Eberlein-S̆mulian theorem that RS f is relatively weakly compact, as required.
Now, since each RS fn relatively weakly compact, a standard diagonal argument produces a
subsequence (tk ) of (sk ) and a sequence (gn ) in Cb (S) such that gn = w-limk Rtk fn for all
n. For any ϕ ∈ Cb (S)0 with kϕk ≤ 1 we then have
|ϕ(gn ) − ϕ(gm )| = lim |ϕ Rtk (fn − fm ) | ≤ kfn − fm k∞ ,
k
hence kgn − gm k∞ ≤ kfn − fm k∞ . Thus (gn ) converges in norm to some g ∈ Cb (S). The
inequality
(c) RC10 f is the weakly closed convex balanced hull of RS f and is weakly compact.
(e) The mapping ϕ 7→ Rϕ f from W AP (S)0 into W AP (S) is weak∗ -weak continuous.
Proof. For part (a), recall that X is the weak∗ closure of the set of mappings sb : f → f (s)
(13.4.2). Now let ϕ ∈ X and let tbα → ϕ in the weak∗ topology of W AP (S)0 . Then for each
s ∈ S,
Rtα f (s) = tbα (Ls f ) → ϕ(Ls f ) = Rϕ f (s),
that is, Rtα f → Rϕ f , pointwise. Since RS f is relatively weakly compact, the convergence is
also in the weak topology, proving (a).
s : s ∈ S} is weak∗ -dense in M .
For (b), recall from 10.2.7 that the convex hull of the set {b
P α α P
c
Thus for each ϕ ∈ M there exists a net of convex sums j cj sj such that j cα α
j g(sj ) → ϕ(g)
for all g ∈ W AP (S). Taking g = Ls f we have the pointwise convergence
X X
cα
j Rsα
j
f (s) = cα α
j f (ssj ) → ϕ(Ls f ) = Rϕ f (s), s ∈ S.
j j
By the Krein-S̆mulianP theorem, the convex hull of RS is relatively weakly compact, hence
the convergence of j cα j Rsj f is also in the weak topology. Thus RM f is the weak closure
α
of the convex hull of RS and hence is weakly compact, proving (b). Similar arguments prove
(c) (see 10.30).
Part (d) follows from (c) and right translation invariance of W AP (S).
For (e), it suffices by 10.2.10 to prove that the restriction of the mapping to C10 is
weak∗ -weak continuous. But this follows because the mapping ϕ 7→ Rϕ f : C10 → W AP (S)
is w∗ -pointwise continuous and the range RC10 f is weakly compact by (c).
The calculation Rϕ1 ·ϕ2 f (s) = ϕ1 · ϕ2 (Ls f ) = ϕ1 (Rϕ2 Ls f ) = ϕ1 (Ls Rϕ2 f ) = Rϕ1 (Rϕ2 f )(s)
shows that
ι(s1 ) · ι(s2 )(f ) = ι(s1 )(Rι(s2 ) f ) = (Rι(s2 ) f )(s1 ) = ι(s2 )(Ls1 f ) = f (s1 s2 ) = ι(s1 s2 )(f ),
hence ι is a homomorphism.
The pair (ι, S W AP ) is called the weakly almost periodic compactification of S. A
key feature of this compactification is the following extension property:
17.2.7 Theorem. Given a continuous homomorphism θ from S into a semitopological
semigroup T , there exists a continuous homomorphism θe : S W AP → T W AP such that the
following diagram commutes:
θe
S W AP −−−−→ T W AP
x x
ι
ι
θ
S −−−−→ T
∗
Proof. By 17.2.2, θ maps W AP (T ) into W AP (S). Thus the assertion is an immediate
consequence of 13.4.2.
17.2.8 Corollary. A function f ∈ Cb (S) is weakly almost periodic iff LS f is relatively
weakly compact.
Proof. Let f ∈ W AP (S). Then the mapping x 7→ Lx fb on X := S W AP is pointwise
continuous hence weakly continuous by 14.1.4. Therefore LX fb is weakly compact in C(X) and
so LS f is relatively weakly compact in W AP (S). The converse may be proved by considering
the reverse semigroup obtained from S by reversing the order of multiplication.
Since a compact semitopological semigroup is its own W AP compactification, we have
Analysis on Semigroups 427
hence Tα S → T S and STα → ST in the weak operator topology. Thus, by the extension
theorem, there exists a continuous homomorphism π e : GW AP → B(H). Then any coefficient
(π(·)x | y) extends to a continuous function (eπ (·)x | y) on GW AP and so is weakly almost
periodic. The last assertion follows from 16.5.7.
From 17.2.6 we see that if f ∈ W AP (S) then |f | ∈ W AP (S). The converse is false:
17.2.13 Example. Let f (x) = tan−1 (x). By 17.2.11, |f | is weakly almost periodic on
S := (R, +). On the other hand, while f is uniformly continuous, it is not weakly almost
periodic. To see the latter, choose a subnet (nα ) of the sequence (1, 2, . . .) such that the
428 Principles of Analysis
limits x = limα ι(nα ) and y = limα ι(−nα ) exist in S W AP . If f ∈ W AP (S) we then have
the contradiction
π
− = lim lim f (nα − nβ ) = lim lim fb(ι(nα )ι(−nβ )) = lim fb(ι(nα )y) = f (xy), and
2 α β α β α
π b
= lim lim f (nα − nβ ) = lim lim f (ι(nα )ι(−nβ )) = lim fb(xι(−nα )y) = f (xy). ♦
2 β α β α β
Thus for y ∈ S W AP ,
Z Z Z
m(Ry f ) = \
(Ry f )(x) dµ(x) = x(Ry f ) dµ(x) = (xy)(f ) dµ(x)
W AP S W AP S W AP
ZS
= fb(xy) dµ(x),
S W AP
from we conclude that m ∈ Mr iff µ is a right invariant measure on S W AP . The left version
follows by considering the reverse semigroup of S.
The following result makes a connection between the invariance of means on W AP (S)
and multiplication in the Banach algebra W AP (S)0 .
17.2.14 Proposition. Let m be a mean on W AP (S). Then m ∈ M` (m ∈ Mr ) iff ϕ·m = m
(m · ϕ = m) for all ϕ ∈ W AP (S)0 .
Proof. For any f ∈ W AP (S) and s ∈ S,
s · m)(f ) = sb(Rm f ) = (Rm f )(s) = m(Ls f ) and (m · sb)(f ) = m(Rsbf ) = m(Rs f ).
(b
Therefore, m ∈ M` (m ∈ Mr ) iff sb · m = m (m · sb = m) for all s ∈ S. The desired equivalence
then follows by taking suitable limits, noting that C10 is the weak∗ -closed convex balanced
hull of δS (Ex. 10.30).
17.2.15 Corollary. If W AP (S) has a left invariant mean and a right invariant mean, then
it has an invariant mean.
Proof. Let m` be a left invariant mean, mr a right invariant mean, and set m := mr · m` .
By 17.2.14,
ϕ · m = (ϕ · mr ) · m` = m` = mr · m` = m
and similarly
m · ϕ = mr · (m` · ϕ) = mr = mr · m` = m,
hence m is an invariant mean.
Analysis on Semigroups 429
17.2.16 Theorem. W AP (S) has a left invariant mean iff for each f ∈ W AP (S) the set
C(f ) := clw co(RS f ) contains a constant function. The analogous assertion holds for right
invariant means.
Proof. By 17.2.3(b), C(f ) = RM f . If m is a left invariant mean for W AP (S), then
Rm (f )(s) = m(Ls ) = m(f ) for all s, hence Rm f is the required constant function.
Conversely, assume that RM f contains a constant function Rµf f for each f ∈ W AP (S).
Then for each s ∈ S, Rµf (Ls f ) = Ls Rµf f = Rµf f , hence the set
M (f, s) := {µ ∈ M : Rµ (f − Ls f ) = 0}
is nonempty. Furthermore, M (f, s) is weak∗ compact and M (W AP (S)) · M (f, s) ⊆ M (f, s),
as may be seen from Rm·µ = Rm Rµ . It follows by induction that
n
\
M (fj , sj ) 6= ∅, fj ∈ W AP (S), sj ∈ S.
j=1
Tn−1 Tn
Indeed, if µ ∈ j=1 M (fj , sj ) and ν ∈ M (Rµ fn , sn ), then ν · µ ∈ j=1 M (fj , sj ). Thus
the sets M (f, s) have the finite intersection property, so by compactness their intersection
contains a point η. Then η 2 (Ls f ) = η(Rη Ls f ) = η(Rη f ) = η 2 (f ) for all f ∈ W AP (S) and
s ∈ S, hence η 2 is a left invariant mean.
17.2.17 Corollary. If S is a semitopological group, then W AP (S) has an invariant mean.
Proof. RS restricted to C(f ) is a group of weakly continuous, noncontracting affine maps
from C(f ) into itself. By the Ryll-Nardzewski fixed point theorem, C(f ) has a fixed point g.
Thus g(st) = g(s) for all s and t. Taking s to be the identity of S shows that g is a constant
function. By the theorem, W AP (S) has a left invariant mean. A similar argument shows
that W AP (S) has a right invariant mean. By 17.2.15, W AP (S) has an invariant mean.
The following theorem summarizes the general properties of the spectrum S AP of AP (S)
and the canonical map ι = ιAP : S → S AP , ι(s) = sb.
17.3.4 Theorem. S AP is a compact topological semigroup and ι : S → S AP is a continuous
homomorphism onto a dense subsemigroup such that ι∗ C(S AP ) = AP (S).
Proof. By 17.2.4, Rϕ AP (S) ⊆ AP (S), hence multiplication ϕ1 · ϕ2 is defined on AP (S)0 .
Thus, as in the W AP case, AP (S)0 is a semitopological semigroup in the weak∗ topology,
S AP is a compact semitopological semigroup, and ι is a continuous homomorphism onto a
dense subset of S AP . It remains only to show that multiplication in S AP is jointly continuous.
Let f ∈ AP (S). By the relative norm compactness of RS f , the map ϕ → Rϕ f on C10 is
w∗ -norm continuous. It follows that ϕ1 · ϕ2 (f ) = ϕ1 (Rϕ2 f ) is jointly continuous in (ϕ1 , ϕ2 )
on C10 and the conclusion follows.
Note that the theorem implies that LS f is relatively compact for all f ∈ AP (S). Thus
the notions of right almost periodicity and left almost periodicity coincide.
The pair (ι, S AP ) is called the almost periodic compactification of S. Analogous to
the weakly almost periodic case we have the following extension property, which is immediate
from 17.3.3.
17.3.5 Theorem. For each continuous homomorphism θ from S into a semitopological
semigroup T , there exists a continuous homomorphism θe : S AP → T AP such that the
following diagram commutes:
θe
S AP −−−−→ T AP
x x
ι
ι
θ
S −−−−→ T
Since a compact topological semigroup is its own AP compactification, we have
17.3.6 Corollary. For each continuous homomorphism θ from S into a compact topological
semigroup T , there exists a continuous homomorphism θe : S AP → T such that θe ◦ ιAP = θ.
Analysis on Semigroups 431
Ellis’s Theorem
17.4.1 Theorem (Ellis). A compact Hausdorff semitopological group G is a topological
group.
Proof. To establish joint continuity of multiplication, it is enough to show that multiplication
is continuous at each point of {e} × G. Indeed, if xα → x and yα → y, then, by separate
continuity, x−1 xα → e, hence if multiplication is continuous at (e, y), then (x−1 xα )yα → y
and so xα yα → xy.
Fix y ∈ G. To verify continuity of multiplication at (e, y), we show first that for each
x ∈ G with x 6= y there are neighborhoods Nx of e, Ux of x, and Vx of y such that
(Nx Vx ) ∩ Ux = ∅. To see this, let g ∈ C(G) with ran g ⊆ [−1, 1], g(y) = 0 and g(x) 6= g(y).
Define f : G × G → [−1, 1] by f (s, z) = g(sz). By B.0.8, there exists a dense subset A of G
such that f is jointly continuous at every point of A × Y . Since {s ∈ G : f (s, x) 6= f (s, y)}
is open and nonempty, it contains a member s of A. Set
|f (s, x) − f (s, y)| ≤ |f (s, x) − f (s, tv)| + |f (s, tv) − f (s, y)|
= |f (s, x) − f (s, u)| + |f (st, v) − f (s, y)|
< ε/2 + ε/2 = ε,
Existence of Idempotents
An idempotent in a semigroup is an element e satisfying e2 = e. A semigroup need not
have an idempotent, as is the case, for example, for (1, ∞) under multiplication or addition.
However, in the compact case one always has idempotents:
17.4.2 Lemma. A compact Hausdorff semitopological semigroup X has an idempotent.
Proof. Order the collection of closed
T subsemigroups of X downward by inclusion. If C is a
chain of such semigroups, then C 6= ∅ by compactness. By Zorn’s lemma, X has a minimal
closed subsemigroup Y . Let e ∈ Y . Then eY is a closed subsemigroup of Y , hence eY = Y
by minimality. Choose y ∈ Y such that e = ey. The set Z = {z ∈ Y : ez = e} is then a
nonempty closed subsemigroup of Y and so Z = Y . In particular, e ∈ Z and so e2 = e.
Ideal Structure
A nonempty subset Y of a semigroup X is a left ideal if XY ⊆ Y . A left ideal is a
minimal if it properly contains no left ideal. Right ideals and minimal right ideals are
defined similarly. An ideal is a subset of X that is both a left ideal and a right ideal. An
ideal is a minimal if properly contains no ideal.
The left and right minimal ideal structures are given in the following theorems.
(c) If R is a minimal right ideal and L is a minimal left ideal, then RL is a topological
group. If e is the identity of RL, then RL = eXe.
Proof. (a) We prove the left case. A Zorn’s lemma argument in the spirit of the proof of
17.4.2 shows that minimal closed left ideals L exist. If L0 is left ideal contained in L and
x ∈ L0 , then Xx is a closed left ideal contained in L0 , which forces Xx = L0 = L. Therefore,
all minimal left ideals are closed. Taking x to be an idempotent completes the proof of (a).
(b) Let L1 and L2 be distinct minimal left ideals. Then L1 and L2 are disjoint; otherwise,
by minimality, L1 = L1 ∩ L2 = L2 .
(c) Clearly RL ⊆ R ∩ L. Since LRL ⊆ L, (RL)(RL) = R(LRL) ⊆ RL, hence RL is
a semigroup. We show next that RL is a group. Let t ∈ RL. Then Lt ⊆ L, hence, by
minimality, Lt = L. Therefore, RLt = RL for all t ∈ RL. Similarly, tRL = RL for all
t ∈ RL. Let e ∈ RL such that et = t. If s ∈ RL, there exist x ∈ RL such that s = tx,
hence es = etx = tx = s. Similarly there exists e0 ∈ RL such that se0 = s for all s ∈ RL.
Then e = ee0 = e0 , so e is an identity for RL. To see that every t ∈ RL has an inverse,
choose y, z ∈ RL such that yt = e = tz. Then z = ez = ytz = ye = y. Therefore, RL is a
group. Since e ∈ L, Xe ⊆ L and so Xe = L by minimality. Similarly eX = R. Therefore,
RL = eXXe ⊆ eXe ⊆ RL, so RL = eXe. Finally, by 17.4.1, eXe is a compact topological
group.
17.4.4 Theorem. Let X be a compact Hausdorff semitopological semigroup and let K =
K(X) be the union of all minimal left ideals. Then K is also the union of all minimal right
ideals and is an ideal contained in every other ideal.
Proof. K is obviously a left ideal. Let Xe be a minimal left ideal and s ∈ X. We claim
that the left ideal Xes is minimal. To see this, let L be a left ideal contained in Xes. Every
Analysis on Semigroups 433
member of L is of the form ys for some y ∈ Xe, hence the set {y ∈ Xe : ys ∈ L} is nonempty.
Since it is a left ideal it must equal Xe. Thus y ∈ Xe ⇒ ys ∈ L, that is, Xes ⊆ L. Therefore,
Xes ⊆ K, so K is a right ideal and hence is an ideal.
Now, if I is any ideal in X and Xe is a minimal left ideal, then IXe is a left ideal
contained in Xe and so IXe = Xe. Since also IXe ⊆ I, Xe ⊆ I. Therefore, K ⊆ I, so K
is contained in every ideal of X. Similar arguments show that the union K 0 of all minimal
right ideals is an ideal contained in every ideal of X. Therefore K = K 0 .
17.4.5 Corollary. K is the union of disjoint, compact topological groups eXe, where
e2 = e ∈ K.
Proof. By minimality K 2 = K. But K 2 is the union of disjoint topological groups RL.
17.4.6 Corollary. X is a topological group iff it satisfies the left and right cancellation laws
xy = xz ⇒ y = z and yx = zx ⇒ y = z.
Proof. For the sufficiency, let e2 = e ∈ K. Then for any x ∈ X, eex = ex, hence ex = x and
so X = eX. Similarly X = Xe. Thus X = eXe, so X is a group.
17.4.7 Corollary. W AP (S) has an invariant mean iff K S W AP is a compact topological
group.
Proof. By the preceding, K S W AP is a compact topological group iff has S W AP has a
unique minimal right ideal and a unique minimal left ideal.
Let m be an invariant mean on W AP (S). If L1 and L2 are minimal left ideals of S W AP ,
then, choosing any ηj ∈ Lj , we have m = m · ηj ∈ L1 ∩ L2 (17.2.14), so L1 = L2 by (b) of
17.4.3. Therefore, X has aunique minimal left ideal. Similarly, X has a unique minimal
right ideal. Thus K S W AP is a compact topological group.
Conversely, assume K = K S W AP is a compact topological group. Define a mean on
W AP (S) by Z
m(f ) = fb(x) dµ(x),
K
where µ is normalized Haar measure on K. Then m is invariant.
For a compact Hausdorff topological group G, the Peter-Weyl theorem (16.5.18) implies
that SAP (G) = C(G), hence SAP (G) = AP (G). We show in 17.5.9 that
C0 (R+ ) \ {0} ⊆ AP (R+ , +) \ SAP (R+ , +), (17.4)
hence SAP (R+ , +) $ AP (R+ , +).
Our immediate goal is to show that SAP (S) is a unital C ∗ -subalgebra of Cb (S). For this
we need the following lemma.
17.5.1 Lemma. Let T be a compact topological semigroup and H a subgroup of T . Then
G := cl H is a topological group.
Proof. We show first that inversion may be extended to G. Given x ∈ G, let xα ∈ H
with xα → x. By compactness, we may assume that x−1 α → y for some y ∈ G. Then
xy = limα xα x−1
α = e. Similarly, zx = e for some z ∈ G. Therefore, G is a group. That
inversion in G is continuous is proved as in 17.4.1.
We may now prove
17.5.2 Theorem. SAP (S) is a translation invariant unital C ∗ -subalgebra of Cb (S).
Proof. Let π be a finite dimensional unitary representation of S on H. The relations
Rs (π(t)x | y) = (π(t)π(s)x | y) and Ls (π(t)x | y) = (π(t)x | π(s)∗ y) show that SAP (S)
is translation invariant. Furthermore, the proof that SAP (S) is closed under multiplication
is the same as in the proof of the Peter-Weyl theorem (16.5.18).
It remains to show that SAP (S) is conjugate closed. For this it suffices to show that
if x0 , y0 ∈ H, then f (s) := (y0 | π(s)x0 ) is a coefficient of some unitary representation.
The proof of this is similar to but somewhat more involved than the corresponding part of
the proof of the Peter-Weyl theorem. As in the latter, let F denote the finite dimensional
subspace of Cb (S) consisting of all functions gx defined by
gx (s) = (y0 | π(s)x) , s ∈ S, x ∈ H.
Since F is right translation invariant, the mapping s 7→ Rs is a continuous representation
of S on the space F. Since Rs gx = gUs x , Rs is surjective, hence invertible. Thus RS is
contained in a bounded group of operators on H and hence, by 17.5.1, is contained in a
compact group of such operators. Thus, by 16.5.17, there exists an inner product h· | ·i on
F relative to which the operators Rs are unitary. Since the closure of π(S) is a group, there
exist a sequence (sn ) in S such that π(sn ) → I. We may assume that the evaluation maps
sbn on F converge to a member of the dual space F 0 , which, by the Riesz representation
theorem, is given by a member gx1 of F. Thus
lim gx (sn ) = hgx | gx1 i , x ∈ H.
n
Since F is translation invariant, the limit relation holds for Rs gx as well. It follows that
f (s) = (y0 | π(s)x0 ) = lim (y0 | π(sn )π(s)x0 ) = lim Rs gx0 (sn ) = hRs gx0 | gx1 i ,
n n
which shows that f is a coefficient of the unitary representation R, completing the proof.
17.5.3 Proposition. Let T be a semitopological
semigroup and θ : S → T a continuous
homomorphism. Then θ∗ SAP (T ) ⊆ SAP (S), where θ∗ : C(T ) → C(S) is the dual map.
In particular, if S is a subsemigroup of T , then SAP (T )|S ⊆ SAP (S).
Proof. This follows essentially from the fact that if π is a continuous, finite dimensional
unitary representation of T , then π◦θ is a continuous finite dimensional unitary representation
of S.
Analysis on Semigroups 435
The following theorem summarizes the general properties of the spectrum S SAP of
SAP (S) and the canonical map ι = ιSAP : S → S SAP , ι(s) = sb.
17.5.4 Theorem. S SAP is a compact topological group
and ι is a continuous homomorphism
onto a dense subsemigroup such that ι∗ C(S SAP ) = SAP (S).
Proof. Since SAP (S) ⊆ AP (S), S SAP is a topological semigroup. It remains to show that
S SAP is a group. For this we show that S SAP has the cancellation properties in 17.4.6. We
show that if yx = zx in S SAP , then fb(y) = fb(z) for all f ∈ SAP (S), where ι∗ (fb) = f .
It suffices to show this for f (s) = (Us x | y), where U is a continuous, finite dimensional,
unitary representation U of S. Let ι(sα ) → x. We may assume that Usα → V for some
unitary operator V . Let g be the coefficient g(s) = Us V −1 x | y . Then for all s
Rx gb ι(s) = gb ι(s)x = lim gb ι(ssα ) = lim Us Usα V −1 x | y = (Us x | y) = f (s),
α α
The pair (ι, S SAP ) is called the strongly almost periodic compactification of S. As
in the W AP and AP cases, we have the following extension property, which may be proved
using 17.5.3.
17.5.5 Theorem. For each continuous homomorphism θ from S into a semitopological
semigroup T , there exists a continuous homomorphism θe : S SAP → T SAP such that the
following diagram commutes:
θe
S SAP −−−−→ T SAP
x x
ι
ι
θ
S −−−−→ T
Since a compact topological group is its own SAP compactification, we have
17.5.6 Corollary. For each continuous homomorphism θ from S into a compact topological
group T , there exists a continuous homomorphism θe : S SAP → T such that θe ◦ ιSAP = θ.
17.5.7 Corollary. If S is a group, then AP (S) = SAP (S).
Proof. By 17.5.1, S AP is a topological group. Applying 17.5.6 to T = S AP and θ = ιAP , we
obtain a continuous homomorphism θe : S SAP → S AP such that θe ◦ ιSAP = ιAP . Thus
AP (S) = ι∗AP C(S AP ) = ι∗SAP ◦ θe∗ C(S AP ) ⊆ SAP (S).
where
W AP (S)0 := {f ∈ W AP (S) : m(|f |) = 0}.
Moreover, W AP (S)0 is an ideal of the C ∗ -algebra W AP (S). In particular, these assertions
hold of S is a group or is commutative.
436 Principles of Analysis
Proof. By 17.4.7, the minimal ideal K = K S W AP is a compact topological group. We
denote the identity in S SAP by 1 and the identity of K by e, so that K = S W AP e = eS W AP e.
The map θ(s) = ιW AP (s)e from S into K is a continuous homomorphism, hence, by 17.5.6,
there exists a continuous homomorphism θ̄ : S SAP → K such that θ = θ̄ ◦ ιSAP . Therefore,
θ∗ C(K) = (ι∗SAP ◦ θ̄∗ )(C(K)) ⊆ SAP (S).
In particular, if f ∈ W AP (S), then the function Re f (s) = fb(ιW AP (s)e) = fb(θ(s)) is strongly
almost periodic. Therefore, Re W AP (S) ⊆ SAP (S). Now let g ∈ SAP (S) and choose
gb ∈ C S SAP such that g = ι∗SAP (b g ). If (ιW AP (tα )) → e, then ιSAP (tα ) = θe ◦ ιW AP (tα ) →
e = 1, so
θ(e)
Re g(t) = gb tιSAP (tα ) → gb(t · 1) = g(t).
We have proved that Re is a projection from W AP (S) onto SAP (S). It remains to show
that ker Re = W AP0 (S) and that W AP (S)0 is an ideal of W AP (S). Now,
Z
m(|f |) = |fb(x)| dx, f ∈ W AP (S),
K
where dx is Haar measure on K. Thus m(|f |) = 0 iff fb(x) = 0 for all x ∈ K iff fb(xe) = 0
for all x ∈ S W AP iff Re f = 0. Therefore, ker Re = W AP0 (S). That W AP (S)0 is an ideal
follows from the inequality m(|f g|) ≤ kgk∞ m(|f |).
17.5.9 Corollary. AP (R+ , +) = AP (R, +)R+ ⊕ C0 (R+ ).
Proof. Set S = R+ . Since S is commutative, Cb (S) has an invariant mean m. By an obvious
modification of the preceding corollary,
show that Xw is an invariant linear subspace of X. To show that Xw is closed in X, let (xn )
be a sequence in Xw converging in norm to x in X. By the Eberlein-S̆mulian theorem, it
suffices to show that Ux is weakly relatively sequentially compact. Let (Un x) be a sequence
in Ux. Since each set Uxn is relatively weakly sequentially compact, a standard diagonal
argument shows that there exists a subsequence (Uk ) of (Un ) and a sequence (yn ) ∈ X such
w
that Uk xn → yn for each n. For any x0 ∈ X 0 with kx0 k ≤ 1 we then have
hence
the case if the members of S are measure-preserving, i.e., µs = µ for all s ∈ S. Define
Us f = f ◦ s, f ∈ L1 . Then
Z Z Z
dµs
kUs f k1 = |f ◦ s| dµ = |f | dµs = |f | dµ ≤ c kf k1 ,
dµ
hence US is uniformly bounded in L1 . Since kUs 1A k∞ ≤ 1, US 1A uniformly integrable and
so is relatively weakly compact, by the Dunford-Pettis theorem (14.2.4). Therefore, US f
weakly relatively compact for every simple function f . Since these are dense in L1 , the
proposition shows that US is weakly almost periodic on L1 . ♦
17.6.3 Theorem. Let U be a semigroup of operators on a Banach space X.
(a) If U is weakly almost periodic, then in the weak operator topology of B(X) the closure
Uw of U is a compact semitopological semigroup of uniformly bounded operators.
(b) If U is almost periodic, then in the strong operator topology of B(X) the closure Ua
of U is a compact topological semigroup of uniformly bounded operators.
Proof. The uniform boundedness principle shows that Uw and Ua are uniformly bounded.
For each x ∈QX, let Kx denote the closure of Ux in the weak topology of X. The product
space K := x∈X Kx contains U and is compact by Tychonoff’s theorem. Therefore, the
closure cl(U) of U in C is compact. But cl(U) ⊆ B(X). To see this, let (Tα ) be a net in U
such that Tα → T in the product topology. Thus for all x, y ∈ X,
w w w
Tα (x + y) → Tα (x + y), Tα (x) → T (x) and Tα (y) → T (y).
It follows that T is linear, and an application of the uniform boundedness principle shows
that T is bounded. Therefore, Uw = cl(U), proving that Uw is compact in the weak operator
topology. A similar argument shows that Ua is compact in the strong operator topology.
We have already seen in the proof of 17.2.12 that operator composition in B(X) is weak
operator continuous. It follows that Uw is closed under operator composition and so is a
semitopological semigroup. It remains to show that operator composition in Ua is continuous
in the strong operator topology. But if Tα → T and Sα → S in that topology, then for all
x∈X
(b) Xp is the largest closed, U-invariant subspace of X on which Uw acts as a group with
identity the identity operator,
(c) X0 = x ∈ X : m |hU(·) x, x0 i| ∀ x0 ∈ X 0 = 0 .
It therefore suffices to show that Vφα z ∈ Xp . Now, C(G) = SAP (G) is generated by finite
dimensional, translation invariant subspaces, hence every φα is uniformly approximable by
functions φ from such spaces G. Since Vφα z is norm approximable by Vφ z, it now suffices
to show that the finite dimensional space {Vφ z : φ ∈ G} is U invariant (hence unitary). But
this follows from
Z Z Z
−1
W Vφ z = φ(V )W V z dV = φ(W V )V z dV = LW −1 φ(V )V z dV = VW −1 φ .
G G G
This completes the proof that EX = Xp , which implies that Uw restricted to Xp is a group of
operators on Xp with identity the identity operator. Now let Y be any U-invariant subspace
on which Uw acts as a group with identity the identity operator. Since E 2 = E, E|Y = I
and so Y = EY ⊆ EX = Xp . Therefore, Xp is the largest such space.
Next, we show that (I − E)X (= ker E) = X0 . Since clw US x = Uw x it follows that
x ∈ X0 iff V x = 0 for some V ∈ Uw . Thus if x ∈ X0 , then {V ∈ Uw : V x = 0} is
nonempty, hence is a closed left ideal and so must contain the idempotent E. Therefore,
X0 = ker E.
0
Finally, let m be an invariant mean on AU and let
0 0
R ) := |hV 0x, x i|, so that
g(V
ψ(g)(s) = |hUs x, xi|. By (c) of 17.6.4, m |hU(·) x, x i| = K |hV x, x i| dV. It follows
that m |hU(·) x, x0 i| = 0 for all x0 iff V x = 0 for all V ∈ K iff Ex = 0 (since K = KE)
iff x ∈ (E − I)X = X0 .
The conclusions of the theorem hold if either S is commutative or a group, since in each
case, W AP (S) has an invariant mean. One also has
17.6.6 Corollary (deLeeuw-Glicksberg). If kUs k ≤ 1 for all s and if both X and X 0 are
strictly convex, then the conclusions of the theorem hold.
Proof. We show that E1 = E1 E2 = E2 for all idempotents in K(Uw ). It will follow that
K(Uw ) is a compact topological group, and we can then apply the theorem.
By minimality, Uw E1 E2 = Uw E2 , hence we may choose V so that V E1 E2 = E2 . Then
(b) The map ψ : A(V w ) → Cb (S) defined by ψ(g)(s) = g(Us ) is an isometry onto FU that
commutes with translations.
(c) FU has an invariant mean iff there exists an idempotent E in V w such that EV =
V E = E for all V ∈ V w .
Proof. The proof of (a) is essentially the same as that of part (a) of 17.6.4. The details are
left to the reader.
(b) That ψ is an isometry into Cb (S) is clear. Given a coefficient h(s) = hUs x, yi, define
g ∈ A(V w ) by g(V ) = hV x, yi. Then ψ(g) = h, which shows that FU ⊆ ran ψ and so
ψ −1 (FU ) ⊆ A(V w ). To show equality, let µ ∈ C(V w )0 such that µ = 0 on ψ −1 (FU ). We show
that µ = 0 on A(V w ); it will follow from the Hahn-Banach theorem that ψ −1 (FU ) = A(V w )
and hence that ran ψ = FU .
Now, µ may be identified with a complex measure on V w and hence may be written as a
linear combination of probability measures µj on V w , say
µ = a1 µ1 − a2 µ2 + i(a3 µ3 − a4 µ4 ), aj ≥ 0.
X = {x : Us x = x ∀ s ∈ S} ⊕ cl span{Us x − x : x ∈ X, s ∈ S}.
The preceding theorem allows a simple proof of the following generalization of the mean
ergodic theorem of von Neumann.
17.6.9 Corollary. Let U ∈ B(X) such that the semigroup {U n : n ∈ N} is weakly almost
Pn−1
periodic. Then An = n−1 j=0 U j converges in the strong operator topology to a projection
E ∈ B(X) satisfying EU = U E = E.
Proof. Let E be the projection in the proof of the theorem for the representation n → U n .
We need only show that for fixed k, An (U k x − x) → 0. This follows from the identity
An (U x − x) = n1 (U n x − x) and the uniform boundedness of U N .
The preceding corollary holds for an operator U of norm ≤ 1 on a reflexive Banach space.
For a nonreflexive example, let (X, F, µ) be a probability space and ϕ : X → X measurable
such that µ(ϕ−1 (E)) ≤ µ(E) for all E ∈ F. Define U in L1 by U f = f ◦ ϕ. By 17.6.2, U N
is weakly
Pn−1 almost periodic, hence the corollary is applicable and we have L1 convergence
−1 j
n j=0 U f = Ef A more refined version of this result in the special case of a measure
preserving ϕ is proved in 18.5.
Chapter 18
Probability Theory
Probability theory has long been a subject of great interest, its roots dating back to the
analysis of games of chance in the sixteenth century. The development of modern probability
theory as a branch of measure theory was initiated by Kolmogorov in the early twentieth
century.
Intuitively, a probability is a number between 0 and 1 that expresses the likelihood of
an outcome in an experiment. In this context, the term experiment simply refers to a
repeatable procedure that has a well-defined set of outcomes; something as simple as tossing
a die or as complex as noting the first time a stock dips below a prescribed level. In practice,
the determination of probabilities may be based on logical deduction, analytical methods,
or statistical analysis (as in polling). For our purposes, we shall take as given a particular
assignment of probabilities and not be concerned with their origin. More precisely, our
development of the subject begins in the modern tradition with a given probability space
(Ω, F, P ).1
1 Here, in keeping with standard conventions, we write Ω instead of X and use the symbol P for a probability
measure. Other changes of notation to accommodate convention, as well as changes in terminology, are given
in §18.1.
443
444 Principles of Analysis
Variance may be seen as a measure of the dispersion of the data X from the mean. The
quantity p
σ(X) := V (X)
is called the standard deviation of X. The covariance of L2 random variables X and Y
is the quantity
Covariance measures the degree of correlation between X and Y . For example, independent
random variables have covariance zero (see 18.2.3).
The characteristic function φX of a d-dimensional random variable X = (X1 , . . . , Xd )
is defined by
φX (t) = E eit·X .
Note that this is simply a variation of the Fourier transform of the image measure X(P )
(see next subsection).
Probability Distributions
for any Borel function g for which one side or the other of the equation is defined. Every
probability distribution Q on B(Rd ) arises in this manner, that is, as the distribution of
a random variable X on a probability space (Ω, F, P ): simply take Ω = Rd , F = B(Rd ),
P = Q and X the identity mapping on Rd . A family X of d-dimensional random variables is
said to be identically distributed if PX = PY for all X, Y ∈ X.
For d = 1 the function
FX (x) = P (X ≤ x) = PX (−∞, x]
is called the cumulative distribution function (cdf) of X. In many cases of interest, the
cdf is given by a probability density fX , so that
Z x Z
FX (x) = fX (t) dt and E g(X) = g(t)fX (t) dt.
−∞ R
If ran X is countable, then the cdf is given by the probability mass function (pmf)
pX (x) := P (X = x).
In this case X X
FX (x) = pX (x) and E g(X) = g(x)pX (x).
t≤x x
The following are standard distributions given in terms of the probability mass function or
density. In each case X denotes a random variable with the given distribution.
Probability Theory 445
18.1.1 Examples.
• Bernoulli distribution with parameter p ∈ (0, 1):
pX (1) = 1 − pX (0) = p.
For example, the number of heads (0 or 1) that appear on a single toss of a fair coin has a
Bernoulli distribution with parameter 1/2. By an easy calculation,
E(X) = p, V (X) = pq, and φX (t) = eit p + q, where q := 1 − p.
18.2 Independence
The notion of independence is specific to probability theory and may be seen as one of
several major points of departure of the subject from general measure theory.
Independent Events
Let (Ω, F, P ) be a probability space. A family {Ai : i ∈ I} of events in F is said to be
independent if
P (Ai1 ∩ · · · ∩ Ain ) = P (Ai1 ) · · · P (Ain )
for all choices of distinct indices ik in I. A family {Ai : i ∈ I} of subcollections Ai of F is
independent if the collection {Ai : i ∈ I} is independent for all choices Ai ∈ Ai , i ∈ I.
For example, if (Ω, F, P ) = (Ω1 × Ω2 , F1 ⊗ F2 , P1 × P2 ), then, by definition of the product
measure, the σ-fields F1 × Ω2 and Ω2 × F2 are independent families. This is the basis of the
notion of independent trials. Indeed, if (ω1 , ω2 ) represents the outcome of a two stage
experiment, then in this model the events A1 × Ω2 and Ω1 × A2 , occurring in stages one
and two, respectively, are independent. This idea generalizes to arbitrary finite sequences of
trials and even to infinite sequences (see §18.4).
18.2.1 Proposition. Let (Ω, F, P ) be a probability space and {Ai : i ∈ I} an independent
family of π-systems contained in F. Then the family {σ(Ai ) : i ∈ I} is independent.
Proof. We may suppose that Ω ∈ Ai for every i, since adjoining Ω does not alter the
independence property. Since the notion of independence involves only finitely many sets at a
time, we may also assume that I is finite, say, I = {1, . . . , n}. The property of independence
may now be expressed as
for all Borel sets Bj . Note that by 18.2.1, to test for independence it suffices to take Bj
in a generating π-system. The preceding equation may be written in terms of probability
distributions as
P(X1 ,...,Xn ) (B1 × · · · × Bn ) = PX1 (B1 ) · · · PXn (Bn ) = PX1 ⊗ · · · ⊗ PXn (B1 × · · · × Bn ).
Thus we have
Probability Theory 447
By Fubini’s theorem, the absolute value signs in this equation may be removed, proving the
theorem.
18.2.4 Proposition. Let X1 , . . . , Xn be independent and Xj ∈ L2 (P ). Then
and the conclusion follows by taking expectations, noting that the expectation of the second
sum on the right is zero, by independence.
18.2.5 Proposition. Let X1 , . . . , Xn be independent random variables. Then
Note that by uniqueness of measures (1.6.8), the equation holds for all A ∈ G iff it holds for A
in a generating π-system for G. In the special case G = σ(X1 , X2 , . . .), E(X | G) is called the
conditional expectation of X given X1 , X2 , . . . and is denoted by E(X | X1 , X2 , . . .).
To test whether 18.3 holds in this case, it suffices to restrict consideration to events A of the
form {X1 ∈ B1 , . . . , Xn ∈ Bn }.
A sub-σ-field G of F may be viewed as information regarding the location of an outcome.
For example, in the case of a repeated coin toss, the σ-field generated by all events of
the form {H} × A2 × A3 × · · · tells us with certainty that the first toss came up heads.
Conditional expectation generalizes the notion of standard expectation by incorporating
such information into its definition. It may be viewed as the best prediction of X given the
information G. The two extreme cases are E X | {∅, Ω} = E(X) and E(X | P(X)) = X.
In the first case, the σ-field {∅, Ω} provides no information, and one merely obtains the mean
of X. In the second case, the best prediction of X given all possible information is X itself.
The following theorem summarizes the main properties of conditional expectation. The
reader will note that several of these properties are analogs of those of ordinary expectation.
18.3.1 Theorem. Let X, Y ∈ L1 (Ω, F, P ) and let G and H be σ-fields with H ⊆ G ⊆ F.
(a) E(1 | G) = 1.
For (f), note first that the random variable XE(Y |G) is G-measurable. Now let A ∈ G.
To establish the required property that E [1A
PXE(Y |G)] = E(1A XY ), we may assume that
n
X, Y ≥ 0. Now, for G-simple functions X = j=1 aj 1Aj , we have, by definition of E(Y | G)
and linearity,
n
X n
X
E [1A XE(Y | G)] = aj E 1A∩Aj E(Y | G) = aj E(1A∩Aj Y ) = E(1A XY ).
j=1 j=1
The desired equality now follows by considering an increasing sequence of simple functions
Xn and applying the monotone convergence theorem.
For (g), simply note that by independence of G and σ(X) and by 18.2.3 we have
Finally, for (i) we apply (c) to conclude that E(Xn | G) ↑ Y for some G-random variable
Y . By the monotone convergence theorem, for any A ∈ G,
Z Z Z Z
Y dP = lim E(Xn | G) dP = lim Xn dP = X dP.
A n A n A A
B × Ωn+1 × Ωn+2 × · · · , B ∈ F1 ⊗ · · · ⊗ Fn .
Interpreting Ak as an event that occurs at “time k”, cylinder sets may be seen as events
occurring in finite time. The σ-field generated by all the cylinder sets (hence N∞also by the
rectangular cylinder sets) is called the product σ-field and is denoted by n=1 Fn . The
following analog of 2.1.5 is readily established.
N∞
18.4.2 Proposition. Let F = n=1 Fn and let πn : Ω → Ωn be the nth projection map
πn (ω1 , ω2 . . . ) = ωn . Then πn is F/Fn -measurable. Moreover, if (Ω0 , F0 ) is a measurable
space, then a mapping T : Ω0 → Ω is F0 /F-measurable iff πn ◦ T is F0 /Fn -measurable for
every n.
N∞
For the construction of a suitable probability measure on n=1 Fn , we follow the elegant
argument of Saeki [42], which begins with the following lemma.
18.4.3 Lemma. Let Ω be a nonempty set and A a semiring
P∞ of subsets of Ω containing Ω.
Let P be a set function on A such that P (∅) = 0 and n=1 P (An ) = 1 whenever (An ) is a
disjoint sequence in A with union Ω. Then P extends to a probability measure on σ(A).
Proof. Let Au denote the set of all finite disjoint unions of members of A. By the proof
of 1.6.4, Au is a field. Moreover, since ∅ ∈ A, every member A of Au can be written
(non-uniquely) as an infinite disjoint union of members An of A. We P∞ shall call (An ) a
representing sequence for A. Now extend P to Au by defining P (A) = n=1 P (An ), where
(An ) is any representing sequence for A. To see that the extension is well-defined, write
Ac ∈ Au as a disjoint union B1 ∪ · · · ∪ Bm , Bj ∈ A. By hypothesis,
∞
X m
X
P (An ) = 1 − P (Bj ).
n=1 j=1
As the right side is independent of the representing sequence for A, the extension P is
well-defined. Since, by definition, P is countably additive on Au , Theorem 1.6.4 guarantees
the existence of an extension of P to σ(A).
We may now prove
N∞
18.4.4 Theorem. There exists a unique probability measure P on n=1 Fn such that
P A1 × · · · × An × Ωn+1 × Ωn+2 × · · · = P1 (A1 ) · · · Pn (An )
In particular,
where
A = A1 × · · · × An and B = B1 × · · · × Bm , Aj , Bj ∈ Fj .
Then B = A × Ωn+1 × · · · × Ωm and so
Next, we show that P has the property of the lemma. Let (An ) be a disjoint sequence in
A with union Ω. Then
∞
Y ∞
Y
An = Anj , Anj ∈ Fj , Anj = Ωj , j > jn , and P (An ) = P (Anj ).
j=1 j=1
P∞
Suppose, for a contradiction, that n=1 P (An ) 6= 1. Then there must exist an ω1 ∈ Ω1 such
that
X∞ ∞
Y
1An1 (ω1 ) P (Anj ) 6= 1;
n=1 j=2
P∞
otherwise, integrating over ω1 ∈ Ω1 would produce n=1 P (An ) = 1. It follows by similar
reasoning that there exists ω2 ∈ Ω2 such that
∞
X ∞
Y
1An1 (ω1 )1An2 (ω2 ) P (Anj ) 6= 1.
n=1 j=3
we see that
jp ∞
Y Y
1Apj (ωj ) P (Apj ) = 1. (b)
j=1 j=jp +1
Since N was arbitrary, (c) holds. From (b) and (c) we have
jp
∞ Y ∞
X Y
1Anj (ωj ) P (Anj ) = 1.
n=1 j=1 j=jp +1
is called the product of the probability spaces (Ωn , Fn , Pn ). An important special case
is the Ncountable product of probability spaces of the form (R, B(R), Pn ). Note that in this
∞
case, n=1 B(R) = B(R∞ ), where R∞ is the topological Cartesian product of countably
many copies of R. This follows from the fact that a basis for the product topology consists
of countable unions of sets of the form U1 × · · · × Un × R × R × · · · , where Uj is in a
countable basis for R. Similar remarks apply to a countable product of probability spaces
(Rd , B(Rd ), Pn ).
By definition, the random variables Xj are independent iff for each n and Bj ∈ B(R),
P (X1 , . . . , Xn ) ∈ B1 × · · · Bn = P (X1 ∈ B1 ) · · · P (Xn ∈ Bn ),
The question still remains as to whether there exist sequences of independent random
variables. Theorem 18.4.4 neatly settles that question: Consider the sequence of probability
spaces (Ωn , Fn , Pn ), where Ωn = Rd , Fn = B(Rd ), and Pn is an arbitrary d-dimensional
probability distribution on Rd , and let (Ω, F, P ) denote the product space. The projection
maps Xn : Ω → Rd are then d-dimensional random variables such that
{(X1 , · · · , Xn ) ∈ B1 × · · · × Bn } = {ω : ωj ∈ Bj , 1 ≤ j ≤ n} = B1 × · · · × Bn × Ωn+1 · · ·
In particular, PXj = Pj , and the Xj are independent since
Yn
P {(X1 , · · · , Xn ) ∈ B1 × · · · × Bn } = P (B1 × · · · × Bn × Ωn+1 · · · ) = Pj (Bj )
j=1
Yn
= P (Xj ∈ Bj )
j=1
We have proved
18.4.6 Proposition. Given a sequence (Pn ) of d-dimensional probability distributions, there
exists a probability space (Ω, F, P ) and a sequence of independent d-dimensional random
variables Xn on (Ω, F, P ) such that PXn = Pn for all n.
We return to the coin toss experiment:
18.4.7 Example. By the proposition, there exists a probability space and a sequence of
independent random variables Xn such that P (Xn = 1) = p = 1 − P (Xn = 0), where
0 < p < 1. This may be taken as a model for an infinite sequence of coin tosses, where
Xn = 1 if the nth toss is heads, Xn = 0 if the nth toss is tails, and p is the probability of
heads on a single toss. Using this model, we may determine probabilities of various interesting
events. For example, the probability that a head occurs on an even toss is
P (X2 = 1) + P (X2 = 0, X4 = 1) + P (X2 = X4 = 0, X6 = 1) + · · · = p(1 + q + q 2 + · · · ) = 1,
where q := 1 − p. The probability that the first head occurs on an even toss is
q
P (X1 = 0, X2 = 1) + P (X1 = X2 = X3 = 0, X4 = 1) + · · · = p(q + q 3 + q 5 + · · · ) = .
1+q
For a fair coin, the latter probability is 1/3. ♦
Zero-One Laws
The tail σ-field of a sequence of random variables Xn is the σ-field
∞
\
T= σ(Xn , Xn+1 , · · · ).
n=1
Members of T are called tail events. Thus tail events are unaffected by changes that occur
in finite time. For example, the events
( n
)
1X 1
ω: Xk (ω) → and {ω : Xn (ω) → 0}
n 2
k=1
hence P (Ak ) → P (A) and P (Ak ∩ A) → P (A). But A ∈ σ(Xi : i > nk ), so by independence
P (Ak ∩ A) = P (Ak )P (A). Therefore, P 2 (A) = P (A).
P∞
For a simple P
application, consider an infinite
P series n=1 Xn of independent random
∞ ∞
variables. Since n=1 Xn (ω) converges iff n=m Xn (ω) converges, the event
X∞
A= ω: Xn (ω) converges
n=1
is a tail event and so has probability 0 or 1. In 18.4.14 we give sufficient conditions for which
P (A) = 1, that is, for which the series converges almost surely.
The next result concerns a particularly important tail event and gives sufficient conditions
that determine the probability of the event.
18.4.9 Borel-Cantelli Lemma. Let (An ) be a sequence of events and let A = lim supn An ,
the event that An occurs infinitely often (i.o.).
P∞
(a) If n=1 P (An ) < ∞, then P (A) = 0.
P∞
(b) If the events An are independent and n=1 P (An ) = ∞, then P (A) = 1.
P∞
Proof. Part (a) follows from P (A) ≤ k=n P (Ak ) for all n. For (b) we have
∞ \ ∞
! m
! m
[ \ Y
c c
1 − P (A) = P Ak = lim lim P Ak = lim lim P (Ak c ),
n m n m
n=1 k=n k=n k=n
Therefore, P (A) = 1.
Note that the independence hypothesisPin (b) is crucial. For example, if P is Lebesgue
∞
measure on [0, 1] and An = [0, 1/n], then n=1 P (An ) = ∞ but P (A) = 0.
18.4.10 Example. PLet
∞
(Xn ) be a sequence of independent random variables such that
Xn → 0 a.s. Then n=1 P (Xn ≥ ε) < ∞. Otherwise, we would have P (Xn ≥ ε i.o.) = 1 by
18.4.9(b) and so P (Xn → 0) = 0. ♦
Probability Theory 455
n
1 X P
Yn := Xk − E(Xk ) → 0.
n
k=1
Proof. Since Yn has mean zero and variance vn (18.2.4), P (|Yn | ≥ ε) ≤ vn /ε2 → 0.
By strengthening the hypothesis of the weak law of large numbers, we obtain more powerful
conclusions in the form of strong laws. For these laws we need the following generalization
of Chebyshev’s inequality.
18.4.13 Kolmogorov’s Inequality. Let X1 , . . . , Xn be independent L2 random variables
with mean 0 and set Sj := X1 + · · · + Xj , j = 1, . . . , n. Then
n
1 X
P max |Sj | ≥ ε ≤ V (Xj ).
1≤j≤n ε2 j=1
Writing
2
Sn2 = (Sn − Sk + Sk ) = (Sn − Sk )2 + 2(Sn − Sk )Sk + Sk2 ,
we then have
E(Sn2 1Bk ) = E (Sn −Sk )2 1Bk +2E (Sn −Sk )Sk 1Bk +E(Sk2 1Bk ) ≥ E(Sk2 1Bk ) ≥ ε2 P (Bk ),
456 Principles of Analysis
which implies
n
X Z XZ X
V (Xk ) = V (Sn ) ≥ Sn2 dP = Sn2 dP ≥ ε2 P (Bk ) = ε2 P (A).
k=1 A k Bk k
and
lim |Sn | ≤ lim |Sn − S1 | + |S1 | ≤ sup |Sn − S1 | + |S1 |.
n n n≥2
Therefore, the claim will follow if we show that supn≥2 |Sn − S1 | < ∞ a.s. Now, for r > 0,
∞
! N
!
[ [
P sup |Sn − S1 | ≥ 2r ≤ P {|Sn − S1 | ≥ r} = lim P {|Sn − S1 | ≥ r}
n≥2 N →∞
n=2 n=2
≤ lim P max |Sn − S1 | ≥ r
N →∞ 2≤n≤N
∞
1 X
≤ 2 V (Xj ), (†)
r j=2
∞
X ∞
X ∞
X ∞
X
1+ P (Am ) = P (Am ) = (n + 1)P (Bn ) ≤ nP (Bn ) + 1,
m=1 m=0 n=0 n=1
which proves the first inequality. For the remaining inequalities, note that n1Bn ≤ |X|1Bn ≤
(n + 1)1Bn , hence
∞
X ∞ Z
X ∞
X
nP (Bn ) ≤ E |X| = |X| dP ≤ (n + 1)P (Bn ).
n=1 n=0 Bn n=0
18.4.18 L1 -Strong Law of Large Numbers. Let (Xn ) be a sequence of independent and
identically distributed L1 random variables. Then
n
1X
lim Xk = E(X1 ) a.s.
n n
k=1
By the Borel-Cantelli
Pn lemma, Xn = Yn eventually with probability one, hence it suffices to
prove that n−1 k=1 Yk → E(X1 ) a.s. Now E(Yn ) = E(X1 1{|X1 |<n} ) → E(X1 ), hence also
458 Principles of Analysis
P n Pn
n−1 k=1 E(Yk ) → E(X1 ). We must therefore show that P n−1 k=1 Yk − E(Yk ) → 0 a.s.
∞
For this it is sufficient by the L2 strong law to prove that k=1 V (Yk )/k 2 < ∞. Now,
V (Yk ) = E(Yk2 ) − E 2 (Yk ) ≤ E(Yk2 ) = E Xk2 1{|Xk |<k} = E X12 1{|X1 |<k} ,
the general result follows from the special case µ = 0 and σ = 1, which we assume. Let Fn (x)
denote the distribution function of n−1/2 Sn and F the distribution function of a standard
normally distributed random variable. The assertion of the theorem is that Fn (x) → F (x)
for all x. By 7.4.3, this is equivalent to the vague convergence Qn → Q, where
Z
√ 1 2
Qn (B) = P Sn ∈ nB and Q(B) = √ e−t /2 dt.
2π B
b 1 (ξ) = 1 + 1 Q b 00 2
Q 2 1 (θ(ξ))ξ , where |θ(ξ)| ≤ |ξ|.
√
By independence, Qn (B) = PX1 ∗ · · · ∗ PXn ( n B), hence
Z Z h
−1/2 i n
b
Qn (ξ) = · · · e−2πin ξ(x1 +···+xn )
dPX1 (x1 ) · · · dPXn (xn ) = Qb 1 n−1/2 ξ
ξ 2 n
b 00 −1/2 an n
= 1 + Q1 θ(n ξ) = 1+ ,
2n n
where
b 001 (0) ξ 2 /2 = −2π 2 ξ 2 .
an := Q001 θ(n−1/2 ξ) ξ 2 /2 → Q
b n (ξ) → Q(ξ),
Therefore, Q b completing the proof.
18.4.20 Remark. In the special case that the random variables Xj are Bernoulli with
parameter p ∈ (0, 1), (18.4) becomes
Z x
Sn − np 1 2
lim P √ ≤x = √ e−t /2 dt, q := 1 − p,
n→∞ npq 2π −∞
a result is known as the DeMoivre-Laplace theorem. One can use this equation to obtain
the following approximation for the pmf of the binomial random variable Sn :
k − .5 − np Sn − np k + .5 − np
P (Sn = k) = P (k − .5 < Sn < k + .5) = P √ < √ < √
npq npq npq
k + .5 − np k − .5 − np
≈Φ √ −Φ √ , k = 0, 1, . . . , n,
npq npq
where Z x
1 2
Φ(x) := √ e−t /2
dt.
2π −∞
Since S1 (ω) = X(ω) and Mn (T ω) ≥ 0, the inequality also holds for k = 0. Therefore,
Noting that Mn (ω) = max1≤j≤n Sj (ω) on the set {Mn > 0}, we have
Z Z Z
X dP ≥ Mn (ω) dP (ω) − Mn (T ω) dP (ω)
{Mn >0} {M >0} {M >0}
Z n Z n
≥ Mn (ω) dP (ω) − Mn (T ω) dP (ω).
{Mn >0}
R
Since T is measure preserving, the last difference reduces to − {Mn ≤0}
Mn dP ≥ 0.
Sn
lim = E(X | I) a.s., (18.5)
n n
where I is the σ-field of all invariant events. In particular, if T is ergodic, then E(X | I) =
E(X).
Proof. First, note that E(X | I) ◦ T = E(X | I). Indeed, since T is I-measurable so is
E(X | I) ◦ T , and, by the measure preserving property of T ,
Z Z
E(X | I) ◦ T dP = E(X | I) dP for all A ∈ I.
A A
By the invariance of A,
k−1
1e 1X 1
Sk (ω) = [X(T j ω) − ε]1A (T j ω) = Sk (ω) − ε 1A (ω),
k k j=0 k
Therefore, P (A) = 0, that is, Y ≤ ε a.s. Since ε was arbitrary, limk Sk /k ≤ 0 a.s. Since the
argument holds for −X as well, we also have limk (−Sk /k) ≤ 0, that is, limk (Sk /k) ≥ 0 a.s.
Therefore, limk Sk /k = 0 a.s., as required.
Finally, if T is ergodic, then I consists only of sets with measure zero or one, hence
E(X | I) = E(X).
The following result is an analog of a result proved in the general setting of weakly almost
periodic semigroups of operators on Banach spaces. (See 17.6.9 and the paragraph following.)
We give a probabilistic proof here.
kAn (X) − E(X | I)k1 ≤ kAn (X − Y )k1 + kAn (Y ) − E(Y | I)k1 + kE(X − Y | I)k1 . (†)
We show that each of the terms on the right may be made arbitrarily small.
By the result of the first paragraph, for sufficiently large n,
Therefore, by (†), kAn (X) − E(X | I)k1 < 3ε for all sufficiently large n.
Stationary Processes
A sequence of random variables Xn on a probability space (Ω, F, P ) is called a stationary
process if
P (Xn , Xn+1 , . . .) ∈ B = P (X1 , X2 , . . .) ∈ B ∀ B ∈ B(R∞ ) and n ∈ N. (18.6)
In particular, taking B = B1 × R × · · · , we see that P (Xn ∈ B1 ) = P (X1 ∈ B1 ), so that the
random variables Xn are identically distributed. In this section we prove an ergodic theorem
for stationary processes.
For an example of a stationary process, let T : Ω → Ω be a measure preserving transfor-
mation and X1 a random variable. Then the sequence (Xn := X1 ◦ T n−1 ) is a stationary
process. To see this, let
An := {(Xn , Xn+1 , . . .) ∈ B}
and note that because Xn = Xn−1 ◦ T we have T −1 (An−1 ) = An and so P (An ) =
P (T −1 (An−1 )) = P (An−1 ). Iterating, we obtain (18.6).
Now let X = (Xn ) be an arbitrary stationary process on (Ω, F, P ) and let T : R∞ → R∞
denote the left shift operator T (x1 , x2 , . . .) = (x2 , x3 , . . .). Thus for all n
(Xn+1 , Xn+2 , . . .) = T n (X1 , X2 , . . .) =: T n ◦ X.
Clearly, T is a measurable transformation on (R∞ , B(R∞ ), PX ). Moreover, T is measuring
preserving. Indeed, from
T −1 (B1 × · · · × Bn × R × · · · ) = R × B1 × · · · × Bn × R × · · ·
we have, by stationarity,
T PX (B1 × · · · × Bn × R × · · · ) = P ((X1 , X2 , . . .) ∈ R × B1 × · · · × Bn × R × · · · )
= P ((X2 , X3 , . . .) ∈ B1 × · · · × Bn × R × · · · )
= P ((X1 , X2 , . . .) ∈ B1 × · · · × Bn × R × · · · )
= PX (B1 × · · · × Bn × R × · · · ).
Thus the measures T (PX ) and PX agree on the sets B1 × · · · × Bn × R × · · · and so are
equal, by the uniqueness theorem for measures.
Now call A ∈ F invariant if there exists a B ∈ B(R∞ ) such that
A = {(Xn+1 , Xn+2 , . . .) ∈ B} for all n ≥ 0.
The set I of all invariant sets is easily seen to be a σ-subfield of F. Since the preceding
relationship between A and B may be written as
1A = 1B ◦ T n ◦ X for all n ≥ 0,
Probability Theory 463
the usual arguments show that a function f on Ω is I-measurable iff there exists a B(R∞ )-
measurable function g such that
f = g ◦ T n ◦ X for all n ≥ 0.
Proof. Let πn : R∞ → R denote the nth coordinate projection. Then πn is a random variable
on (R∞ , B(R∞ ), PX ) and πn = π1 ◦PT n−1 . Since T is measure preserving, by the Birkhoff
n
ergodic theorem the averages n−1 j=1 πj converge a.s. and in L1 on (R∞ , B(R∞ ), PX ).
Pn
Since (πn ) has distribution PX , n−1 j=1 Xn converges a.s. and in L1 to some random
variable Y ∈ L1 (P ). To see that Y = E(X1 | I), note first that since Y = g ◦ T n ◦ X for
all n, where g is the measurable function g(x1 , x2 , . . .) = limk (x1 + · · · + xk /k), the random
variable Y is I-measurable. It remains to show that
Z Z
Y dP = X1 dP for all A ∈ I. (†)
A A
But if A is invariant, say A = {(Xk , Xk+1 , . . .) ∈ B} for all k, then, by the stationary
property of X,
Z Z Z
Xk dP = xk dPX (x1 , x2 , . . .) = x1 dPX (x1 , x2 , . . .)
A {(xk ,xk+1 ,...)∈B} {(x1 ,x2 ,...)∈B}
Z
= X1 dP,
A
hence Z Z
n
1X
Xk dP = X1 dP.
A n A
k=1
Filtrations
A (discrete-time) filtration on (Ω, F, P ) is a sequence of σ-fields Fn such that
A probability space
with a filtration is called a filtered probability space and is denoted
by Ω, F, (Fn ), P . It is sometimes useful to view a filtration as a mathematical description of
the information produced by an experiment consisting of repeated trials. At the completion of
the nth trial, Fn encapsulates the information revealed by the outcome of this and previous
trials.
A stochastic process (Xn ) is said to be adapted to a filtration (Fn ) if for all n the
random variable Xn is Fn -measurable. For example, (Xn ) is clearly adapted to the filtration
FX = FX X
n , Fn := σ(X1 , . . . , Xn ) ,
which is called the natural filtration of (Xn ). As noted above, a filtration models the
evolution of information. Thus if (Xn ) is adapted to a filtration (Fn ), then the σ-field Fn
includes all knowable information about the process up to time n. The natural filtration
includes all knowable information about the process up to time n but nothing more.
Indeed, from Xm−1 ≤ E(Xm | Fm−1 ) and the tower property we have
Xm−2 ≤ E(Xm−1 | Fm−2 ) ≤ E E(Xm | Fm−1 ) | Fm−2 = E(Xm | Fm−2 ).
Iterating we obtain (18.7). Submartingales and martingales have analogous multistep prop-
erties. Note that (Xn , Fn ) is a submartingale iff (−Xn , Fn ) is a supermartingale.
We may think of a martingale as the accumulated winnings of a gambler in a sequence
of fair games. The martingale condition, which may be written E(Xn+1 − Xn | Fn ) = 0,
then asserts that the best prediction of the gain Xn+1 − Xn on the next play, based on the
information Fn obtained during the first n plays, is zero, the hallmark of a fair game. The
games favor the house (respectively, the player), if the winnings constitute a supermartingale
(respectively, a submartingale).
Probability Theory 465
(d) Consider a sequence of finite partitions Pn := {An,1 , An,2 , . . . , An,mn } of Ω such that
each member of Pn is a union of members of Pn+1 . Let Q be a probability measure such
that Q P . Define
mn
X Q(An,j )
Xn = an,j 1An,j , an,j := ,
j=1
P (An,j )
(b) If (Xn , Fn ) and (Yn , Fn ) are sub (super) martingales and a, b ≥ 0, then (Zn , Fn ) is a
sub (super) martingale.
For the remainder of the subsection, we focus mainly on submartingales. Corresponding
results for supermartingales may be obtained by considering (−Xn ). The next result describes
several ways of generating submartingales.
466 Principles of Analysis
18.5.3 Theorem. Let (Xn ) and (Yn ) be processes on a filtered probability space
(Ω, F, (Fn ), P ).
(a) If (Xn , Fn ) and (Yn , Fn ) are submartingales, then (Xn ∨ Yn , Fn ) is a submartingale.
In particular, (Xn+ , Fn ) is a submartingale.
(b) If (Xn , Fn ) is a submartingale, φ is convex and increasing, and φ(Xn ) ∈ L1 for all n,
then φ(Xn ), Fn is a submartingale.
(c) If (Xn , Fn ) is a martingale, φ is convex, and φ(Xn ) ∈ L1 for all n, then φ(Xn ), Fn
is a submartingale. In particular, (|Xn |, Fn ) is a submartingale.
Proof. For (a) we have E(Xn+1 ∨ Yn+1 | Fn ) ≥ E(Xn+1 | Fn ) ≥ Xn , with a similar
inequality for Y . Therefore, E(Xn+1 ∨ Yn+1 | Fn ) ≥ Xn ∨ Yn . Part
(b) follows from the
conditional form of Jensen’s inequality: φ(Xn ) ≤ φ E(Xn+1 | Fn ) ≤ E φ(Xn+1 ) | Fn ) .
The proof of part (c) is similar.
The following theorem asserts that reducing the amount of information provided by
a filtration preserves the submartingale property. (The same is not necessarily true if
information is increased.)
18.5.4 Theorem. Let (Gn ) and (Fn ) be filtrations with Gn ⊆ Fn ⊆ F. If (Xn ) is adapted
to (Gn ) and is a submartingale with respect to (Fn ), then it is also a submartingale with
respect to (Gn ).
Proof. E(Xn+1 | Gn ) = E E(Xn+1 | Fn ) | Gn ≥ E(Xn | Gn ) = Xn .
show that τ is a stopping time relative to the natural filtration of (Xn ). In this connection,
note that the function
(
max{n : Xn (ω) < 0} if {n : Xn (ω) < 0} = 6 ∅
σ(ω) =
∞ otherwise
Probability Theory 467
is not a stopping time. This is a mathematical formulation of the self-evident fact that one
cannot predict the future: By knowing merely the past history of the process, one cannot
expect (in the absence of prescience) to know when the process will be negative for the last
time. ♦
One of the most important facts regarding stopping times is that they may be combined
with submartingales to produce submartingales indexed by random times, a construct useful
in contexts where one may wish to stop a process when a certain goal is achieved. (Think of
a gambler who resolves to stop playing as soon as he has amassed sufficient winnings.) We
shall need these so-called stopped processes in the proof of Doob’s martingale convergence
theorem below.
The main result of the current subsection depends on the following notions: Let
Ω, F, (Fn ), P be a filtered probability space and let (Xn ) be a process adapted to (Fn ). If
τ is a stopping time taking values in N ∪ {∞}, then the stopped random variable Xτ is
defined by
X∞
Xτ (ω) := Xτ (ω) (ω)1{τ <∞} (ω) = 1{τ (ω)=j} Xj (ω).
j=1
Fτ = {A ∈ F : A ∩ {τ ≤ n} ∈ Fn ∀ n ∈ N} .
Since
A ∩ {σ = j, τ = j + 1} = A ∩ {σ = j} ∩ {τ > j} ∈ Fj ,
the terms in the above sum are nonnegative by the submartingale property. Therefore,
Xσ ≤ E(Xτ | Fσ ). For the general case, define stopping times ρi = τ ∧ (σ + i) (0 ≤ i ≤ n).
Then σ = ρ0 ≤ ρ1 ≤ · · · ≤ ρn = τ and ρi+1 − ρi ≤ 1, hence, by the special case,
Z Z Z
Xσ dP ≤ Xρ1 dP ≤ · · · ≤ Xτ dP, A ∈ Fσ .
A A A
18.5.7 Corollary. Let (Xn ) be a submartingale and let τ be a stopping time. Then (Xn∧τ )
is a submartingale.
Proof. This follows immediately from 18.5.6 and the inequality n ∧ τ ≤ (n + 1) ∧ τ .
The process (Xn∧τ ) in the corollary is called the stopped process relative to (Xn ) and τ .
468 Principles of Analysis
Upcrossings
The martingale convergence theorem, proved in the next subsection, is one of the key
results in martingale theory. The proof is based on Doob’s notion of upcrossings, which
we now describe.
Let (xn ) be any sequence in R. Given real numbers a < b, define a sequence (τn ) with
values in N ∪ {∞} by
τ1 := inf{j ≥ 1 : xj ≤ a}, τ2 := inf{j > τ1 : xj ≥ b},
(18.8)
τ2n−1 := inf{j > τ2n−2 : xj ≤ a}, τ2n := inf{j > τ2n−1 : xj ≥ b}.
Here, as usual, we set inf ∅ = ∞. Clearly, the sequence (τn ) is increasing, τn ≥ n, and
xτ2n−1 ≤ a < b ≤ xτ2n if τ2n−1 < ∞.
From the definition we see that τ1 is the first time the sequence is below a, τ2 the first time
after τ1 that the sequence is above b, etc. It follows that τ2 is the time of the first upcrossing
of the interval [a, b], τ4 the time of the second upcrossing, and in general τ2k is the time of
n
the kth upcrossing. The number U[a,b] of upcrossings of the interval [a, b] up to time n by
the sequence (xn ) is the largest k for which τ2k ≤ n:
n
U[a,b] := sup{k : τ2k ≤ n}.
n
If the set in the definition is empty, we define U[a,b] = 0. Obviously,
n n
U[a,b] ≤ n, and k > U[a,b] iff τ2k > n.
The total number of upcrossings is defined as
n
U[a,b] = sup U[a,b] = sup{k : τ2k < ∞}.
n
(xn )
τ1 τ2 τ3 τ4 τ5 τ6
16
FIGURE 18.1: U[a,b] = 3.
The connection between upcrossings and convergence of the sequence (xn ) is given in the
following lemma.
18.5.8 Lemma. A sequence (xn ) of real numbers converges in R iff U[a,b] < ∞ for all
a, b ∈ Q with a < b.
Proof. Set α := limn xn and β := limn xn . If U[a,b] = ∞ for some a < b, then xn ≤ a for
infinitely many n and xn ≥ b for infinitely many n, hence α ≤ a < b ≤ β and so (xn ) cannot
converge in R. Conversely, if (xn ) does not converge in R, then there exist rationals a and b
such that α < a < b < β. It follows that xn < a infinitely often and xn > b infinitely often,
hence U[a,b] = ∞.
Probability Theory 469
Now consider a process X = (Xn ). For each ω and pair of real numbers a, b, we may apply
the above construction to the sequence (Xn (ω)) to obtain N ∪ {∞}-valued the functions τn ,
n
U[a,b] , and U[a,b] on Ω. It is easily established by induction that τn is a stopping time. For
example,
n
In particular, U[a,b] and U[a,b] are F∞ -measurable. Here is the key result regarding upcross-
ings.
18.5.9 Upcrossing Inequality
(Doob). Let (Xn ) be a submartingale on a filtered proba-
bility space Ω, F, (Fn ), P . Then, for any a < b,
n
1 1
E U[a,b] ≤ E (Xn − a)+ + |a| ≤ E |Xn | + 2|a| , n ∈ N.
b−a b−a
Denote the first sum on the right by S1 and the second by S2 . Now, if τ2j−1 ≥ n, then
Xeτ ∧n − X eτ = 0, and if τ2j−1 < n then Xτ2j−1 ≤ a so X eτ = 0. Therefore,
2j 2j−1 ∧n 2j−1
S1 includes all differences corresponding to the upcrossings of [0, c] up to time n, hence
S1 ≥ cU e n and so E S1 ≥ cE U e n . Moreover, by optional stopping, E S2 ≥ 0. Therefore,
[0,c] [0,c]
E(X en − Xe0 ) ≥ E S1 +E S2 ≥ cE U e n . Since −X
e0 = −(0−a)+ ≤ |a|, the desired inequalities
[0,c]
follow.
Convergence of Martingales
Throughout this subsection, (Xn ) is an adapted process on a filtered probability space
(Ω, F, (Fn ), P ). There are several important results on the convergence of martingales. One
of the most basic is the following:
18.5.10 Martingale Convergence Theorem (Doob). Let (Xn , Fn ) be a submartingale
such that supn kXn k1 < ∞. Then (Xn ) converges almost surely to an L1 random variable
X∞ . If, additionally, (Xn ) is uniformly integrable, then the convergence is in L1 .
Proof. By 18.5.9, for each n
n
1 1 2|a|
E U[a,b] ≤ E |Xn | + 2|a| ≤ sup kXn k1 + .
b−a b−a n b−a
n
Since and U[a,b] ↑ U[a,b] , by the monotone convergence theorem
1 2|a|
E U[a,b] ≤ sup kXn k1 + < ∞.
b−a n b−a
470 Principles of Analysis
Therefore, U[a,b] is finite a.s. By 18.5.8, (Xn ) converges a.s. to a measurable function
R R
X∞ : Ω → R. By Fatou’s lemma, |X∞ | dP ≤ limn |Xn | dP < ∞, hence X∞ ∈ L1 . This
proves the first part of the theorem. The last part follows from 4.4.5.
18.5.11 Corollary. Let (Xn ) be a submartingale such that Xn ≤ 0 for all n. Then (Xn )
converges almost surely to an L1 random variable X∞ .
Proof. By the submartingale property, E X1 ≤ E Xn , hence E |Xn | = −E Xn ≤ −E X1 and
so supn kXn k1 < ∞.
18.5.12 Corollary. Let 1 ≤ p < ∞ and let (Xn ) be a submartingale such that (|Xn |p ) is
uniformly integrable. Then (Xn ) converges almost surely and in Lp to an Lp random variable
X∞ .
Proof. By uniform integrability, supn kXn kp < ∞ (4.4.2). Since kXn k1 ≤ kXn kp , (Xn )
converges almost surely to an L1 random variable X∞ . By 4.4.5, the convergence is in Lp
norm.
18.5.13 Corollary. Let (Xn ) be a martingale such that (Xn ) is uniformly integrable. Then
(Xn ) converges almost surely and in L1 to a random variable X∞ with the property that
Xn = E(X∞ | Fn ) a.s. for all n.
Proof. All but the last assertion follows from the preceding corollary. For the desired equality,
let A ∈ Fn and note that for all m ≥ n,
Z Z Z Z
m→∞
Xn dP = Xm dP → X∞ dP = E(X∞ | | Fn ) dP
A A A A
so Xn = E(X∞ | Fn ) a.s.
S
18.5.14 Corollary. Let X ∈ L1 and denote by F∞ the σ-field generated by Fn . Then
E(X | Fn ) → E X | F∞ a.s. and in L1 .
Proof. Let Xn = E X | Fn . Then (Xn ) is a uniformly integrable martingale (18.5.1(c))
and supn kXn k1 ≤ kXk1 < ∞, hence (Xn ) converges a.s. and in L1 to some F∞ -random
variable X∞ . If A ∈ Fm and n > m, then
Z Z Z Z
E X | F∞ dP = X dP = E X | Fn dP → X∞ dP.
A A A A
R R
Therefore A
E X | F∞ dP = A
X∞ dP for all A ∈ F∞ and so X∞ = E X | F∞ a.s.
Recall that the L1 strong law of large numbers asserts that for independent and identically
distributed (iid)PL1 random variables Xn , the sample averages Sn /n tend to E(X1 ) a.s. or,
1 n
equivalently, n k=1 (Xn − E(Xn )) → 0 a.s. The following generalization removes the iid
requirement.
18.5.15 Corollary. Let supn kXn k1 < ∞. Set F0 = {∅, Ω}. Then
n
1X
Xj − E(Xj | Fj−1 ) → 0 a.s.
n j=1
Probability Theory 471
Pn
Proof. Set Yn = j=1 j −1
Xj − E(Xj | Fj−1 )]. Then
1
Yn+1 − Yn = Xn+1 − E(Xn+1 | Fn ) ,
n+1
hence E Yn+1 − Yn | Fn = 0, that is, (Yn ) is a martingale. Since supn kYn k1 < ∞, (Yn )
converges a.s. to a random variable Y∞ . The conclusion now follows from Kronecker’s
lemma.
Reversed Martingales
A reversed filtration on a probability space (Ω, F, P ) is a sequence of sub-σ fields Fn
of F such that
· · · ⊆ Fn+1 ⊆ Fn · · · ⊆ F1 .
For example, if (Xn ) is a sequence of random variables, then Fn := σ(Xk : k ≥ n) defines a
reversed filtration.
Now let (Xn ) be an L1 process such that Xn is Fn -measurable for each n. Then (Xn , Fn )
is a reversed martingale if
Iterating, we obtain
E(Xn | Fn+p ) = Xn+p . (18.9)
One may also formulate in an analogous way the notions of reversed submartingales and
reversed supermartingales. We consider only the martingale case.
Here is the reversed martingale analog of Doob’s convergence theorem. Note that the
hypothesis supn kXn k1 < ∞ is not needed in this setting.
18.5.16 Theorem. Let (Xn , Fn ) be a reversed martingale. Then there exists a random
variable X∞ such that
lim Xn = X∞ a.s. and in L1 .
n
T
Moreover, X∞ = E(X1 | F∞ ) a.s., where F∞ = n Fn .
Proof. We apply Doob’s upcrossing inequality to the number U n [a, b] of upcrossings of [a, b]
of the sequence Xn , Xn−1 , . . . , X1 . This gives
n
1
E U[a,b] ≤ E |X1 | + 2|a| .
b−a
n
Since U[a,b] ↑ U[a,b] , which is the number of upcrossings of the infinite sequence
· · · Xn , Xn−1 , · · · X1 , we see that U[a,b] < ∞ a.s. and so Xn → X∞ a.s., as before. Since
Xn = E(X1 | Fn ), (Xn ) is uniformly integrable (18.5.1(c)), hence the convergence is also
L1 .
Since Xm is Fn -measurable for all m ≥ n, X∞ is Fn -measurable for all n, that is, X∞ is
F∞ -measurable. Also, from E(X1 | Fn ) = Xn we have
E(X1 | F∞ ) = E E(X1 | Fn ) | F∞ = E(Xn | F∞ ),
hence Z Z Z
E(X1 | F∞ ) dP = Xn → X∞ dP, ∀ A ∈ F∞ .
A A A
For an application, consider an iid process (Xn ) with |E(X1 )| < ∞ and set Sn =
X1 + · · · + Xn and Yn = n−1 Sn − E(X1 ). Then (Yn ) is a reversed martingale with respect
to the reversed filtration Fn := σ(Xk : k ≥ n). Indeed, from
and so Yn+1 = E(Yn | Fn+1 ). Since E(Y1 ) = 0, the theorem implies that Yn → 0 a.s. and
in L1 , which is the law of large numbers with the added feature of L1 convergence (which
could also have been established originally.)
The probability measures P(i1 ...,in ) are called the finite dimensional distributions of the
process X. Note that these distributions satisfy the following consistency conditions: For
all n and Bj ∈ B,
C1. Piτ 1 ···iτ n Biτ 1 × · · · × Biτ n = Pi1 ···in Bi1 × · · · × Bin ∀ permutation τ of (1, . . . , n).
C2. Pi1 ···in+1 (Bi1 × · · · × Bin × S = Pi1 ···in (Bi1 × · · · × Bin .
The problem we consider in this section is the converse: Given an index set I and a family
D(I) of finite dimensional distributions satisfying the above consistency conditions, find a
probability space (Ω, F, P ) and a process such that (18.10) holds for all members of D(I).
To construct such a process, we must first define the product measurable space S I for a
general measurable space (S, F).
Probability Theory 473
Let I be an arbitrary index set, (S, F) an arbitrary measurable space, and S I the
collection of all functions f : I → S. For n ∈ N, let (S n , Fn ) = (S × · · · × S, F ⊗ · · · ⊗ F)
denote the n-fold product σ-field. In what follows, we consider finite sequences (i1 , . . . , in )
of distinct members in I. These will be called index sequences. For such sequences, we
write (i1 , . . . , in ) ⊆ (j1 , . . . , jp ) if {i1 , . . . , in } ⊆ {j1 , . . . , jp }. Define the projection map
corresponding to the index sequence (i1 , . . . , in ) by
πi1 ···in : S I → S n , πi1 ···in (f ) = f (i1 ), . . . , f (in ) .
Proof. Let τ be a permutation of {1, . . . , p} such that the first n coordinates of (jτ 1 , . . . , jτ p )
are i1 , . . . , in . Define
A0 = {(x1 , . . . , xp ) : (xτ 1 , . . . , xτ n ) ∈ A}.
Then A0 is the preimage of A under a measurable
p
mapping S → S , hence
n
A0 ∈ F p .
Moreover, f (i1 ), . . . , f (in ) = f (jτ 1 ), . . . , f (jτ n ) ∈ A iff f (j1 ), . . . , f (jp ) ∈ A0 , so the
desired equation holds.
18.6.2 Corollary. Given cylinder sets
πi−1
1 ···in
(A) and πj−1
1 ···jm
(B), A ∈ Fn , B ∈ Fm , (18.11)
πi−1
1 ···in
(A) = πk−1
1 ···kp
(A0 ) and πj−1
1 ···jm
(B) = πk−1
1 ···kp
(B 0 ). (18.12)
We denote the collection of all cylinder sets by C(I). Except in trivial cases, C(I) is not a
σ-field. However,
18.6.3 Proposition. C(I) is a field.
πi−1
1 ···in
(A) ∪ πj−1
1 ···jm
(B) = πk−1
1 ···kp
(A0 ∪ B 0 ),
which shows that C(I) is closed under finite unions. Since the complement of πi−1 1 ···in
(A) is
πi−1
1 ···in
(A c
), C(I) is also closed under complementation and hence is a field.
The σ-field generated by C(I) is called the product σ-field and is denoted by FI . The
equality
n
\
πi−1
1 ···in
(B 1 × · · · × B n ) = πi−1
j
(Bj )
j=1
shows that FI is also the σ-field generated by all the projection mappings πi . As a conse-
quence, we have
474 Principles of Analysis
Proof. Let G denote the collection of all subsets A of S I with the stated property. We show
that FI ⊆ G. To this end, note first that G contains all sets of the form A := πi−1 (B), B ∈ F;
indeed, one need only take JA = {i}. Since these are generators for FI , the desired inclusion
will follow if we show that G is a σ-field.
Suppose A ∈ G and let f ∈ Ac , g ∈ S I . If f (j) = g(j) for all j ∈ JA , then g ∈ Ac ;
otherwise, g would lie in A forcing f to lie in A. Therefore, we may take JAc = JA , showing
S
that G is closed under complementation. Now let (An ) beSa sequence in G and set A := n An .
Let f ∈ A, g ∈ S I , such that f (j) = g(j) S for all j ∈ n JAn . Since f ∈ Am for some m,
g ∈ Am ⊆ A. Therefore we may take JA = n JAn . Since this is countable, A ∈ G.
Proof. Suppose, for a contradiction, that C[0, ∞) ∈ B[0,∞) . By the proposition there exists
a countable subset D of [0, ∞) with the property
f ∈ C[0, ∞), g ∈ RI , and f (t) = g(t) ∀ t ∈ D ⇒ g ∈ C 0, ∞) .
Proof. Define P on C(I) by (18.13). To see that P is well-defined, suppose that πi−1 1 ···in
(A) =
πj−1
1 ···jm
(B). Represent these cylinder sets as in (18.12). Then A 0
= B 0
, and it follows from
the consistency conditions that
Pi1 ...in (A) = Pk1 ...kp (A0 ) = Pk1 ...kp (B 0 ) = Pj1 ...jm (B).
We show next that P is a probability measure on C(I). The conclusion of the theorem
will then follow from the measure extension theorem (1.6.4). Clearly P (S I ) = 1. To see that
P is finitely additive on C(I), represent disjoint cylinder sets as in (18.12). Then A0 and B 0
must be disjoint and
πi−1
1 ···in
(A) ∪ πj−1
1 ···jm
(B) = πk−1
1 ···kp
(A0 ∪ B 0 ),
Probability Theory 475
hence
P πi−1
1 ···in
(A) ∪ πj−1
1 ···jm
(B) = Pk1 ···kp (A0 ∪ B 0 ) = Pk1 ···kp (A0 ) + Pk1 ···kp (B 0 )
= P πi−11 ···in
(A) + P πj−1
1 ···jm
(B) .
Since C(I) is a field, and P is finitely additive, P is monotone. It remains to show that if
(An ) is a sequence in C(I) and An ↓ ∅, then P (AnT ) → 0. Let r := limn P (An ). We show that
the assumption r > 0 implies the contradiction n An 6= ∅.TNow, by 18.6.1, it is possible,
without affecting monotonicity or changing the intersection n An , to precede the sequence
(An ) by terms S I and to insert duplicate terms Aj . Thus we may assume there exists an
infinite sequence of distinct indices in such that
An = πI−1
n
(Bn ), where In = (i1 , . . . , in ) and Bn ∈ Bn .
By regularity, choose a compact set Cn ⊆ Bn with PIn (Bn \ Cn ) < r/2n+1 and set
n
\
Dn = πI−1
n
(Cn ) and En = Dj .
j=1
Then by monotonicity
[
n Xn n
X
P (An \ En ) = P (An \ Dj ) ≤ P (Aj \ Dj ) = PIj (Bj \ Cj ) ≤ r/2,
j=1 j=1 j=1
and since En ⊆ An we see that P (En ) ≥ P (An ) − r/2 ≥ r/2 > 0. Therefore, En 6= ∅.
Choosing fn ∈ En we have
fn (i1 ), . . . , fn (in ) ∈ Cj , 1 ≤ j ≤ n.
In particular, fn (i1 ) ∈ C1 for all n ≥ 1, and since C1 is compact there exists a subsequence
(1) (1) n (1) (1)
(fn ) of (fn ) such that fn (i1 ) → x1 ∈ C1 . Likewise, since (fn (i1 ), fn (i2 )) ∈ C2 for all
(2) (1) (2) (2) n
n ≥ 2, exists a subsequence (fn ) of (fn ) such that (fn (i1 ), fn (i2 )) → (x1 , x2 ) ∈ C2 .
(k)
By induction we may construct successive subsequences (fn ) such that for all k,
n
fn(k) (i1 ), fn(k) (i2 ), . . . , fn(k) (ik ) → (x1 , x2 , . . . , xk ) ∈ Ck . (†)
(n)
For each k, the diagonal sequence (fn (ik ))n then converges to xk . Now choose any f
such that f (ij ) = xj for all j. Then by (†), f ∈ Dk ⊆ Ak for all k, which is the desired
contradiction. This proves that P is a probability measure on C(I).
Taking Xi to be the projection map πi , we now have the following resolution to the
problem stated at the beginning of the section.
18.6.8 Corollary. Given D(I) as above, there exists a probability space (Ω, F, P ) and
a family of Rd -valued random variables such that 18.10 holds for every finite sequence
(i1 , . . . , in ).
The following version of the theorem is useful in the important special case I = (0, ∞)
and d = 1.
18.6.9 Corollary. Suppose that for each finite ordered sequence t1 < t2 < · · · < tn in (0, ∞)
476 Principles of Analysis
there exists a probability distribution Pt1 ···tn with cdf Ft1 ···tn such that for all n and k with
1 ≤ k ≤ n,
lim Ft1 ···tn (x1 , . . . , xn ) = Ft1 ···tk−1 tk+1 ···tn (x1 , . . . , xk−1 , xk+1 , . . . , xn . (18.14)
xk →∞
Then there exists a unique probability measure P on the product space (R(0,∞) , B(R)(0,∞) )
such that for every sequence t1 < t2 < · · · < tn ,
P πt−1
1 ···tn
(A) = Pt1 ···tn (A) ∀ A ∈ B(Rn ).
Proof. The hypothesis implies that for all Bj ∈ B(R) and t1 < t2 < · · · < tn ,
Pt1 ···tk−1 tk+1 ,···tn (B1 × · · · × Bk−1 × Bk+1 · · · × Bn
= Pt1 ···tk−1 tk tk+1 ,...,tn ) (B1 × · · · × Bk−1 × R × Bk+1 × · · · × Bn . (†)
The idea is to enlarge the collection of probability distributions to include all index sequences
(s1 , . . . , sn ) and then apply the extension theorem. This is accomplished as follows: Given
an arbitrary sequence (s1 , . . . , sn ) of distinct sj , define
where τ is the unique permutation of (1, . . . , n) that orders s1 , . . . , sn , that is, that produces
the natural ordering sτ 1 < · · · < sτ n . If σ is any permutation of (1, . . . , n) and τ is the
permutation that orders sσ1 , . . . , sσn , then τ σ is the permutation that orders s1 , . . . , sn ,
hence
Psσ1 ···sσn (Bσ1 × · · · × Bσn ) = Psτ σ1 ···sτ σn (Bτ σ1 × · · · × Bτ σn ) = Ps1 ···sn (B1 × · · · × Bn ).
This shows that consistency condition C1 holds. To verify C2, we must show that
Ps1 ···sn (B1 × · · · × Bn−1 × R) = Ps1 ···sn−1 (B1 × · · · × Bn−1 .
But a permutation that orders s1 , . . . , sn then places R in some position k, and an application
of (†) yields the desired equation.
Black-Scholes model for option pricing (discussed in §18.9). In the current section we consider
a one-dimensional version of (mathematical) Brownian motion, which may be viewed as a
model for the motion of a Brownian particle projected onto a vertical axis.
For a mathematical description of Brownian motion, we need to extend some earlier
terminology. A (continuous-time) filtration on a probability space (Ω, F, P ) is a family of
σ-fields Ft indexed by t ∈ [0, ∞) such that Fs ⊆ Ft ⊆ F for all 0 ≤ s ≤ t. A probability space
with a filtration is called a filtered probability space and is denoted by Ω, F, (Ft )t≥0 , P .
As in the case of discrete time, a filtration (Ft ) may be viewed as a mathematical model for
ever more precise information produced by an experiment evolving in time. An important
example is the natural filtration FX = (FX X
t ) of a process X = (Xt ), where Ft is the
σ-field σ(Xs : 0 ≤ s ≤ t), which consists precisely of the information revealed by the process
up to time t. A stochastic process (Xt ) is said to be adapted to a filtration (Ft ) if for all
t the random variable Xt is Ft -measurable. For example, a process is always adapted to its
natural filtration, but there may be reason to consider larger filtrations.
A (one-dimensional) Brownian motion or Wiener process on a filtered probability
space (Ω, F, (Ft )t≥0 , P) is a stochastic process W = (W (t))t≥0 adapted to (Ft )t≥0 such that
the following conditions hold:
(a) W0 = 0;
(b) For 0 ≤ s < t, the increment W (t) − W (s) is normal with mean zero and variance t − s,
that is,
Z h
1 x2 i
P W (t) − W (s) ∈ B = p exp − dx, B ∈ B(R).
2π(t − s) B t−s
Note that W (t) has independent increments, that is, if 0 < t1 < t2 < · · · < tn then
the random variables W (t1 ), W (t2 ) − W (t1 ), . . . , W (tn ) − W (tn−1 ) are independent. This
follows by induction from (c) and the fact that (Wt ) is adapted to (Ft ).
where ∆xk = xk − xk−1 and ∆tk = tk − tk−1 (x0 = t0 = 0). Then ft1 ···tn is a density for an
n-dimensional cdf Ft1 ···tn that satisfies the consistency condition (18.14). By 18.6.9 there
exists a probability space (Ω, F, P ) and a process (Xt ) such that
Z
P (Xt1 , . . . , Xtn ) ∈ B = ft1 ···tn (x) dx, B ∈ B(Rn ).
B
478 Principles of Analysis
This shows that (Xt ) satisfies (b) and (c) of the definition of Brownian motion. Setting
X0 = 0 completes the construction.
It remains to show that there exists a continuous process satisfying (a)–(c). The idea is
to modify the process (Xt ) obtained in the preceding paragraph on a set of probability zero
to produce the desired continuous process. This is accomplished by the following general
theorem. The proof depends on the density of the dyadic rationals j/2n (j ≥ 0, n ≥ 1) in
[0, ∞).
18.7.1 Theorem. Let g(t) and h(t) be nonnegative even functions on some interval (−a, a)
that are increasing on (0, a) such that g is continuous at zero and the series
∞
X ∞
X
g(2−n ) and n2n h(2−n )
n=1 n=1
converge. Let (Xt ) be a stochastic process on a probability space (Ω, F, P ) that satisfies
P |Xt − Xs | ≥ g(t − s) ≤ h(t − s) whenever t, s ≥ 0 and |t − s| < a.
Then there exists a process (Yt ) on (Ω, F, P ) with continuous paths such that for any t,
Yt = Xt a.s. (the exceptional set depending on t). In particular, Y and X have the same
finite-dimensional distributions.
Proof. Set tn,j = j2−n (n = 1, 2, . . . , j = 0, 1, . . .). For each n and ω, imbed in the path
t → Xt (ω) a polygonal line Xn (·, ω) with vertices (tn,j , X(tn,j , ω)):
Xn (t, ω) := X(tn,j , ω) + 2n (t − tn,j ) X tn,j+1 , ω) − X(tn,j , ω) , tn,j ≤ t ≤ tn,j+1 .
This defines a sequence of processes Xn (·). Note that since tn+1,2j = tn,j ,
Xn (t, ω) = X(tn+1,2j , ω) + 2n (t − tn+1,2j ) X tn+1,2j+2 , ω) − X(tn+1,2j , ω) .
The figure below illustrates the idea. Here, A and B are consecutive points of the polygon
for the process Xn (·), and C is the interpolation point required to pass to the polygon for
the process Xn+1 (·).
Probability Theory 479
C = (tn+1,2j+1 , X(tn+1,2j+1 , ω))
FIGURE 18.2:
Since the processes Xn have continuous paths, Mn,j (ω) may be calculated as the supremum
over the rational interval [tn,j , tn,j+1 ] ∩ Q, hence Mn,j is measurable. Moreover, from (β),
P Mn,j ≥ g 2n+1 ≤ P X tn+1,2j+1 , ω) − X(tn+1,2j , ω) ≥ g(2n+1 )
+ P X tn+1,2j+1 , ω) − X(tn+1,2j+2 , ω) ≥ g(2(n+1) )
≤ 2h 1/2n+1 , (γ)
the last inequality holding for all n with 1/2n+1 < a. Now set
Mn (ω) := sup{Xn+1 (t, ω) − Xn (t, ω) : 0 ≤ t ≤ n}.
Sn2n −1
Since [0, n] = j=0 [tnj , tnj+1 ],
n
n2[−1
Mn ≥ g 1/2n+1 ⊆ Mn,j ≥ g 1/2n+1 ,
j=0
P∞
It follows from the hypothesis that the series n=1 P Mn ≥ g 1/2n+1 converges. By the
Borel-Cantelli lemma we then have
n o
P (A) = 0, where A := lim Mn ≥ g 1/2n+1 .
n
480 Principles of Analysis
Now, let ω ∈ Ac and b > 0. For any p ∈ N, t ∈ [0, b], and all sufficiently large n we have
p p
X X
Xn+p (t, ω) − Xn (t, ω) ≤ Xn+k (t, ω) − Xn+k−1 (t, ω) ≤ Mn+k−1 (ω)
k=1 k=1
Xp
n+k
≤ g 1/2 .
k=1
P∞
Since the series k=1 g 1/2k converges, the preceding inequality implies that the sequence
(Xn (t, ω)) is uniformly Cauchy on [0, b] and therefore converges uniformly on [0, b]. Now
define (
limn Xn (t, ω) ω 6∈ A,
Y (t, ω) =
0 ω ∈ A.
Then (Yt ) has continuous paths and Y (j/2n , ω) = X(j/2n , ω) for all ω ∈ Ac , n ∈ N, and
j ∈ Z+ .
It remains to show that Yt = Xt a.s. This is clear if t is a dyadic rational. For arbitrary
t, choose a sequence of dyadic rationals sn so that 0 ≤ t − sn < 2−n . Since g and h are
increasing,
P |X(sn , ω) − X(t, ω)| ≥ g(1/2n ) ≤ P |X(sn , ω) − X(t, ω)| ≥ g(t − sn )
≤ h(t − sn ) ≤ h(1/2n ).
P
Since the series n h(1/2n ) converges, by the Borel-Cantelli lemma again we have
P (B) = 0, where B := lim |X(sn , ω) − X(t, ω)| ≥ g(1/2n ) .
n
If ω 6∈ B, then, eventually, |X(sn , ω) − X(t, ω)| ≤ g(1/2n ) → 0, hence X(sn , ω) → X(t, ω).
Therefore, Y (t, ω) = X(t, ω) for ω ∈ (A ∪ B)c .
18.7.2 Theorem. Brownian motion exists.
Proof. Take X to be any process satisfying (a)–(c) of the definition of Brownian motion.
Define r !
1/4 2 1/4 −1
g(t) = |t| and h(t) = t exp p .
π 2 |t|
We show that these functions satisfy the hypotheses of 18.7.1. We may then apply that
theorem to obtain a continuous version of X, which is the desired Brownian motion.
The functions g and h are clearly even and increasing in t > 0. Moreover,
∞ ∞ ∞ r ∞ n/2
X X 1 X 2 X 3n/4 −2
n n n
g(1/2 ) = n/4
< ∞ and n2 h(1/2 ) = n2 exp < ∞.
n=1 n=1
2 n=1
π n=1
2
Making the substitution y = x|t − s|−1/2 , setting z := |t − s|−1/4 , and integrating by parts,
the left side of the inequality becomes
Z ∞ Z ∞ Z ∞
1 d 1 1
exp(−y 2 /2) dy = − exp(−y 2 /2) dy = exp(−z 2 /2) − 2
exp(−y 2 /2) dy
z z y dy z z y
1 2
≤ exp(−z /2),
z
which is (†).
S the set of all ω such that (†) holds for some s and for all n ≥ n(ω, s, a).
For a > 0, let Ea denote
Thus if ω 6∈ E := a∈Q+ Ea , then W (·, ω) is nowhere differentiable. S We show that there
exists a set Aa of probability zero such that Ea ⊆ Aa . Setting A := a∈Q+ Aa we then have
E ⊆ A, P (A) = 0, and W (·, ω) is nowhere differentiable for every ω ∈ Ac ⊆ E c , completing
the proof.
Set tj,n := j/2n and
Mn,k := max W (tk+j,n ) − W (tk+j−1,n ).
1≤j≤3
The increments in the definition are independent and have the distribution of W (1/2n ),
which is the same as that of 2−n/2 W (1), these being normally distributed with mean zero
and variance 1/2n . Thus if g denotes the standard normal density, then
Z 2n/2 ε !3
3 n/2
P (Mn,k ≤ ε) = P |W (1)| ≤ 2 ε ≤ g(t) dt ≤ (21+n/2 ε)3 .
−2n/2 ε
Now let ω ∈ Ea , so that (†) holds for some s and for all n > n(ω, s, a). We assume that
s > 0. (A separate, one-sided argument may be given for the case s = 0.) Then, for each
sufficiently large n, there exists k ≥ 0 such that tk+1,n ≤ s < tk+2,n . It follows that, for
0 ≤ j ≤ 3, |tk+j,n − s| < 1/2n−1 , hence, by (†), |W (tk+j,n , ω) − W (s, ω)| ≤ a/2n−1 . By the
triangle inequality, Mn,k (ω) ≤ a/2n−2 . Taking n > s we have k < k + 1 ≤ 2n s ≤ n2n , hence
Mn (ω) ≤ a/2n−2 . We have shown that
Ea ⊆ Aa := lim An , where An := {Mn ≤ a/2n−2 }.
n
n 1+n/2
By (‡), P (An ) ≤ n2 (2 a/2n−2 )3 → 0. It follows that P (Aa ) = 0, completing the
proof.
482 Principles of Analysis
A path t 7→ Wt (ω) is said to have bounded (unbounded) pth variation on [a, b] if the
(p)
quantities VP (ω), taken over all partitions P, form a bounded (unbounded) set of real
numbers. By 18.7.3 and 5.5.8 we have
18.7.4 Proposition. With probability one, the paths of Brownian motion have unbounded
first variation on every interval [a, b].
It may be shown that the paths of Brownian motion have unbounded p variation for all
p ≤ 2 [47]. This state of affairs is partially redeemed by the following important result.
(2)
18.7.5 Theorem. lim||P||→0 VP = b − a in L2 (P ).
Proof. Given a partition P as above, define
n
X
(2)
AP = VP (W ) − (b − a) = Dj , Dj := (∆Wtj )2 − ∆tj .
j=1
Since Zj is normal with mean zero and variance one, the quantity c := E(Zj2 − 1)2 is finite,
as may be verified by expressing c as an integral, using the standard normal density. We
now have
n−1
X
E(A2P ) ≤ c||P|| ∆tj = c||P||(b − a).
j=0
(2)
and the uniform continuity of the paths of W on [a, b] imply that limn VPn = 0 a.s.
(2)
The L2 limit lim||P||→0 VP is called the quadratic variation of Brownian motion on the
interval [a, b]. That Brownian motion has nonzero quadratic variation on any interval is a key
property of Brownian motion that accounts for some of the differences between stochastic
calculus, discussed below, and classical calculus.
Probability Theory 483
The continuous-time analogs of 18.5.2, 18.5.3, and 18.5.4 hold and are proved as before.
A martingale convergence theorem for continuous time is established below.
The following examples are taken relative to the natural (Brownian) filtration FW .
18.7.7 Examples. (a) Brownian motion (Wt ) is a martingale. Indeed, since Wt − Ws is
independent of FsW for all s ≤ t, E(Wt − Ws | FW
s ) = E(Wt − Ws ) = 0.
(b) The process Wt2 − t t≥0 is a martingale: For 0 ≤ s ≤ t write
Taking conditional expectations and using linearity and the factor and independence proper-
ties yields
E(Wt2 | FW 2 2 2
s ) = E(Wt − Ws ) + 2Ws E(Wt − Ws ) + Ws = t − s + Ws .
(c) The exponential process exp(aWt − a2 t/2) t≥0 is a martingale. This follows from the
calculation (for t > s)
2
E(eaWt | FWs )=e
aWs
E(eaWt −aWs | FW s )=e
aWs
E(ea(Wt −Ws ) ) = eaWs +a (t−s)/2
.
√
The last equality is seen as follows: Set σ = t − s. Then
Z ∞
a(Wt −Ws ) 1 x2
E(e )= √ exp ax − 2 dx
σ 2π −∞ 2σ
Z 2
exp(a2 σ 2 /2) ∞
1 x−a
= √ exp − dx
σ 2π −∞ 2 σ
= exp(a2 σ 2 /2). ♦
Proof. For fixed m, let R =: {r1 < · · · < rn } be a finite sequence of rationals contained in
R
[0, m] and let U[a,b] be the number of upcrossings of Xr1 , . . . , Xrn of [a, b]. By the upcrossing
lemma,
R 1
E U[a,b] ≤ E |Xm | + 2|a| .
b−a
484 Principles of Analysis
m R
Let U[a,b] denote the supremum of U[a,b] over all sets R. A sequence (Rk ) of such sets
Rk
increases to [0, m] ∩ Q, hence U[a,b] m
↑ U[a,b] . By the monotone convergence theorem,
m 1 1
E U[a,b] ≤ E |Xm | + 2|a| ≤ sup kXt k1 + 2|a| < ∞.
b−a b−a t
m
Now let U[a,b] = supm U[a,b] . By the monotone convergence theorem again,
1
E U[a,b] ≤ sup kXt k1 + 2|a| .
b−a t
In particular, U[a,b] is finite a.s. Set
[
Sa,b := {ω : lim Xt (ω) < a < b < lim Xt (ω)} and S := Sa,b
t→∞ t→∞
a,b∈Q,a<b
For ω ∈ Sa,b , there exists a strictly increasing sequence (rn ) in Q+ tending to ∞ such that
implying that U[a,b] (ω) = ∞. Since U[a,b] is finite a.s., P (Sa,b ) = 0 and so P (S) = 0. One
argues as in the discrete case that X∞ (ω) := limt→∞ Xt (ω) exists in R for each ω ∈ S c .
Setting X∞ = 0 on S and letting t → ∞ through Q, we may apply Fatou’s lemma as in the
discrete case to obtain X∞ ∈ L1 (Ω, F∞ , P ).
The continuous time analogs of 18.5.11–18.5.14 are valid. The proofs follow from 18.7.8
in much the same way as before. The continuous time notion of reversed martingale may be
formulated as in the discrete case, and a martingale convergence theorem may be proved in
this setting.
where ξj−1 is Ftj−1 -measurable and ξj−1 ∈ L2 (Ω). The Ito integral of f on [a, b] is defined
by
Z b n
X
Iab (f ) = f (t) dW (t) := ξj−1 ∆W (tj ), where ∆W (tj ) := W (tj ) − W (tj−1 ).
a j=1
Probability Theory 485
Note that by refining the partition in (18.16) one still has an Ito step process, and the Ito
integral is unchanged. For example, if a point s is inserted into (a, t1 ), then
n
X
f (t, ω) = ξ0 (ω)1[a,s) (t) + ξ0 (ω)1[s,t1 ) (t) + ξj−1 (ω)1[tj−1 ,tj ) (t).
j=2
In particular, if g(t, ω) is another Ito step process, then, by taking the common refinement
of the partitions, one may assume that f and g are defined on the same partition. It follows
that the collection S[a, b] of all Ito step processes on [a, b] is a linear space and Iab is a linear
map on S[a, b]. Moreover, by Fubini’s theorem,
Z n
X
2
kf kL2 ([a,b]×Ω) = E(ft2 ) dt = 2
(tj − tj−1 )E(ξj−1 ) < ∞, (18.17)
[a,b] j=1
E(ξj−1 ∆W (tj ) | Ftj −1 ) = ξj−1 E(∆W (tj | Ftj −1 ) = ξj−1 E(∆W (tj )) = 0.
Here we have used the independence and factor properties of conditional expectation and
the fact that Brownian increments have mean zero. Taking expectations yields
n
X
E Iab (f ) = E ξj−1 ∆W (tj ) = 0.
j=1
hence
X n
X
E[Iab (f )]2 = E ξi−1 ξj−1 ∆W (ti )∆W (tj ) + 2
E ξj−1 [∆W (tj )]2 . (α)
i6=j j=1
If i < j, then, by conditioning and using the factor and independence properties again, we
have
E ξi−1 ξj−1 ∆W (ti )∆W (tj ) = E E(ξi−1 ξj−1 ∆W (ti )∆W (tj | Ftj−1 )
= E ξi−1 ξj−1 ∆W (ti )E(∆W (tj ) | Ftj−1 )
= E ξi−1 ξj−1 ∆W (ti ) E ∆W (tj )
= 0. (β)
Similarly,
2
E ξj−1 [∆W (tj )]2 = E E(ξj−1
2
[∆W (tj )]2 | Ftj−1 ) = E ξj−1
2
E([∆W (tj )]2 | Ftj−1 )
2
= E(ξj−1 )(tj − tj−1 ). (γ)
486 Principles of Analysis
By (18.18), the mapping Iab : S[a, b] → L2 (Ω) extends to an isometry on the closure
cl S[a, b] of S[a, b] in L2 ([a, b] × Ω). This defines the Ito integral for functions f ∈ cl S[a, b]:
Z b Z b Z b
Iab (f ) = f dW = f (t) dW (t) := lim fn (t) dW (t), f ∈ cl S[a, b],
a a n a
where (fn ) is a sequence in S[a, b] such that fn → f in L2 ([a, b] × Ω), that is,
Z
E[fn (t) − f (t)]2 dt → 0.
[a,b]
Proof. Assume first that f ∈ S[a, b], as given in 18.16. If c ∈ [tk − 1, tk ), then
Z b n
X
f (t) dW (t) = ξj−1 1[tj−1 ,tj ) (t)ξj−1 ∆W (tj ) + ξk−1 [W (c) − W (tk−1 )]
a j<k
n
X
+ ξk−1 [W (tk ) − W (c))] + ξj−1 1[tj−1 ,tj ) (t)ξj−1 ∆W (tj )
j>k
Z c Z b
= f (t) dW (t) + f (t) dW (t).
a c
R
In the general case, let fn ∈ S[a, b] such that [a,b] E(fn (t) − f (t))2 dt → 0. Then clearly
R
fn [a,c] and fn [c,b] are Ito step functions, and both [a,c] E(fn (t) − f (t))2 dt → 0 and
R
[c,b]
E(fn (t) − f (t))2 dt → 0, hence
Z b Z b Z c Z b Z c Z b
f dW = lim fn dW = lim fn dW + lim fn dW = f dW + f dW.
a n a n a n c a c
Probability Theory 487
The following proposition shows that in certain circumstances the Ito integral is a limit
of Riemann-Stieltjes sums.
18.8.3 Proposition. Let f ∈ L2 ([a, b] × Ω) such that ft is Ft -measurable and the mapping
(s, t) → E(fs ft ) is continuous. Then f ∈ cl S[a, b] and
Z b n
X
f (t) dW (t) = lim f (tj−1 )∆W (tj ), where P := {a = t0 < t1 < · · · < tn }.
a kPk→0
j=1
Proof. Define an Ito step process fP by fP (t, ω) = f (tj−1 , ω) (tj−1 < t ≤ tj ), where the
(tj−1 , tj ] are the intervals of the partition P. Let (Pn ) be any sequence of partitions with
kPn k → 0 and set fn := fPn . From the calculation
we see that lims→t E[f (t) − f (s)]2 = 0. Since kPn k → 0 it follows that
By continuity the supremum is finite, so we may apply the dominated convergence theorem
to conclude that Z b
lim E |f (t) − fn (t)|2 dt = 0,
n a
2
Rb Rb
that is, fn → f in L ([a, b] × Ω). Therefore, f ∈ cl S[a, b] and a
fn dW → a
f dW . Since
the sequence (Pn ) was arbitrary, the conclusion follows.
18.8.4 Example. Let P = {a = t0 < t1 < · · · < tn = b} be an arbitrary partition of [a, b].
By direct expansion
n−1
X (2)
W (tj−1 )∆Wtj = 1
2 W 2 (b) − W 2 (a) − VP (W ) .
j=0
Then
Z t X n
E f dW Fs = E ξj−1 ∆W (tj ) | Fs
s j=1
Xn
= E E(ξj−1 ∆W (tj ) | Ftj −1 ) | Fs
j=1
n
X
= E ξj−1 E(∆W (tj ) | Ftj −1 ) | Fs .
j=1
The last sum is zero since, as noted earlier, E(∆W (tj ) | Ftj −1 ) = 0.
Rt
For a general f , let fn ∈ S[s, t] such that s E |fn (u) − f (u)|2 du → 0. By the first
paragraph, Z t Z t
E f dW Fs = E (f − fn ) dW Fs . (†)
s s
It may be shown that almost all paths of the integral process X are continuous. (See, for
example, [29].)
Here σ and µ are constants called, respectively, the volatility and drift of the stock. The
integral equation is frequently written as a stochastic differential equation
dS
dS = σS dW + µS dt or = σ dW + µ dt. (18.22)
S
The latter form expresses the fact that the relative change in the stock price has a deterministic
part µ dt, which accounts for the general trend of the stock, and a component σ dW , which
reflects the random nature of the stock.
The solution of (18.21) may be shown to be the geometric Brownian motion process
St = S0 exp σWt + µ − 12 σ 2 t . (18.23)
Note that because of the relationship between St and Wt , Ft = σ(Ss : 0 ≤ s ≤ t). Thus the
Brownian filtration (Ft ) reveals stock price information. We show how these facts lead to a
formula for the price of an option.
Self-Financing Portfolios
The key to determining the value of an option is the construction of a self-financing
portfolio based on the stock and a risk-free bond. Assuming that the bond earns interest
at a continuously compounded annual rate r and that the initial value of the bond is one
dollar, the value of the bond at time t is seen to be Bt := ert . Now let φ and θ be stochastic
process adapted to the filtration (Ft ), these representing, respectively, the number of dollar
bonds and number of shares of the stock held at time t. The value of the portfolio at time t
is the random variable
Vt = φt Bt + θt St , 0 ≤ t ≤ T,
where V0 is the initial investment in the portfolio, assumed to be a constant. The portfolio
is said to be self-financing if
dV = φ dB + θ dS, (18.24)
where the differentials represent small changes. The equation may be best understood by
considering a discrete version at times t0 = 0 < t1 < t2 < · · · < tn = T . At time tj , the
value of the portfolio before the price Sj is known is
φj Bj−1 + θj Sj−1 ,
where we write Sj for Stj , etc. After Sj becomes known and the new bond value Bj is noted,
the portfolio has value
Vj = φj Bj + θj Sj .
At this time, stocks and bonds may be bought and sold (based on the information provided
by Ftj ). For the portfolio to be self-financing, this rebalancing must not change the current
value of the portfolio. Thus the new values φj+1 and θj+1 must satisfy
φj+1 Bj + θj+1 Sj = φj Bj + θj Sj .
490 Principles of Analysis
It follows that
Call Options
A call option based on a stock is a contract made between two parties, the buyer (holder)
of the option and the seller (writer) of the option. The contract requires the writer to offer to
sell the stock to the holder at a future time T for a predetermined amount K. At this time,
the holder may or may not decide to exercise the option. Thus the payoff for the holder is
(ST − K)+ . A self-financing portfolio may be used by the writer as a hedging strategy, that
is, an investment in shares of the stock and units of the bond devised to exactly cover the
writer’s obligation at maturity T . In this case, the portfolio is said to replicate the option.
The writer initiates the portfolio with an amount V0 , the price of the option (cost to the
holder). Here, V0 is chosen so that VT = (ST − K)+ , which is the cost to the writer of the
transaction. The law of one price (in an arbitrage-free market) then asserts that V0 is the
fair price of the option.
e given by
Now form the discounted price process S,
Set := e−rt St = S0 exp σWt∗ − 12 σ 2 t , 0 ≤ t ≤ T.
By 18.7.7(c), Set is a P ∗ -martingale. One may show, as a consequence, that the discounted
value process Ve , given by
Vet := e−rt Vt ,
is also a P ∗ -martingale. This implies the key fact E ∗ Vet is constant in t. In particular,
V0 = E ∗ V0 = E ∗ VeT = e−rT E ∗ VT .
Since the portfolio value VT is assumed to be the payoff to the holder of the option,
where ϕ is the standard normal density. From (18.26) and (18.27) we see that the price of
the option is given by the formula
Z ∞ n √ o +
V0 = e−rT S0 exp σ T y + (r − 21 σ 2 )T − K ϕ(y) dy. (18.28)
−∞
A more succinct formula for the option price may be obtained as follows. Define
Appendices
Appendix A
Change of Variables Theorem
for all Borel measurable functions f : V :→ [0, +∞]. Indeed, if this inequality holds for all f
and ϕ, then switching the roles of U and V we also have
Z Z
d
g dλ ≤ (g ◦ ϕ−1 )|Jϕ−1 | dλd
U V
for all Borel measurable g : U :→ [0, +∞]. Taking g = (f ◦ ϕ)|Jϕ | and recalling that
Jϕ Jϕ−1 = 1, we obtain the reverse of inequality (A.2). Finally, by the standard arguments, it
suffices to verify (A.2) for indicator functions f = 1B , where B ∈ B(V ). Then (A.2) reduces
to Z
λd (B) ≤ |Jϕ | dλd , B ∈ B(V ).
ϕ−1 (B)
The proof of (A.3) is accomplished by a sequence of lemmas. The first treats the case of
a linear change of variable.
A.0.2 Lemma. If T : Rd → Rd is linear and nonsingular, then
λd (T (E)) = | det T |λd (E), E ∈ B(Rd ). (A.4)
Proof. Since T is a homeomorphism, T (E) ∈ B(Rd ), so the left side of (A.4) is defined.
Furthermore, if (A.4) holds for T1 and T2 , then
λd T1 T2 (E) = | det T1 |λd T2 (E) = | det T1 | | det T2 |λd (E) = | det(T1 T2 )|λd (E).
495
496 Principles of Analysis
T (x1 , x2 , x3 , . . . , xn ) = (x1 + x2 , x2 , x3 , . . . , xn ),
Since det T = 1, (A.4) holds in case (c). Therefore (A.4) holds for all nonsingular T and all
bounded intervals E.
Now let I be a fixed bounded interval and let GI denote the collection of all E ∈ B(Rd )
for which
λd (T (E ∩ I)) = | det T |λd (E ∩ I). (†)
By the first part of the proof, GI contains the collection I all intervals of Rd . We show that
GI is a λ-system (see 1.5). Let A, B ∈ GI with A ⊆ B, and set C = A ∩ I and D = B ∩ I.
Then (B \ A) ∩ I = D \ C and
λd T (D \ C) = λd T (D) − λd T (C) = | det T | λd (D) − λd (C) = | det T |λd (D \ C),
For the next lemma, recall that dfx : Rd → Rd denotes the differential of a function
f : U → Rd at x, that is, the linear operator whose matrix is the Jacobian matrix of f
evaluated at x.
A.0.3 Lemma. Let f : U → Rd be C 1 and let K ⊆ U be compact and convex. Then
M := supz∈K kdfz k < ∞ and kf (x) − f (y)k ≤ M kx − yk for all x, y ∈ K.
Change of Variables Theorem 497
Taking u = f (x) − f (y) and using the CBS and operator norm inequalities, we have
|f (x) − f (y)|2 = f (x) − f (y) · dfc (x − y) ≤ M |f (x) − f (y)| |x − y|.
For the remaining lemmas, we use the following terminology and notation: The cube with
center y ∈ Rd and edge r > 0 is the half-closed interval
e
kψ(x) e
− ψ(y)k ≤ ckx − yk for all x, y ∈ Q.
Thus, if Q has center x0 and edge r, then recalling (A.5) we have for all x ∈ Q,
√
e
kψ(x) − ψ(x0 )k ≤ kψ(x) e 0 )k + kx − x0 k ≤ (c + 1)kx − x0 k ≤ 1 (c + 1)r d.
− ψ(x 2
√
Thus ψ(Q) is contained in the closed ball C with center ψ(x0 ) and radius 12 (c + 1)r d. Since
C is contained in the cube with center ψ(x0 ) and edge (c + 1)dr, we have
λd ψ(Q) ≤ [(c + 1)dr]d = [(c + 1)d]d λd (Q).
We call a finite collection Qr of pairwise disjoint cubes with edge r that covers a subset A
of Rd a paving of A. Pavings Qr = {Qr (xj ) : 1 ≤ j ≤ m} and Qs = {Qs (xj ) : 1 ≤ j ≤ m}
with the same centers are said to be concentric. Clearly, any bounded set has a paving Qr
with arbitrarily small r.
A.0.5 Lemma. Let K ⊆ U be compact. Then, for all sufficientlySsmall δ and each 0 < r < δ,
there exists a compact set Kδ and a paving Qr of K with K ⊆ Qr ⊆ Kδ ⊆ U .
√
Proof. Since K is compact and U c is closed, d(U c , K) > 0. For 0 < δ < d(U c , K)/ d, let
√
Kδ = {x : d(x, K) ≤ δ d}.
Then Kδ is compact and K ⊆ Kδ ⊆ U . Let 0 < r < δ and let Q be a cube with edge r. If
x ∈ Q ∩ K and y ∈ Q ∩ Kδc , then
√ √
δ d < d(y, K) ≤ |x − y| ≤ r d.
Proof. Let ε > 0 and choose δ > 0 as in A.0.6. By uniform continuity of Jϕ (x) on Kδ , there
exists δ1 < δ such that
Choose pavings Qr = {Qr (y)}y and Qdr = {Qdr (y)}y as in A.0.6. For x ∈ Qdr (y) we have
|Jϕ (y)| ≤ |Jϕ (x) − Jϕ (y)| + |Jϕ (x)| < ε + |Jϕ (x)|, hence, applying (A.6),
Z
(1 + ε)−d λd ϕ(Qr (y)) ≤ |Jϕ (y)|λd (Qdr (y)) ≤ |Jϕ (x)| + ε dx.
Qdr (y)
Therefore,
Z
−d d
X
(1 + ε) λ ϕ(K) ≤ (1 + ε)−d λd ϕ(Qr (y)) ≤ |Jϕ (x)| + ε dx
y Kδ
Z
≤ |Jϕ (x)| dx + ε 1 + λd (Kδ ) ,
K
as required. This completes the proof of the change of variables theorem for the case f Borel.
Now let f ≥ 0 be Lebesgue measurable on V . Then f = g on V \ E, where g ≥ 0 is Borel
measurable, E ⊆ V , and λd (E) = 0. By the first part of the proof,
Z Z
g(y) dy = (g ◦ ϕ)(x)|Jϕ (x)| dx.
V U
R
But theR left side equals V f (y) dy, and since f ◦ ϕ = g ◦ ϕ on U \ ϕ−1 (E) the right side
equals U (f ◦ ϕ)(x)|Jϕ (x)| dx provided we can show that
λd (ϕ−1 (E)) = 0.
To verify this, suppose first that E is bounded. Then E ⊆ K for a compact interval K with
λd (K) arbitrarily small. Applying A.0.1 “in reverse,” we have
Z Z
h dλd = (h ◦ ϕ−1 )|Jϕ−1 | dλd
U V
Since the right side may be made arbitrarily small, λd (ϕ−1 (E)) = 0. If E is unbounded, take
a sequence of bounded set En of measure zero with En ↑ E.
Appendix B
Separate and Joint Continuity
In this appendix we prove the following theorem, which is used in Chapter 17 to establish
joint continuity of multiplication in certain algebraic structures.
B.0.8 Theorem. Let X and Y be topological spaces with X locally compact or a complete
metric space and Y compact Hausdorff. If f : X × Y → C is bounded and separately
continuous, then there exists a dense Gδ subset A of X such that f is jointly continuous at
every point of A × Y .
The proof is based on the following lemmas. For these, we assume the hypotheses of
the theorem, except we allow X to be an arbitrary topological space. We shall need the
functions F : X → C(Y ) and G : X → R+ defined by
F (x) = f (x, ·) and G(x) = inf sup{kF (x0 ) − F (x00 )k∞ : x0 x00 ∈ U },
U
Kr := {x ∈ X : d(F (x), K) ≤ r}
501
502 Principles of Analysis
Proof. Let x0 ∈ Krc and r < s < t < d(F (x0 ), K). For any g ∈ K, because Y is compact
there exists y0 ∈ Y such that
d(F (x0 ), K) = min{kF (x0 ) − hk∞ : h ∈ K} ≤ kF (x0 ) − gk∞ = |f (x0 , y0 ) − g(y0 )|.
Therefore, |f (x0 , y0 ) − g(y0 )| > t, hence, by separate continuity of f , there exists a neighbor-
hood Ug of x0 such that
|f (x, y0 ) − g(y0 )| > t ∀ x ∈ Ug .
Thus if h is in the ball Bt−s (g) in C(Y ) and x ∈ Ug , then
kF (x) − hk∞ ≥ kF (x) − gk∞ − kg − hk∞ ≥ |f (x, y0 ) − g(y0 )| − kg − hk∞ > s. (†)
S
Now, by compactness of K, there exist g1 , . . . , gn ∈ K such that K ⊆ j Bt−s (gj ). Therefore,
by (†) \
kF (x) − hk∞ > s > r ∀ h ∈ K and x ∈ U := Ugj .
j
Taking the infimum of all h shows that the neighborhood U of x0 is contained in Krc .
Therefore, Kr is closed.
Now assume that X is a Baire space and G ≥ ε on X. Since K is compact, we may
cover K with closed balls Cs (g1 ), . . . , Cs (gk ), where gj ∈ K and s = ε/4. It follows that for
Sk
r = ε/12, {h ∈ C(Y ) : d(h, K) ≤ r} ⊆ j=1 Cs (gj ) and so
k
[
Kr ⊆ F −1 {h ∈ C(Y ) : d(h, K) ≤ r} ⊆ F −1 Cs (gj ) .
j=1
By the first paragraph, Kr is closed, as are the sets F −1 Cb (gj ) (take K = {gj }). Since X
is a Baire space, if int Kr 6= ∅, then U := int F −1 Cs (gj ) =6 ∅ for some j. It follows that
kF (x0 ) − F (x00 )k∞ ≤ 2s (x0 , x00 ∈ U ) and so G(x) ≤ 2s = ε/2 (x ∈ U ), contradicting the
hypothesis.
B.0.12 Lemma. Let (xn ) be a sequence in X such that every subsequence has a cluster
point in X. If x0 is a cluster point of (xn ), then F (x0 ) is in the norm-closed convex hull of
the set {F (xn ) : n ∈ N}.
Proof. We show first that the set S := {F (xn ) : n ∈ N} is relatively sequentially compact in
the topology p of pointwise convergence in C(Y ). To see this, (gk ) be a sequence in S. If (gk )
has infinitely many distinct terms, then it has a subsequence that is in fact a subsequence
of (F (xn )). Since F is clearly p-continuous, the hypothesis on (xn ) implies that (gk ) has a
subsequence that p-converges to some g ∈ C(Y ). On the other hand, if (gk ) has only finitely
many distinct terms, then it has a constant subsequence, and the same conclusion holds.
By 14.1.4, S is relatively w-compact in C(Y ), hence the weak and pointwise closures of S
coincide. Since F (x0 ) is in the pointwise closure and since the norm and weak closures of
co S are the same, the conclusion of the lemma follows.
The proof of B.0.8 is based on the following “game” on a topological space X. The game
has two players, α and β. Player β starts the game by choosing a nonempty open set U1 .
Player α then chooses a nonempty open set V1 ⊆ U1 and a point x1 ∈ V1 . Next, player β
chooses a nonempty open set U2 ⊆ V1 . In general, move n of β is the choice of an open
set Un ⊆ Vn−1 , and α’s subsequent move n is the choice (Vn , xn ), where Vn is open and
xn ∈ Vn ⊆ Un . In this way we obtain two decreasing sequences (Un ) and (Vn ) of open
sets and a sequence (xn ) of points in X. Player α wins the game (and defeats β) if every
Separate and Joint Continuity 503
T∞ T∞
subsequence of (xn ) has a cluster point in the common intersection n=1 Un = n=1 Vn .
A strategy for α is a rule that governs each of α’s moves based only on the immediately
preceding move of β. A winning strategy for α is a strategy that results in the defeat of
β no matter how β moves. A topological space X for which a winning strategy for α exists
is called α-favorable.
B.0.13 Proposition. (a) A complete metric space is α-favorable.
(b) A locally compact Hausdorff space is α-favorable.
[1] R. Ash and C. Doleans-Dade, Probability and Measure Theory, 2nd Ed., Academic
Press, San Diego, 2000.
[2] G. Bachman, and L. Narici, Functional Analysis, Academic Press, New York, 1966.
[3] R. Baire, Sur les fonctions de variables réelles, Ann. di Mat. 3, 1–123, 1899.
[4] J. Berglund, H. Junghenn, and P. Milnes, Analysis on Semigroups: Function Spaces,
Compactifications, Representations, Wiley, New York, 1988.
[5] P. Billingsly, Probability and Measure, Wiley, New York, 1979.
[6] H. Brezis, Functional Analysis, Sobolev Spaces, and Partial Differential Equations,
Springer-Verlag, New York, 2011.
[7] J.P.R. Christensen, Joint continuity of separately continuous functions, Proc. Amer.
Math. Soc. 82, 455–461, 1981.
[8] J. Clarkson, Uniformly convex spaces, Trans. Amer. Math. Soc. 40, 415–420, 1936.
[9] J. Conway, A Course in Functional Analysis, Springer-Verlag, New York, 1990.
505
506 References
[19] L. Fejer, Beispiele stetiger Funktionen mit divegenter Fourierreihe, J. Reine Angew.
Math. 137, 1–5, 1910.
[20] G. Folland, Real Analysis. Modern Techniques and Their Applications, 2nd Ed. John
Wiley & Sons, New York, 1999.
[21] G. Folland, A Course in Abstract Harmonic Analysis, CRC Press, Boca Raton, 1995.
[22] P. Halmos, Lectures on Ergodic Theory, Chelsea, New York, 1956.
[23] P. Halmos, Naive Set Theory, Springer-Verlag, New York, 1994.
[24] H. Junghenn, Option Valuation: A First Course in Financial Mathemtics, CRC Press,
Boca Raton, 2012.
[25] H. Junghenn, A Course in Real Analysis, CRC Press, Boca Raton, 2015.
[26] J. Kindler, A simple proof of the Daniell-Stone representation theorem, Amer. Math.
Monthly 90, 396–397, 1983.
[27] I. Kluvnek and G. Knowles, Vector measures and control systems, North-Holland
Mathematics Studies 20, North-Holland, New York, 1976.
[28] E. Kreyszig, Introduction to Functional Analysis with Applications, John Wiley & Sons,
New York, 1978.
[29] H. Kuo, Introduction to Stochastic Integration, Springer-Verlag, New York, 2006.
[30] S. Lang, Real and Functional Analysis, 3rd Ed., Springer-Verlag, New York, 1993.
[31] J. D. Lawson, Joint continuity in semitopological semigroups, Illinois J. Math 18,
275–285, 1974.
[32] J. D. Lawson, Additional notes on continuity in semitopological semigroups, Semigroup
Forum 12, 265–280, 1976.
[33] P. Lax, Functional Analysis, 3rd Ed., Wiley Interscience, John Wiley & Sons, 2002.
[34] L. Loomis, An Introduction to Abstract Harmonic Analysis, D. Van Nostrand, Princeton,
1953.
[35] I. Namioka, Separate continuity and joint continuity, Pacific J. Math, 51, 515–531,
1974.
[36] G. Pedersen, Analysis Now, Springer-Verlag, New York, 1995.
[37] R. Phelps, Lectures on Choquet’s Theorem, 2nd Ed., Lecture Notes in Mathematics
1757, Springer-Verlag, New York, 2001.
[38] I. Rana, An Introduction to Measure and Integration, 2nd Ed., Graduate Studies in
Mathematics Vol. 45, AMS, Providence, 2002.
[39] J. Ringrose, A note on uniformly convex spaces, J. London Math. Soc. 34, p.92, 1959.
[40] M. Rosenblum, On a theorem of Fuglede and Putnam, J. London Math. Soc. 33, 376–377,
1958.
[41] W. Ruppert, Compact Semitopological Semigroups: An Intrinsic Theory, Lecture Notes
in Mathematics 1079, Springer-Verlag, New York, 1984.
References 507
[42] S. Saeki, A proof of the existence of infinite product probability measures, Amer. Math.
Monthly Vol. 103, No. 8, 682-683, Oct. 1996.
[43] S. Shreve, Stochastic Calculus for Finance, Springer-Verlag, New York, 2004.
[44] I. Singer, Bases in Banach Spaces I, Springer-Verlag, Heidelberg, 1970.
[45] C. Swartz, An Introduction to Functional Analysis, Marcel Dekker, New York, 1992.
[46] M. Taylor, Measure Theory and Integration, Graduate Studies in Mathematics Vol. 76,
American Mathematical Society, Providence, 2006.
[47] S. Taylor, Exact asymptotic estimates of Brownian path variation, Duke Math. J. Vol.
39, No. 2, 219–241, 1972.
[48] F. Treves, Topological Vector Spaces, Distributions, and Kernels, Academic Press, New
York, 1967.
List of Symbols
Convergence
T a.e. µ a.u. Lp v
T -limα xα = x, xα → x, 21; fn → f , fn → f , fn → f , 85; fn → f , 131; µn → µ, 191;
w w∗
xα → x, 257; xα → x, 262.
Functions
idX , ιA : A ,→ X, δi,j , 1A , x+ , x− , 5; Re z, Im z, z, |z|, sgn(z), xb, δx , 6; f + , f − , f1 ∨· · ·∨fn ,
f1 ∧ · · · ∧ fn , supn fn , inf n fn , limn fn , limn fn , Re f , Im f , f and |f |, 6; xα , 172; ∆(x), 392;
f (A), f −1 (B), 5.
Function Spaces
B(X), 16; Cb (X), C(X), 25; Cc (X), 34; C0 (X), 35; C k (U ), C ∞ (U ), Cck (U ), Cc∞ (U ), 36;
Lp (X, F, µ), 123; L∞ (X, F, µ), 126; L0 (X, F, µ), 245; BV (I), 160; AC(I), 164; S = S(Rd ),
∞
175; A(D), 201; `p (N), 127; `p (Z), c00 , c0 , c, 200; CK (U ), 369; D(U ), 370; Lpk (U ), 380; P(G),
402; G, 411; W AP (S), 424; AP (S), 429; SAP (S), 433.
b
Measure
σ(A), ϕ(A), B(X), 45; OI , CI , HI , 46; B(R), 46; A1 × · · · × Ad , F1 ⊗ · · · ⊗ Fd , 46; (X, F, µ)
51; δx , µE , 52; (X, Fµ , µ), M(µ∗ ), 54; µE , 52; µ∗ , 56; M(µ∗ ), 56; λ, λd , 63; F/G, 75; T (µ),
hµ, h dµ, 96; µ ⊗ ν, 112,188; µ⊗ν, 189; µ1 ⊗ · · · ⊗ µd , 114; µi1 ⊗ · · · ⊗ µin , 189; ν ⊥ η, 140; µ+ ,
µ− , 140; |µ|, 140,144; µr , µi , 143; M (X, F), 146; ν µ, 148; dµdν
, 149; D(µ; x, r), D(µ; x, r),
D µ, D µ, D µ, 154; VI,P (f ), VI (f ), 159; Tf , 161; Mra (X), 182.
Metric Spaces
d(x, y), (X, d), 10; Br (x), Cr (x), Sr (x), 11; int(E), cl(E), bd(E), 12,19; d(A, B), d(x, A),
d(E), 14.
Integration
R R R R R Rb
f dµ, 91; S(f, P), S(f, P), a f ,
R
f, f dµ, E
f (x) dµ(x), E
f (x)µ(dx), f dF , 89; E
Rb Rb
f , a f , 101; kPk, 102; S(f, P, ξ), 103; X×Y f (x, y) d(µ⊗ν)(x, y), Y X f (x, y) dµ(x) dν(y),
R R R
a RR
112; f (x1 , . . . , xd ) dµ1 (x1 ) . . . dµd (xd ), 114.
509
510 List of Symbols
Probability
(Ω, F, P ), 443; E(X), 443; V (X), σ(X), cov(X, Y ), φX , FX , fX , PX , 444; N∞P(X1 ,...,Xn ) ,
Q∞ PX1 ⊗N
446; · · · ⊗ P Xn N, 447; PX1 ∗ · · · ∗ PXn , 447; E(X|G), E(X|Y ), 448; n=1 Fn , 450;
∞ ∞
n=1 Ω n , n=1 F n , n=1 P n , 452; P X1 ⊗X2 ⊗··· , 452; Ω, F, (F n ), P , 464; (Xn , Fn ), 464;
Xτ , Fτ , 467; U[a,b]
n
, U[a,b] , 468; P(i1 ...in ) , 472; (S n , Fn ), 473; FI , 473; Ω, F, (Ft )t≥0 , P , 477;
(p) Rb
W = (W (t))t≥0 , 477; VP , 482; FW , 483; ∆W (tj ), 484; Iab (f ), a f (t) dW (t), 484,486; (St ),
(Vt ), 489.
Sets
S
A ∩ B, A, P(X), 1; An ↑ A, An ↓ A, 2; N, Z, Q, R, C, D, T, 2; Z+ , R+ , K, Rd , Cd , Kd , R,
K, 3; A∗ , 3; Y X , 4; A 4 B, 44; limn An , limn An , 44; span A, [a : b], co A, cobal A, 8; ker ϕ,
7; ker T , 9; V/U 10; ex K, 348.
Topological Spaces
N(x), 20; B(x), 20; Fσ , Gδ , 25; (X∞ , T ∞ ), 35; supp(f ), 34; K(f, ε), 35; Cx , 39; βS, 328.
Index
511
512 Index
Cauchy, 261
integrable, 357
measurable, 357
sequentially complete, 261
weakly almost periodic
compactification, 426
function, 424
semigroup of operators, 437
vector, 437
weakly continuous operator, 228
Wiener process, see Brownian motion