Sie sind auf Seite 1von 541

Principles of Analysis

Principles of Analysis
Measure, Integration, Functional Analysis,
and Applications

Hugo D. Junghenn
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2018 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20180324

International Standard Book Number-13: 978-1-4987-7328-7 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity
of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of
users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Names: Junghenn, Hugo D. (Hugo Dietrich), 1939- author.


Title: Principles of real analysis : measure, integration, functional
analysis, and applications / Hugo D. Junghenn.
Description: Boca Raton : CRC Press, Taylor & Francis Group, 2018. | Includes
bibliographical references and index.
Identifiers: LCCN 2017061660 | ISBN 9781498773287
Subjects: LCSH: Functions of real variables--Textbooks. | Mathematical
analysis--Textbooks.
Classification: LCC QA331.5 .J86 2018 | DDC 515/.8--dc23
LC record available at https://lccn.loc.gov/2017061660

Visit the e-resources at: https://www.crcpress.com/9781498773287


Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
TO MY CHILDREN

Katie and Patrick

AND TO MY WIFE

Mary

AS ALWAYS
Contents

Preface xix

0 Preliminaries 1
0.1 Sets . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1
Set Operations . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1
Number Systems . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 2
Relations . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 3
Functions . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 4
Cardinality . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.2 Algebraic Structures . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
Semigroups and Groups .
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
Linear Spaces . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 8
Linear Transformations .
. . . . . . . . . . . . . . . . . . . . . . . . . . 9
Quotient Linear Spaces . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 10
Algebras . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.3 Metric Spaces . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 10
Open and Closed Sets . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 11
Interior, Closure, and Boundary . . . . . . . . . . . . . . . . . . . . . . . 12
Sequential Convergence. Completeness . . . . . . . . . . . . . . . . . . . 12
Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.4 Normed Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Norms and Seminorms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Completion of a Normed Space . . . . . . . . . . . . . . . . . . . . . . . 16
Infinite Series in Normed Spaces . . . . . . . . . . . . . . . . . . . . . . 16
Unordered Sums in Normed Spaces . . . . . . . . . . . . . . . . . . . . . 17
Bounded Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 18
Banach Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
0.5 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Neighborhood Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Neighborhood Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Relative Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
0.6 Continuity in Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . 23
Definition and General Properties . . . . . . . . . . . . . . . . . . . . . . 23
Initial Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Product Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Final Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Quotient Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

vii
viii Contents

The Space of Continuous Functions . . . . . . . . . . . . . . . . . . . . . 25


F -sigma and G-delta Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 25
0.7 Normal Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Urysohn’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Tietze Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 27
0.8 Compact Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 27
Convergence in Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . 28
Compactness of Cartesian Products . . . . . . . . . . . . . . . . . . . . . 29
Continuity and Compactness . . . . . . . . . . . . . . . . . . . . . . . . . 29
0.9 Totally Bounded Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . 30
0.10 Equicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
0.11 The Stone-Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . . . . 32
0.12 Locally Compact Topological Spaces . . . . . . . . . . . . . . . . . . . . . 33
General Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Functions with Compact Support . . . . . . . . . . . . . . . . . . . . . . 34
Functions That Vanish at Infinity . . . . . . . . . . . . . . . . . . . . . . 35
The One-Point Compactification . . . . . . . . . . . . . . . . . . . . . . . 35
0.13 Spaces of Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . 36
0.14 Partitions of Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
0.15 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

I Measure and Integration 41


1 Measurable Sets 43
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.2 Measurable Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Fields and Sigma Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Generated Sigma Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Borel Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Extended Borel Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Product Sigma Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Pi-Systems and Lambda-Systems . . . . . . . . . . . . . . . . . . . . . . 47
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Properties and Examples of Measures . . . . . . . . . . . . . . . . . . . . 51
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.4 Complete Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Completion Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Null Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.5 Outer Measure and Measurability . . . . . . . . . . . . . . . . . . . . . . 55
Construction of an Outer Measure . . . . . . . . . . . . . . . . . . . . . 56
Carathéodory’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.6 Extension of a Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
The Measure Extension Theorem . . . . . . . . . . . . . . . . . . . . . . 59
Approximation Property of the Extension . . . . . . . . . . . . . . . . . . 61
Completeness of the Extension . . . . . . . . . . . . . . . . . . . . . . . . 61
Uniqueness of the Extension . . . . . . . . . . . . . . . . . . . . . . . . . 62
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Contents ix

1.7 Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63


The Volume Set Function . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Construction of the Measure . . . . . . . . . . . . . . . . . . . . . . . . . 65
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1.8 Lebesgue-Stieltjes Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
One-Dimensional Distribution Functions . . . . . . . . . . . . . . . . . . 67

Higher Dimensional Distribution Functions . . . . . . . . . . . . . . . . . 69
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
*1.9 Some Special Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
An Uncountable Set with Lebesgue Measure Zero . . . . . . . . . . . . . . 71
Non-Lebesgue-Measurable Sets . . . . . . . . . . . . . . . . . . . . . . . . 71
A Lebesgue Measurable, Non-Borel Set . . . . . . . . . . . . . . . . . . . 72
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

2 Measurable Functions 75
2.1 Measurable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 75
General Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2 Measurable Numerical Functions . . . . . . . . . . . . . . . . . . . . . . . 78
Criteria for Measurability . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Almost Everywhere Properties . . . . . . . . . . . . . . . . . . . . . . . . 79
Combinatorial and Limit Properties of Measurable Functions . . . . . . . 79
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.3 Simple Functions . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 82
A Fundamental Convergence Theorem . .. . . . . . . . . . . . . . . . . . 82
Applications . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 83
Exercises . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 84
2.4 Convergence of Measurable Functions . .. . . . . . . . . . . . . . . . . . 85
Modes of Convergence . . . . . . . . . .. . . . . . . . . . . . . . . . . . 85
Relationships Among the Modes of Convergence . . . . . . . . . . . . . . 86
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3 Integration 89
3.1 Construction of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Integral of a Nonnegative Simple Function . . . . . . . . . . . . . . . . . 89
Integral of a Real-Valued Function . . . . . . . . . . . . . . . . . . . . . . 90
Integral of a Complex-Valued Function . . . . . . . . . . . . . . . . . . . 91
Integral over a Measurable Set . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Basic Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . 92
Almost Everywhere Properties . . . . . . . . . . . . . . . . . . . . . . . . 92
Monotone Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . 93
Linearity of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Integration Against an Image Measure . . . . . . . . . . . . . . . . . . . 96
Integration Against a Measure with Density . . . . . . . . . . . . . . . . . 96
Change of Variables Theorem . . . . . . . . . . . . . . . . . . . . . . . . 97
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.3 Connections with the Riemann Integral on Rd . . . . . . . . . . . . . . . 100
The Darboux Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Measure Zero Criterion for Riemann Integrability . . . . . . . . . . . . . 104
x Contents

Improper Riemann Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 106


Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.4 Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
The General Monotone Convergence Theorem . . . . . . . . . . . . . . . 108
Fatou’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
The Dominated Convergence Theorem . . . . . . . . . . . . . . . . . . . 109
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.5 Integration against a Product Measure . . . . . . . . . . . . . . . . . . . 111
Construction of the Product of Two Measures . . . . . . . . . . . . . . . 111
Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
The d-Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.6 Applications of Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . 116
Gaussian Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Spherical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Volume of a d-Dimensional Ball . . . . . . . . . . . . . . . . . . . . . . . 118
Integration of Radial Functions . . . . . . . . . . . . . . . . . . . . . . . 119
Surface Area of a d-Dimensional Ball . . . . . . . . . . . . . . . . . . . . 120
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4 Lp Spaces 123
4.1 Definition and General Properties . . . . . . . . . . . . . . . . . . . . . . 123
The Case 1 ≤ p < ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
The Case p = ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
The Case 0 < p < 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
`p -Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2 Lp Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Approximation by Simple Functions . . . . . . . . . . . . . . . . . . . . . 129
Approximation by Continuous Functions . . . . . . . . . . . . . . . . . . 130
Approximation by Step Functions . . . . . . . . . . . . . . . . . . . . . . 131
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.3 Lp Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
*4.4 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
*4.5 Convex Functions and Jensen’s Inequality . . . . . . . . . . . . . . . . . . 136
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5 Differentiation 139
5.1 Signed Measures . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 139
Definition and a Fundamental Example . . .. . . . . . . . . . . . . . . . 139
The Hahn-Jordan Decomposition . . . . . .. . . . . . . . . . . . . . . . 140
Exercises . . . . . . . . . . . . . . . . .
. .. . . . . . . . . . . . . . . . 142
5.2 Complex Measures . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 143
The Total Variation Measure . . . . . . . .. . . . . . . . . . . . . . . . 144
The Vitali-Hahn-Saks Theorem . . . . . . .. . . . . . . . . . . . . . . . 145
The Banach Space of Complex Measures . .. . . . . . . . . . . . . . . . 146
Integration against a Signed or Complex Measure . . . . . . . . . . . . . . 147
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Contents xi

5.3 Absolute Continuity of Measures . . . . . . . . . . . . . . . . . . . . . . . 148


General Properties of Absolute Continuity . . . . . . . . . . . . . . . . . 148
The Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . 149
Lebesgue-Decomposition of a Measure . . . . . . . . . . . . . . . . . . . . 152
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.4 Differentiation of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Definition and Properties of the Derivative . . . . . . . . . . . . . . . . . 154
Connections with the Classical Derivative . . . . . . . . . . . . . . . . . . 156
Existence of the Measure Derivative . . . . . . . . . . . . . . . . . . . . . 157
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.5 Functions of Bounded Variation . . . . . . . . . . . . . . . . . . . . . . . 159
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 159
The Total Variation Function . . . . . . . . . . . . . . . . . . . . . . . . 161
Differentiation of Functions of Bounded Variation . . . . . . . . . . . . . 162
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.6 Absolutely Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . 164
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 164
Fundamental Theorems of Calculus . . . . . . . . . . . . . . . . . . . . . 165
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6 Fourier Analysis on Rd 169


6.1 Convolution of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 169
Approximate Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 171
The Fourier Inversion Theorem . . . . . . . . . . . . . . . . . . . . . . . 172
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.3 Rapidly Decreasing Functions . . . . . . . . . . . . . . . . . . . . . . . . 174
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 174
The Plancherel Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.4 Fourier Analysis of Measures on Rd . . . . . . . . . . . . . . . . . . . . . 178
Convolution of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
The Fourier-Stieltjes Transform . . . . . . . . . . . . . . . . . . . . . . . 179
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7 Measures on Locally Compact Spaces 181


7.1 Radon Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 181
Consequences of Regularity . . . . . . . . . . . . . . . . . . . . . . . . . 182
The Space of Complex Radon Measures . . . . . . . . . . . . . . . . . . . 182
The Support of a Radon Measure . . . . . . . . . . . . . . . . . . . . . . 183
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.2 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . . . 184
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.3 Products of Radon Measures . . . . . . . . . . . . . . . . . . . . . . . . . 188
Finitely Many Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Infinitely Many Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
xii Contents

7.4 Vague Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
*7.5 The Daniell-Stone Representation Theorem . . . . . . . . . . . . . . . . . 193

II Functional Analysis 197


8 Banach Spaces 199
8.1 General Properties of Normed Spaces . . . . . . . . . . . . . . . . . . . . 199
Topology and Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Separable Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Equivalent Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Finite Dimensional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Strictly Convex Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.2 Bounded Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . 206
The Operator Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
The Banach Algebra B(X) . . . . . . . . . . . . . . . . . . . . . . . . . . 208
The Dual Space X 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Bilinear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.3 Concrete Representations of Dual Spaces . . . . . . . . . . . . . . . . . . 210
The Dual of c0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
The Dual of c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
The Dual of Lp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
The Dual of C0 (X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.4 Some Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.5 Hahn-Banach Extension Theorems . . . . . . . . . . . . . . . . . . . . . . 218
Real Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Complex Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Normed Space Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
The Bidual of a Normed Space . . . . . . . . . . . . . . . . . . . . . . . 221

Invariant Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
*8.6 Applications of the Hahn-Banach Theorem . . . . . . . . . . . . . . . . . 222
The Moment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Invariant Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Banach Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Invariant Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8.7 Baire Category in Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . 225
The Uniform Boundedness Principle . . . . . . . . . . . . . . . . . . . . 225
The Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 226
The Closed Graph Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 228
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
*8.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Divergent Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Contents xiii

Vector-Valued Analytic Functions . . . . . . . . . . . . . . . . . . . . . . 231


Summability . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . 231
Schauder Bases . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . 232
Exercises . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . 233
8.9 The Dual Operator . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . 234
Definition and Properties . . . . . .. . . . . . . . . . . . . . . . . . . . 234
Annihilators . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 234
Duals of Quotient Spaces and Subspaces . . . . . . . . . . . . . . . . . . . 235
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.10 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Fredholm Alternative for Compact Operators . . . . . . . . . . . . . . . . 238
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

9 Locally Convex Spaces 241


9.1 General Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Geometry and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Seminormed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Fréchet Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
9.2 Continuous Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . 246
Continuity on Topological Vector Spaces . . . . . . . . . . . . . . . . . . 246
Continuity on Locally Convex Spaces . . . . . . . . . . . . . . . . . . . . 248
Continuity on Finite Dimensional Spaces . . . . . . . . . . . . . . . . . . 248
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.3 Hahn-Banach Separation Theorems . . . . . . . . . . . . . . . . . . . . . 249
Weak Separation in a TVS . . . . . . . . . . . . . . . . . . . . . . . . . 249
Strict Separation in a LCS . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Some Consequences of the Separation Theorems . . . . . . . . . . . . . . 250
The Bipolar Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
*9.4 Some Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Strict Inductive Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

10 Weak Topologies on Normed Spaces 257


10.1 The Weak Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Definition and General Properties . . . . . . . . . . . . . . . . . . . . . . 257
Weak Sequential Convergence . . . . . . . . . . . . . . . . . . . . . . . . 258
Convexity and Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Application: Weak Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.2 The Weak∗ Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Definition and General Properties . . . . . . . . . . . . . . . . . . . . . . 262
0
The Dual of Xw ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
The Banach-Alaoglu Theorem . . . . . . . . . . . . . . . . . . . . . . . . 263

Application: Means on Function Spaces . . . . . . . . . . . . . . . . . . . 263
Weak∗ Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

The Closed Range Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 265
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
xiv Contents

10.3 Reflexive Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267


Examples and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 267
Weak Compactness and Reflexivity . . . . . . . . . . . . . . . . . . . . . 268
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
*10.4 Uniformly Convex Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Definition and General Properties . . . . . . . . . . . . . . . . . . . . . . 269
Connections with Strict Convexity . . . . . . . . . . . . . . . . . . . . . . 270
Weak and Strong Convergence . . . . . . . . . . . . . . . . . . . . . . . . 270
Connection with Reflexivity . . . . . . . . . . . . . . . . . . . . . . . . . 271
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

11 Hilbert Spaces 273


11.1 General Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Sesquilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Semi-Inner-Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Inner Product Spaces. Hilbert Spaces . . . . . . . . . . . . . . . . . . . . 275
Isomorphisms of Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . 277
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
11.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Orthogonal Complements . . . . . . . . . . . . . . . . . . . . . . . . . . 278
The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . . . 280
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
11.3 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
The Dimension of a Hilbert Space . . . . . . . . . . . . . . . . . . . . . . 283
The Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . 283
Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
11.4 The Hilbert Space Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Bounded Sesquilinear Functionals . . . . . . . . . . . . . . . . . . . . . . 286
The Lax-Milgram Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Definition and Properties of the Adjoint . . . . . . . . . . . . . . . . . . . 287
B(H) as a C ∗ -algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

12 Operator Theory 289


12.1 Classes of Operators . . . . . . . . . .
. . . . . . .. . . . . . . . . . . . . 289
Normal Operators . . . . . . . . . .
. . . . . . .. . . . . . . . . . . . . 289
Self-Adjoint Operators . . . . . . . .
. . . . . . .. . . . . . . . . . . . . 289
Positive Operators . . . . . . . . . .
. . . . . . .. . . . . . . . . . . . . 291
Orthogonal Projections and Idempotents . . . . . . . . . . . . . . . . . . 292
Unitary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

Partial Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.2 Compact Operators and Operators of Finite Rank . . . . . . . . . . . . . 296
Rank One Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
An Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 297
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
12.3 The Spectral Theorem for Compact Normal Operators . . . . . . . . . . 299
Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 299
Diagonalizable Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Contents xv

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
*12.4 Hilbert-Schmidt Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 303
The Hilbert-Schmidt Norm . . . . . . . . . . . . . . . . . . . . . . . . . . 303
The Hilbert-Schmidt Inner Product . . . . . . . . . . . . . . . . . . . . . 304
The Hilbert-Schmidt Operator A ⊗ B . . . . . . . . . . . . . . . . . . . . 306
Hilbert-Schmidt Integral Operators . . . . . . . . . . . . . . . . . . . . . . 307
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
*12.5 Trace Class Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
The Trace Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
The Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
The Dual Spaces B0 (H)0 and B1 (H)0 . . . . . . . . . . . . . . . . . . . 313
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

13 Banach Algebras 315


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 315
The Group of Invertible Elements . . . . . . . . . . . . . . . . . . . . . . 316
The Cauchy Product of Series . . . . . . . . . . . . . . . . . . . . . . . . 317
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
13.2 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
The Spectrum of an Element . . . . . . . . . . . . . . . . . . . . . . . . . 319
The Spectral Radius Formula . . . . . . . . . . . . . . . . . . . . . . . . 320
Normal Elements in a C ∗ -Algebra . . . . . . . . . . . . . . . . . . . . . . 321
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
13.3 The Spectrum of an Algebra . . . . . . . . . . . . . . . . . . . . . . . . . 324
Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Maximal Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
13.4 Gelfand Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
The Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . 326
Application: The Stone-Čech Compactification . . . . . . . . . . . . . . . 327
Application: Wiener’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 328
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
*13.5 The Non-unital Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
The Unitization of a Banach Algebra . . . . . . . . . . . . . . . . . . . . 329
The Non-unital Representation Theorem . . . . . . . . . . . . . . . . . . 330
The Spectrum of C0 (X) . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
The Spectrum of L1 (Rd ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
13.6 Operator Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
The Continuous Functional Calculus . . . . . . . . . . . . . . . . . . . . 333
Applications to Operators on Hilbert Space . . . . . . . . . . . . . . . . . 334
The Borel Functional Calculus . . . . . . . . . . . . . . . . . . . . . . . . 336
The Spectral Theorem for Normal Operators . . . . . . . . . . . . . . . . 338
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

14 Miscellaneous Topics 341


14.1 Weak Sequential Compactness . . . . . . . . . . . . . . . . . . . . . . . . 341
The Eberlein-S̆mulian Theorem . . . . . . . . . . . . . . . . . . . . . . . 342
14.2 Weak Compactness in L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Weak Convergence and Uniform Integrability . . . . . . . . . . . . . . . . 344
xvi Contents

The Dunford-Pettis Theorem . . . . . . . . . . . . . . . . . . . . . . . . 345


14.3 Convexity and Compactness . . . . . . . . . . . . . . . . . . . . . . . . . 346
The Krein-S̆mulian Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 346
Mazur’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
The Finite Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . 347
14.4 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 348
The Krein-Milman Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 350
14.5 Applications of the Krein-Milman Theorem . . . . . . . . . . . . . . . . . 352
Existence of Ergodic Measures . . . . . . . . . . . . . . . . . . . . . . . . 352
The Stone-Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . . . . 353
The Banach-Stone Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 354
The Lyapunov Convexity Theorem . . . . . . . . . . . . . . . . . . . . . . 355
The Ryll-Nardzewski Fixed Point Theorem . . . . . . . . . . . . . . . . . 356
14.6 Vector-Valued Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Weak Integrals in Banach Spaces . . . . . . . . . . . . . . . . . . . . . . 358
Weak Integrals in Locally Convex Spaces . . . . . . . . . . . . . . . . . . 360
The Bochner Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
14.7 Choquet’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

III Applications 367


15 Distributions 369
15.1 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
The Fréchet Space C ∞
K (U ) . . . . . . . . . . . . . . . . . . . . . . . . . . 369
The Spaces D(U ) and D0 (U ) . . . . . . . . . . . . . . . . . . . . . . . . 370
Examples of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 370
15.2 Operations on Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Derivative of a Locally Integrable Function . . . . . . . . . . . . . . . . . 371
Derivative of a Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 371
Multiplication by a Smooth Function . . . . . . . . . . . . . . . . . . . . 372
Composition with Linear Maps . . . . . . . . . . . . . . . . . . . . . . . 372
15.3 Distributions with Compact Support . . . . . . . . . . . . . . . . . . . . . 372
15.4 Convolution of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 374
15.5 Tempered Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
The Fourier Transform of a Tempered Distribution . . . . . . . . . . . . . 379
15.6 Sobolev Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Application: Elliptic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Sobolev Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

16 Analysis on Locally Compact Groups 385


16.1 Topological Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 385
Translation and Uniform Continuity . . . . . . . . . . . . . . . . . . . . 386
16.2 Haar Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 387
Existence of Haar Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Essential Uniqueness of Haar Measure . . . . . . . . . . . . . . . . . . . 391
The Modular Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
16.3 Some Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Contents xvii

Haar Measure on Direct Products . . . . . . . . . . . . . . . . . . . . . . 394


Haar Measure on Semidirect Products . . . . . . . . . . . . . . . . . . . . 394
Haar Measure on Quotient Groups . . . . . . . . . . . . . . . . . . . . . 396
16.4 The L1 -Group Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Convolution and Involution . . . . . . . . . . . . . . . . . . . . . . . . . 397
Approximate Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
The Measure Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
16.5 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Positive-Definite Functions . . . . . . . . .
. . . . . . . . . . . . . . . . 401
Functions of Positive Type . . . . . . . . . .
. . . . . . . . . . . . . . . . 402
Unitary Representations . . . . . . . . . . .
. . . . . . . . . . . . . . . . 403
Irreducible Representations . . . . . . . . .
. . . . . . . . . . . . . . . . 406
Unitary Representations of Compact Groups . . . . . . . . . . . . . . . . 408
16.6 Locally Compact Abelian Groups . . . . . . . . . . . . . . . . . . . . . . 411
The Dual Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
Bochner’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
The Inversion Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
The Plancherel Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
The Pontrjagin Duality Theorem . . . . . . . . . . . . . . . . . . . . . . 420

17 Analysis on Semigroups 423


17.1 Semigroups with Topology . . . . . . . . . . . . . . . . . . . . . . . . . . 423
17.2 Weakly Almost Periodic Functions . . . . . . . . . . . . . . . . . . . . . . 424
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 424
The Dual of the Space of Weakly Almost Periodic Functions . . . . . . . . 424
The Weakly Almost Periodic Compactification . . . . . . . . . . . . . . . 426
Invariant Means on Weakly Almost Periodic Functions . . . . . . . . . . 428
17.3 Almost Periodic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 429
The Almost Periodic Compactification . . . . . . . . . . . . . . . . . . . 430
17.4 The Structure of Compact Semigroups . . . . . . . . . . . . . . . . . . . . 431
Ellis’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Existence of Idempotents . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Ideal Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
17.5 Strongly Almost Periodic Functions . . . . . . . . . . . . . . . . . . . . . 433
Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 433
The Strongly Almost Periodic Compactification . . . . . . . . . . . . . . . 435
17.6 Semigroups of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . 437
Dynamical Properties of Semigroups of Operators . . . . . . . . . . . . . 438
Ergodic Properties of Semigroups of Operators . . . . . . . . . . . . . . . 441

18 Probability Theory 443


18.1 Random Variables . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 443
Expectation and Variance . . .
. . . . . . . . . . . . . . . . . . . . . . . 443
Probability Distributions . . . .
. . . . . . . . . . . . . . . . . . . . . . . 444
18.2 Independence . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 446
Independent Events . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 446
Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . 446
18.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
18.4 Sequences of Independent Random Variables . . . . . . . . . . . . . . . . 449
xviii Contents

Infinite Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 450


The Distribution of a Sequence of Random Variables . . . . . . . . . . . . 452
Zero-One Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 458
The Individual Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . 459
Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
18.5 Discrete-Time Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . 463
Filtrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Definition and General Properties of Martingales . . . . . . . . . . . . . . 464
Stopping Times. Optional Sampling . . . . . . . . . . . . . . . . . . . . . 466
Upcrossings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Convergence of Martingales . . . . . . . . . . . . . . . . . . . . . . . . . 469
Reversed Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
18.6 General Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . 472
The Consistency Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 472
The Product of Measurable Spaces . . . . . . . . . . . . . . . . . . . . . . 473
The Kolmogorov Extension Theorem . . . . . . . . . . . . . . . . . . . . 474
18.7 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Construction of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . 477
Non-Differentiability of Brownian Paths . . . . . . . . . . . . . . . . . . 481
Variation of Brownian Paths . . . . . . . . . . . . . . . . . . . . . . . . 482
Brownian Motion as a Martingale . . . . . . . . . . . . . . . . . . . . . . 483
18.8 Stochastic Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
The Ito Integral of a Step Process . . . . . . . . . . . . . . . . . . . . . . 484
The General Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
The Ito Integral as a Martingale . . . . . . . . . . . . . . . . . . . . . . . 487
18.9 An Application to Finance . . . . . . . . . . . . . . . . . . . . . . . . . . 488
The Stock Price Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Self-Financing Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Call Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
The Black-Scholes Option Price . . . . . . . . . . . . . . . . . . . . . . . 490

IV Appendices 493
A Change of Variables Theorem 495

B Separate and Joint Continuity 501

References 505

List of Symbols 509

Index 511
Preface

The purpose of this book is to provide a rigorous and detailed treatment of the essentials
of measure, integration, and functional analysis at the graduate level. It is assumed that the
reader has an undergraduate background in what is now traditionally called real analysis,
including elementary set theory and a rigorous treatment of limits, continuity, differentiation,
Riemann integration, and uniform convergence. An acquaintance with complex function
theory, in particular the complex exponential function ez and Cauchy’s integral equation,
is needed for a few applications. A knowledge of basic linear algebra, at least the notions
of subspace, basis, and linear transformation, is also assumed. Metric spaces and general
topology are developed in detail in Chapter 0. The former topic will be needed for the
treatment of Lp spaces and the latter for the chapters on Radon measures and weak
topologies.
The book has four parts. Part I consists of Chapters 1–7 and develops the general theory of
Lebesgue integration. A course in the subject could consist of Chapters 1–5 with Chapters 6
or 7 optional.
Part II is organized as a course in functional analysis. Chapters 8–12 could form the core
of such a course, with Chapter 13 optional. Some of the applications and examples in Part II
rely on the measure and integration developed in Part I. The reader with a background in
this subject could safely omit Part I. Chapter 14 consists of deeper theorems in functional
analysis as well as applications. Some of the applications in the remainder of the book rely
on results of this chapter.
Part III consists of a variety of topics and applications that depend on, and indeed are
meant to illustrate the power of, topics developed in the first two parts. The chapters here
are largely independent, with the exception of Chapter 17, which depends on some results
in Chapter 16. The goal of these chapters is to provide a relatively quick overview of the
essentials of the subjects treated therein. The approach to these is sufficiently detailed so
that the reader can follow the development with relative ease. It is hoped that the treatment
here will inspire the reader to consult some of the many fine texts that specialize in these
subjects, some of which are listed in the bibliography.
Part IV consists of two appendices with proofs of the change of variables theorem and a
theorem on separate and joint continuity. The reader may safely omit the proofs without
disturbing the flow of the text.
The book contains nearly 700 exercises. Hints and/or a framework of intermediate steps
are given for the more difficult exercises. Many of these are extensions of material in the
text or are of special independent interest. Exercises related in a critical way to material
elsewhere in the text are marked with either an upward arrow, referring to earlier results,
or a downward arrow, referring to later material. Instructors with suitable bona fides may
obtain complete solutions to the exercises from the publisher.
A word about numbering: Proclamations (theorems, lemmas, examples, etc.) are numbered
consecutively in each section. Thus 1.2.3 refers to the third proclamation in Section 2 of

xix
xx Preface

Chapter 1. Important equations are numbered consecutively in each chapter. Thus (4.5)
refers to the fifth such equation of Chapter 4. Equations within a proof that are only locally
relevant are referenced by symbols such as (†), (α), etc. Exercises are numbered consecutively
within each chapter. Thus Ex. 6.7 refers to the seventh exercise of Chapter 6.
The book is an outgrowth of courses in analysis taught at The George Washington
University. Specific notes for the book have been tested in classes over the last three years
and have benefitted greatly from comments, questions, and corrections from students; for
these I am grateful. It goes without saying that the book has also benefitted from several
excellent texts in analysis that have served as valuable resources—several of these are listed
in the bibliography. Finally, I wish to express my gratitude to my teacher C.T. Taam who
first exposed me to much of the mathematics that appears in this book.

Hugo D. Junghenn
Washington, D.C.
Chapter 0
Preliminaries

In this chapter we assemble the basic material needed for the topics treated in the book.
The reader may wish to simply skim the chapter at first, returning to specific topics as the
need arises.

0.1 Sets
The terms set, collection, and family are synonymous, although in some contexts one
term may be preferred over another, as in a collection of sets or a family of functions. Sets
are usually denoted by capital letters in various styles, and members of sets by small letters.
As usual, the notation x ∈ A denotes membership of x in A.
A concrete set may be described either by (perhaps only partially) listing its members or
by set-builder notation. The latter is of the form {x : P (x)}, which is read “the set of all x
such that P (x),” where P (x) is a well-defined property that x must possess to belong to the
set. For example, the set of all odd integers may be described as

{±1, ±3, ±5, . . .} = {n : n = 2m − 1 for some integer m}.

Set Operations
If A is a subset of B, we write A ⊆ B. If all sets in a particular discussion are subsets
of a set X, then X is called a universal set. The power set of a set X is the collection
P(X) of all subsets of X. If A, B ⊆ X, then A ∪ B, A ∩ B, and A \ B denote the union,
intersection and relative difference of A and B, respectively, and Ac denotes the complement
of A in X. If A ⊆ P(X) and B ∈ P(X), we define the trace of A on B by

A ∩ B = {A ∩ B : A ∈ A}.

The union and intersection of an indexed family A = {Ai : i ∈ I} of sets are denoted,
respectively, by [ [ \ \
A= Ai and A= Ai .
i∈I i∈I

If the index set in these operations is {1, 2 . . . , n} or {1, 2 . . .}, we write instead
n
[ n
\ ∞
[ ∞
\
Aj = A1 ∪· · ·∪An , Aj = A1 ∩· · ·∩An , Aj = A1 ∪A2 ∪. . . , Aj = A1 ∩A2 ∩. . .
j=1 j=1 j=1 j=1

1
2 Principles of Analysis

0.1.1 Proposition. Union and intersection have the following properties:


 [ c \  \ c [
(a) Ai = Aci . (b) Ai = Aci .
i∈I i∈I i∈I i∈I
\  \ [  [
(c) A ∪ Ai = A ∪ Ai . (d) A ∩ Ai = A ∩ Ai .
i∈I i∈I i∈I i∈I

Parts (a) and (b) of the proposition are known as DeMorgan’s laws, and parts (c) and (d)
are called distributive laws.
A family
S {Ai : i ∈ I} of sets is disjoint if Ai ∩ Aj = ∅ whenever i 6= j. In this case, the
union i∈I Ai is said to be disjoint. A partition of a set X is a collection of nonempty,
disjoint sets whose union is X.
A sequence of sets An is said to be increasing if A1 ⊆ A2 ⊆ · · · , in which case we write
An ↑. Similarly, the sequence is decreasing if A1 ⊇ A2 ⊇ · · · , written An ↓. In the first
case we also write An ↑ A, where A = A1 ∪ A2 ∪ · · · , and in the second An ↓ A, where
A = A1 ∩ A2 ∩ · · · .
Cartesian products of finite or infinite sequences of sets A1 , A2 , . . . are denoted, respectively,
by
Yd Y∞
An = A1 × · · · × Ad and An = A1 × A2 × . . . .
n=1 n=1

In the case A1 = A2 = · · · = A, we write instead Ad or A∞ .


0.1.2 Proposition. Cartesian products have the following properties:

(a) A × A1 ∪ A2 ∪ · · · = (A × A1 ) ∪ (A × A2 ) ∪ · · ·.

(b) A × A1 ∩ A2 ∩ · · · = (A × A1 ) ∩ (A × A2 ) ∩ · · ·.
 
(c) A1 ∩ A2 ∩ · · · × B1 ∩ B2 ∩ · · · = (A1 × B1 ) ∩ (A2 × B2 ) ∩ · · ·.
Q
The Cartesian product X := i∈I Xi of a family of nonempty sets Xi is defined as the
S
collection of all functions f : I → i∈I Xi such that f (i) ∈ Xi for each i. (The axiom of
choice asserts that the Cartesian product is nonempty.) The value f (i) is called the ith
coordinate of f . Note that if we identify a finite index set I with the set {1, . . . , n}, then
a function f in X is completely described by the n-tuple (f (1), . . . , f (n)). Thus the general
definition of Cartesian product reduces in this case to the “concrete” definition given above.
A similar remark applies if I is countably infinite.

Number Systems
The following notation is used for the standard number systems:
N := the set of positive integers.
Z := the set of integers.
Q := the set of rational numbers.
R := the set of real numbers.
C := the set of complex numbers.
Two subsets of C are of particular importance:

D := {z ∈ C : |z| < 1} the open unit disk and T := {z ∈ C : |z| = 1} the circle group.
Preliminaries 3

The symbol K serves double-duty:


K := R or C.
This will be convenient, for example, when considering vector spaces where the scalar field
may be either R or C. If A ⊆ R we write A+ for the set A ∩ [0, +∞.). For example, the set
of nonnegative integers is
Z+ = {n ∈ Z : n ≥ 0}.
If A ⊆ C we denote by A∗ the set of nonzero members of A.
Real and complex d-dimensional Euclidean space are defined, respectively, by

Rd := R × · · · × R and Cd := C × · · · × C (d factors).

We also use the notation


Kd := K × · · · × K
where appropriate. A d-dimensional interval in Rd is indicated with boldface, as in

(a, b] := (a1 , b1 ] × · · · × (ad , bd ], where a := (a1 , . . . , ad ) and b := (b1 , . . . , bd ),

The Euclidean norm on Kd is denoted by


p
|z| := |z1 |2 + · · · + |zd |2 .

The extended real number system is the set

R := R ∪ {±∞} = [−∞, +∞]

with the following order structure and operations:

−∞ < x < +∞ for all x ∈ R,


x + ∞ = ∞ if − ∞ < x ≤ +∞, x − ∞ = −∞ if − ∞ ≤ x < +∞,
x · (+∞) = +∞ if 0 < x ≤ +∞, x · (+∞) = −∞ if − ∞ < x < 0,
x · (−∞) = −∞ if 0 < x < +∞, x · (−∞) = +∞ if − ∞ ≤ x < 0,
x x
= = 0 if − ∞ < x < +∞,
+∞ −∞
0 · (±∞) = 0.
+
Hereafter, we shall use the customary abbreviation ∞ for +∞. The members of [0, ∞] = R
are called extended nonnegative real numbers. The symbol K is modified as follows to
apply to the extended setting:
K := R or C.

Relations
A relation on a nonempty set X is a nonempty set ∼ of ordered pairs from X. It is
customary to write x ∼ y rather than the prolix (x, y) ∈ ∼. A relation is said to be
(a) reflexive if x ∼ x for every x ∈ X;
(b) symmetric if x ∼ y ⇒ y ∼ x;
(c) transitive if x ∼ y and y ∼ z ⇒ x ∼ z;
(d) antisymmetric if x ∼ y and y ∼ x ⇒ x = y.
4 Principles of Analysis

A relation ∼ on X that is reflexive, symmetric, and transitive is called an equivalence


relation. The equivalence class of x ∈ X is the set

[x] := {y ∈ X : x ∼ y}.

The collection X/ ∼ of distinct equivalence classes is a partition of X. Conversely, given a


partition of X, the relation defined by x ∼ y iff x and y are in the same partition member is
an equivalence relation on X whose equivalence classes are precisely the members of the
partition. Thus equivalence relations and partitions are two versions of the same concept. An
example of an equivalence relation on R is given by the definition x ∼ y iff x − y ∈ Q. Here,
the distinct equivalence classes are the sets x + Q, where x is irrational, and these classes
partition R. Additional examples of equivalence relations appear throughout the text.
A relation that is reflexive, antisymmetric, and transitive is called a partial order. For
partial orders we use the more suggestive notation x  y (equivalently, y  x) rather than
x ∼ y. A set with a partial order is called a partially ordered set. An upper (lower)
bound of a subset Y of a partially ordered set X is a member x ∈ X such that y  x
(x  y) for all y ∈ Y . The supremum (infimum) of Y is an upper (lower) bound x0 of Y
such that x0  x (x0  x) for all upper (lower) bounds x of Y . For example, if the powerTset
X =S P(X) is ordered upward by inclusion, that is, A  B iff A ⊆ B, and if Y ⊆ X, then Y
and Y are, respectively, the infimum and supremum of Y. The terms least upper bound
and greatest lower bound are synonymous with supremum and infimum, respectively. A
member x of X is maximal if y ∈ X and x  y ⇒ y = x, and is minimal if y ∈ X and
x  y ⇒ y = x.
A nonempty subset Y of a partially ordered set X is said to be totally ordered or a
chain if for all y1 , y2 ∈ Y either y1  y2 or y1  y2 . We shall make frequent and significant
use of the following important result. (For a proof based on the axiom of choice see, for
example, [23] or [30].)
0.1.3 Zorn’s Lemma. Let X be a partially ordered set such that every chain has an upper
bound in X. Then X has a maximal element.

Functions
The terms mapping, transformation, and function are synonymous. A function f
with domain dom f = X and range ran√ f ⊆ Y is symbolized by f : X → Y . We also
occasionally write x 7→ f (x), as in x 7→ x, to describe a function. The collection of all
functions from X to Y is denoted by Y X .
The image of A ⊆ X and the preimage of B ⊆ Y under a function f : X → Y are
defined, respectively, by

f (A) = {f (x) : x ∈ A} and f −1 (B) = {x : f (x) ∈ B}.

A function f : X → Y is surjective or onto Y if f (X) = Y , and injective, or one-to-one


6 x2 ⇒ f (x1 ) 6= f (x2 ). A surjection (injection) is a map that is surjective
(1-1), if x1 =
(injective). A mapping that is both surjective and injective is said to be bijective and is
called a bijection or a one-to-one correspondence.
An important example of a surjective function is the quotient map Q : X → X/ ∼
associated with an equivalence relation ∼ on a nonempty set X. Here Q(x) := [x], the
equivalence class containing x. The preimage of a subset B of X/ ∼ under Q is the union of
all equivalence classes [x] in B.
The basic properties of images and preimages are summarized in the following proposition.
Preliminaries 5

0.1.4 Proposition. Let {Xi : i ∈ I} be a collection of subsets of X and {Yj : j ∈ J} a


collection of subsets of Y . Then
[  [ \  \
(a) f −1 Yj = f −1 (Yj ). (b) f −1 Yj = f −1 (Yj ).
j∈J j∈J j∈J j∈J
[  [ \  \
(c) f Xi = f (Xi ). (d) f Xi ⊆ f (Xi ).
i∈I i∈I i∈I i∈I
−1 c
 −1 c c
 c
(e) f (Yj ) = f (Yj ) . (f) f (Xi ) ⊆ f (Xi ) .
 
(g) f f −1 (Yj ) ⊆ Yj . (h) Xi ⊆ f −1 f (Xi ) .

Equality holds in (d) and (h) if f is injective. Equality holds in (f ) and (g) if f is surjective.
For f : X → Y , A ⊆ P(X) and B ⊆ P(Y ), we define the collections
 
f (A) = f (A) : A ∈ A ⊆ P(Y ) and f −1 (B) = f −1 (B) : B ∈ B ⊆ P(X).

If f : X → Y and g : Y → Z are functions with f (X) ⊆ Y , then g ◦ f : X → Z denotes


the composition of g and f :

(g ◦ f )(x) = g f (x) , x ∈ X.

The following relation holds for subsets A ⊆ Z:



(g ◦ f )−1 (A) = f −1 g −1 (A) .

The identity function idX on a set X is defined by idX (x) = x for all x ∈ X. If A ⊆ X,
then the restriction of idX to A is called the inclusion map and is frequently denoted by
ιA : A ,→ X.
If f : X → Y is bijective, then the inverse f −1 : Y → X of f is defined by the rule
x = f −1 (y) iff y = f (x). One then has

f −1 ◦ f = idX and f ◦ f −1 = idY .

If X is a universal set and A ⊆ X, then the indicator function of A is defined by


(
1 if x ∈ A,
1A (x) =
0 if x ∈ Ac .

Clearly, indicator functions satisfy the relations

1AB = 1A 1B , 1A∪B = 1A + 1B − 1A 1B , and 1Ac = 1 − 1A .

A special case of an indicator function is the function δxy defined by


(
1 if x = y,
δxy =
0 if x 6= y.

Indeed, if A = {(x, x) : x ∈ X}, then δxy = 1A (x, y).


The functions x+ and x− on R are defined by

x+ = max{x, 0} and x− = max{−x, 0}.

The following relations are easily established:

x = x+ − x− and |x| = x+ + x− .
6 Principles of Analysis

The real and imaginary parts of a complex number z are denoted, respectively, by Re z
and Im z, the conjugate by z, and the modulus by |z|. Thus
p
z = Re z + i Im z, z = Re z − i Im z, and |z| = (Re z)2 + (Im z)2 .

The signum sgn(z) of a complex variable z is defined by



 |z|
if z 6= 0,
sgn(z) = z
0 otherwise.

Thus |z| = z sgn(z) for all z ∈ C and |sgn(z)| = 1 for all z 6= 0.


If F ⊆ Y X , then the evaluation functional at x ∈ X is the function x
b : F → Y defined
by
b(f ) = f (x), f ∈ F.
x
The notation δx is sometimes used for x b. The adjoint or dual of a map φ : Z → X (with
respect to F ) is the function φ∗ : F → Y Z defined by

φ∗ (f ) = f ◦ φ, f ∈ F.

The notation φ0 is also used.


The following notation for real-valued functions on a set X will be needed in subsequent
chapters:

f + (x) := max{f (x), 0} f − := (−f )+


(f1 ∨ · · · ∨ fn )(x) := max fk (x) (f1 ∧ · · · ∧ fn )(x) := min fk (x)
1≤k≤n 1≤k≤n

(sup fn )(x) := sup fn (x) (inf fn )(x) := inf fn (x)


n n n n

(lim fn )(x) := lim fn (x) (lim fn )(x) := lim fn (x).


n n n n

For complex-valued functions we define Re f , Im f , f and |f | by

(Re f )(x) = Re f (x), (Im f )(x) = Im f (x), f (x) = f (x), and |f |(x) = |f (x)|.

Cardinality
Two sets A and B are said to have the same cardinality if there exists a bijection
from A to B. A set A is finite if either A is the empty set or A has the same cardinality
as {1, 2, . . . , n} for some positive integer n. In the latter case, the members of A may be
labeled with the numbers 1, 2, . . . , n so that A may be written {a1 , a2 , . . . , an }. A set A is
countably infinite if it has the same cardinality as the set N of positive integers, in which
case we may write A = {a1 , a2 , . . .}. A set is countable if it is either finite or countably
infinite; otherwise, it is said to be uncountable. The set of all integers is countably infinite,
as is the set of rational numbers. The set of all real numbers is uncountable, as is any
(nondegenerate) interval of real numbers. The cardinality of R is denoted by c and that of
N by ℵ0 . For a detailed discussion of cardinality, the reader is referred to [23].
Preliminaries 7

0.2 Algebraic Structures


Semigroups and Groups
A semigroup is a nonempty set G together with an associative binary operation
(s, t) → st : G × G → G, that is, r(st) = (rs)t for all r, s, t ∈ G. A semigroup G is
commutative or abelian if st = ts for all s, t ∈ G. An identity element for G is a
member e such that se = es = s for all s ∈ G. Identities, if they exist, are unique. Indeed, if
also se0 = e0 s = s for all s, then in particular e0 = ee0 = e. An element s ∈ G has an inverse
t if st = ts = e. The inverse of s is unique: if also sr = rs = e then r = re = rst = et = t.
The inverse of s, if it exists, is denoted by s−1 . A group is a semigroup with identity such
that every element has an inverse. Semigroups and groups may be written additively, so that
s + t replaces st, 0 replaces e, and −s replaces s−1 .
A subsemigroup of a semigroup G is a nonempty subset H that is closed under multi-
plication. A subgroup of a group G is a subsemigroup that contains the identity of G and
is closed under the inverse operation (and hence is a group in its own right).
If G is a semigroup and A, B ⊆ G, t ∈ G, we set

At = {at : a ∈ A}, tA = {ta : a ∈ A}, AB = {ab : a ∈ A, b ∈ B}, A−1 = {a−1 : a ∈ A},

the last notation assuming that G is a group. The notation is modified in the obvious way if
multiplication is written additively.
The sets R and C are groups under addition and are semigroups under multiplication.
Removing zero in each case yields a group under multiplication. The interval [1, ∞) is a semi-
group under both addition and multiplication. The disk D and its closure are subsemigroups
of C under multiplication, and T is a group. These examples are obviously all commutative.
The collection of nonsingular n × n matrices over K (n ≥ 2) is a noncommutative group
under matrix multiplication. The subset of matrices with determinant 1 is a subgroup.
It G and G0 are semigroups, then a function ϕ : G → G0 satisfying

ϕ(st) = ϕ(s)ϕ(t), s, t ∈ G,

is called a homomorphism. A bijective homomorphism of semigroups is called an iso-


morphism. Note that in this case, the inverse map is automatically a homomorphism. A
homomorphism of groups necessarily maps the identity onto the identity and inverses onto
inverses. For example, the determinant function is a homomorphism from the semigroup of
n × n matrices over K under multiplication to the multiplicative semigroup K. The mapping
x → ex is an isomorphism from the additive group of real numbers onto the multiplicative
group of positive reals.
If G is a group and H is a subgroup, then x ∼ y iff x−1 y ∈ H defines an equivalence
relation on G with equivalence classes xH, the left coset of H with respect to x. The
collection G/H of all left cosets is called the left coset space. If H is normal, that is,
xH = Hx for all x, then G/H is a group under the operation xH · yH = xyH, and the
quotient map Q : G → G/H is a homomorphism. Conversely, if ϕ : G → G0 is a surjective
homomorphism, then the set

ker ϕ := {x ∈ G : ϕ(x) = e0 },

called kernel of ϕ, is a normal subgroup of G and the mapping Q(x) 7→ ϕ(x) is an


isomorphism of G/ ker ϕ onto G0 .
8 Principles of Analysis

Linear Spaces
A linear space (or vector space) over K is an additively written abelian group V with
identity 0 and an operation scalar multiplication K × V → V, (s, v) → sv, satisfying

(st)v = s(tv), 1 v = v, s(v + w) = sv + sw, and (s + t)v = sv + tv

for all s, t ∈ K and v, w ∈ V. It follows that 0 v = 0 for all v ∈ V. Linear spaces are
always taken over K, whether or not explicitly mentioned. Euclidean space is a familiar
example of a linear space. Numerous additional examples appear throughout the text. It is
assumed that the reader has some familiarity with the basic theory of finite dimensional
vector spaces.
A subspace of a linear space V is a nonempty subset W that is closed under the
operations of addition and scalar multiplication. If A ⊆ V, then the span of A is the
subspace of V consisting of all linear combinations of members of A:
X
m 
span A := cj aj : aj ∈ A, cj ∈ K, m ∈ N .
j=1

If A and B are nonempty subsets of V and c ∈ K, we define

A + B := {x + y : x ∈ A, y ∈ B} and cA := {cx : x ∈ A}.

A subset C of V is said to be convex if

tC + (1 − t)C ⊆ C for all 0 ≤ t ≤ 1,

and balanced if
cC ⊆ C, for all c ∈ K with |c| ≤ 1.
A subspace of a linear space is obviously convex and balanced. The line segment from a
to b, defined by
[a : b] = {(1 − t)a + tb : 0 ≤ t ≤ 1},
is convex but generally not balanced. The disk D is both convex and balanced in the real
linear space R2 , while T is neither balanced nor convex.
The convex hull co A of a subset A of a linear space V is the intersection of all convex
subsets of V containing A. It is the smallest convex set (in the sense of containment)
containing A. Similarly, the convex balanced hull cobal A of A is the intersection of all
convex balanced subsets of V containing A. Here are important alternate descriptions of
these sets.
0.2.1 Proposition. Let A be a subset of a linear space V. Then
Pn
(a) co A consists of all sums of the form
Pn j=1 tj xj , where n ∈ N, xj ∈ A, tj ≥ 0, and
j=1 tj = 1.
Pn
(b) cobal A consists of all sums of the form
Pn j=1 cj xj , where n ∈ N, xj ∈ A, cj ∈ K,
and j=1 |cj | ≤ 1.
Proof. Let C denote the collection of all sums in (a). One easily checks that C is convex.
Since C ⊇ A, we have C ⊇ co A. For the reverse inclusion, let D be any convex set containing
A. By induction, D ⊇ C. Since co A is the intersection of all such sets D, co A ⊇ C. This
proves (a). The proof of (b) is similar.
Preliminaries 9

The sum in part (a) of the proposition is called a convex combination and the sum in
(b) an absolutely convex combination.
0.2.2 Theorem. Every linearly independent set A in a vector space may be extended to a
basis. Thus every (nontrivial) vector space has a basis.
Proof. Partially order the collection of linearly independent sets containing A by inclusion
and note that the union of a chain of such sets is linearly independent. By Zorn’s lemma,
there exists a maximal linearly independent set, which is necessarily a basis.
A basis for V is also called a Hamel basis to distinguish it from other types of bases,
for example Schauder bases.

Linear Transformations
Let V and W be linear spaces over K. A linear transformation from V into W is a
function T : V → W such that

T (x + y) = T x + T y and T (cx) = cT x for all x, y ∈ V and c ∈ K. 1

The collection of all linear transformations from V to W is a linear space under pointwise
addition and scalar multiplication

(S + T )(x) := Sx + T x, (cT )(x) = c(T x), x ∈ V, c ∈ K.

The kernel of a linear transformation T : V → W is the subspace

ker T = {x ∈ V : T x = 0}.

By linearity, T is injective iff ker T = {0}. If T is a bijection, then T −1 : W → V is easily


seen to be linear. Such a mapping is called an isomorphism of linear spaces.
A linear transformation f : V → K is called a linear functional. The following
proposition will be useful later.
0.2.3 Proposition. Let f, f1 , . . . , fTn linear functionals on a linear space V. Then f is a
n
linear combination of f1 , . . . , fn iff j=1 ker fj ⊆ ker f .
Proof. The necessity is clear. For the sufficiency, define F (v) = (f1 (v), . . . , fn (v)), v ∈ V.
Then F maps V onto a subspace X of Kn . Define a linear functional g on X so that g ◦F = f .
The hypothesis implies that g is well-defined. Extend g linearly in a natural way to Kn by
expanding a basis for X to a basis for Kn . Then g must be of the form g(x) = α · x, hence
f (v) = α · F (v), which is a linear combination of f1 , . . . , fn .

Let C be a convex subset V. A function f : C → W is said to be affine if



f (1 − t)x + ty = (1 − t)f (x) + tf (y), ∀ x, y ∈ C and 0 < t < 1.

For example, the restriction of a linear transformation to a convex set is affine. The function
x 7→ a · x + b on a convex subset of Rd is affine.
1 The notation T x for T (x) is standard for linear transformations.
10 Principles of Analysis

Quotient Linear Spaces


The notion of cosets in a group applies to linear spaces as follows: Let U be a subspace
of a linear space V. The relation x ∼ y iff x − y ∈ U is an equivalence relation on U.
The quotient space V/U is the vector space of all sets of the form x + U, x ∈ V, where
addition and scalar multiplication are defined by

(x + U) + (y + U) = (x + y) + U and c(x + U) = cx + U.

The quotient map x 7→ x + U is easily seen to be a linear transformation. If T : V → W is


a linear transformation, then the mapping T v 7→ v + ker T is an isomorphism from ran T
onto V/ ker T .

Algebras
An algebra (over K) is a linear space A with an associative multiplication (x, y) → xy
that satisfies

z(x + y) = zx + zy, (x + y)z = xz + yz, and c(xy) = (cx)y = x(cy)

for all x, y, z ∈ A and c ∈ K. An algebra A is commutative if xy = yx for all x, y ∈ A.


An identity of A is an element e such that ex = xe = x for all x ∈ A. An identity, if
it exists, is unique. An algebra with identity is said to be unital. For example, the vector
space of n × n matrices over K is an algebra with respect to matrix multiplication. More
generally, the space of linear transformations from a vector space into itself is an algebra
with respect to function composition. Additional examples appear throughout the text.
A homomorphism of algebras A and B is a linear map ϕ : A → B such that
ϕ(xy) = ϕ(x)ϕ(y). A bijective homomorphism is called an algebra isomorphism. A
subalgebra of an algebra A is a linear subspace of A that is closed under multiplication. An
ideal I of A is a subalgebra with the stronger property x ∈ A and y ∈ I ⇒ xy, yx ∈ A.
It then follows that the quotient linear space A/I is an algebra under multiplication
(x+I)(y+I) = xy+I, and the quotient map is an algebra homomorphism. For example,
in the algebra A of all functions f : R → R under pointwise addition and multiplication,

the subset {f ∈ A : f (x) = 0 ∀ x ∈ [0, 1]} is an ideal, and the map f + I → f [0,1] is an
algebra isomorphism from A/I onto the algebra of all real functions on [0, 1].

0.3 Metric Spaces


A metric on a nonempty set X is a function d : X × X → R with the following properties:
(a) d(x, y) ≥ 0 (nonnegativity).

(b) d(x, y) = 0 iff x = y (coincidence).

(c) d(x, y) = d(y, x) (symmetry).

(d) d(x, y) ≤ d(x, z) + d(y, z) (triangle inequality).

The ordered pair (X, d), as well as the set X, is called a metric space. A nonempty subset
Y of X with the metric d Y ×Y is called a subspace of X. A metric has the property

|d(x, y) − d(u, v)| ≤ d(x, u) + d(y, v) (0.1)


Preliminaries 11

as may be seen from the triangle inequality d(x, y) ≤ d(x, u) + d(u, v) + d(v, y) and its
counterpart.
The real number system R is a metric space under the usual metric d(x, y) = |x − y|.
More generally, the set Rd is a metric space under the Euclidean metric
X
d 1/2
2
d(x, y) = |x − y| = (xj − yj ) .
j=1

For another example, let X be a nonempty set and define d(x, y) = 1 if x 6= y and d(x, x) = 0.
Then d is a metric, called the discrete metric on X.

Open and Closed Sets


Let (X, d) be a metric space. For x ∈ X and r > 0, the sets

Br (x) = B(x, r) := {y ∈ X : d(x, y) < r} and Cr (x) = C(x, r) := {y ∈ X : d(x, y) ≤ r}

are called, respectively, the open and closed balls with center x and radius r. The set

Sr (x) = S(x, r) := {y ∈ X : d(x, y) = r} = Cr (x) \ Br (x)

is called the sphere with center x and radius r. For example, the open (closed) balls in
R with the usual metric are the bounded open (closed) intervals. The open (closed) balls in
Euclidean space R2 are open (closed) disks and the spheres are circles. The open and closed
balls in a discrete metric space X are the sets X and {x}; the spheres are X \ {x} and the
empty set.
A subset U of X is said to be open if either U = ∅ or else for each x ∈ U there exists
an r > 0 such that Br (x) ⊆ U . A subset of X is closed if its complement is open. An
application of the triangle inequality shows that an open ball is open. Indeed, if y ∈ Bε (x),
then Bδ (y) ⊆ Bε (x) for δ = ε − d(x, y), which shows that Bε (x) is a union of open balls
Bδ (y). A similar argument shows that a closed ball is closed.
A neighborhood of a point a in X is any set containing an open set containing a. As
we shall see, certain concepts such as continuity and convergence are conveniently phrased
in terms of neighborhoods.
0.3.1 Proposition. Open and closed sets have the following properties:
S
(a) If U is a collection of open sets, then U is open.
(b) If V1 , . . . , Vn are open, then V := V1 ∩ · · · ∩ Vn is open.
T
(c) If C is a family of closed sets, then C is closed.

(d) If C1 , . . . , Cn are closed, then C1 ∪ · · · ∪ Cn is closed.


S
Proof. (a) Let x ∈ U. S
Then x ∈ U for some U ∈ U, and since U is open there exists r > 0
such that Br (x) ⊆ U ⊆ U.
(b) Let x ∈ V . For each j there exists rj > 0 such that Brj (x) ⊆ Vj . Then Br (x) ⊆ V ,
where r := min{r1 , . . . , rn }. Therefore, V is a union of open balls and so is open.
Parts (c) and (d) follow from (a) and (b) and DeMorgan’s laws.
12 Principles of Analysis

Interior, Closure, and Boundary


Let E be a subset of a metric space X. The interior int(E) = intX (E) of E is the union
of all open subsets of X contained in E. The closure cl(E) = clX (E) of E in X is the
intersection of all closed subsets of X containing E. The boundary bd(E) = bdX (E) of E
is the set cl(E) \ int(E). Thus the interior of E is the largest open set contained in E and
the closure of E is the smallest closed set containing E. A point x ∈ X is in the boundary
of E iff every neighborhood of x meets both E and E c .
If cl E = X, then E said to be dense in X. A metric space with a countable dense subset
is said to be separable. For example, Rd is separable with respect to the Euclidean metric,
as may be seen by considering Qd .

Sequential Convergence. Completeness


A sequence (xn ) in a metric space (X, d) is said to converge to x ∈ X if limn d(xn , x) = 0.
In this case we write xn → x or limn xn = x. In terms of neighborhoods, xn → x iff xn is
eventually in each neighborhood N of x, that is, xn ∈ N for all sufficiently large n. The
triangle inequality shows that the limit of a sequence (xn ) in X, if it exists, is unique.
A cluster point of a sequence in X is the limit of a convergent subsequence. Thus x is
a cluster point of (xn ) iff (xn ) is frequently in every neighborhood N of x, that is, xn ∈ N
for infinitely many n.
A sequence (xn ) is said to be Cauchy if limm,n→∞ d(xm , xn ) = 0. If every Cauchy
sequence in (X, d) converges to a member of X, then (X, d) is said to be complete. For
example, Euclidean space is complete. It may be shown that every metric space X has a
completion, that is, a complete metric space that contains a dense copy of X. (See [2]).
It follows from (0.1) that xn → x and yn → y ⇒ d(xn , yn ) → d(x, y), which expresses the
continuity of the metric.
The following result is sometimes useful in establishing completeness of a metric space.
0.3.2 Proposition. A Cauchy sequence with a convergent subsequence converges.
Proof. Let (xn ) be a Cauchy sequence and xnk → x. Given ε > 0, by the Cauchy property
the inequality d(xn , xnk ) < ε holds for all sufficiently large n and k. Fixing such an n and
letting k → ∞ yields d(xn , x) ≤ ε, by continuity of the metric. Therefore, xn → x.
0.3.3 Proposition. A subset C of X is closed iff C contains the limit of each convergent
sequence in C.
Proof. Assume that C is closed and let (xn ) be a sequence in C with xn → x. If x 6∈ C,
then, because C c is open, there exists ε > 0 such that Bε (x) ∩ C = ∅. But this is impossible,
since xn is eventually in Bε (x) ⊆ C c . Therefore, x ∈ C.
Now suppose C is not closed. Then C c is not open, hence there exists x ∈ C c such that
B1/n (x) ∩ C =6 ∅, for every n ∈ N. Choosing a point xn in this intersection we obtain a
sequence (xn ) in C that converges to a member of C c .
0.3.4 Corollary. Let Y be a subspace of X.
(a) If X is complete and Y is closed, then Y is complete.

(b) If Y is complete, then Y is closed.

Proof. (a) Let (yn ) be a Cauchy sequence in Y . Since X is complete, there exists x ∈ X
such that yn → x. Since Y is closed, x ∈ Y . Therefore, Y is complete.
(b) Let (yn ) be a sequence in Y such that yn → x ∈ X. Then (yn ) is Cauchy and hence
converges to some y ∈ Y . Since limits are unique, x = y ∈ Y . Therefore, Y is closed.
Preliminaries 13

0.3.5 Proposition. Let A ⊆ X. Then x ∈ cl(A) iff there exists a sequence (an ) in A such
that an → x.
Proof. Let C be the set of all limits of convergent sequences in A including constant sequences,
so A ⊆ C ⊆ cl(A), the second inclusion by 0.3.3. We show that C is closed, proving the
assertion.
Suppose C is not closed. Then C c is not open, hence there exists y ∈ C c and for each n a
point yn ∈ B1/n (y) ∩ C. Since each yn is the limit of a sequence in A, there exists an ∈ A
such that d(yn , an ) < 1/n. By the triangle inequality, d(an , y) < 2/n, hence an → y. But
then y ∈ C, a contradiction.

Continuity
Let (X, d) and (Y, ρ) be metric spaces. A function f : X → Y is said to be continuous

at a ∈ X if for each ε > 0 there exists a δ > 0 such that d(x, a) < δ ⇒ ρ f (x), f (a) < ε.
In terms of open balls,  
f Bδ (a) ⊆ Bε f (a) . (0.2)
If E ⊆ X and f is continuous at each point of E, then f is said to be continuous on E. If f
is continuous at each member of X, then f is said to be continuous. A homeomorphism
from X to Y is a bijection f : X → Y such that both f and f −1 are continuous.
The following proposition describes a useful characterization of continuity in terms of
neighborhoods. It will have implications later in the formulation of the definition of continuity
in the more general setting of topological spaces.
0.3.6 Proposition. A function f : X → Y is continuous at a iff for each neighborhood M
of f (a) there exists a neighborhood N of a such that f (N ) ⊆ M .
Proof. Let f be continuous at a and let M be a neighborhood of f (a). Choose ε > 0 such
that Bε (f (a)) ⊆ M and choose δ > 0 as in 0.2. Then Bδ (a) is the required neighborhood N .
Conversely, assume the neighborhood property holds and let ε > 0. Choose a neighborhood
N of a such that f (N ) ⊆ Bε (f (a)) and choose δ so that Bδ (a) ⊆ N . Then (0.2) holds.

It is clear from the proof that the neighborhoods M and N in 0.3.6 may be taken to be
open.
0.3.7 Proposition. Let f : (X, d) → (Y ρ) and a ∈ X. Then f is continuous at a iff
f (an ) → f (a) for any sequence (an ) in X with an → a.
Proof. If f is continuous at a, then for any neighborhood M of f (a) there exists a neighbor-
hood N of a such that f (N ) ⊆ M . If an → a, then an ∈ N for all sufficiently large n, and
for such n, f (an ) ∈ M . Therefore, f (an ) → f (a).
Conversely, if f is not continuous ata, then for some ε > 0 and each n ∈ N there exists
an an ∈ B1/n (a) with f (an ) 6∈ Bε f (a) . Thus the sequential property fails.
0.3.8 Theorem. Let f : (X, d) → (Y ρ). The following statements are equivalent:

(a) f is continuous.

(b) f −1 (V ) is open in X for each open subset V of Y .

(c) f −1 (C) is closed in X for each closed subset C of Y .



(d) f cl A ⊆ cl(f (A)) for each subset A of X.
14 Principles of Analysis

Proof. (a) ⇒ (b): Let V ⊆ Y be open. If x ∈ f −1 (V ), then V is a neighborhood of f (x),


hence there exists an open neighborhood N of x such that f (N ) ⊆ V and so N ⊆ f −1 (V ).
Therefore, f −1 (V ) is a union of open sets hence is open.
c 
(b) ⇒ (c): This follows from f −1 (C) = f −1 C c .

(c) ⇒ (d): Let C = cl f (A). Then f −1 (C) is closed and f −1 (C) ⊇ f −1 f (A) ⊇ A, hence
f −1 (C) ⊇ cl A and so C ⊇ f (f −1 (C)) ⊇ f (cl A).
(d) ⇒ (a): If f is not continuous at x, then there exists ε > 0 and a sequence (xn )
in X such that xn → x and d f (xn ), f (x) ≥ ε for all n. But this is impossible, since
f (x) ∈ f cl{x1 , x2 , . . .} ⊆ cl f {x1 , x2 , . . .} .
A function f : (X, d) → (Y, ρ) is said to be uniformly continuous on a set E ⊆ X if,
given ε > 0, there exists δ > 0 such that u, v ∈ E and d(u, v) < δ ⇒ ρ(f (u), f (v)) < ε. The
proof of the following theorem is similar to that of 0.3.7
0.3.9 Proposition. A function f : (X, d) → (Y, ρ) is uniformly continuous on E ⊆ X iff
ρ f (un ), f (vn ) → 0 for any pair of sequences (un ) and (vn ) in E with d(un , vn ) → 0.
Let A and B nonempty subsets of X. The distance between A and B is defined as

d(A, B) := inf{d(x, y) : x ∈ A, y ∈ B}.

To simplify notation we set


d(x, A) := d({x}, A).
It follows easily from the triangle inequality that |d(x, A) − d(y, A)| ≤ d(x, y), which shows
that d(x, A) is continuous, indeed uniformly continuous. A simple consequence of this is
Urysohn’s lemma for metric spaces:
0.3.10 Lemma. Let (X, d) be a metric space. If A and B are disjoint, nonempty, closed
sets in X, then the function
d(x, A)
f (x) =
d(x, A) + d(x, B)
is well-defined and continuous, 0 ≤ f ≤ 1, A = {x : f (x) = 0}, and B = {x : f (x) = 1}.

Category
The diameter of a nonempty subset E of a metric space (X, d) is defined by

d(E) = sup{d(x, y) : x, y ∈ E}.

Note that the continuity of the metric implies that d(E) = d(cl(E)).
Here is an important characterization of completeness of a metric space in terms of
diameters.
0.3.11 Cantor Intersection Theorem. A metric space X is complete iff the intersection
of any decreasing sequence of nonempty closed sets Cn in X with d(Cn ) → 0 consists of a
single point.
Proof. Assume X is complete. For each n choose xn ∈ Cn . Since Cn ↓ and d(Cn ) → 0, (xn )
is Cauchy. Let xn → x. Since xm ∈ Cn for T all m ≥ n and Cn is closed, letting m → ∞ we
see that x ∈ Cn for all n, that is, x ∈ C := n Cn . Since d(C) ≤ d(Cn ) → 0, C = {x}.
Conversely, let X have the stated intersection property and let (xn ) be a Cauchy sequence
in X. Set CT n := cl{xk : k ≥ n}. By the Cauchy property, d(Cn ) → 0. Since Cn ↓, by our
hypothesis n Cn contains a point x. It follows easily that xnk → x for some subsequence
(xnk ). By 0.3.2, xn → x.
Preliminaries 15

The following consequence of Cantor’s theorem is a key step in the proofs of several
important results in analysis. In §0.12 we give a version of the theorem for locally compact
spaces.
0.3.12 Baire Category Theorem. Let X be a complete metric space. If (Xn ) a sequence
of closed sets with union X, then int Xn 6= ∅ for some n.
Proof. Suppose for a contradiction that int Xn = ∅ for all n. Choose an open ball B(x0 , r0 )
with r0 = 1. Since int X1 = ∅, there exists x1 ∈ B(x0 , r0 ) \ X1 , and since B(x0 , r0 ) \ X1
is open, there exists r1 ∈ (0, 1/2) such that C(x1 , r1 ) ⊆ B(x0 , r0 ) \ X1 . Since int X2 = ∅
and B(x1 , r1 ) \ X2 is open there exists x2 ∈ B(x1 , r1 ) \ X2 and r2 ∈ (0, 1/3) such that
C(x2 , r2 ) ⊆ B(x1 , r1 ) \ X2 . In this way we construct sequences (xn ) in X and (rn ) in R such
that
C(xn , rn ) ⊆ B(xn−1 , rn−1 ) \ Xn , 0 < rn−1 ≤ 1/n, n ≥ 1.
Since the closed balls are decreasing and the diameters are tending to zero, their intersection
C is nonempty (0.3.11). But this is impossible because C ∩ Xn = ∅ for all n.

0.4 Normed Linear Spaces


Norms and Seminorms
A norm on a linear space X over K is a function k · k : X → R with the following
properties:
(a) x 6= 0 implies kxk =
6 0 (positivity).
(b) kcxk = |c| kxk (absolute homogeneity).
(c) kx + yk ≤ kxk + kyk (triangle inequality).
If k · k satisfies (b) and (c) but not necessarily (a), then k · k is called a seminorm. Note
that by (b), k0k = 0 and kxk = k−xk, hence by (c), 0 = kx − xk ≤ kxk + k−xk = 2 kxk.
Therefore, seminorms and norms are nonnegative.
Property (d) has the following useful extensions:
n n
X X

xj ≤ kxj k and kxk − kyk ≤ kx − yk . (0.3)

j=1 j=1

Indeed, the first inequality may be established by a simple induction argument, and the
second by applying the triangle inequality to kxk = kx − y + yk and kyk = ky − x + xk.
If k·k is a norm on X, then the pair (X, k · k) is called a normed space. It is easy to
check that the mapping (x, y) 7→ kx − yk is a metric on X, making the entire machinery
of metric spaces available. Unless stated otherwise, convergence and continuity in a normed
space are taken relative to this metric.

Banach Spaces
A normed space (X, k·k) that is complete in the metric (x, y) → kx − yk is called a
Banach space. A familiar example is Euclidean space Kd . Many other examples appear
throughout the text. For now we content ourselves with the following.
16 Principles of Analysis

0.4.1 Example. (The space of bounded functions). Let X be a nonempty set and let B(X)
denote the vector space (under pointwise addition and scalar multiplication) of all bounded
functions f : X → K. The supremum norm or uniform norm on B(X) is defined by

kf k∞ = sup |f (x)| : x ∈ X .

That k·k∞ is a norm is easily established using familiar properties of absolute value. For
example, the triangle inequality follows by taking the supremum over X in

|(f + g)(x)| ≤ |f (x)| + |g(x)| ≤ kf k∞ + kgk∞ .

To verify completeness, let (fn ) be a Cauchy sequence in B(X) and ε > 0. Choose N such
that kfn − fm k < ε for all m, n ≥ N . For such indices and each x ∈ X we then have

|fn (x) − fm (x)| < ε, (†)

which shows that (fn (x)) is a Cauchy sequence in K. Since K is complete, fn (x) → f (x) ∈ K.
Fixing n ≥ N in (†) and letting m → ∞ yields |fn (x) − f (x)| ≤ ε for all x ∈ X and n ≥ N .
Therefore, f = f − fn + fn ∈ B(X) and kfn − f k∞ ≤ ε. ♦

Completion of a Normed Space


The following theorem asserts that every normed space may be realized as a dense
subspace of a Banach space. A proof may be given at this point based on the fact that X has
a metric space completion; however, we prefer to wait until Chapter 8 when the machinery
for simpler proof will be available. (See 8.5.9.)
0.4.2 Theorem. Let (X, k·k) be a normed space. Then there exists a Banach space X and
a linear transformation from X onto a dense subspace of X that preserves norm.

Infinite Series in Normed Spaces


P∞
An infinite series n=1 xn P of members of a normed space X is said to converge if
n
the
P∞ sequence of partial sums k=1 xk converges
P∞in X. By analogy with numerical series,
n=1 x n is said to converge absolutely if n=1 kxn k < ∞. The following result is
sometimes useful in establishing completeness of a normed space.
0.4.3 Proposition. Let X be a normed vector space. Then X is complete (and hence is a
Banach space) iff every absolutely convergent series converges.
P∞
Proof. Assume that X is complete and n=1 kxn k < ∞. Then for m > n ≥ 1,
X ∞
m n
X m
X X
x − x ≤ kx k ≤ kxj k.
j j j
j=1 j=1 j=n+1 j=n+1
Pn
Since the right side tends to 0 as n → ∞, the sequence of partial sums j=1 xj is Cauchy
P∞
and so converges, that is, the series n=1 xn converges in X.
Conversely, assume that every absolutely convergent series converges. Let (xn ) be a
Cauchy sequence in X. One may then obtain a strictly increasing sequence (nk ) in N such
that kxm − xn kP< 2−k for all m, n ≥ nk . In particular, kxnk+1 − xnk k < 2−k , which implies
that the series k kxnk+1 − xnk k converges. By hypothesis, the sequence of partial sums
Pk
j=1 (xnj+1 − xnj ) converges. But these sums collapse to xnk+1 − xn1 . Therefore, (xnk )
is a convergent subsequence of (xn ) and so the latter must converge (0.3.2).
Preliminaries 17

Unordered Sums in Normed Spaces


Let {xi : i ∈ I} be a family of vectors in a normed vector space X and let x ∈ X. We
say that {xi : i ∈ I} converges unconditionally to x if for each ε > 0 there exists a
finite set Fε ⊆ I such that
X

xi − x < ε for all finite F ⊆ I with F ⊇ Fε . (0.4)
i∈F
P
We then call x the unordered sum of {xi : i ∈ I} and write x = i∈I xi . In this case,
we also say that {xi : i ∈ I} is summable to x.

0.4.4 Proposition. If the unordered sums on the right in the following equality exist, then
the unordered sum on the left exists and
X X X
(axi + byi ) = a xi + b yi .
i∈I i∈I i∈I
P P
Proof. Let x = i∈I xi and y = i∈I yi . Given ε > 0, choose finite Fε , Gε ⊆ I such that
X X

xi − x < ε/2 and xi − x < ε/2
i∈F i∈G

for all finite F ⊇ Fε and G ⊇ Gε . Then for finite F ⊇ Fε ∪ Gε , by the extended triangle
inequality (0.3) we have
X X X

(xi + yi ) − (x + y) ≤ kxi − xk + kyi − yk < ε.
i∈F i∈F i∈F
P P P
This
P shows that P i∈I (xi + yi ) = i∈I xi + i∈I yi . A even simpler argument shows that
i∈I axi = a i∈I xi .

0.4.5 Proposition. If the family {xi : i ∈ I} is unconditionally convergent to x, then all


but countably many members of the family P∞are zero. If the nonzero members are arranged in
any order, say xi1 , xi2 , . . ., then x = n=1 xin .
Proof. For each n ∈ N, let Fn be a finite set of indices such that
X

xi − x < 1/n for all finite F ⊇ Fn .
i∈F

Thus if j 6∈ Fn and F = {j} ∪ Fn , then


X X X X

kxj k = xi − xi ≤ xi − x + xi − x < 2/n.
i∈F i∈Fn i∈F i∈Fn
S
Thus if j 6∈ n Fn , then the preceding inequality holds for all n and so xj = 0.
For the last assertion of the proposition, choose a finite set Fε such that (0.4) holds. We
may assume that Fε contains Pnno index j for which xj = 0. Then, for all sufficiently large n,
Fε ⊆ {i1 , . . . , in }, hence j=1 xij − x < ε.
The next result follows easily from the approximation property of suprema.
0.4.6PProposition. A family {ti : i ∈ I} ⊆ [0, ∞) is unconditionally convergent to
supF i∈F ti , provided that the supremum, which is taken over all finite F ⊆ I, is finite.
18 Principles of Analysis

0.4.7P Theorem. Let X be a Banach space and let {xi : i ∈ I} ⊆ X such that s :=
supF i∈F kxi k < ∞, where the supremum is taken over all finite F ⊆ I. Then the family
{kxi k : i ∈ I} converges unconditionally to s, {xi : i ∈ I} converges unconditionally to
some x ∈ X, and kxk ≤ s, that is,
X X

xi
≤ kxi k .
i∈I i∈I
P P∞ P∞
Proof. By 0.4.5 and 0.4.6, s = i∈I kxi k = k=1 kxik k . Thus the series k=1 xik is
absolutely convergent, so converges to some x ∈ X (0.4.3). Given ε > 0, choose m such that
X
n
xik − x
< ε for all n ≥ m.
k=1

then Fε := {i1 , . . . , im } satisfies 0.4, hence the convergence is unconditional.

Bounded Linear Transformations


0.4.8 Proposition. Let X and Y be normed spaces and T : X → Y linear. Then T is
continuous iff there exists a constant M ≥ 0 such that
kT xk ≤ M kxk for all x ∈ X.
Proof. If the condition holds, then continuity (in fact uniform continuity) follows from the
inequality kT x − T yk = kT (x − y)k ≤ M kx − yk.
Conversely, if T is continuous there exists a δ > 0 such that kyk ≤ δ ⇒ kT yk ≤ 1. Thus
for any x 6= 0,  
δ δ

kT xk = T x
kxk kxk ≤1
and so kT xk ≤ (1/δ)kxk.
The proposition implies that a linear transformation T is continuous iff it is bounded on
bounded sets, that is, on sets of the form {x ∈ X : kxk ≤ r}. For this reason, a continuous
linear transformation is said to be bounded, reflecting the fact that T maps bounded sets
onto bounded sets. Note that by the above proof, T is bounded iff it is continuous at zero.
A topological isomorphism from a normed space X onto a normed space Y is a
linear bijection T : X → Y that is also a homeomorphism. A linear map T : X → Y
with the property kT xk = kxk for all x ∈ X is called an isometry. If also T (X) = Y,
then T is called an isometric isomorphism onto Y. For example, if X and Y are sets
and ϕ : X → Y is any function, then the adjoint map ϕ∗ : B(Y ) → B(X) is a bounded
linear transformation, an isometry if ϕ is surjective, and an isometric isomorphism if ϕ is a
bijection.

Banach Algebras
A normed algebra is an algebra A over C with a norm that satisfies
kxyk ≤ kxk kyk , x, y ∈ A.
A complete normed algebra is called a Banach algebra. These structures occur in many
important settings, particularly in the theory of operators on Hilbert spaces. The Banach
space B(X) of all bounded functions under pointwise multiplication is a simple example of
a commutative unital Banach algebra. Other examples appear throughout the text. General
commutative Banach algebras are discussed in detail in Chapter 13.
Preliminaries 19

0.5 Topological Spaces


Open and Closed Sets
A topology on a set X is a collection T of subsets of X with the following properties:

(a) X, ∅ ∈ T,
[
(b) U ⊆ T ⇒ U ∈ T, (0.5)
(c) U, V ∈ T ⇒ U ∩ V ∈ T.

A member of T is said to be T-open, or simply open if there is no possibility of confusion.


A set X with a topology T is called a topological space and is denoted by XT . In the
absence of ambiguity we omit the subscript T . We shall occasionally also use the notation O
to denote the collection of open sets of a topology.
A metric space is an important example of a topological space. The open sets here are
unions of open balls. The topology of a metric space X called the metric topology of X.
In particular, Rd with the Euclidean metric is a topological space, its topology called the
usual topology of Rd . Other examples of topological spaces appear throughout the text.
As in the case of metric spaces, a subset of a topological space is said to be closed if it is
the complement of an open set. Proposition 0.3.1 then clearly holds for topological spaces.
If T 1 and T 2 are topologies and T 1 ⊆ T 2 , then T 1 is said to be weaker than T 2 and
T 2 stronger than T 1 . In this case we also write T 1 ≤ T 2 . Every nonempty set X has
a weakest topology, the indiscrete topology, whose only open sets are ∅ and X, and a
strongest topology, the discrete topology, for which every subset is open. The latter is
the metric topology generated by the discrete metric.
T If X is a set and {T i : i ∈ I} is a family of topologies on X, then the intersection
{T i : i ∈ I} is easily seen to be a topology on X. In particular, if S is a family of subsets
of X, then intersection T(S) of all topologies containing S (the discrete topology being one
such) is well-defined and is the weakest topology relative to which all members of S are
open. T(S) is called the topology generated by S. For example, the usual topology of R
is generated by the finite open intervals with rational endpoints.
A topological space XT is said to be Hausdorff if distinct points in X can be separated
by open sets, that is, if for each pair of distinct points x, y ∈ X, there exist disjoint open
sets U and V with x ∈ U and y ∈ V . For example, a metric space is a Hausdorff space, since
x and y may be separated by the open balls Br (x) and Br (y), where 0 < r ≤ d(x, y)/2.
A base for a topology is a collection of open sets U such that every open set is a union
of members of U. A topological space is said to be second countable if it has a countable
base. For example, Rd is second countable, as may be seen by considering the open balls
B1/n (x) where x has rational coordinates.
The interior, closure, and boundary of a subset of a topological space are defined
exactly as in the case of a metric space, as are the notions of dense subspace and
separability. 2
0.5.1 Proposition. A second countable topological space X is separable. A separable metric
space is second countable.
2 Generally speaking, a metric concept phrased entirely in terms of open sets typically has a valid

topological analog.
20 Principles of Analysis

Proof. Let (Un ) be a countable base and xn ∈ Un . For any open neighborhood U of x, there
exists n such that Un ⊆ U , hence xn ∈ U . Therefore, (xn ) is dense in X.
Now let X be a metric space with countable dense set {x1 , x2 , . . .}. The collection
B := {B1/m (xn ) : m, n ∈ N, } is then countable. We show that every nonempty open set U
is a union of members of B. Let x ∈ U and choose m such that B2/m (x) ⊆ U . Next, choose
xn ∈ B1/m (x). Then x ∈ B1/m (xn ) ⊆ B2/m (x) ⊆ U . Therefore, U is a union of the balls
B1/m (xn ).

Neighborhood Systems
The notion of neighborhood of a point x in a topological space X is defined as in the
case of a metric space, namely as a superset of a open set containing x. The collection of
all neighborhoods of x is called the neighborhood system at x and is denoted by N(x).
Neighborhood systems clearly have the following properties:

(a) X ∈ N(x) ∀ x ∈ X.
(b) N ∈ N(x) ⇒ x ∈ N.
(c) N ∈ N(x) and M ⊇ N ⇒ M ∈ N(x). (0.6)
(d) N1 , N2 ∈ N(x) ⇒ N1 ∩ N2 ∈ N(x).
(e) N ∈ N(x) ⇒ there exists U ∈ N(x) with U ⊆ N such that U ∈ N(y) ∀ y ∈ U.

The following is a converse. It allows a topological space X to be defined “locally”, that


is, by specifying a neighborhood system at each point of X.
0.5.2 Proposition. Let X be a nonempty set and for each x ∈ X let N(x) be a collection of
subsets of X for which properties (a) – (e) of (0.6) hold. Then there exists a unique topology T
on X such that (i) N(x) is the T-neighborhood system at x and (ii) x ∈ U ∈ T ⇒ U ∈ N(x).

Proof. Let T be the collection of all sets U such that either U = ∅ or U ∈ N(x) for each
x ∈ U . By (a), X ∈ T, and, by (c) and (d), T is closed under arbitrary unions and finite
intersections. Therefore, T is a topology for X satisfying (ii).
Now let {NT (x) : x ∈ X} be the T-neighborhood system. If M ∈ NT (x) and U is open
with x ∈ U ⊆ M then, by definition of T, U ∈ N(x), hence, by (c), M ∈ N(x). Conversely,
if N ∈ N(x), then the set U in (e) is in T, hence N ∈ NT (x). Therefore, N(x) = NT (x).
To prove uniqueness, let T 0 be a topology satisfying (i) and (ii). If x ∈ V ∈ T 0 , then
V ∈ N(x) by (ii), hence there exists U ∈ T such that x ∈ U ⊆ V . Therefore, V is a union of
T-open sets and so is T-open. This shows that T 0 ⊆ T. Similarly, T ⊆ T 0 .

Neighborhood Bases
Let X be a topological space. A neighborhood base at x ∈ X is a subset B(x) of N(x)
such that every member of N(x) contains a member of B(x). For example, the collection of
open neighborhoods of x is clearly a neighborhood base at x.
If each x ∈ X has a neighborhood base B(x), then the resulting system {B(x) : x ∈ X}
has the following properties, derived from those of N(x):
(a) B ∈ B(x) ⇒ x ∈ B.
(b) B1 , B2 ∈ B(x) ⇒ there exists B3 ∈ B(x) with B3 ⊆ B1 ∩ B2 .
(0.7)
(c) B ∈ B(x) ⇒ there exists U ∈ B(x) with U ⊆ B such that U contains a
member of B(y) for each y ∈ U .

Here is a converse based on 0.5.2:


Preliminaries 21

0.5.3 Proposition. Let X be a nonempty set and for each x ∈ X let B(x) be a collection
of subsets of X with properties (a) – (c) of (0.7). Then there exists a unique topology T on
X such that (i) B(x) is a neighborhood base at x and (ii) every open set is a neighborhood
of each of its points.
Proof. Let N(x) be the collection of all supersets of members of B(x). Then N(x) satisfies
the conditions (a) – (e) of (0.6), and the assertions follow from 0.5.2.

A topological space is said to be first countable if each point x has a countable


neighborhood base. A metric space is first countable; for example, the collection of open (or
closed) balls at x with radii 1/n (n ∈ N) is a countable neighborhood base.

Relative Topology
If Y is a subset of a topological space XT , then the trace T ∩ Y is a topology called the
relative topology of Y . The collection of closed sets in Y is easily seen to be the trace
of the collection of closed sets in X. Open (closed) sets of Y are frequently referred to as
relatively open (closed). The neighborhood system of y ∈ Y is the trace on Y of the
T-neighborhood system of y. If B(y) is a T-neighborhood base at y ∈ Y , then B(y) ∩ Y
is a neighborhood base at y. For example, the collection of intervals [0, 1/n) (n ∈ N) is a
neighborhood base at 0 in the relative topology of [0, 1].

Nets
A directed set is a nonempty set A together with a relation  that is reflexive, transitive,
and has the property that every pair of elements has an upper bound. For example, the
neighborhood system of a point x in a topological space is directed by reverse inclusion,
that is,
Nx  Mx iff Mx ⊆ Nx .
The collection of all partitions of an interval [a, b] is directed by inclusion:

P  Q iff Q is a refinement of P.

The Cartesian product A × B of directed sets A and B is directed by the product ordering

(a1 , b1 )  (a2 , b2 ) iff a1  a2 and b1  b2 .


A net is a function whose domain A is a directed set. We shall use notation such as (xα )A ,
or simply (xα ), for a net with values xα . A net (xα ) in a set X is said to be eventually in
a subset E of X if there exists a β such that xα ∈ E for all α  β. The net is frequently
in E if for each α ∈ A there exists an β  α such that xβ ∈ E. A net (xα ) in a topological
space XT is said to T -converge to x ∈ X if (xα ) is eventually in every neighborhood of x.
T
In this case we write T -limα xα = x or xα → x. In the absence of ambiguity, we usually drop
the symbol T from this notation. The reader may easily verify that in a Hausdorff space
limits are unique.
An infinite sequence in a metric space X is a net directed by the set N with the usual
order <. Net convergence in this case is simply sequential convergence. If (xn ) is a Cauchy
sequence in X, then under the product ordering of N × N the distances d(xm , xn ) form a
net which converges to zero. For another example, consider the set of all tagged partitions
(P, ξ) of an interval [a, b] directed by

(Q, ζ)  (P, ξ) iff kPk ≤ kQk .


22 Principles of Analysis

If f : [a, b] → R is Riemann integrable, then the Riemann sums S(f, P, ξ) form a net such
Rb
that lim(P,ξ) S(f, P, ξ) = a f (x) dx. (See §3.3.)
Many properties of sequential convergence in a metric space carry over to nets in general
topological spaces. In fact, the notion of net was introduced to describe convergence in
topological spaces that are not first countable. Here is the net analog of 0.3.5.
0.5.4 Proposition. Let X be a topological space and E ⊆ X. Then x ∈ cl(E) iff there
exists a net (xα ) in E converging to x.
Proof. If x ∈ cl(E), then N ∩ E 6= ∅ for each neighborhood N of x. Choosing xN ∈ N ∩ E
and directing the neighborhood system at x by reverse inclusion, we obtain a net in E
converging to x. Conversely, if x 6∈ cl(E) then every net converging to x is eventually in the
open set cl(E)c ⊆ E c , hence no net in E can converge to x.

The notion of subsequence has the following net counterpart: A net (yβ )B is a subnet of
a net (xα )A if there exists a function β 7→ αβ : B → A such that
(i) yβ = xαβ .

(ii) For each α0 ∈ A there exists β0 ∈ B such that αβ  α0 for all β  β0 .

While this generalizes the notion of subsequence, it should be noted that a subnet of a
sequence need not be a subsequence. (Consider a subnet (xnβ ) of a sequence (xn ) with
β(1) = β(2) = 1, β(3) = β(4) = 2, etc.)
A point x in a topological space X is a said to be cluster point of a net (xα ) if (xα ) is
frequently in every neighborhood of x. The connection between cluster points and subnets is
analogous to the connection between cluster points and subsequences in a metric space:

0.5.5 Proposition. A net (xα )A in a topological space X has a cluster point x iff (xα ) has
a subnet converging to x
Proof. Let (xαβ ) be a subnet of (xα ) converging to x. If N is a neighborhood of x, then there
exists β1 such that xαβ ∈ N for all β  β1 . This implies that xα is frequently in N . Indeed,
by definition of subnet, for each α0 ∈ A there exists β0 ∈ B such that β  β0 ⇒ αβ  α0 .
Thus if β  β0 and β  β1 , then αβ  α0 and xαβ ∈ N .
Conversely, let x be a cluster point of (xα ) and let N(x) be the neighborhood system at
x directed by reverse inclusion. Direct the pairs (α, N ) ∈ A × N(x) by the product ordering.
For each (γ, N ), the net (xα ) is frequently in N , hence there exists α(γ, N )  γ such that
xα(γ,N ) ∈ N . Then (xα(γ,N ) )A×N(x) is the required subnet. Indeed, for any α0 ∈ A and
N0 ∈ N(x), (γ, N )  (α0 , N0 ) ⇒ α(γ, N )  γ  α0 ⇒ xα(γ,N ) ∈ N ⊆ N0 .

0.5.6 Corollary. Let X have topologies T and T 0 . Then T 0 ≤ T iff every net (xα ) that
T-converges to some member x of X also T 0 -converges to x.
Proof. The necessity is clear, since every T 0 -neighborhood of x is a T-neighborhood of x.
For the sufficiency, suppose for a contradiction that C is T 0 -closed but not T-closed. Take
any x ∈ clT (C) \ C and for each T-neighborhood N of x choose xN ∈ C ∩ N . Then xN → x
in XT , hence also in XT 0 . But then x ∈ C.
Preliminaries 23

0.6 Continuity in Topological Spaces


Definition and General Properties
Let X and Y be topological spaces. Motivated by 0.3.6, we say that a function f : X → Y
is continuous at a point x ∈ X if for each neighborhood Nf (x) of f (x) there exists a
neighborhood Nx of x such that f (Nx ) ⊆ Nf (x) . If f is continuous at each point of a subset
E of X, then f is said to be continuous on E. Note that this is not the same as the
assertion that f E is continuous. For example, the indicator function 1Z : R → {0, 1} is not
continuous at any integer, yet its restriction to Z is continuous in the relative topology. If f
is continuous on X, we say that f is continuous. A continuous function f : X → Y with a
continuous inverse f −1 : Y → X is called a homeomorphism.
A function f : X → Y is said to be open if f (U ) is open in Y for each open subset U
of X. An open map need not be continuous (take Y to be discrete for suitable X), and a
continuous map need not be open (take X to be discrete). Obviously, a continuous bijection
is a homeomorphism iff it is open.
0.6.1 Proposition. A function f : X → Y is continuous at x iff f (xα ) → f (x) for each
net (xα ) in X that converges to x.

Proof. Suppose that f is continuous at x and let xα → x. Given Nf (x) choose Nx such that
f (Nx ) ⊆ Nf (x) . Next, choose α0 so that xα ∈ Nx for all α  α0 . For such α, f (xα ) ∈ Nf (x) .
Therefore, f (xα ) → f (x).
Conversely, if f is not continuous at x, then there exists a neighborhood Nf (x) such that
f (N ) 6⊆ Nf (x) for all N ∈ N(x). For each N choose xN ∈ N so that f (xN ) 6∈ Nf (x) . Then
the net (xN ) converges to x, but f (xN ) is never in Nf (x) .

The next result is proved exactly as in the metric case (0.3.8), except that in the proof of
(d) ⇒ (a) one must use nets instead of sequences. We leave the details to the reader.
0.6.2 Theorem. The following statements are equivalent:
(a) f is continuous.

(b) f −1 (V ) is open in X for each open subset V of Y .

(c) f −1 (C) is closed in X for each closed subset C of Y .



(d) f cl A ⊆ cl(f (A)) for each subset A of X.

0.6.3 Corollary. Let X and Y be topological spaces and let the topology of Y be generated
by a collection S of subsets of Y . Then a function f : X → Y is continuous iff f −1 (U ) is
open in X for each U ∈ S.
Proof. The necessity is clear. For the sufficiency, assume f −1 (U ) is open in X for each U ∈ S.
Then the collection of all sets V ⊆ Y for which f −1 (V ) is open in X is a topology containing
S and so contains T(S).

For example, a real-valued function f on a topological space is continuous iff f −1 (−∞, a)
and f −1 (a, ∞) are open for all rational numbers a.
24 Principles of Analysis

Initial Topologies
Let X be a set, Y a topological space, and F a family of maps f : X → Y . The topology
T on X generated by the sets f −1 (U ), where f ∈ F and U is open in Y , is called the initial
topology on X with respect to F.
0.6.4 Proposition. The initial topology T has the following properties:
(a) T is the weakest topology on X relative to which every member of F is continuous.

(b) For each y ∈ Y , let By be an open neighborhood base at y. For x ∈ X, let Bx denote
the collection of all sets of the form f1−1 (U1 ) ∩ · · · ∩ fn−1 (Un ) containing x, where
fj ∈ F and Uj ∈ Bfj (x) . Then Bx is a neighborhood base at x for the initial topology.
(c) A net (xα ) in X T-converges to x iff f (xα ) → f (x) for every f ∈ F.

(d) If Z is a topological space, a mapping ϕ : Z → X is T-continuous iff f ◦ ϕ : Z → Y is


continuous for each f ∈ F.
Proof. (a) Let T 0 be a topology relative to which every member of F is continuous. Then
f −1 (U ) ∈ T 0 for all open U ⊆ Y , hence T ≤ T 0 .
(b) The system {Bx : x ∈ X} satisfies (0.7) and hence defines a topology T 0 on X relative
to which B(x) is a neighborhood base at x (0.5.3). Since every T 0 -open set is a union of
these basic neighborhoods and since the latter are T-open, T 0 ≤ T. On the other hand, every
member of F is obviously T 0 -continuous, hence T ≤ T 0 .
(c) The necessity is clear. For the sufficiency, let (xα ) be a net in X such that f (xα ) → f (x)
for every f ∈ F. By (b), (xα ) is eventually in any basic neighborhood of x.
(d) For the sufficiency, let f ◦ ϕ : Z → Y be continuous for each f ∈ F. Then for any open
set U ⊆ Y and f ∈ F, ϕ−1 (V ) is open in Z, where V := f −1 (U ). Since the sets V generate
the topology of X, ϕ is continuous by 0.6.3.

Product Topology
Q
Let {Xi : i ∈ I} be a family of topological spaces and set X := i∈I Xi . The product
topology on X is the initial topology with respect to the family of projection mappings
πi : X → Xi . By 0.6.4(c), a net (fα ) in X converges to f in this topology iff fα (i) =
πi (fα ) → πi (f ) = f (i) for each i ∈ I. For this reason the product topology is also called
the topology of pointwise convergence on I. Note that the product topology on Rd is
simply the topology defined by the Euclidean metric.

Final Topologies
Let X be a topological space, Y a nonempty set and F a family of maps f : X → Y .
The collection T of all subsets V of Y such that f −1 (V ) is open in X for each f ∈ F is a
topology called the final topology on Y with respect to F.
0.6.5 Proposition. The final topology T has the following properties:
(a) T is the strongest topology on Y relative to which every member of F is continuous.

(b) If Z is a topological space, then a function ϕ : Y → Z is continuous iff ϕ ◦ f : X → Z


is continuous for every f ∈ F.
Proof. (a) This follows immediately from the definition of T.

(b) For the sufficiency, let U be open in Z. Then f −1 ϕ−1 (U ) = (ϕ ◦ f )−1 (U ) is open in
X for every f ∈ F, hence ϕ−1 (U ) is open in Y .
Preliminaries 25

Quotient Topology
Let X be a topological space and ∼ an equivalence relation on X. The final topology on
X/ ∼ with respect to the quotient map Q : X → X/ ∼ is called the quotient topology.
S of X/ ∼ are precisely those collections V of equivalence classes [x] such that
The open sets
Q−1 (V ) = [x]∈V [x] is open in X. Quotient topologies play an important role in the theory
of normed linear spaces (see §8.4).

The Space of Continuous Functions


Let X and Y be topological spaces. The set of continuous functions f : X → Y is denoted
by C(X, Y ). If Y = C, we use the simpler notation C(X). Thus

C(X) := the space of all continuous functions f : X → C.

Clearly, C(X) is closed under addition and multiplication and so is an algebra. We define
the related space of all bounded, continuous functions f : X → C by

Cb (X) := C(X) ∩ B(X).

As the uniform limit of a sequence of continuous functions is continuous, Cb (X) is a closed


subalgebra of the Banach algebra B(X) under the supremum norm. Thus Cb (X) is a Banach
algebra.

Notation. For spaces such as C (0, 1) , etc. we usually omit the outer parentheses and
write instead C(0, 1). This convention holds for other function spaces as well.

F -sigma and G-delta Sets


A countable union of closed sets in a topological space X is called an Fσ -set. Dually, a
countable intersection of open sets is called a Gδ -set. For example, the half-open interval
(a, b] is both an Fσ -set and a Gδ -set, the set of rationals is an Fσ set, and the set of irrationals
is a Gδ set. By DeMorgan’s laws, a set is Fσ iff its complement is Gδ . In a metric space
every closed set C is a Gδ set and every open set U is an Fσ set, as may be seen from
\ [
C= {x ∈ X : d(x, C) < 1/n} and U = {x ∈ X : d(x, U c ) ≥ 1/n}.
n n

Proposition 0.6.7 below characterizes the set of continuity points of a function as a Gδ


set. For the proof we need
0.6.6 Lemma. Let X be a topological space and (Y, d) a metric space. For each x ∈ X,
define
F (x) = inf sup{d(f (x0 ), f (x00 )) : x0 x00 ∈ U },
U

where the infimum is taken over all open neighborhoods of x. Then for each r > 0 the set
Wr := {x ∈ X : F (x) < r} is open. Thus W0 := {x ∈ X : F (x) = 0} is a Gδ set.
Proof. Let x0 ∈ Wr and choose an open neighborhood U of x0 such that

sup{d(f (x0 ), f (x00 )) : x0 x00 ∈ U } < r.


T∞
Then F (x) < r for all x ∈ U , that is, U ⊆ Wr . Therefore, Wr is open and so W0 = n=1 W1/n
is a Gδ .
0.6.7 Proposition. Let X be a topological space and (Y, d) a metric space. Then the set of
points where a function f : X → Y is continuous is a Gδ set.
26 Principles of Analysis

Proof. We claim that W0 is the set of continuity points of f . Indeed, f is continuous at x iff
for each ε > 0 there exists a neighborhood U of x such that d(x0 , x) < ε for all x0 ∈ U iff for
each ε > 0 there exists a neighborhood U of x such that d(x0 , x00 ) < ε for all x0 , x00 ∈ U iff
F (x) < ε for all ε iff F (x) = 0.
From the proposition we see that no function f : R → R can be continuous precisely at
the rationals. The reader may easily find examples of functions that are continuous precisely
at the irrationals. (See Ex. 2.20.)

0.7 Normal Topological Spaces


A Hausdorff topological space is said to be normal if every pair of disjoint closed subsets
A and B may be separated by open sets U and V , that is, A ⊆ U , B ⊆ V , and U ∩ V = ∅.
In this section we describe the two most important properties of normal spaces. The first
shows that an ostensibly stronger separation property holds. For its proof we need the
following lemma.
0.7.1 Lemma. Let X be normal, C closed, and U open with C ⊆ U . Then there exists an
open set V such that C ⊆ V ⊆ cl(V ) ⊆ U .
Proof. Since C and U c are disjoint closed sets, by normality there exist disjoint open sets
V ⊇ C and W ⊇ U c . If x ∈ cl(V ) ∩ U c , then W , as a neighborhood of x, meets V , a
contradiction. Therefore, cl(V ) ⊆ U .

Urysohn’s Lemma
0.7.2 Theorem (Urysohn). If X is a normal topological space and A and B are disjoint
closed subsets, then there exists a continuous function f : X → [a, b] such that f = a on A
and f = b on B.
Proof. We may assume a = 0 and b = 1 (otherwise, replace f by (f − a)/(b − a)). Let
D := {r = k2−n : n ∈ N, 0 < k < 2n }, the set of dyadic rational numbers in (0, 1). We show
by induction on n that there exists a family of open sets Ur indexed by members r of D
such that
A ⊆ Ur ⊆ cl Ur ⊆ Us ⊆ B c for all r, s ∈ D with r < s. (†)
By 0.7.1, there exists an open set U1/2 such that A ⊆ U1/2 ⊆ cl U1/2 ⊆ B c . This defines
Ur for the case k = n = 1. Now assume that sets Ur have been constructed for r = k/2n
(0 < k < 2n ). Since k/2n = 2k/2n+1 , it remains to construct Ur for r = (2k + 1)/2n+1 . But
since cl Uk2−n ⊆ U(k+1)2−n , there exists by 0.7.1 an open set Ur such that
A ⊆ cl Uk2−n ⊆ Ur ⊆ cl Ur ⊆ U(k+1)2−n ⊆ B c ,
establishing (†).
Now set U1 = X and define f on X by f (x) = inf{r ∈ D : x ∈ Ur }. Obviously, 0 ≤ f ≤ 1.
Also, since no member of B is in Ur for r < 1, f (B) = 1. Moreover, since A ⊆ Ur for all r,
f (A) = 0. To see that f is continuous, let 0 < t < 1 and note that f (x) < t iff x ∈ Ur for
some r < t, and f (x) > t iff x 6∈ cl(Ur ) for some r > t. Thus we have open sets
[ [
{f < t} = Ur and {f > t} = (cl Ur )c .
r<t r>t

Since the intervals (−∞, t), (t, ∞) generate the topology of R, f is continuous by 0.6.3.
Preliminaries 27

Tietze Extension Theorem


Here is the one of the main applications of Urysohn’s lemma. A variation of the theorem
is given in the section on locally compact spaces.
0.7.3 Theorem (Tietze). If X is a normal topological space and Y ⊆ X is closed, then for
each continuous f : Y → [a, b] there exists a continuous F : X → [a, b] such F Y = f .
Proof. We may assume a = 0 and b = 1. To construct F we first construct inductively a
sequence of continuous functions gn : X → [0, 1] such that
n
X
(i) 0 ≤ gn ≤ 2n−1 /3n on X and (ii) 0 ≤ f − gj ≤ (2/3)n on Y. (†)
j=1
 
To obtain g1 , note that the disjoint subsets f −1 [0, 1/3] and f −1 [2/3, 1] of Y are closed
in X, hence by Urysohn’s  function g1 : X → [0, 1/3] with
 lemma there exists a continuous
g1 = 0 on f −1 [0, 1/3] and g1 = 1/3 on f −1 [2/3, 1] . Thus (†) holds for n = 1. Now
assume that g1 , . . . , gn satisfy (†). By Urysohn’s lemma again, there P
exists a continuous
n
function gn+1 : X → [0, 2n /3n+1 ] such that gn+1 = 0 whenever f − j=1 gj ≤ 2n /3n+1 ,
Pn
and gn+1 = 2n /3n+1 whenever f − j=1 gj ≥ (2/3)n+1 , completing the construction. Now
P∞
set F = n=1 gn . By (i), the convergence is uniform, so F is continuous. By (ii), F = f
on Y .
A Hausdorff topological space is completely regular if for each x ∈ X and closed set
Y ⊆ X there exists a continuous function f : X → [0, 1] such that f = 0 on Y and f (x) = 1.
It follows from Urysohn’s lemma that every normal space is completely regular. The notion
of complete regularity finds special importance in the Stone-Čech compactification theorem
(§13.4).

0.8 Compact Topological Spaces


S
A collection U of open subsets of a topological space X such that U = X is called an
open cover of X. A subcollection of U that is a cover of X is called a subcover. A space
X is said to be compact if each open cover of X has a finite subcover. A subset Y of X is
compact if it is compact in the relative topology, that is, if forS each collection U of open
sets in X there exists a finite subcollection U0 such that Y ⊆ U0 . A subset Y of X is
relatively compact if its closure is compact.
Finite subsets of a topological space are obviously compact. In a discrete topological space
X these are the only compact sets, since any nonempty set has an open cover of singletons.
A family A of subsets of a set X is said to have the finite intersection property (f.i.p.)
if every finite subcollection of A has a nonempty intersection. The following is a useful
characterization of compactness in terms of this notion.
T
0.8.1 Proposition. A topological space X is compact iff A∈A cl(A) 6= ∅ for every collection
A of subsets of X with the f.i.p.
T
Proof. Let X be compact and A a collection of subsets of X. If A∈A cl(A) = ∅, then
c
the collection of complements Tncl(A) is T
an open cover of X and so has a finite subcover
n
{cl(A1 ) , . . . , cl(An ) }. Then j=1 Aj ⊆ j=1 cl(Aj ) = ∅, hence A does not have the f.i.p.
c c

Conversely, if X is not compact, then there exists an open cover U with no finite subcover.
28 Principles of Analysis

Taking C to be the collection of complements of members of U, we see that C has the finite
intersection property but has empty intersection.
0.8.2 Proposition. A compact subset of a Hausdorff space X is closed.
Proof. Let A ⊆ X be compact. We show that Ac is open. Let b ∈ Ac . For each x ∈ A, let Mx
and Nx be disjoint open neighborhoods of x and b, respectively. Then {MxS: x ∈ A} is an
open coverTof A, hence there exists a finite subset A0 of A such that Ub := x∈A0 Mx ⊇ A.
Set Vb := x∈A0 Nx . Then Vb is a neighborhood of b, and since Vb ∩ Mx = ∅ for every x ∈ A0 ,
Vb ⊆ Ac . Therefore Ac is open.
0.8.3 Proposition. A compact subset Y of a metric space (X, d) is bounded.
Proof. Fix y ∈ Y . The collection of open balls Bn (y) with center y ∈ Y and radius n ∈ N is
an open cover of Y and so has a finite subcover. Therefore, Y ⊆ Bn (y) for some n.
0.8.4 Proposition. A closed subset of a compact space X is compact.
Proof. Let Y ⊆ X be closed. If U is a cover of Y by open sets of X, then enlarging U by
including the open set X \ Y results in anSopen cover of X. S
Since X is compact, there exist
U1 , . . . , Un ∈ U such that X = (X \ Y ) ∪ j Uj . Then Y ⊆ j Uj .
0.8.5 Corollary. Let X have topologies T 1 ≤ T 2 such that (X, T 2 ) is compact and (X, T 1 )
is Hausdorff. Then T 1 = T 2 .
Proof. Let C be T 2 -closed. By 0.8.4, C is T 2 -compact hence T 1 -compact. By 0.8.2, C is
T 1 -closed. Therefore, T 1 and T 2 have the same closed sets and so are equal.
The following proposition asserts that disjoint compact sets in a Hausdorff space may be
separated by open sets.
0.8.6 Proposition. Let A and B be disjoint compact subsets of a Hausdorff space X. Then
there exist disjoint open sets U and V with A ⊆ U and B ⊆ V .
Proof. By the proof of 0.8.2, for each b ∈ B there exist disjoint open sets Ub ⊇ A and Vb 3 b.
Then {Vb : b ∈ B} is S
an open cover of B, so byTcompactness there exists a finite set B0 ⊆ B
such that B ⊆ V := b∈B0 Vb . Now set U := b∈B0 Ub .
From 0.8.4 and 0.8.6 we have the following:
0.8.7 Corollary. A compact Hausdorff space is normal.

Convergence in Compact Spaces


Here is an important characterization of compactness in terms of nets.
0.8.8 Theorem. A topological space X is compact iff each net in X has a convergent
subnet.
Proof. Let X be compact and let (xα )A be a net in X. By 0.5.5 it suffices to show that (xα )
has a cluster point in X. For α ∈ A, define Eα := {xβ : β ≥ α}. Since every finite subset
T bound, the collection {Eα : α ∈ A} has the f.i.p. By compactness, there
of A has an upper
exists an x ∈ cl Eα . Thus if N is a neighborhood of x then N ∩ Eα 6= ∅ for every α. By
definition of Eα , (xα ) is frequently in N . Therefore, x is the required cluster point.
Conversely, if X is not compact, then there exists an open cover {Ui : i ∈ I} of X with
no finite subcover.
S Direct the finite subsets α of I upward by inclusion. For each α choose a
point xα ∈ X \ i∈α Ui . Then (xα ) is a net in X with no cluster points. Indeed, if x ∈ X,
then x ∈ Uj for some j ∈ I, but xα 6∈ Uj for all finite sets α containing j (that is, α  {j}),
hence xα is not frequently in Uj .
Preliminaries 29

We shall see in the next section that in a metric space the nets in the last theorem may
be replaced by sequences.

Compactness of Cartesian Products


The following theorem is among the most powerful theorems in topology. It will have
important analytical consequences in a variety of contexts later.
0.8.9 Tychonoff ’s Theorem. Q If {Xi : i ∈ I} is a family of compact topological spaces,
then the product space X := i∈I Xi is compact in the product topology.
T
Proof. We show that cl(A)A∈A0 6= ∅ for any collection A0 ⊆ P(X) with the f.i.p. Order
upward by inclusion the family A of all collections A ⊇ A0 with the f.i.p. It is easy to check
that every chain in A has an upper bound, namely the union of allTthe collections in the
T a maximal element A. We show that cl(A)A∈A 6= ∅. Since
chain. By Zorn’s lemma, A has
A0 ⊆ A, it will follow that cl(A)A∈A0 6= ∅.
Observe that
(i) A, B ∈ A ⇒ A ∩ B ∈ A and (ii) A ∩ B 6= ∅ ∀ B ∈ A ⇒ A ∈ A,
since the negation of either would allow A to be enlarged while still retaining the f.i.p.,
contradicting the maximality of A. Let πi : X → Xi denote Tthe projection map. Since πi (A)
has the f.i.p., by compactness of Xi there exists an xi ∈ A∈A cl πi (A) . Thus if Ui is an
T A ∈ A,
open neighborhood of xi , then for every Ui ∩ πi (A) 6= ∅ and so πi−1 (Ui ) ∩ A =
6 ∅. By
−1 −1
(ii), πi (Ui ) ∈ A, hence by (i) UF := i∈F πi (Ui ) ∈ A for any finite F ⊆ I. Since A has
A ∈ A. Now set f (i) := xi , i ∈ I. Since the sets UF form a
the f.i.p., UF ∩ A 6= ∅ for all T
neighborhood base at f , f ∈ A∈A cl(A).

Continuity and Compactness


0.8.10 Theorem. Let X and Y be topological spaces with X compact, and let f : X → Y
be continuous. Then the following hold:
(a) f (X) is compact.
(b) If f is a bijection and Y is Hausdorff, then f is a homeomorphism.
(c) If X and Y are metric spaces, then f : X → Y is uniformly continuous.
Proof. (a) Let {Vi : i ∈ I} be an open cover of f (X) in Y . Then {f −1 (Vi ) : i ∈ I} is an
open cover of X, hence there exists a finite subset I0 of I such that {f −1 (Vi ) : i ∈ I0 } is a
cover of X. It follows that {Vi : i ∈ I0 } is a finite cover of f (X).
(b) Let g = f −1 and let C be a closed subset of X. Then C is compact, hence g −1 (C) = f (C)
is compact and therefore closed. By 0.6.2, g is continuous.
(c) Let X and Y have metrics d and ρ, respectively. Let ε > 0. For each x ∈ X choose
γx > 0 such that  
f Bγx (x) ⊆ Bε/2 f (x) . (†)
Set δx = γx /2. The collection {Bδx (x) : x ∈ X} is an open cover of X, hence there exists a
finite set F ⊆ X such that {Bδx (x) : x ∈ F } covers X. Let δ := minx∈F δx and let a, b ∈ X
with d(a, b) < δ. Choose x ∈ F such that a ∈ Bδx (x). Then
d(x, a) < δx < γx and d(x, b) ≤ d(a, b) + d(x, a) < δx + δx = γx ,
so a, b ∈ Bγx (x). By (†),
  
ρ f (a), f (b) ≤ ρ f (a), f (x) + ρ f (x), f (b) < ε/2 + ε/2 = ε.
Therefore, f is uniformly continuous.
30 Principles of Analysis

0.8.11 Corollary. If f : X → R is continuous and X is compact, then there exist points


xm and xM in X such that

f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ X.

Proof. By 0.8.10(a), f (X) is compact, hence closed and bounded in R. Thus f (X) must
contain its supremum and infimum.

0.9 Totally Bounded Metric Spaces


Let (X, d) be a metric space. In this section we give two alternate characterizations of
compactness of X.
A subset E of X is said to S be totally bounded if for each ε > 0 there exist points
x1 , . . . , xn ∈ X such that E ⊆ j Bε (xj ). Since a finite union of open balls is bounded,
every totally bounded set is bounded. The converse is false. For example, in a discrete metric
space all sets are bounded, but no infinite set can be totally bounded.
A subset E of X is said to be sequentially compact if every sequence in E has a cluster
point in E.

0.9.1 Theorem. The following statements are equivalent:


(a) X is compact.

(b) X is sequentially compact.

(c) X is complete and totally bounded.

Proof. (a) ⇒ (b): Let (an ) be a sequence in X with no cluster point. Then for each x ∈ X
there must exist an open ball B(x) with center x that contains only finitely many terms
of (an ). This implies that every finite subcover of the open cover {B(x) : x ∈ X} of X
contains only finitely many terms of the sequence and so cannot cover X. Therefore, X is
not compact.
(b) ⇒ (c): Let X be sequentially compact. That X is complete follows from 0.3.2.
Suppose X is not totally bounded. Then there exists ε > 0 such that no finite collection
of open balls of radius ε covers X. Choose any a1 ∈ X. Since Bε (a1 ) does not cover
X, there exists a2 ∈ X \Bε (a1 ). Since Bε (a1 ) ∪ Bε (a2 ) does not cover X, there exists
a3 ∈ X \ Bε (a1 ) ∪ Bε (a2 ) . Continuing in this manner we obtain a sequence (an ) in X with
 
an ∈ X \ Bε (a1 ) ∪ Bε (a2 ) ∪ · · · ∪ Bε (an−1 ) .

It follows that d(an , am ) ≥ ε for all m 6= n. But then no subsequence of {an } can converge.
Therefore, X must be totally bounded.
(c) ⇒ (a): Assume that X is complete and totally bounded but not compact. Then X has
an open cover U = {Ui : i ∈ I} with no finite subcover. For each k let Fk be a finite set of
points in X such that {B1/k (x) : x ∈ Fk } is a cover of X. Consider the case k = 1. If for
each x ∈ F1 the ball B1 (x) could be covered by finitely many members of U, then X itself
would have a finite cover, contradicting our assumption. Thus there exists x1 ∈ F1 such that
E1 := B1 (x1 ) cannot be covered by finitely many members of U. Since {B1/2 (x) : x ∈ F2 }
covers X, {E1 ∩ B1/2 (x) : x ∈ F2 } covers E1 , so by similar reasoning applied to E1 there
Preliminaries 31

exists x2 ∈ F2 such that E2 := E1 ∩ B1/2 (x2 ) cannot be covered by finitely many members
of U. In this way we obtain a sequence (xn ) in X and decreasing sets

En = B1 (x1 ) ∩ B1/2 (x2 ) ∩ · · · ∩ B1/n (xn ) = En−1 ∩ B1/n (xn ) (†)

that cannot be covered by finitely many members of U. In particular, En 6= ∅. For each n,


choose a point yn ∈ En . If n > m, then yn ∈ Em , hence from (†)

d(xm , xn ) ≤ d(xm , yn ) + d(yn , xn ) < 1/m + 1/n,

from which it follows that (xn ) is a Cauchy sequence. Since X is complete, xn → x for
some x ∈ X. Choose i ∈ I such that x ∈ Ui . Since Ui is open, there exists r > 0 such that
Br (x) ⊆ Ui . Taking n > 2/r so that d(xn , x) < r/2 we then have En ⊆ B1/n (xn ) ⊆ Br (x) ⊆
Ui , contradicting the non-covering property of En . Therefore, X must be compact.

The following result is known as the Heine-Borel theorem.


0.9.2 Corollary. A subset of Rd is compact iff it is closed and bounded.
Proof. We have already proved the necessity. For the sufficiency, let C ⊆ Rd be closed and
bounded. Since Rd is complete, C is complete (0.3.4). Since C is bounded, it is totally bounded
as may be seen by enclosing C in bounded d-dimensional interval I and then subdividing
I into finitely many congruent subintervals of arbitrarily small diameter. Therefore, C is
compact.

0.10 Equicontinuity
We have seen that every closed ball in Rd is compact. By contrast, closed balls in the
space C[0, 1] with the supremum norm are not compact, as may be inferred from the fact
that the sequence of functions fn (x) = xn has no convergent subsequence in C[0, 1]. The
additional property of equicontinuity is needed to characterize compact subsets of such
spaces.
Let X be a topological space. A family F of functions in C(X) is said to be equicontin-
uous at a point a ∈ X if, for each ε > 0, there exists a neighborhood N of a such that
|f (x) − f (a)| < ε for all x ∈ N and all f ∈ F. If F is equicontinuous at each point of X,
then F is said to be equicontinuous. The distinguishing feature of equicontinuity is that,
while the neighborhood N may vary with the point a, the same N works for all f ∈ F.
Here is the main result regarding equicontinuity.
0.10.1 Theorem (Arzelá–Ascoli). Let X be a compact Hausdorff space. A subset F of
C(X) is relatively compact in the uniform norm topology iff it is equicontinuous and pointwise
bounded, that is, 
sup |f (x)| : f ∈ F < ∞ for all x ∈ X.

Proof. Suppose F is relatively compact in C(X). Then F is bounded, hence certainly


pointwise bounded. If F is not equicontinuous at some a ∈ X, then there exists an ε > 0 and
for every N ∈ Na a point xN ∈ N and a function fN ∈ F such that |fN (xN ) − fN (a)| ≥ ε.
By relative compactness of F, there exists a subnet (fNα ) and f ∈ C(X) such that
32 Principles of Analysis

kfNα − f k∞ → 0. Furthermore, since X is compact there exists a subnet (xNβ ) of (xNα )


and a point a ∈ X such that xNβ → a. But then
ε ≤ |fNβ (xNβ ) − fNβ (a)| ≤ |fNβ (xNβ ) − f (xNβ )| + |f (xNβ ) − f (a)| + |f (a) − fNβ (a)|

≤ 2 fN − f + |f (xN ) − f (a)|,
β ∞ β
(†)
impossible since the expressions in (†) tend to zero.
Conversely, assume that F is equicontinuous. Since C(X) is complete, to show that F
is relatively compact it suffices by 0.9.1 to show that the closure of F in C(X) is totally
bounded. Since the closure of a totally bounded set is totally bounded, it is enough to show
that F is totally bounded.
Let ε > 0. By equicontinuity, for each x ∈ X we may choose an open neighborhood Nx of
x such that
|f (y) − f (x)| < ε/4 for all y ∈ Nx and f ∈ F. (‡)
S
By compactness, there exists a finite set F ⊆ X such that X = x∈F Nx . Since F is
pointwise bounded, the set {f (x) : x ∈ F, f ∈ F} is bounded in C and hence is totally
bounded. Thus we may choose a finite set C ⊆ C such that for each f ∈ F and x ∈ F there
exists ϕf (x) ∈ C such that |f (x) − ϕf (x)| < ε/4. For each function ϕ : F → C set
Fϕ = {f ∈ F : |f (x) − ϕ(x)| < ε/4 ∀ x ∈ F }.
As there are only finitely many functions ϕ, there are only finitely many sets Fϕ . Moreover,
f ∈F Sϕf for each f ∈ F. Let Fj = Fϕj (j = 1, . . . , m) denote the nonempty Fϕ , so that
F = j Fj . Choosing fj ∈ Fj we then have
f ∈ F ⇒ f ∈ Fj for some j
⇒ |f (x) − fj (x)| ≤ |f (x) − ϕj (x)| + |ϕj (x) − fj (x)| < ε/2 ∀x ∈ F
⇒ |f (y) − fj (y)| ≤ |f (y) − f (x)| + |f (x) − fj (x)| + |fj (x) − fj (y)| < ε
∀ x ∈ F and y ∈ Nx (by (‡))
S
⇒ kf − fj k∞ < ε (since X = x∈F Nx )
Sm
Thus F ⊆ j=1 Bε (fj ), proving that F is totally bounded.

0.11 The Stone-Weierstrass Theorem


Weierstrass’s classical approximation theorem asserts that any function in C[a, b] may
be uniformly approximated by polynomials. Stone’s generalization of Weierstrass’s theorem
replaces [a, b] by an arbitrary compact Hausdorff topological space and replaces the set of
polynomials by a more general class of functions.
For the statement of the theorem, the following terminology will be needed. A collection
A of complex-valued functions on a set X is said to separate points of X if for each pair
of distinct points x and y in X there exists f ∈ A such that f (x) 6= f (y). For example,
the algebra of all polynomials on [a, b] separates points. The set {sin x, cos x} separates the
points of [ε, 2π] but not of [0, 2π].
Here is the statement of the theorem. Rather than giving Stone’s original proof, we prefer
to wait until Chapter 14 when a shorter proof using the Krein-Milman theorem is available.
0.11.1 Stone-Weierstrass Theorem. Let X be a compact Hausdorff space and A a
subalgebra of C(X) that contains the constant functions, separates points of X, and is closed
under complex conjugation. Then A is dense in C(X) in the uniform norm.
Preliminaries 33

0.12 Locally Compact Topological Spaces


General Properties
A topological space is said to be locally compact if each member of the space has
a compact neighborhood. For example, discrete spaces and Euclidean spaces are locally
compact. We shall see in Chapter 8 that no infinite dimensional normed space can be locally
compact.
The following proposition is immediate from the definition of relative topology.
0.12.1 Proposition. If X is locally compact and Y is an open or closed subset of X, then
Y is locally compact.

The next proposition gives a key property of locally compact spaces that underlies the
utility and importance of these spaces.
0.12.2 Proposition. If X is a locally compact Hausdorff space, then for each x ∈ X the
collection of compact neighborhoods of x is a neighborhood base.
Proof. Let N be an open neighborhood of x. We may assume that cl(N ) is compact, otherwise
replace N by the smaller open neighborhood int(M )∩N , where M is a compact neighborhood
of x. By 0.8.6, there exist disjoint open sets U and V with x ∈ U and cl(N ) \ N ⊆ V . If
y ∈ cl(U ∩ N ) \ N , then y ∈ V , hence V ∩ (U ∩ N ) 6= ∅, which is impossible. Therefore,
x ∈ U ∩ N ⊆ cl(U ∩ N ) ⊆ N , so cl(U ∩ N ) is the desired compact neighborhood contained
in N .
The following version of 0.7.1 will be needed below.
0.12.3 Proposition. Let X be locally compact and Hausdorff. If K ⊆ U ⊆ X with U
open and K compact, then there exists an open set V with compact closure such that
K ⊆ V ⊆ cl(V ) ⊆ U .
Proof. By 0.12.2, for each x ∈ K there exists an open neighborhood Vx of x with compact
closure
S contained in U . By compactness
S of K, there exists a finite set F ⊆ K such that
V := x∈F Vx ⊇ K. Then cl(V ) ⊆ x∈F cl(Vx ), hence cl(V ) is compact and ⊆ U .

Baire Spaces
0.12.4 Proposition. Let X be a topological space. The following statements are equivalent:
T∞
(a) If Un is open and dense in X for each n, then n=1 Un is dense in X.
S∞
(b) If Cn is closed and n=1 Cn has an interior point, then some Cn has an interior point.

Proof. The equivalence follows from De Morgan’s laws and the fact that an open set is dense
in X iff its complement has empty interior.
A Baire space is a topological space X with the equivalent properties in the proposition.
For example, a complete metric space is a Baire space (0.3.12). Here is another important
example.
0.12.5 Theorem. A locally compact Hausdorff space X is a Baire space.
34 Principles of Analysis
T∞
Proof. We show that (a) of 0.12.4 holds. Set D := n=1 Un and let U be any nonempty
open set in X. We show that D ∩ U 6= ∅. Since U ∩ U1 is open and nonempty, there exists a
nonempty open set V1 such that cl V1 is compact and contained in U ∩ U1 (0.12.3). Since
V1 ∩ U2 is open and nonempty, there exists a nonempty open set V2 such that cl V2 is compact
and contained in V1 ∩ U2 and hence is contained in U ∩ U1 ∩ U2 ∩ V1 . Proceeding in this
manner we construct a sequence of nonempty open sets Vn with compact closure contained
in U ∩ U1 ∩ · · · ∩ Un ∩ Vn−1 . Since the compact sets cl Vn are decreasing, their intersection is
nonempty. Any point in this intersection is a member of D ∩ U

Functions with Compact Support


Let X be a topological space. The support of a function f : X → C is defined as

supp(f ) := cl {x ∈ X : f (x) 6= 0}.

Thus supp(f ) is the smallest closed set on whose complement f = 0. The collection of all
functions f ∈ C(X) with compact support is denoted by Cc (X):

Cc (X) := {f ∈ C(X) : supp(f ) is compact in X}.

Clearly, Cc (X) ⊆ Cb (X). Moreover, the relations

supp(f + g) ⊆ supp(f ) ∪ supp(g), supp(cf ) = c supp(f ) (c 6= 0), and supp(f g) ⊆ supp(f )

show that Cc (X) is an ideal in the algebra Cb (X).


The next two theorems are versions of Urysohn’s lemma and Tietze’s extension theorem
for locally compact Hausdorff spaces. They imply that such spaces have a rich supply of
continuous functions, a property crucial in the development of integration on locally compact
spaces.

0.12.6 Theorem. Let X be locally compact and Hausdorff. If K ⊆ U ⊆ X with K compact


and U open, then there exists a continuous function h : X → [0, 1] with compact support
such that h = 1 on K and h = 0 on U c , that is, 1K ≤ h ≤ 1U .
Proof. Let V be as in 0.12.3. Since cl(V ) is compact it is normal, hence, by Urysohn’s lemma,
there exists a continuous function h : cl(V ) → [0, 1] such that h = 1 on K and h = 0 on
cl(V ) \ V . Extend h to X by setting h = 0 on cl(V )c . Then h is clearly continuous on the
open set V ∪ cl(V )c . A simple argument shows that h is continuous at each point of cl(V ) \ V
as well.
The proof of the following result is the same as that of Tietze’s extension theorem for
normal spaces, except that one uses the preceding theorem instead of Urysohn’s lemma.

0.12.7 Theorem. Let X be locally compact and Hausdorff. If K ⊆ X is compact and


f : K → [a, b] is continuous, then there exists a continuous function F : X → [a, b] such that
f = F on K.
0.12.8 Corollary. Let X be locally compact and Hausdorff and K ⊆ U ⊆ X with U open
and K compact. If g : K → [a, b] is continuous, then there exists a continuous function
f : X → R with compact support contained in U such that f = g on K.
Proof. Let G : X → [a, b] be a continuous extension of g and let h be as in 0.12.6. Then
f := Gh satisfies the requirements.
Preliminaries 35

Functions That Vanish at Infinity


Let X be a topological space. A function f : X → C is said to vanish at infinity if for
each ε > 0 the set
K(f, ε) := {x ∈ X : |f (x)| ≥ ε}
is compact. In Rd this is simply the assertion that lim|x|→∞ f (x) = 0. The collection of all
functions in C(X) vanishing at infinity is denoted by C0 (X):

C0 (X) := {f ∈ C(X) : K(f, ε) is compact in X ∀ ε > 0}.



Note that if f ∈ C0 (X), then |f (x)| ≤ ε + sup |f K(f, ε) | for all x ∈ X, hence f ∈ Cb (X).
Thus we have the inclusions Cc (X) ⊆ C0 (X) ⊆ Cb (X), with equality holding throughout if
X is compact. The relations

K(f + g, ε) ⊆ K(f, ε) ∪ K(g, ε), K(cf, ε) = K(f, ε/|c|) and K(f g, ε) ⊆ K(f, ε/ kgk∞ )

imply that C0 (X) is an ideal of Cb (X). More can be said:


0.12.9 Proposition. C0 (X) is a Banach space under the uniform norm.
Proof. By 0.3.4(a), it suffices to show that C0 (X) is closed in Cb (X). Let fn ∈ C0 (X) and
fn → f ∈ Cb (X). To show that f ∈ C0 (X), given ε > 0, choose n so that kfn − f k < ε/2.
To see that K(f, ε) is compact, let (xα ) be a net in K(f, ε). Then for all α,

|fn (xα )| = |f (xα ) + fn (xα ) − f (xα )| ≥ |f (xα )| − |fn (xα ) − f (xα )| > ε/2,

hence, xα ∈ K(fn , ε/2). Since K(fn , ε/2) is compact, there exists a subnet (xβ ) that
converges to some x ∈ K(fn , ε/2). Since |f (xα )| ≥ ε for all α and f is continuous, |f (x)| ≥ ε.
Therefore, x ∈ K(f, ε). By 0.8.8, K(f, ε) is compact.
0.12.10 Proposition. If X is locally compact and Hausdorff, then Cc (X) is dense in
C0 (X).
Proof. Let f ∈ C0 (X) and ε > 0. Since K(f, ε) ⊆ U := {x ∈ X : |f (x)| > ε/2}, there exists
a function g : X → [0, 1] in Cc (X) such that g = 1 on K(f, ε) and g = 0 on U c (0.12.6).
Then f g ∈ Cc (X) and kf g − f k∞ ≤ ε.

The One-Point Compactification


Let X be a noncompact locally compact Hausdorff space with topology T and let ∞
be a point not in X. Define X∞ := X ∪ {∞} and let T ∞ consist of the members of T
together with all sets of the form X∞ \ K, where K ⊆ X is compact. It is straightforward
to check that T ∞ is closed under finite intersections and arbitrary unions and hence is a
topology on X∞ . The pair (X∞ , T ∞ ) is called the one-point compactification of X. This
construction is useful in extending results from a compact setting to a locally compact one,
as, for example, in 0.12.13, below.
The following proposition makes the basic connection between C0 (X) and C(X∞ ) and
justifies the terminology “vanishing at infinity” for functions f ∈ C0 (X).
0.12.11 Proposition. Let f ∈ Cb (X). Then f ∈ C0 (X) iff f (xα ) → 0 for any net (xα ) in
X with xα → ∞.
Proof. The necessity is clear, since if xα → ∞ then xα is eventually in K(f, ε)c for any
ε > 0. For the sufficiency, we show that the convergence hypothesis implies that K(f, ε) is
compact. Let (xα ) be a net in K(f, ε) with no cluster point in X. Then for each compact
36 Principles of Analysis

K ⊆ X and each α there exists an αK  α with xαK 6∈ K. Direct the collection of compact
subsets upward by inclusion. Then (xαK ) is a subnet of (xα ), and for any compact K0 ,
xαK 6∈ K0 for all K ⊇ K0 . Therefore, xαK → ∞ and so by hypothesis f (xαK ) → 0. But this
is impossible, since |f (xα | ≥ ε for all α. This shows that (xα ) must have a cluster point in
K(f, ε) and so K(f, ε) is compact.
We may now prove the following extension property of one-point compactifications:

0.12.12 Proposition. Let X and Y be noncompact, locally compact Hausdorff spaces and
let ϕ : X → Y be continuous such that g ◦ ϕ ∈ C0 (X) for all g ∈ C0 (Y ). Then the extension
ϕ∞ : X∞ → Y∞ of ϕ defined by ϕ∞ (∞) = ∞ is continuous.
Proof. Let xα ∈ X and xα → ∞. Let K ⊆ Y be compact and choose g ∈ Cc (Y ) such that
g = 1 on K. By hypothesis, f := g ◦ ϕ ∈ C0 (X), hence f (xα ) → 0. Thus the net (ϕ(xα )) is
eventually in K c , hence ϕ(xα ) → ∞. This establishes continuity of ϕ at ∞.
We conclude this subsection with a locally compact version of the Stone-Weierstrass
theorem. It is derived from the compact version via the one-point compactification.
0.12.13 Stone-Weierstrass Theorem. Let X be a locally compact noncompact Hausdorff
conjugate closed subalgebra of C0 (X) that separates points
topological space and let A be a T
of X and with the property that f ∈A {x ∈ X : f (x) = 0} = ∅. Then A is dense in C(X) in
the uniform norm.
Proof. Identify C0 (X) with the closed subspace of C(X∞ ) consisting of all f with f (∞) = 0.
Let A1 denote the subalgebra of C∞ (X) generated by A and the constant function 1. Then
A1 trivially separates points of X∞ , hence is dense in C(X∞ ). Moreover, every member g of
A1 may be written uniquely as g = g0 + g(∞), where g0 ∈ C0 (X). Now let f ∈ C0 (X) and
ε > 0, and choose g ∈ A1 such that kg − f k∞ < ε. In particular, |g(∞)| < ε, hence setting
g0 = g − g(∞) we have g0 ∈ A and for all x ∈ X

|f (x) − g0 (x)| = |f (x) − g(x) + g(∞)| ≤ |g(x) − f (x)| + |g(∞)| < 2ε.

0.13 Spaces of Differentiable Functions


In this section we define several spaces of differentiable functions on open sets U ⊆ Rd
that will appear in later chapters. We shall need the following terminology and notation.
A multi-index is a d-tuple α = (α1 , . . . , αd ) of nonnegative integers. We set

|α| = α1 + · · · + αd .

While this conflicts with the notation for the Euclidean norm on Rd , context will make
clear which notion is being referenced. The partial differential operator of order |α| is
defined by  α1  αd
α α ∂ ∂
∂ = ∂x = ··· .
∂x1 ∂xd
If α = (0, . . . , 0), then ∂ α is the identity operator. The following spaces of differentiable
functions figure prominently in the study of Fourier analysis and distributions on Rd . (See
Chapters 6 and 15.)
Preliminaries 37
T∞
C k (U ) = {f : ∂ α f ∈ C(U ) for all |α| ≤ k}, C ∞ (U ) := k=1 C k (U ).
T∞
Cck (U ) = {f : ∂ α f ∈ Cc (U ) for all |α| ≤ k}, Cc∞ (U ) := k=1 Cck (U ).

By the standard rules of differentiation, these spaces are closed under addition, multiplication,
and scalar multiplication and so are algebras. Moreover, the C ∞ spaces satisfy ∂ α C ∞ ⊆ C ∞
for all α.

0.14 Partitions of Unity


In this section we prove two related results that are useful for piecing together local data
to form a global construct such as a surface integral. The first result occurs in the general
setting of locally compact spaces; the second is a C ∞ version of the first in the context of Rd .
In each case, the functions φi in the statement of the theorem are said to form a partition
of unity subordinate to the open sets Ui .
0.14.1 Theorem. Let K be a compact subset of a locally compact Hausdorff space X and
let {Ui : i ∈ I} be an open cover of K. Then there exists a finite
Pp subcover {U1 , . . . , Up } of K
and nonnegative φi ∈ Cc (X) such that supp(φj ) ⊆ Uj and j=1 φj = 1 on K.
Proof. For each x ∈ K, let j(x) be an index such that  x ∈ Uj(x) . Choose an open neighbor-
hood Vx of x with compact closure such that cl Vx ⊆ Uj(x) . Since K is compact, finitely
many of the sets Vx cover K. Denote these by V1 , . . . , Vp and denote the corresponding
sets Uj(x) by U1 , . . . , Up . Since Vj ⊆ Kj := cl(Vj ) ⊆ Uj , there exists by 0.12.6 a continuous
function ψj : X → [0, 1] with compact support such that ψj = 1 on Kj and supp(ψj ) ⊆ Uj .
Now set
φ1 := ψ1 and φj := (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψj−1 )ψj , j > 1.
Then φj ∈ Cc (X), 0 ≤ φj ≤ 1, and supp(φj ) ⊆ supp(ψj ) ⊆ Uj . Finally, let

ηj := (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψj ).
 
For j > 1, ηj−1 − ηj = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψj−1 ) 1 − (1 − ψj ) = φj , hence
p
X p
X
φj = φ 1 + (ηj−1 − ηj ) = φ1 + η1 − ηp = 1 − ηp .
j=1 j=2
S S Pp
Since K ⊆ j Vj ⊆ j Kj and φj = 1 on Kj , ηp = 0 on K, hence j=1 φj = 1 on K,
completing the proof.

For the C ∞ version of 0.14.1, we need the following lemmas.

0.14.2 Lemma. Let a < b. Then there exists a C ∞ function h : R → [0, +∞) such that
h > 0 on (a, b), and h = 0 on (a, b)c .
Proof. Define h by
(  
exp (x − a)−1 (x − b)−1 if a < x < b,
h(x) =
0 otherwise.
38 Principles of Analysis

Clearly, h(m) = 0 on [a, b]c for all m ≥ 0. Moreover, if x ∈ (a, b), then h(m) (x) is a sum of
terms of the form
±h(x)
, p, q ∈ Z+ .
(x − a)p (x − b)q
Since the exponent (x − a)−1 (x − b)−1 in h(x) is negative on (a, b), l’Hospital’s rule is
applicable and yields
h(x)
lim = 0, a < x < b.
x→a+ (x − a)p (x − b)q

Therefore, limx→a h(m) (x) = 0. An induction argument then shows that h(m) (a) = 0 for all
m. A similar argument holds for b. Thus h is C ∞ on R.

0.14.3 Lemma. Let a < b. Then there exists a C ∞ function g : R → R such that 0 ≤ g ≤ 1,
g = 0 on (−∞, a], and g = 1 on [b, +∞).
R b −1 R x
Proof. Take g(x) := a h a
h, where h is the function in 0.14.2.

The following consequence of 0.14.2 is useful for obtaining smooth approximations of


functions on Rd .
0.14.4 Lemma. Let I = (a1 , b1 )×· · ·×(an , bn ). Then there exists a C ∞ function f : Rd → R
such that f > 0 on I and f = 0 on I c .

Proof. For each j, let hj : R → [0, +∞) be a C ∞ function such that hj > 0 on (aj , bj ) and
hj = 0 on (aj , bj )c . Now set f (x1 , . . . , xn ) := h1 (x1 ) · · · hn (xn ).
We may now prove the following C ∞ version of Urysohn’s lemma:
0.14.5 Theorem. Let K ⊆ U ⊆ Rd , where K is compact and U is open. Then there exists
a C ∞ function ψ : Rd → [0, 1] such that supp(ψ) ⊆ U and ψ = 1 on K.

Proof. For each x ∈ K, let Vx be an open cube with center x and edge 2r:

Vx := {y ∈ Rd : xj − r < yj < xj + r, j = 1, . . . , n},



where r is chosen so that cl Vx ⊆ U . Let Wx ⊆ Vx denote the concentric open cube
with center x and edge r. Since K is compact, there exist finitely many cubes Wx whose
union contains K. Denote these by W1 , . . . , Wm and denote the corresponding cubes Vx by
V1 , . . . , Vm . By 0.14.4, for each i there exists a C ∞ function fi : Rd → R such that fi > 0
on Wi and fi = 0 on Wic . Set
m
X m
[ m
[
f := fi , V := Vi , and W := Wi .
i=1 i=1 i=1

Then f is nonnegative and C ∞ on Rd , f > 0 on W ⊇ K, and supp(f ) ⊆ cl(V ) ⊆ U . Set


a := minx∈K f (x). Since a > 0, by 0.14.3 there exists a C ∞ function g : R → [0, 1] such that
g = 0 on (−∞, 0] and g = 1 on [a, +∞). Now take ψ := g ◦ f .
The following theorem is proved as in 0.14.1 but using the above C ∞ version of Urysohn’s
lemma.
0.14.6 Theorem. Let K be a compact subset of Rd and let {Ui : i ∈ I} be an open cover
of K. Then there exists a finite subcover P{U1 , . . . , Up } of K and nonnegative functions
p
φi ∈ Cc∞ (Rd ) such that supp(φi ) ⊆ Ui and i=1 φi = 1 on K.
Preliminaries 39

0.15 Connectedness
A pair of open sets U , V in a topological space X is said to separate X if

X = U ∪ V, U 6= ∅, V 6= ∅, and U ∩ V = ∅.

The pair (U, V ) is then called a separation of X. The space X is said to be disconnected if
it has a separation, and connected if no separation exists. A subset E of X is disconnected
(connected) if it is disconnected (connected) as a subspace of X. Thus if E is disconnected,
then there exist sets U , V open in X such that (E ∩ U, E ∩ V ) is a separation of E.
In any topological space, the singletons {x} are trivially connected. In a discrete space
the only connected
√ √ the singletons. The set Q is not connected in R, since the open
sets are
sets (−∞, 2) ∩ Q and ( 2, +∞) ∩ Q separate Q.
0.15.1 Theorem. A topological space X is disconnected iff there exists a continuous function
from X onto {0, 1}. Equivalently, X is connected iff every continuous function from X into
{0, 1} is constant.
Proof. Assume that X is disconnected and let (U, V ) separate X. The function
(
0 if x ∈ U ,
g(x) =
1 if x ∈ V .

from X onto {0, 1} is easily seen to be continuous. Conversely, if h : X → {0, 1} is continuous


and surjective, then the open sets h−1 ((−1, 1/2)) and h−1 ((1/2, 2)) separate X.
0.15.2 Corollary. The nonempty, connected subsets of R are the intervals.
Proof. By the intermediate value theorem, no continuous function from an interval into
{0, 1} can be surjective. Therefore, intervals must be connected. On the other hand, if E is
a nonempty subset of R that is not an interval, then there exist real numbers a < c < b with
a, b ∈ E and c 6∈ E. The sets (−∞, c) and (c, +∞) then separate E.
0.15.3 Corollary. If f : X → Y is continuous and X is connected, then f (X) is connected.
Proof. If g : f (X) → {0, 1} is continuous, then g ◦ f : X → {0, 1} is continuous and so is
constant. Therefore, g must be constant.
0.15.4 Corollary. If A ⊆ X is connected and A ⊆ B ⊆ cl(A), then B is connected. In
particular, the closure of a connected set is connected.
Proof. Let g : B → {0, 1} be continuous. Then g|A is continuous and hence is constant.
Since B ⊆ cl A, g is constant. Therefore, B is connected.
T
0.15.5 Corollary.
S Let {Ei : i ∈ I} be a family of connected subsets of X. If i Ei 6= ∅,
then E := i Ei is connected.

Proof. Let g : E → {0, 1} be continuous. Then g Ei is constant for each i. Since there is a
point common to all Ei , the constant is the same for all i. Therefore, g is constant.
The component Cx of a member x of a topological space X is the union of all connected
subsets of X containing x. By 0.15.5, Cx is connected and is therefore the largest connected
set containing x. For example, the components in a discrete space are the singletons. The
following theorem summarizes the main properties of components.
40 Principles of Analysis

0.15.6 Theorem. Let X be a topological space.

(a) Cx is not properly contained in a connected set.

(b) The set of all distinct components of X is a partition of X.

(c) Cx is closed in X.

(d) In a normed space X, the components of an open set are open.

Proof. Part (a) follows directly from the definition of component. Part (b) follows from (a)
and 0.15.5. Part (c) follows from 0.15.4 and (a) by considering the closure of Cx . For (d) let
U ⊆ X be open and C a component of U . If x ∈ C and r is chosen so that Br (x) ⊆ U , then,
since Br (x) is connected, C ∪ Br (x) is connected (0.15.5), hence Br (x) ⊆ C by (a).
Part I

Measure and Integration


Chapter 1
Measurable Sets

1.1 Introduction
This chapter begins the development of Lebesgue integration, which constitutes Part
I of the text. The theory may be seen as arising from the need to overcome some of the
shortcomings of the Riemann integral, which is restrictive in both the kind of function that
may be integrated and the space over which the integration takes place. These shortcomings
make the Riemann integral unsuitable for certain applications, for example those involving
random parameters. A further complication with the Riemann theory concerns the integration
of a pointwise limit of a sequence of Riemann integrable functions, such limits sometimes
failing to be Riemann integrable. The removal of these limitations may be seen as a reason
for the wide applicability of the Lebesgue theory.
Nevertheless, the Riemann integral still occupies an important position in analysis. Indeed,
as we shall see, the set of Lebesgue integrable functions on [a, b] is the completion in a precise
sense of the set of Riemann integrable functions, much as the real number system is the
completion of the rational number system.
It is illuminating to compare the construction of the two integrals in terms of how the
domain [a, b] of an integrand f is partitioned. In the case of the Riemann integral, [a, b] is
partitioned into subintervals [xi−1 ,Pxi ] and a point x∗i is chosen in each. A suitable limit of
the corresponding Riemann sums i f (x∗i )∆xi then produces the Riemann integral of f .
By contrast, in the Lebesgue theory it is the range of the function that is partitioned into
subintervals, these inducing, via preimages under f , a partition of [a, b]. This partition will in
general not consist of intervals. However, the Lebesgue theory provides a way of “measuring”
the members of the partition. The Lebesgue integral is then constructed by multiplying
these measured values by (approximate) function values, summing, and taking limits.
The preceding discussion suggests (correctly) that a fundamental feature of the Lebesgue
theory is the notion of “measure” of a set. Such measures are constructed by starting with
a collection A of elementary sets, such as intervals in R or rectangles in R2 , and a set
function that assigns a natural “size” to each member of A, for example length in the case
of intervals and area in the case of rectangles. The collection A is then enlarged to a richer
class of sets that can still be “measured,” the so-called σ-field of measurable sets. Unlike A,
this collection is closed under standard set-theoretic operations, including countable unions
and intersections, a feature eventually resulting in limit theorems of a sort unavailable in
Riemann integration, these theorems underlying much of modern analysis. The first step
then in the construction of the Lebesgue integral is to develop the notion of measurable set
and measure, which is the goal of this chapter.

43
44 Principles of Analysis

1.2 Measurable Spaces


For a robust theory of integration that admits the standard combinatorial and limit
operations, one requires that the collections of measurable sets on which the integration is
based be closed under the usual set-theoretic operations. In this section we discuss the most
common of such collections.

Fields and Sigma Fields


Let X be a nonempty set. A field on X is a family F of subsets of X satisfying (a)–(c)
of the following. If F also satisfies (d), then F is called a σ-field:
(a) X ∈ F. (b) A ∈ F ⇒ Ac ∈ F.
[∞
(c) A, B ∈ F ⇒ A ∪ B ∈ F. (d) A1 , A2 , · · · ∈ F ⇒ An ∈ F.
n=1

Note that (a) and (b) imply that ∅ ∈ F. An induction argument using (c) shows that a field
F is closed under finite unions, that is,
A1 , . . . , An ∈ F ⇒ A1 ∪ · · · ∪ An ∈ F.
Of course, every field with only finitely many members is a σ-field, since in this case countable
unions reduce to finite unions. De Morgan’s law
c
A1 ∩ A2 ∩ · · · ∩ An = Ac1 ∪ Ac2 ∪ · · · ∪ Acn
together with (b) shows that a field is closed under finite intersections and thus, for example,
under the operation of symmetric difference defined by
A 4 B := (A ∪ B) \ (A ∩ B) = (A \ B) ∪ (B \ A).
Furthermore, every finite union of members of a field may be expressed as a disjoint union
of members of the field via the construction
n
[
Ak = A1 ∪ (A2 ∩ Ac1 ) ∪ · · · ∪ (An ∩ Ac1 ∩ · · · ∩ Acn−1 ). (1.1)
k=1

Similar remarks apply to σ-fields: Part (d) of the above definition asserts that a σ-field is
closed under countable unions, and an application of De Morgan’s law shows that a σ-field
is closed under countable intersections as well. As a consequence, a σ-field F is closed under
the operations of limit infimum and limit supremum defined, respectively, by
∞ \
[ ∞ ∞ [
\ ∞
lim An := Ak and lim An := Ak .
n n
n=1 k=n n=1 k=n

Moreover, every countable union of members of F may be expressed as a countable disjoint


union of members of F in the manner of (1.1):

[
An = A1 ∪ (A2 ∩ Ac1 ) ∪ · · · ∪ (An ∩ Ac1 ∩ · · · ∩ Acn−1 ) ∪ · · · . (1.2)
n=1
Members of a σ-field F on X are called F-measurable sets. The qualifier F is usually
dropped if the σ-field is understood and there is no possibility of confusion. The pair (X, F)
is called a measurable space. A finite or countably infinite sequence of disjoint measurable
sets with union A is called a measurable partition of A.
Measurable Sets 45

1.2.1 Examples.
(a) The power set P(X) is obviously a σ-field, as is the collection {∅, X}. A field clearly
cannot have exactly three members. All fields with exactly four members are of the form
{∅, X, A, Ac }.
(b) A subset A of X is said to be cofinite if Ac is finite. The collection F of all sets that
are either finite or cofinite is a field. If X is infinite, then F is not a σ-field (Ex. 1.9).
(c) A subset A of X is said to be cocountable if Ac is countable. The collection F of all
sets that are either countableSor cocountable is a σ-field. For example, to see that F is closed

under countable unions A = n=1 An , note that if each An is countable, then A is countable
and if some An is cocountable then A is cocountable. In either case, A ∈ F.
(d) If F is a field (σ-field) on X, then the trace
F ∩ E = {A ∩ E : A ∈ F}
is a field (σ-field) on E. For example, if A, B ∈ F, then the relations
(A ∪ B) ∩ E = (A ∩ E) ∪ (B ∩ E) and (A \ B) ∩ E = (A ∩ E) \ (B ∩ E)
show that A ∪ B, A \ B ∈ F. Note that F ∩ E ⊆ F iff E ∈ F, in which case F ∩ E is simply
the collection of all sets A ∈ F with A ⊆ E. ♦

Generated Sigma Fields


The intersection of a nonempty family of σ-fields on a nonempty set X is easily seen to
be a σ-field. In particular, if A is an arbitrary nonempty collection of subsets of X, then the
intersection σ(A) of all σ-fields on X containing A is a σ-field, called σ-field generated
by A. Note that there is at least one σ-field containing A, namely, P(X), hence σ(A) is
well-defined. Generated σ-fields have the important minimality property:
F a σ-field and A ⊆ F ⇒ σ(A) ⊆ F.
The field generated by A, denoted by ϕ(A), is defined in a similar manner and enjoys
the analogous minimality property.

S Let A = {A1 , A2 , . . .} be a countable partition of X. Then σ(A) consists


1.2.2 Example.
of all unions n∈S An , where S ⊆ N. (If S = ∅, then the union is defined to be ∅.)
To see this, note first that the collection F of all such unions is a σ-field. Indeed, F is
obviously closed under countable unions, and by disjointness
[ c [
An = c
An ,
n∈S n∈S

hence F is closed under complements as well. Since A ⊆ F ⊆ σ(A), the minimality property
implies that σ(A) = F. The analogous assertions hold for finite partitions of X. ♦

Borel Sets
Let X be a topological space. The σ-field generated by the collection of all open subsets
of X is called the Borel σ-field on X and is denoted by B(X). A member of B(X) is
called a Borel set. The minimality property of B(X) takes the following form:
If a σ-field F contains all open sets, then it contains all Borel sets.
Borel σ-fields provide a bridge between topology and measure theory, allowing, for example,
the entry of continuous functions into integration theory.
Since closed sets are complements of open sets, B(X) is also generated by the collection
of closed sets. For Euclidean space Rd , more can be said:
46 Principles of Analysis

1.2.3 Proposition. The σ-field B(Rd ) is generated by the collection


(a) OI of all bounded, open d-dimensional intervals (a1 , b1 ) × · · · × (ad , bd ).

(b) CI of all bounded, closed d-dimensional intervals [a1 , b1 ] × · · · × [ad , bd ].

(c) HI of all bounded, left-open d-dimensional intervals (a1 , b1 ] × · · · × (ad , bd ].

Proof. For ease of notation we prove the proposition for d = 1; the proof for the general
case is entirely similar.
(a) Let O denote the collection of all open sets in R. Since OI ⊆ O, by minimality we have
σ(OI ) ⊆ σ(O) = B(R). On the other hand, every member of O is a countable union of sets
in OI , hence O ⊆ σ(OI ) and so B(R) ⊆ σ(OI ).
(b) Let C denote the collection of all closed sets in R. As in part (a), σ(C
S I ) ⊆ σ(C) = B(R).
Moreover, every bounded open interval (a, b) may be expressed as n [a + 1/n, b − 1/n],
hence OI ⊆ σ(CI ). By part (a) and minimality, B(R) = σ(OI ) ⊆ σ(CI ).
S T
(c) From the representations (a, b) = n (a, b − 1/n] and (c, d] = n (c, d + 1/n), we see
that OI ⊆ σ(HI ) and HI ⊆ σ(OI ). By minimality, σ(OI ) ⊆ σ(HI ) and σ(HI ) ⊆ σ(OI ).
An application of (a) completes the argument.
The collection HI will figure prominently in the development of the Lebesgue integral on
Euclidean space Rd .

Extended Borel Sets


To deal with functions that take values in R, we need to augment B(R) with the sets

B ∪ {−∞}, B ∪ {∞}, B ∪ {−∞, ∞}, B ∈ B(R).

The collection of all such sets, together with the Borel subsets of Rd , is called the extended
Borel σ-field and is denoted by B(R). One easily checks that B(R) is indeed a σ-field with
trace B(R) on R. It may be shown that R has a natural topology whose open sets generate
B(R) (Exercise 2.30).

Product Sigma Fields


Let X1 , . . . , Xd be nonempty sets and set X := X1 × · · · × Xd . For arbitrary nonempty
collections Aj ⊆ P(Xj ) define

A1 × · · · × Ad = {A1 × · · · × Ad : Aj ∈ Aj , j = 1, . . . , d}.

If Fj is a σ-field on Xj , then the σ-field on X generated by F1 × · · · × Fd is called the


product σ-field and is denoted by F1 ⊗ · · · ⊗ Fd . Thus

F1 ⊗ · · · ⊗ Fd := σ(F1 × · · · × Fd ).

Members of F1 × · · · × Fd are called measurable rectangles.


1.2.4 Theorem. If Aj ⊆ P(Xj ), then

σ(A1 ) ⊗ · · · ⊗ σ(Ad ) = σ(A1 × · · · × Ad ). (1.3)

Proof. The inclusion ⊇ follows from σ(A1 ) ⊗ · · · ⊗ σ(Ad ) ⊇ A1 × · · · × Ad and minimality.


For the reverse inclusion, let Aj ∈ Aj , j = 2, . . . , d. Then

σ(A1 ) × {A2 } × · · · × {Ad } ⊆ σ (A1 × · · · × Ad ) . (†)


Measurable Sets 47

Indeed, the collection F1 of all B1 ∈ σ(A1 ) for which B1 × A2 × · · · × Ad ∈ σ (A1 × · · · × Ad )


is easily seen to be a σ-field containing A1 and so by minimality F1 = σ(A1 ).
Next, let B1 ∈ σ(A1 ) and Aj ∈ Aj , j = 3, . . . , d. By (†)

{B1 } × A2 × {A3 } × · · · × {Ad } ⊆ σ(A1 × · · · × Ad ).

Arguing as before, this time on the second coordinate, we see that

{B1 } × σ(A2 ) × {A3 } · · · × {Ad } ⊆ σ(A1 × · · · × Ad ).

We have now shown that

σ(A1 ) × σ(A2 ) × A3 · · · × Ad ⊆ σ(A1 × · · · × Ad ).

Continuing in this manner we eventually obtain the inclusion ⊆ in (1.3).


1.2.5 Corollary. Let d = d1 + · · · + dk , where dj ∈ N. Then

B(Rd ) = B(Rd1 ) ⊗ · · · ⊗ B(Rdk ). (1.4)

In particular,
B(Rd ) = B(R) ⊗ · · · ⊗ B(R) (d factors).
Proof. By definition, B(Rdj ) = σ(Oj ) and B(Rd ) = σ(O), where Oj is the collection of all
open subsets of Rdj and O is the collection of all open subsets of Rd . By the theorem,

σ(O1 × · · · × Ok ) = σ(O1 ) × · · · × σ(Ok ) = B(Rd1 ) ⊗ · · · ⊗ B(Rdk ).

It therefore suffices to show that

O1 × · · · × Ok ⊆ O ⊆ B(Rd1 ) ⊗ · · · ⊗ B(Rdk ); (†)

the desired equality (1.4) will then follow by minimality. The first inclusion in (†) follows
from the definition of the product topology of Rd1 × · · · × Rdk (the latter identified with
Rd ). For the second inclusion, recall that each U ∈ O is a countable union of open intervals
I = (a1 , b1 ) × · · · × (ad , bd ). Since each such interval may be written as Id1 × · · · × Idk ,
where Idj is a dj -dimensional open interval, U ∈ B(Rd1 ) ⊗ · · · ⊗ B(Rdk ). Therefore, (†) holds,
completing the proof.

Pi-Systems and Lambda-Systems


A collection P of subsets a set X is called a π-system if it is closed under finite
intersections. Clearly, every field is a π-system, as is the collection of all open (or closed)
intervals of R.
A collection L of subsets a set X is called λ-system if it has the following properties:
(a) X ∈ L.
(b) A, B ∈ L and A ⊆ B ⇒ B \ A ∈ L. (1.5)
(c) An ∈ L and An ↑ A ⇒ A ∈ L.

Note that (a) and (b) imply that a λ-system is closed under complements and contains
the empty set. The importance of λ-systems is that they provide an indirect method for
establishing various properties of certain collections of sets. (See, for example, 1.6.8.) The
method is based on Dynkin’s π-λ theorem, which makes a connection between π-systems,
λ-systems, and σ-fields.
48 Principles of Analysis

1.2.6 Theorem (Dynkin). Let L be a λ-system and P ⊆ L a π-system. Then σ(P) ⊆ L.

Proof. Let `(P) denote the intersection of all λ-systems containing P. Then `(P) is a λ-
system, as is easily verified, and `(P) ⊆ σ(P). If we show that `(P) is a σ-field, it will then
follow by minimality that σ(P) = `(P) ⊆ L, establishing the theorem.
To show that `(P) is closed under finite intersections, let A ∈ `(P) and define

LA := {B ∈ `(P) : A ∩ B ∈ `(P)}.

One easily checks that LA is a λ-system. Furthermore, if A ∈ P, then P ⊆ LA , so by


minimality `(P) ⊆ LA . Thus A ∩ B ∈ `(P) for all A ∈ P and B ∈ `(P). Fixing such a B we
have P ⊆ LB , hence by minimality `(P) ⊆ LB . Thus A, B ∈ `(P) ⇒ A ∩ B ∈ `(P).
Now let (En ) be a sequence in `(P). By the preceding result and induction,
n
[ \
n c
An := Ek = Ekc ∈ `(P).
k=1 k=1
S∞ S∞
By (c) of (1.5), k=1 Ek = n=1 An ∈ `(P). Therefore, `(P) is a σ-field, completing the
proof.

Exercises
1.1 Let A, B, C, An , Bn ⊆ X. Verify the following:

(a) 1A∆B = |1A − 1B |.


c
(b) A 4 B = Ac 4 B = A 4 B c .
(c) Ac 4 B c = A 4 B.

(d) A 4 B ∩ C = (A ∩ C) 4 (B ∩ C).
[ ∞  [ ∞  ∞
[
(e) An 4 Bn ⊆ An 4 Bn .
n=1 n=1 n=1

1.2 Let An , Bn ⊆ X. Verify the following:

(a) x ∈ limn An iff x ∈ An for all sufficiently large n.


(b) x ∈ limn An iff x ∈ An for infinitely many n.
(c) limn An ⊆ limn An .
c
(d) limn An = limn Acn .
c
(e) limn An = limn Acn .
(f) limn (An ∩ Bn ) ⊆ limn An ∩ limn Bn .
(g) limn (An ∪ Bn ) = limn An ∪ limn Bn .
(h) limn (An ∩ Bn ) = limn An ∩ limn Bn .
(i) limn (An ∪ Bn ) ⊇ limn An ∪ limn Bn .

Show that the inclusions in (c), (f), and (i) may be strict.

1.3 For An ⊆ X, write An → A if limn An = limn An = A. Let An → A and Bn → B. Show that


(a) An ∪ Bn → A ∪ B. (b) An ∩ Bn → A ∩ B. (c) Acn → Ac . (d) An 4 Bn → A 4 B.

1.4 Let An , A ⊆ X and set B = limn An and C = limn An . Prove that


(a) 1B = limn 1An . (b) 1C = limn 1An (c) An → A iff 1An → 1A .
Measurable Sets 49

1.5 Let {an } be a sequence in R and set An = (−∞, an ) and Bn = (an , ∞). Prove:
(a) x ∈ limn An ⇒ x ≤ limn an . (b) x < limn an ⇒ x ∈ limn An .
(c) x ∈ limn An ⇒ x ≤ limn an . (d) x < limn an ⇒ x ∈ limn An .
(e) x ∈ limn Bn ⇒ limn an ≤ x.

1.6 Determine all sets in the field on X = {1, 2, 3, 4, 5, 6} generated by the sets
(a) {1, 2}, {2, 3}, {3, 4}, {4, 5}. (b) {1, 2, 3}, {2, 3, 4}, {3, 4, 5}.
(c) {1, 2, 3, 4}, {2, 3, 4, 5}, {3, 4, 5, 6}.

1.7 Let F be a σ-field on X and E ⊆ X. Show that σ F ∪ {E} consists of all sets of the form


(A ∩ E) ∪ (B ∩ E ), A, B ∈ F.
c

1.8 Let F ⊆ P(X) such that X ∈ F and A \ B ∈ F whenever A, B ∈ F. Show that F is a field.

1.9 Show that if X is infinite, then the field consisting of all finite or cofinite sets is not a σ-field.

1.10 Let F1 , F2 , . . . be a sequence of σ-fields on X such that F1 ⊆ F2 ⊆ · · · . Show that F := ∞


n=1 Fn
S
is a field. Show by example that F need not be a σ-field.

1.11 Find examples of fields F and G on X = {1, 2, 3} such that F ∪ G is not a field.

1.12 Describe the σ-field F on (0, 1) generated by all singletons {x}, x ∈ (0, 1). Show that F is
contained in B(0, 1) and contains no proper open subinterval of (0, 1).

1.13 Let F be the collection of all finite disjoint unions of intervals [a, b) ⊆ [0, 1). Show that F is a
field on [0, 1) but not a σ-field.

1.14 Let A ⊆ P(X). Show that σ ϕ(A) = σ(A).




1.15 Let Ff denote the field consisting of the subsets of X that are either finite or cofinite. Show
that σ(Ff ) is the σ-field Fc consisting of the countable or cocountable subsets of X.

1.16 Show that B(Rd ) is generated by the collection


(a) K of all compact sets. (b) Ir of all intervals (a1 , ∞) × · · · × (ad , ∞), aj ∈ Q.

1.17 Let F be a field. Prove that the following are equivalent:


(a) F is a σ-field.
(b) ∞ n=1 An ∈ F for every sequence of disjoint sets An ∈ F.
S
S∞
(c) n=1 Bn ∈ F for every increasing sequence of sets Bn ∈ F.

1.18 Let A ⊆ P(X) and E ⊆ X. Prove that σ(A ∩ E) = σ(A) ∩ E.

1.19 Let X be a topological space and let E ⊆ X have the relative topology. Prove that B(X) ∩ E =
B(E).

1.20 [↓ 2.30] Let a, b ∈ R and let [a, b] and (a, b) have the relative topology from R. Show that
B([a, b]) consists of the sets B, B ∪ {a}, B ∪ {b}, and B ∪ {a, b} where B ∈ B (a, b) .


1.21 For j = 1, . . . , d, let Aj ⊆ P(Xj ) and Ej ∈ P(Xj ). Set E := E1 × · · · × Ed . Show that


σ(A1 ∩ E1 ) ⊗ · · · ⊗ σ(Ad ∩ Ed ) = σ(A1 × · · · × Ad ) ∩ E.

1.22 Let B ∈ B(Rd ), x ∈ Rd , and r ∈ R. Prove that B +x := {b+x : b ∈ B} and rB := {rb : b ∈ B}


are Borel sets.

1.23 Let A ⊆ P(X) and let F be the union of all σ-fields σ(C), where C is a countable subfamily of
A. Prove that F = σ(A).
50 Principles of Analysis

1.24 Let F = {B1 , . . . , Bm } be a finite field on X. Show that there exists a finite partition A of X
by sets in F such that every member of F is a union of members of A. JConsider C1 ∩ · · · ∩ Cm ,
where Cj = Bj or Bjc . K

1.25 Show that every infinite σ-field F has an infinite sequence of disjoint nonempty sets. Conclude
that F has cardinality at least that of the continuum. Conclude that no σ-field can have
cardinality ℵ0 . Find a field that has cardinality ℵ0 .

1.26 A nonempty collection M of subsets of X is a monotone class if for any sequence {An } in
M, An ↑ A or An ↓ A ⇒ A ∈ M. Carry out steps (a)–(f) below to prove the monotone class
theorem, due to Halmos: If F is a field, M is a monotone class, and F ⊆ M, then σ(F) ⊆ M.
(a) Show that a monotone class that is closed under finite unions (intersections) is closed under
countable unions (intersections).
(b) Let m(F) denote the intersection of all monotone classes containing F. Show that m(F) is a
monotone class.
(c) Show that A := {A ∈ m(F) : Ac ∈ m(F)} is monotone and m(F) = A. Conclude that m(F)
is closed under complements.
(d) Let B = {B ∈ m(F) : A ∪ B ∈ m(F) for all A ∈ F}. Show that B is a monotone class and
B = m(F). Conclude that A ∪ B ∈ m(F) for all B ∈ m(F) and all A ∈ F.
(e) Let C = {C ∈ m(F) : C ∪ B ∈ m(F) for all B ∈ m(F)}. Show that C is monotone and
C = m(F). Conclude that m(F) is closed under finite unions.
(f) Show that m(F) is closed under countable unions. Conclude that σ(F) ⊆ m(F) ⊆ M.

1.3 Measures
Set Functions
Let X be a nonempty set. A collection of subsets of X containing the empty set is
called a paving of X. A function µ on a paving A of X that takes values in R is called a
set function on A. Until Chapter 5, we consider only nonnegative set functions, that
is, those taking values in [0, ∞]. An important example is the function that assigns the
length b − a to intervals [a, b]. This set function and its d-dimensional generalization will be
examined in detail in §1.7.
Let µ be a nonnegative set function on a paving A and let A1 , A2 , . . . ∈ A. Then µ is
said to be

• monotone if A1 ⊆ A2 implies µ(A1 ) ≤ µ(A2 ).


Sn Pn
• finitely additive if A := k=1 Ak disjoint and A ∈ A implies µ(A) = k=1 µ(Ak ).
Sn Pn
• finitely subadditive if A := k=1 Ak ∈ A implies µ(A) ≤ k=1 µ(Ak ).
S∞ P∞
• countably additive if A := n=1 An disjoint and A ∈ A implies µ(A) = n=1 µ(An ).
S∞ P∞
• countably subadditive if A := n=1 An ∈ A implies µ(A) ≤ n=1 µ(An ).
• finite if µ(A) < ∞ for every A ∈ A.
• σ-finite if there exist pairwise disjoint X1 , X2 , . . . ∈ A with union X and µ(Xn ) < ∞.
• a measure on A if µ is countably additive and µ(∅) = 0.
Measurable Sets 51

If µ is a measure on a σ-field F, then the triple (X, F, µ) is called a measure space. A


member E of F that is a countable union of sets of finite measure is called a σ-finite set. If
µ(X) = 1, then µ is said to be a probability measure. Note that a measure on a field is
finitely additive: simply apply countable additivity to the sequence A1 , . . . , An , ∅, ∅, . . . .
Notation. In the sequel, if µ is a set function defined on intervals we write µ(a, b) for
µ((a, b)), µ[a, b] for µ([a, b]), etc. No confusion should arise from these abbreviations, as
context will make clear the intended meaning.

Properties and Examples of Measures


1.3.1 Proposition. A measure µ on a σ-field F is monotone and countably subadditive.
Moreover, for An ∈ F the following hold:
(a) (Continuity at A from below). An ↑ A implies µ(An ) ↑ µ(A).

(b) (Continuity at A from above). An ↓ A and µ(A1 ) < ∞ implies µ(An ) ↓ µ(A).

Proof. If A1 ⊆ A2 then µ(A2 ) = µ(A2 \ A1 ) + µ(A1 ) ≥ µ(A1 ), hence µ is monotone. For


subadditivity use (1.2), countable additivity, and monotonicity:
[
∞  ∞
X
µ Ak = µ(A1 ) + µ(A2 ∩ Ac1 ) + µ(A3 ∩ Ac1 ∩ Ac2 ) + · · · ≤ µ(Ak ).
k=1 k=1

S∞ µ(Ak ) < ∞ for all k. Set


Part (a) is clear if some Ak has infinite measure, so assume
A0 = ∅ and Ek = Ak \ Ak−1 . Then A is the disjoint union k=1 Ek , hence

X n
X  
µ(A) = µ(Ek ) = lim µ(Ak ) − µ(Ak−1 ) = lim µ(An ).
n n
k=1 k=1

For (b), note that A1 \ An ↑ A1 \ A, hence, by (a),

µ(A1 ) − µ(A) = µ(A1 \ A) = lim µ(A1 \ An ) = µ(A1 ) − lim µ(An ).


n n

The preceding proposition has a converse:


1.3.2 Proposition. Let µ be a finitely additive, nonnegative set function on a field F.
(a) If µ is continuous from below, then µ is a measure.

(b) If µ(X) < ∞ and µ is continuous at ∅ from above, then µ is a measure.

For (a), let {An } be a sequence of disjoint sets in F with union A ∈ F and set
Proof. S
n
Bn := k=1 Ak . Then Bn ∈ F and Bn ↑ A. By finite additivity and continuity from below,

X n
X
µ(Ak ) = lim µ(Ak ) = lim µ(Bn ) = µ(A).
n n
k=1 k=1

The proof of (b) is left as an exercise (1.39).


1.3.3 Examples.
(a) Set µ(∅) = 0 and µ(A) = ∞ if A 6= ∅. Then µ is a measure on P(X).
(b) Let X be an infinite set and define µ(A) = 0 if A is countable and µ(A) = ∞ otherwise.
Then µ is a measure on P(X).
52 Principles of Analysis

(c) Let X be uncountable and F the σ-field of countable or cocountable subsets of X (see
1.2.1(c)). Define µ(A) = 0 if A is countable and µ(A) = 1 if A is cocountable. Then µ is a
probability measure on F.
(d) Dirac measure. Let (X, F) be a measurable space. For x ∈ X and A ∈ F define
δx (A) = 1A (x). Then δx is a probability measure on F.
Pn
(e) If µj are measures on a σ-field F and aj ≥ 0, then j=1 aj µj is a measure on F. In
particular, a nonnegative linear combination of Dirac measures is a measure.
(f) If (X, F, µ) is a measure space and E ∈ F, then µE (A) := µ(A ∩ E) defines a measure
on F. Note that µE agrees with µ on the trace F ∩ E.
(g) Counting measure. Let X be a nonempty set. For A ⊆ X let µ(A) be the number of
elements in A if A is finite and µ(A) = ∞ otherwise. Then µ is clearly finitely additive on
P(X). To show that µ is a measure, let An ↑ A. If there exists an m such that Am = A,
then An = A for all n ≥ m and so, trivially, µ(An ) ↑ µ(A). On the other hand, if no such
m exists, then A must be infinite and Ank−1 $ Ank for some sequence of indices. Since
µ(Ank ) ≥ µ(Ank−1 ) + 1,

lim µ(An ) = lim µ(Ank ) = ∞ = µ(A).


n k

By 1.3.2, µ measure on P(X).


(h) Infinite series measure. For an arbitrary sequence (pn ) in [0, ∞), define
X
µ(E) = pk , E ⊆ N,
k∈E

where the sum may be infinite. (By convention, the sum over the empty set is zero.) The
rearrangement theorem for nonnegative series implies that µ is well-defined and finitely
additive. Let An ↑ A. If A is finite, then eventually An = A, so obviously
P∞ µ(An ) ↑ µ(A). If A
is infinite, then µ(A) may be written as an infinite series µ(A) = k=1 pnk . Let r < µ(A),
Pk
choose k such that i=1 pni > r, and choose m so that Am contains the indices n1 , . . . , nk .
Then µ(An ) ≥ µ(Am ) > r for all n ≥ m. Since r was arbitrary, µ(An ) → µ(A). By 1.3.2, µ
is a measure on P(N). Note that if pk ≡ 1, then µ is simply counting measure on N. ♦

Exercises
1.27 Let A ⊆ P(X) and ∅ ∈ A. Show that if µ is a countably additive, finite set function on A, then
µ(∅) = 0.

1.28 Verify that the set functions defined in 1.3.3 (c) and (d) are measures.

1.29 Give an example of a measure µ on a σ-field F and a sequence of sets An ∈ F decreasing to A


such that limn µ(An ) 6= µ(A).

1.30 [↑ 1.2.1] Let F be the field of finite or cofinite subsets of X and define µ(A) = 0 if A is finite and
µ(A) = 1 if A is cofinite. (a) Show that µ is finitely additive but in general is not countably
additive. (b) Show that µ is countably additive if X is uncountable.

1.31 Let µ be a finitely additive, nonnegative set function on a field F. Prove that if µ(A) and µ(B)
are finite, then |µ(A) − µ(B)| ≤ µ(A 4 B).

1.32 (Inclusion-exclusion I). Let µ be a finitely additive nonnegative set function on a field F. Prove
that µ(A) + µ(B) = µ(A ∪ B) + µ(A ∩ B).
Measurable Sets 53

1.33 Let µ be a finitely additive, nonnegative set function on a field F and let A, B ∈ F with
µ(B) = 0. Show that µ(A ∪ B) = µ(A \ B) = µ(A).

1.34 (Inclusion-exclusion II). Let µ be a finitely additive, nonnegative set function on a field F and
let A1 , . . . , An ∈ F with union A such that µ(A) < ∞. Prove that for n ≥ 2
n
X n
X n
X
µ(A) = µ(Ai ) − µ(Ai ∩ Aj ) + µ(Ai ∩ Aj ∩ Ak ) − · · · + (−1)n−1 µ(A1 ∩ · · · ∩ An ).
i=1 1≤i<j≤n 1≤i<j<k≤n

1.35 (Inclusion-exclusion III). Let µ be a finitely additive, nonnegative set function on a field F with
µ(X) < ∞ and let B1 , . . . , Bn ∈ F with intersection B. Prove that for n ≥ 2,
n
X n
X n
X
µ(B) = µ(Bi ) − µ(Bi ∪ Bj ) + µ(Bi ∪ Bj ∪ Bk ) − · · · + (−1)n−1 µ(B1 ∪ · · · ∪ Bn ).
i=1 1≤i<j≤n 1≤i<j<k≤n

F, µ) be a measure
1.36 Let (X, S P∞ space and let An ∈ F such that µ(Am ∩ An ) = 0 for m 6= n. Prove
that µ ∞ n=1 An = n=1 µ(An ).

1.37 [↓ 5.3.2] Let (X, F, µ) be a measure space and An ∈ F. Prove:



(a) µ limn An ≤ limn µ(An ).
 S 
(b) µ limn An ≥ limn µ(An ) if µ n An < ∞.
 P
(c) µ limn An = 0 if n µ(An ) < ∞.

1.38 Let (X, F) be a measurable space and let x1 , x2 ∈ X. For A ∈ P(X), define µ(A) = 1 if
{x1 , x2 } ⊆ A and µ(A) = 0 otherwise. Prove that µ is continuous from below. Is µ a measure?

1.39 Prove 1.3.2(b).

1.40 [↓ Ex. 3.3] Let µn be a sequence of measures on a σ-field F on X such that µn (A) ≤ µn+1 (A)
for all A ∈ F. Define the set function µ on F by µ(A) = limn µn (A). Prove that µ is a measure.

1.41 Let µn be a sequence of measures on a σ-field F on X and define µ on F by µ(A) = n µn (A).


P
Prove that µ is a measure.

1.42 Let (X, F, µ) be a finite measure space. Show that there can be at most countably many pairwise
disjoint sets of positive measure.

1.43 Let (X, F, µ) be a σ-finite measure space and E a collection of pairwise disjoint members of F.
Show that for any A ∈ F, µ(A ∩ E) > 0 for at most countably many members of E.

1.44 Let (X, F, µ) be a measure space and for A ∈ F define

µ0 (A) = sup{µ(B) : B ∈ F, B ⊆ A and µ(B) < ∞}.

Show that µ0 is a measure on F. Show also that µ0 = µ iff the following condition holds:
For each A ∈ F with µ(A) = ∞ there exists B ∈ F such that B ⊆ A and 0 < µ(B) < ∞.

1.45 Let (X, F, µ) be a measure space and {Ek } be a sequence in F. For fixed m ∈ N, let A denote
the set of all x such that x ∈ Ek for exactly m values of k; B the set of all x such that x ∈ Ek
P set of all x such that x ∈ Ek for at most
for finitely many and at least m values of k; and C the
m values of k. Prove that A, B, C ∈ F. If s(D) := ∞ k=1 µ(D ∩ Ek ), prove that

(a) µ(A) = s(A)/m. (b) µ(B) ≥ s(B)/m. (c) µ(C) ≤ s(C)/m.


54 Principles of Analysis

1.4 Complete Measure Spaces


A measure space (X, F, µ) is said to be complete if
M ∈ F, µ(M ) = 0, and N ⊆ M ⇒ N ∈ F.
Examples (a)–(c), (g), and (h) of 1.3.3 are complete measure spaces. In this section we
show that any measure space (X, F, µ) may be enlarged in a minimal way to produce a
complete measure space. The following simple example illustrates the basic idea behind the
construction.

1.4.1 Example. Let X = {1, 2, 3} and F = ∅, {1}, {2, 3}, X . The measure µ defined by
µ{1} = 1 and µ{2, 3} = 0 is not complete. However, by enlarging F to include {2}, {3} and
defining a new measure µ on the augmented σ-field so that µ{1} = 1 and µ{2} = µ{3} = 0,
we obtain an extension of (X, F, µ) that is complete. ♦

Completion Theorem
Here is the general technique for completing a measure space. Part (a) of the theorem
gives the construction and part (b) describes a minimality property of a completion.
1.4.2 Theorem. Let (X, F, µ) be a measure space. Define

Fµ := A ∪ N : A ∈ F, N ⊆ M ∈ F, µ(M ) = 0 and µ(A ∪ N ) := µ(A). (1.6)
(a) Fµ is a σ-field containing F and µ is a measure on Fµ that extends µ such that
(X, Fµ , µ) is complete.
(b) If (X, G, ν) is a complete measure space such that F ⊆ G and ν is an extension of µ,
then Fµ ⊆ G and the restriction of ν to Fµ is µ.
Proof. (a) To see that µ is well-defined, let A1 ∪ N1 = A2 ∪ N2 , where Nj ⊆ Mj , Aj , Mj ∈ F
and µ(Mj ) = 0. Then A1 ⊆ A2 ∪ M2 and A2 ⊆ A1 ∪ M1 , hence µ(A2 ) = µ(A1 ).
Clearly, F ⊆ Fµ . To see that Fµ is closed under complements note that in the notation
of (1.6)
(A ∪ N )c = (Ac ∩ M c ) ∪ (Ac ∩ N c ∩ M ), Ac ∩ M c ∈ F and Ac ∩ N c ∩ M ⊆ M.
S
For closure under countable unions, let Bn := An ∪ Nn ∈ Fµ and B := n Bn , where
Nn ⊆ Mn , An , Mn ∈ F, and µ(Mn ) = 0. Then

[ ∞
[ ∞
[
B = A ∪ N, where A := An and N := Nn ⊆ M := Mn .
n=1 n=1 n=1

Since µ(M ) = 0, B ∈ Fµ . Moreover, if the sets Bn are disjoint, then


X X
µ(B) = µ(A) = µ(An ) = µ(Bn ).
n n

Therefore, Fµ is a σ-field and µ is a measure on Fµ . Clearly, (X, Fµ , µ) is complete.


(b) Let A, N and M be as in (1.6). Then ν(M ) = µ(M ) = 0, hence, since (X, G, ν) is
complete, N ∈ G. Therefore, A ∪ N ∈ G and µ(A ∪ N ) = µ(A) = ν(A) = ν(A ∪ N ), so ν is
an extension of µ.
Note that the completion theorem produces nothing new if (X, F, µ) is already complete,
since then the sets N in the above construction are already in F.
Measurable Sets 55

Null Sets
The sets N in the completion theorem, namely the subsets of F-measurable sets M
with measure zero, are called µ-null sets. Such sets appear throughout measure theory,
frequently in the following context:
A property P (x) of points x ∈ X is said to hold µ-almost everywhere, abbreviated
µ-a.e., if the set of all x for which P (x) is false is a µ-null set, that is,
µ {x ∈ X : P (x) is false} = 0.
In this case we also say that the property P (x) holds for µ-almost all x, abbreviated µ-a.a.
x. If the measure is clear from context we drop the qualifier µ and simply write a.e. or a.a.
For example, if a function f in 1.4.1 is defined by f (j) = j, then f = 1 a.e. For an example
with far reaching implications, consider functions fn , f : X → C. The notation fn → f a.e.
then means that
µ{x ∈ X : lim fn (x) 6= f (x)} = 0.
n
This type of convergence will be examined in Chapter 2.

Exercises
1.46 [↑ 1.3.3(d).] Let (X, F) be a measurable space, E a finite subset of X, and µ :=
P
x∈E δx .
Describe the completion of (X, F, µ).
1.47 Show that if G ⊆ F are sigma fields, µ is a measure on F, and ν = µ G , then Gν ⊆ Fµ and


ν = µ G .
ν

1.48 [↑ 1.44] Prove that µ0 = µ 0 .


1.49 Let {Fi : i ∈ I} be a collection of σ-fields on X and µ a measure on G := σ Si Fi . For each i
S 

let µi denote the restriction of µ to F . Show that Gµ = Hµ , where H := σ


i
i Fµi .
i

1.50 Let ν and η be measures on a σ-field F and set µ := ν + η. Show that Fµ ⊆ Fν ∩ Fη and
µ := ν + η on Fµ .
1.51 [↑ 1.3.3(f)] Let E ∈ F. Prove that FµE ∩ E = Fµ ∩ E and µE = µE on FµE .
1.52 Let (X, F, µ) be a finite measure space. For E ⊆ X define
µ∗ (E) = sup{µ(A) : A ∈ F, A ⊆ E} and µ∗ (E) = inf{µ(B) : B ∈ F, B ⊇ E}.
Show that Fµ = {E ⊆ X : µ∗ (E) = µ∗ (E)}.

1.5 Outer Measure and Measurability


As mentioned in the introduction to the chapter, the construction of a measure generally
begins with a collection A of “elementary” subsets of X and a set function µ on A, and
culminates with an extension of µ to a measure on a σ-field containing A. Of course, there
may be several σ fields containing A, P(X) being an obvious one. However, in many cases
it is impossible to extend µ to P(X). For example, in §1.7 it is shown that the length
set-function on the collection bounded intervals of R cannot be extended to a measure on
P(R). In general, the best one can hope for is an extension of µ to the completion of the
σ-field generated by A. This is accomplished by first constructing a related set function
on P(X), called outer measure, and then restricting this function to the class of so-called
measurable sets. The details follow.
56 Principles of Analysis

Construction of an Outer Measure


An outer measure on a nonempty set X is a nonnegative, monotone, countably sub-
additive set function µ∗ on P(X) such that µ∗ (∅) = 0. Clearly, every measure on P(X) is
an outer measure. In particular, the set function that assigns 0 to the empty set and ∞ to
every nonempty set is an outer measure. By contrast, the set function that assigns 0 to the
empty set and 1 to every nonempty set is an outer measure that is not a measure.
The following proposition describes a general class of outer measures which are typically
not measures. The outer measure µ∗ defined in (1.7) is said to be generated by the pair
(A, µ). The sequences (An ) in (1.7) are said to cover E.
1.5.1 Proposition. Let A be a paving of X and let µ be a nonnegative set function on A
such that µ(∅) = 0. Define a set function µ∗ on P(X) by
X
∞ ∞
[ 

µ (E) = inf µ(An ) : An ∈ A and E ⊆ An , (1.7)
n=1 n=1

where inf ∅ := ∞. Then µ∗ is an outer measure.


Proof. That µ∗ (∅) = 0 can be seen by taking as a cover the sequence A1 = A2 = · · · = ∅.
Monotonicity of µ∗ follows from the observation that if A ⊆ B, then
S∞ every cover of B is a
coverPof A. For countable subadditivity, let En ∈ P(X) and E := n=1 En . We may assume

that n=1 µ∗ (En ) < ∞. Given ε > 0, for each n choose a cover {An,j }j of En in A such
that
X∞
µ(An,j ) < µ∗ (En ) + ε/2n .
j=1

Since {An,j }n,j is a cover of E,


X X
µ∗ (E) ≤ µ(An,j ) < µ∗ (En ) + ε.
n,j n
P
Thus µ∗ (E) ≤ n µ∗ (En ), as required.

Carathéodory’s Theorem
Let µ∗ be any outer measure on X. A subset E of X is said to be µ∗ -measurable if

µ∗ (C) = µ∗ (C ∩ E) + µ∗ (C ∩ E c ) for all C ⊆ X. (1.8)

The definition asserts that E “splits” the outer measure of each subset C of X, a property
that may be seen as a precursor to finite additivity. Note that by subadditivity the inequality
≤ in (1.8) always holds. Thus the measurability criterion singles out precisely those sets E
for which the inequality ≥ in (1.8) is satisfied. The collection of all µ∗ -measurable subsets of
X is denoted by M(µ∗ ). Here is the main result regarding outer measure.

µ be an outer measure on X. Then M := M(µ ) is


∗ ∗
1.5.2 Theorem (Carathéodory). Let

a σ-field and the restriction µ := µ M is a complete measure.
Proof. Clearly, ∅, X ∈ M, and since E and E c appear symmetrically in (1.8), E c ∈ M iff
E ∈ M. Furthermore, if µ∗ (E) = 0, then, by monotonicity,

µ∗ (C ∩ E) + µ∗ (C ∩ E c ) ≤ µ∗ (E) + µ∗ (C ∩ E c ) = µ∗ (C ∩ E c ) ≤ µ∗ (C),

hence E ∈ M. Thus M contains all sets of outer measure zero.


Measurable Sets 57

It remains to show that for any sequence (En ) in M,



[ [
∞  ∞
X

(a) En ∈ M and (b) µ En = µ∗ (En ) if the union is disjoint.
n=1 n=1 n=1

The verifications of (a) and (b) are carried out in the following steps. For convenience, call a
set C for which the equality in (1.8) holds a test set for E.

(1) M closed under finite unions and hence is a field.


JLet E, F ∈ M. Take any set C as a test set for E and take C ∩ E c as a test set for
F . This gives

µ∗ (C) = µ∗ (C ∩ E) + µ∗ (C ∩ E c ) and
µ∗ (C ∩ E c ) = µ∗ (C ∩ E c ∩ F ) + µ∗ (C ∩ E c ∩ F c ).

Combining these we have

µ∗ (C) = µ∗ (C ∩ E) + µ∗ (C ∩ E c ∩ F ) + µ∗ (C ∩ E c ∩ F c )
 
≥ µ∗ (C ∩ E) ∪ (C ∩ E c ∩ F ) + µ∗ (C ∩ E c ∩ F c ) (by subadditivity)
   
= µ∗ C ∩ (E ∪ F ) + µ∗ C ∩ (E ∪ F )c .

Therefore, E ∪ F ∈ M.K

(2) C ⊆ X, E, F ∈ M and E ∩ F = ∅ ⇒ µ∗ C ∩ (E ∪ F ) = µ∗ (C ∩ E) + µ∗ (C ∩ F ).
JUsing C ∩ (E ∪ F ) as a test set for E we have
     
µ∗ C ∩ (E ∪ F ) = µ∗ C ∩ (E ∪ F ) ∩ E + µ∗ C ∩ (E ∪ F ) ∩ E c
= µ∗ (C ∩ E) + µ∗ (C ∩ F ).K
S∞ P∞
(3) If the sets En are disjoint, then F := n=1 En ∈ M and µ(F ) =
n=1 µ(En ).
Sn
JLet Fn := k=1PnEk and C ⊆ X. By steps (1) and (2) and induction, Fn ∈ M and
µ∗ (C ∩ Fn ) = k=1 µ∗ (C ∩ Ek ). Therefore, by monotonicity,
n
X
µ∗ (C) = µ∗ (C ∩ Fn ) + µ∗ (C ∩ Fnc ) ≥ µ∗ (C ∩ Ek ) + µ∗ (C ∩ F c )
k=1

for all n and so



X
µ∗ (C) ≥ µ∗ (C ∩ Ek ) + µ∗ (C ∩ F c ) ≥ µ∗ (C ∩ F ) + µ∗ (C ∩ F c ) ≥ µ∗ (C).
k=1

This shows that F ∈ M. Taking C = F verifies countable additivity.K


S
(4) If En ∈ M, then n En ∈ M.
JUse (1), (3) and (1.2).K
58 Principles of Analysis

Exercises
1.53 Define an outer measure µ∗ on P(X) by µ∗ (∅) = 0 and µ∗ (E) = 1 if E 6= ∅. Find M(µ∗ ).
1.54 Let OI denote the collection of all bounded open subintervals of R and let µ := δ0 be the Dirac
measure at 0 on OI . Show that the outer measure µ∗ generated by (OI , µ) is the Dirac measure
at 0 on P(R). Find M(µ∗ ).
1.55 Let X be an uncountable set and define µ∗ (E) = 0 if E = ∅ and µ∗ (E) = 1 otherwise. Show
that µ∗ (E) = 0 or 1 according as E is countable or uncountable. Show also that M(µ∗ ) is the
σ-field of sets that are countable or cocountable.
1.56 [↑ 1.3.3(f)] Let µ be a monotone set function on a field F. For E ∈ F, let µE denote the set
function on F defined by µE (A) = µ(E ∩ A) and let (µE )∗ be the outer measure generated by
(F, µE ). Prove that (µ∗ )E = (µE )∗ .
1.57 [↓ 1.8.1.] Let A and B be pavings of X such that each contains sequence with union X. Let
µ be a measure on A ∪ B and let µ∗a and µ∗b be the outer measures generated by (A, µ) and
(B, µ), respectively. Suppose that
µ∗a (E) = µ∗b (E) = µ(E) ∀ E ∈ A ∪ B. (†)
Prove that µ∗a = µ∗b . Show that assertion fails if the condition in (†) is not assumed.
1.58 Let µ be an outer measure on X, E ⊆ X, and A ∈ M(µ∗ ) with E ∩ A = ∅. Show that

µ∗ (E ∪ A) = µ∗ (E) + µ(A).
1.59 Let µ∗ be an outer measure on X, E ⊆ X, and A, B ∈ M(µ∗ ) with A ∩ B = ∅. Show that
µ∗ E ∩ (A ∪ B) = µ∗ (E ∩ A) + µ(E ∩ B). Show that the conclusion holds for countable disjoint


unions as well.
1.60 Let µ a nonnegative set function on a paving A of X with µ(∅) = 0, and let µ∗ be the outer
measure generated by (A, µ). Prove that E ∈ M(µ∗ ) for any E ⊆ X satisfying
µ∗ (A) = µ∗ (A ∩ E) + µ∗ (A ∩ E c ) for all A ∈ A.

1.6 Extension of a Measure


We have seen that a suitably defined pair (A, µ) generates an outer measure µ∗ and that
the restriction of µ∗ to the σ-field M(µ∗ ) of measurable sets is a complete measure. A more
intimate connection between µ and µ∗ is possible if certain additional conditions are imposed
on (A, µ). For this we need the following definitions.
A nonempty collection A of subsets X is called a
• semiring if A is a π-system andSfor any A, B ∈ A, there exist finitely many disjoint
n
members Cj of A with A \ B = j=1 Cj .
• ring if and A, B ∈ A implies A ∪ B, A \ B ∈ A.
Every ring is a π system and hence a semiring, since A ∩ B = A \ (A \ B) The collection of
all bounded intervals on R is a semiring that is not a ring. A ring that contains X is closed
under complements and hence is a field. If (X, F, µ) is a measure space, then the collection
of all members of F with finite measure is a ring that obviously need not be a field.
In this section we show that a measure µ on a semiring A may be extended to a measure
on σ(A) and that under suitable conditions the extension is unique and possesses certain
approximation and completeness properties.
Measurable Sets 59

The Measure Extension Theorem


Let A be a semiring on a set X, µ a measure on A, and µ∗ the outer measure generated
by (A, µ). The proof of the measure extension theorem is based on the following lemmas.

1.6.1 Lemma. The set Au of all finite disjoint unions of members of A is a ring.
Proof. Let A, B ∈ Au , say
m
[ n
[
A= Aj , Aj ∈ A, and B = Bk , Bk ∈ A (disjoint unions).
j=1 k=1

B ∈ Au , for each j and k choose finitely many disjoint sets Cijk ∈ A such
To see that A \ S
that Aj \ Bk = i Cijk . Then Aj \ Bk ∈ Au and
m
[ m \
[ n m \
[ n [
A\B = Aj ∩ B c = Aj \ B k = Cijk .
j=1 j=1 k=1 j=1 k=1 i

Since this is a disjoint union of members of A, A \ B ∈ Au .


To show that A ∪ B ∈ Au , write A ∪ B S as the disjoint union (A \ B) ∪ (B \ A) ∪ (A ∩ B)
and note that A ∩ B is the disjoint union j, k Aj ∩ Bk of members of A.
1.6.2 Lemma. Define a set function µu on Au by
[
m  m
X
µu Aj = µ(Aj ), Aj ∈ A (disjoint union).
j=1 j=1

Then µu is a well-defined measure on Au and µu A = µ.
Sm Sn
Proof. To show that µu is well-defined, let j=1 Aj = k=1 Bk be disjoint unions of members
Sn Sm
of A. Then Aj = k=1 Aj ∩ Bk and Bk = j=1 Aj ∩ Bk , hence
n
X m
X
µ(Aj ) = µ(Aj ∩ Bk ) and µ(Bk ) = µ(Aj ∩ Bk ).
k=1 j=1

Summing, we obtain
m
X m X
X n n
X
µ(Aj ) = µ(Aj ∩ Bk ) = µ(Bk ).
j=1 j=1 k=1 k=1

To show countable additivity, let E1 , E2 . . . ∈


SmAu be disjoint with union E ∈ Au . Choose
disjoint sets A1 , . . . , Am ∈ A such thatSE = i=1 Ai , and for each k choose disjoint sets
mk
Bk,1 , . . . , ∈ Bk,mk ∈ A such that Ek = j=1 Bk,j . Then
m
[ m m
[ [k
Ek = E ∩ Ek = Ai ∩ Ek = Ai ∩ Bk,j , (disjoint unions).
j=1 i=1 j=1

By definition of µu ,
m
X mk
m X
X
µu (E) = µ(Ai ) and µu (Ek ) = µ(Ai ∩ Bk,j ). (α)
i=1 i=1 j=1
60 Principles of Analysis

Also, for each i,



[ ∞ m
[ [k
Ai = Ai ∩ E = Ai ∩ Ek = Ai ∩ Bk,j (disjoint unions),
k=1 k=1 j=1

hence, by the countable additivity of µ,


∞ X
X mk
µ(Ai ) = µ(Ai ∩ Bk,j ). (β)
k=1 j=1

By (α) and (β) and a rearrangement,


∞ X
m X
X mk ∞ X
X mk
m X ∞
X
µu (E) = µ(Ai ∩ Bk,j ) = µ(Ai ∩ Bk,j ) = µu (Ek ).
i=1 k=1 j=1 k=1 i=1 j=1 k=1

1.6.3 Lemma. The outer measures generated by (A, µ) and (Au , µu ) are the same.
Proof. Let E ⊆ X. Typical sums in the definitions of µ(E) and µu (E) are, respectively,

X ∞
[ ∞
X ∞
[
s= µ(An ), An ∈ A, E ⊆ An , and t = µu (Bn ), Bn ∈ Au , E ⊆ Bn .
n=1 n=1 n=1 n=1

Since A ⊆ Au , every sum s is also a sum t. On the other hand, since each Bn is a finite
disjoint union of members of A and µu is additive, every t may be decomposed and written
as an s. The infima over these sums are therefore the same.
We may now prove
1.6.4 Theorem. Let A be a semiring on a set X, µ a measure on A, µ∗ the outer measure
generated by (A, µ), and M = M(µ∗ ) the σ-field of µ∗ -measurable sets. Then σ(A) ⊆ M
and the measure µ∗ M is an extension of µ.1
Proof. By the last lemma, we may assume that A is a ring. To show that A ⊆ M(µ∗ ), let
A ∈ A and C ⊆ X. We show that
µ∗ (C ∩ A) + µ∗ (C ∩ Ac ) ≤ µ∗ (C). (†)
S∞
Let Cn ∈SA such that C ⊆ n=1 Cn . Since
S∞ A is a ring, Cn ∩ A, Cn ∩ A ∈ A. Moreover,
c

C ∩ A ⊆ n=1 (Cn ∩ A) and C ∩ Ac ⊆ n=1 (Cn ∩ Ac ), so

X ∞
X
µ∗ (C ∩ A) ≤ µ(Cn ∩ A) and µ∗ (C ∩ Ac ) ≤ µ(Cn ∩ Ac ).
n=1 n=1

Adding we have

X ∞
X ∞
X
µ∗ (C ∩ A) + µ∗ (C ∩ Ac ) ≤ µ(Cn ∩ A) + µ(Cn ∩ Ac ) = µ(Cn ).
n=1 n=1 n=1

Since the cover (Cn ) of C was arbitrary, (†) holds. S



To show that µ∗ A = µ, let A, An ∈ A with A ⊆ n=1 An . Then

X ∞
X
µ(A) ≤ µ(A ∩ An ) ≤ µ(An ).
n=1 n=1

Taking infima over all such sequences {An } yields µ(A) ≤ µ∗ (A). On the other hand, the
sequence
A, ∅, ∅, . . . is a cover of A by members of A, hence µ∗ (A) ≤ µ(A). Therefore,

µ A = µ, completing the proof of the theorem.
1 We frequently denote this extension also by µ, depending on context.
Measurable Sets 61

Approximation Property of the Extension


1.6.5 Theorem. Let E ∈ σ(A) with µ(E) < ∞. Then for each ε > 0 there exist disjoint
sets A1 , . . . , An ∈ A such that
 [n 
µ E4 Aj < ε.
j=1
P
Proof. Choose a cover {Bn } of E in A such that n µ(Bn ) < µ(E) + ε/2. Define

E1 = B1 and En = Bn ∩ B1c · · · ∩ Bn−1


c
= (Bn \ B1 ) ∩ · · · ∩ (Bn \ Bn−1 ), n ≥ 2.
P∞
The sets En are disjoint and cover E. Choose n so large that j=n+1 µ(Bj ) < ε/2. From
the inclusion
n
[  [n    [n   [∞  ∞
[
E4 Ej = Ej \ E ∪ E \ Ej ⊆ E c ∩ Ej ∪ Ej
j=1 j=1 j=1 j=1 j=n+1

we have
 n
[   ∞
[   [
∞  X∞ ∞
X
c
µ E4 Ej ≤ µ E ∩ Ej + µ Ej ≤ µ(Bj ) − µ(E) + µ(Ej )
j=1 j=1 j=n+1 j=1 j=n+1

< ε/2 + ε/2 = ε.

Noting that each Ej is a disjoint union of members of A (because A is a semiring), we


obtain the desired approximation.

Completeness of the Extension



1.6.6 Theorem. If (A, µ) is σ-finite, then M(µ∗ ) is the completion of σ(A), µ .
Proof. Let F = σ(A). Since M(µ∗ ) is complete, by minimality Fµ ⊆ M(µ∗ ). For the reverse
inclusion, assume first that µ(X) < ∞. Let E ∈ M(µ∗ ) and for each n choose sequences
j=1 and {Bn,j }j=1 in A such that
{An,j }∞ ∞


X ∞
[
µ∗ (E c ) ≤ µ(An ) ≤ µ(An,j ) ≤ µ∗ (E c ) + 1/n, where An := An,j ⊇ E c , and
j=1 j=1

X ∞
[
µ∗ (E) ≤ µ(Bn ) ≤ µ(Bn,j ) ≤ µ∗ (E) + 1/n, where Bn := Bn,j ⊇ E.
j=1 j=1

Then Bn , Acn ∈ σ(A), Acn ⊆ E ⊆ Bn , and

µ(Bn ) → µ∗ (E), µ(Acn ) = µ(X) − µ(An ) → µ(X) − µ∗ (E c ) = µ∗ (E). (†)

Next, let

[ ∞
\
A= Acn and B = Bn .
n=1 n=1

Then Acn ⊆ A ⊆ E ⊆ B ⊆ Bn , hence from (†), µ(B \ A) = 0. Setting M = B \ A and


N = E \ A we have E = A ∪ NS, N ⊆ M , A, M ∈ F, and µ(M ) = 0 and so E ∈ Fµ .

In the general
case, let X = n=1 Xn , where Xn ∈ A and µ(Xn ) < ∞. Set An = A ∩ Xn

and µn = µ An . Then An is a semiring on Xn (Ex. 1.62) and µn is a measure on An , so
the outer measure µ∗n generated by (An , µn ) is a measure on Fn := σ(An ) = F ∩ Xn with
62 Principles of Analysis

completion M(µ∗n ). By Ex. 1.62 again, M(µ∗n ) = M(µ∗ ) ∩ Xn and µ∗n is the restriction
of µ∗ to M(µ∗n ). Now let E ∈ M(µ∗ ). By the preceding paragraph, for each n there exist
Mn , An ∈ Fn with µn (Mn ) = 0 and Nn ⊆ Mn such that E ∩ Xn = An ∪ Nn . Setting

[ ∞
[ ∞
[
A= An , M = Mn , and N = Nn ,
n=1 n=1 n=1

we have E = A ∪ N , N ⊆ M , M, A ∈ F and µ(M ) = 0, hence E ∈ Fµ .


1.6.7 Remark. The σ-finite hypothesis in the completeness theorem cannot be removed.
For example, let A = {∅, R} with µ(∅) = 0 and µ(R) = ∞. Then µ∗ (C) = ∞ for any C 6= ∅,
hence, trivially, µ(C) = µ(C ∩ E) + µ(C ∩ E c ) for all E ⊆ R, that is, M(µ∗ ) = P(R). On the
other hand, since the only set of measure zero is the empty set, the completion of σ(A) = A
is just A. ♦

Uniqueness of the Extension


Uniqueness is an immediate consequence of the following more general result:
1.6.8 Theorem. Let (Y, P) be a π-system and let µ1 and µ2 be measures on σ(P) that are
σ-finite on P. If µ1 P = µ2 P , then µ1 = µ2 .
Proof. The proof uses Dynkin’s π-λ theorem. Suppose first that Y ∈ P and µ1 (Y ) < ∞.
Let L = {E ∈ σ(P) : µ2 (E) = µ1 (E)}. We claim that L is a λ-system. Indeed, property
(a) of (1.5) holds by assumption and (c) holds by continuity from below. To verify (b), let
A, B ∈ L with A ⊆ B. Then
µ2 (B \ A) = µ2 (B) − µ2 (A) = µ1 (B) − µ1 (A) = µ1 (B \ A),
verifying the claim. Since P ⊆ L, by Dynkin’s theorem σ(P) ⊆ L. This proves the theorem
for the case µ1 (Y ) < ∞.
Now let (Yn ) be a disjoint sequence in P with union Y and µ1 (Yn ) < ∞ for all n.
Applying the result of the first paragraph to the restriction of the measures to Yn , we see
that µ1 (A ∩ Yn ) = µ2 (A ∩ Yn ) for all A ∈ σ(P) and all n. Now use countable additivity to
complete the proof.
Applying Theorem 1.6.8 to the current setting we have
1.6.9 Theorem. If (A, µ) is σ-finite, then the extension of µ to σ(A) is unique.
1.6.10 Remarks. Without the σ-finite hypothesis the conclusion of 1.6.9 may fail. For
example, let A be the semiring of all bounded intervals and take µ to be the measure on
B(R) = σ(A) that assigns the value ∞ to every nonempty set in A (hence µ∗ (E) = ∞ for
every nonempty E ⊆ R). If ν is counting measure on B(R), then µ 6= ν, yet the measures
agree on A. Note also that µ (vacuously) has the approximation property, but ν does not.
The conclusion of 1.6.9 may also fail if A is not a semiring. For example, let A be the
collection of all intervals (a, b] with b − a = 1. If µ(A) is the number of integers in A ∈ B(R)
and λ is Lebesgue measure on B(R) (see §1.7), then µ = λ on A but not on B(R). ♦
The following consequence of 1.6.8 will be needed later.
1.6.11 Theorem. Let ν be any measure on σ(A) that is σ-finite on A. Then
X
∞ [∞ 
ν(E) = inf ν(An ) : An ∈ A and E ⊆ An , E ∈ σ(A).
n=1 n=1

Proof. Let ν denote the outer measure generated by (ν|A , A). Then the measures ν ∗ |σ(A)

and ν agree on the π-system A and so are equal.


Measurable Sets 63

Exercises
1.61 Let Ai be a semiring on Xi , i = 1, 2. Show that A1 × A2 is a semiring.
1.62 Let µ be a measure on a semiring A ⊆ P(X) and E ∈ A
(a) Prove that A ∩ E is a semiring consisting of the members of A that are subsets of E.
(b) Let ν be the restriction of µ to A ∩ E and let µ∗ and ν ∗ be the outer measures generated
by (X, A, µ) and (E, A ∩ E, ν). Show that ν ∗ is the restriction of µ∗ to P(E).
(c) Prove that M(ν ∗ ) = M(µ∗ ) ∩ E.
1.63 Let µ be as in 1.6.4 and let ν be a measure on σ(A) that equals µ on A.
(a) Show that ν(E) ≤ µ(E) for all E ∈ σ(A). (1.6.10 shows equality may not hold.)
(b) Show that ν(E) = µ(E) for all E ∈ σ(A) with µ(E) < ∞. J Assume that A is a ring (how?).
Choose A ∈ A such that E ⊆ A and µ(A) < µ(E) + ε. Then ν(E) + ν(A \ E) < ν(E) + ε. K
1.64 Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated by
(A, µ). Prove that for any E ⊆ X there exists A ∈ σ(A) such that E ⊆ A and µ∗ (E) = µ(A).
1.65 [↑ 1.64] Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated
by (A, µ). Prove the weak inclusion-exclusion principle
µ∗ (E ∪ F ) + µ∗ (E ∩ F ) ≤ µ∗ (E) + µ∗ (F ), E, F ⊆ X.

1.66 [↑ 1.64] Let µ and ν be measures on a semiring A ⊆ P(X) and let µ∗ and ν ∗ be the outer
measures generated by (A, µ) and (A, ν), respectively. Prove that (µ + ν)∗ = µ∗ + ν ∗ and
M(µ∗ ) ∩ M(ν ∗ ) ⊆ M(µ∗ + ν ∗ ). Show that the inclusion may be strict.
1.67 [↑ 1.64, 1.40] Let µ and µn be σ-finite measures on a semiring A ⊆ P(X) with µn ↑ µ on σ(A).
Let µ∗ , µ∗n be the outer measures generated by (A, µ) and (A, µn ) Prove that µ∗n ↑ µ∗ on P(X).
1.68 [↑ 1.66, 1.67] Let µn be measures on a semiring A on X and define µ(A) = ∞ n=1 µn (A) (A ∈ A).
P
Let µ∗Pand µ∗n be the outer measures generated by (A, µ) and (A, µn ), respectively. Prove that
µ∗ = ∞ ∗
n=1 µn .

1.69 [↑ 1.64] Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated
by (A, µ). Prove that µ∗ is continuous from below. Why doesn’t this imply that µ∗ is a measure
on P(X)?
1.70 [↑ 1.64] Let µ be a measure on a semiring A ⊆ P(X) and let µ∗ be the outer measure generated
by (A, µ). Suppose that µ∗ (X) < ∞. Show that E ∈ M(µ∗ ) iff µ(X) = µ∗ (E) + µ∗ (E c ).

1.7 Lebesgue Measure


The Volume Set Function
Recall that HI denotes the semiring of bounded, left open d-dimensional intervals
(a, b] := (a1 , b1 ] × · · · × (ad , bd ], a := (a1 , . . . , ad ), b := (b1 , . . . , bd ),
where −∞ < aj ≤ bj < ∞. Define the d-dimensional volume of (a, b] by
Yd
λ(a, b] = λd (a, b] := (bj − aj ).
j=1

In this section we apply the results of §1.6 to the pair (HI , λ) to construct d-dimensional
Lebesgue measure. The following lemma is key to the construction.
64 Principles of Analysis

1.7.1 Lemma. Let H, H1 , . . . , Hm ∈ HI .


Sm Pm
(a) If H1 , . . . , Hm are disjoint and H = j=1 Hj , then λ(H) = j=1 λ(Hj ).
Sm Pm
(b) If H ⊆ j=1 Hj , then λ(H) ≤ λ(Hj ).
j=1
Sm Pm
(c) If H1 , . . . , Hm are disjoint and H ⊇ j=1 Hj , then λ(H) ≥ j=1 λ(Hj ).
Proof. For ease of notation and exposition, we prove the lemma for d = 2, in whichScase the
m
intervals are rectangles. Let H = (a, b] × (c, d]. We may assume in (b) that H = j=1 Hj ,

d
H7 H6
y3 H1

y2 H5
Ri,j
y1 H2
H3 H4
c
a x1 x2 x3
b
FIGURE 1.1: Pairwise disjoint interval grid of H.

otherwise replace Hj by Hj ∩ H. Thus in each case the rectangles Hj are contained in H,


hence the coordinates of their vertices form partitions

{x0 := a < x1 < . . . < xp := b} and {y0 := c < y1 < . . . < yq := d}

of [a, b] and [c, d], respectively. These partitions generate a grid of disjoint subrectangles
Ri,j = (xi , xi+1 ] × (yj , yj+1 ] with union H such that each Hk is a union of such subrectangles.
The procedure for case (a) is illustrated in Figure 1.1. Since
p−1
X q−1
X
b−a= (xi+1 − xi ) and d − c = (yj+1 − yj ),
i=0 j=1

we have, by the definition of λ,


X
p−1  X
q−1  Xp−1 X
q−1
λ(H) = (xi+1 − xi ) (yj+1 − yj ) = λ(Ri,j ). (1.9)
i=0 j=0 i=0 j=0

Similarly, X
λ(Hk ) = λ(Ri,j )
(i,j):Ri,j ⊆Hk

so that X X X
λ(Hk ) = λ(Ri,j ). (1.10)
k k (i,j):Ri,j ⊆Hk

Now compare (1.9) and (1.10). In (a), every Ri,j is contained


Pm in exactly one Hk , hence the
rectangles in (1.10) appear exactly once and so λ(H) =P k=1 λ(Hk ). In (b), a rectangle Ri,j
m
could be contained in more than one Hk , so λ(H) ≤ Pk=1 λ(Hk ). Finally, in (c) not every
m
Ri,j is necessarily contained in an Hk , hence λ(H) ≥ k=1 λ(Hk ).
Measurable Sets 65

1.7.2 Lemma. The volume set function λ is countably additivity on HI .


Proof. Part (a) of 1.7.1 S gives finite additivity. Let {Hj } be a sequence
Pn of disjoint members

of HI such that H := j=1 Hj ∈ HI . By 1.7.1(c), λ(H) ≥ j=1 λ(Hj ) for all n, hence
P∞
λ(H) ≥ j=1 λ(Hj ).
For the reverse inequality, let ε > 0, and for each j let Hjε denote the member of HI
obtained by replacing each coordinate subinterval (c, d] of Hj by (c − δj , d + δj ], where
δj is chosen so that λ(Hjε ) < λ(Hj ) + ε/2j . Then the collection of intervals int Hjε is
an open covering of the compact set cl H, so there exists an m ∈ N such that H ⊆
int H1ε ∪ · · · ∪ int Hm
ε
⊆ H1ε ∪ · · · ∪ Hm
ε
. By 1.7.1(b),

X
λ(H) − ε < λ(H ε ) ≤ λ(H1ε ) + · · · + λ(Hm
ε
)≤ λ(Hj ) + ε.
j=1
P∞
Letting ε → 0 yields λ(H) ≤ j=1 λ(Hj ), establishing countable additivity.

Construction of the Measure


Since σ(HI ) = B(Rd ), we may invoke 1.6.4 using the outer measure
X ∞ ∞
[ 

λ (E) := inf λ(An ) : An ∈ HI and E ⊆ An , E ⊆ R d ,
n=1 n=1

to obtain
1.7.3 Theorem. The volume set function λ on HI has a unique extension to B(Rd ).
Moreover, M(λ∗ ) is the completion of B(Rd ).

The members of M(λ∗ ) are called Lebesgue measurable sets and λ := λ∗ M(λ∗ ) is
called Lebesgue measure on Rd .

Exercises
1.71 Let I ∈ HI . Show that λ(I) = λ(int I) = λ(cl I). Also, in the definition
X∞ ∞
[ 

λ (E) := inf λ(An ) : An ∈ A and E ⊆ An , E ⊆ Rd ,
n=1 n=1

where A = HI , show that the infimum is unchanged if A is taken to be OI , CI , O := the set of


open sets of Rd , or K := the set of compact subsets of Rd .
1.72 Let N ⊆ Rd with λ(N ) = 0. Show that N c is dense in R.
1.73 (Translation invariance of λ). Let E ⊆ Rd and x ∈ Rd . Show that
(a) λ∗ (x + E) = λ∗ (E) (b) E ∈ M(λ) ⇒ x + E ∈ M(λ).
1.74 (Dilation property of λ). Let E ⊆ R and r ∈ R. Show that
d

(a) λ∗ (rE) = |r|d λ∗ (E) (b) E ∈ M(λ) ⇒ rE ∈ M(λ).


1.75 Show that for any ε > 0 there exists an open set U dense in Rd such that λ(U ) < ε.
1.76 Let A, B ⊆ [0, 1], where B ∈ M(λ) and λ(B) = 1. Show that λ∗ (A) = λ∗ (A ∩ B).
1.77 Let E ⊆ R with 0 < λ(E) < ∞ and let 0 < r < 1. Show that there exists an interval [a, b]

such
P that λ (E ∩ [a, b]) > r(b − a). JLet In be closed, bounded intervals that cover E with
−1
n λ (In ) < r λ(E). K
1.78 Show that the graph G := {(x, f (x)) : x ∈ R} of a continuous function f is a Borel set with
two-dimensional Lebesgue measure zero.
66 Principles of Analysis

1.8 Lebesgue-Stieltjes Measures


A measure on B(Rd ) that is finite on bounded, d-dimensional intervals is called a
Lebesgue-Stieltjes measure. For example, Lebesgue measure λd is a Lebesgue-Stieltjes
measure. Lebesgue-Stieltjes measures may be constructed from so-called distribution func-
tions, discussed below. Before we describe the construction, we discuss some approximation
properties possessed by these measures.

Regularity
The following theorem complements the approximation property 1.6.5.
1.8.1 Theorem. Let µ be a Lebesgue-Stieltjes measure on Rd and let E ∈ B(Rd ). Then
(a) µ(E) = inf{µ(U ) : U open and U ⊇ E}.

(b) µ(E) = sup{µ(K) : K compact and K ⊆ E}.


Proof. Assume first that E is bounded. Let ε > 0. By 1.6.11 (taking A = OI , say), there
sequence of bounded, open, d-dimensional intervals Ij with union U ⊇ E such that
exists a P
µ(U ) ≤ j µ(Ij ) < µ(E) + ε, verifying (a).
P∞ with cl(E) ⊆ J. Choose
To verify (b) in the bounded case, let J be a bounded open interval
a sequence of open intervals Vk with union V ⊇ J \ E such that k=1 µ(Vk ) < µ(J \ E) + ε/2.
We may assume that Vk ⊆ J, otherwise replace Vk by Vk ∩ J. By subadditivity

X
µ(V ) ≤ µ(Vk ) ≤ µ(J \ E) + ε/2 = µ(J) − µ(E) + ε/2.
k=1

Set K = J \ V . Since K ⊆ E ⊆ cl(E) ⊆ J, K = cl(E)


 \ V . Therefore, K is compact and
µ(K) = µ(J) − µ(V ) ≥ µ(J) − µ(J) − µ(E) + ε/2 = µ(E) − ε/2, verifying (b).

V (gray)

K
E
J

FIGURE 1.2: Construction of K.


Now suppose E is unbounded. Choose a sequence of bounded sets En ∈ M(µ) such that
En ↑ E. Let ε > 0. For each n, use the first part of the proof to choose a compact set Kn
and an open set Un with finite measure such that
Kn ⊆ En ⊆ Un , µ(Un ) − µ(En ) < ε/2n and µ(En ) − µ(Kn ) < ε.
S∞ S
Set U := n=1 Un . Then U is open, E ⊆ U , and U \ E ⊆ n (Un \ En ). If µ(E) < ∞, then
X
µ(U ) − µ(E) = µ(U \ E) ≤ µ(Un \ En ) < ε,
n

and for sufficiently large n,


µ(E) − µ(Kn ) = µ(E \ En ) + µ(En \ Kn ) < ε,
verifying (a) and (b) in this case. On the other hand, if µ(E) = ∞, then (a) clearly holds
and (b) holds as well because then µ(En ) ↑ ∞ and µ(Kn ) > µ(En ) − ε.
Measurable Sets 67

One-Dimensional Distribution Functions


A nondecreasing, right continuous function F : R → R is called a distribution function.
Such functions arise naturally in probability theory (see Chapter 18). The connection between
Lebesgue-Stieltjes measures and distribution functions is described in the following theorem,
the proof of which is given below.
1.8.2 Theorem. For every Lebesgue-Stieltjes measure µ on R, there exists a distribution
function F such that
µ(a, b] = F (b) − F (a) for all a < b. (1.11)
Any two distributions that satisfy (1.11) for the same µ differ by a constant. Conversely,
every distribution function F gives rise to a unique Lebesgue-Stieltjes measure µ on B(R)
satisfying (1.11).
Here are three common examples:
1.8.3 Examples.
(a) The Dirac measure δ0 on B(R) has distribution function F = 1[0,∞) .
P
(b) Let (cn ) and (pn ) be sequences in R with pn > 0 and n pn < ∞. Define
X
F (x) = pn ,
n:cn ≤x

where the sum is taken over all indices n for which cn ≤ x. (If there are no such indices,
the sum is defined to be 0.) Note that because the order of summation is irrelevant, F is
well-defined. The Lebesgue-Stieltjes measure corresponding to F is given by
X
µ(B) = pn for all Borel sets B.
n:cn ∈B

The distribution in (a) is a special case, obtained by taking p1 = 1, pn = 0 for n ≥ 2, and


cn = 0 for all n.
(c) Let f be continuous and nonnegative on R. Define
Z x
F (x) = F (0) + f (t) dt,
0

where F (0) is arbitrary. The Lebesgue-Stieltjes measure corresponding to F is dµ = f dt.


(See Chapter 3.) ♦

Proof of Theorem 1.8.2. For the first part of the theorem, define F : R → R as follows:
Let F (0) be arbitrary and set
(
F (0) + µ(0, x] if x > 0,
F (x) :=
F (0) − µ(x, 0] if x < 0.

By considering cases, we see that for a < b, F (b) − F (a) = µ(a, b]. Therefore, F is
nondecreasing and right continuous. If also G(b) − G(a) = µ(a, b] for all a < b, then
F (x) − F (0) = G(x) − G(0) for all x, hence F = G + F (0) − G(0).
For the converse, let F : R → R be a distribution function. To construct the Lebesgue-
Stieltjes measure defined by F , we apply the results of §1.6 to (HI , µ), where µ is the
set function on HI given by (1.11). Thus the proof of the theorem will be complete if we
show that µ is countably additive on HI . The following lemmas, analogous to those of §1.7,
establish this.
68 Principles of Analysis

1.8.4 Lemma. Let H, H1 , . . . , Hm ∈ HI .


Sm Pm
(a) If H1 , . . . , Hm are disjoint and H = j=1 Hj , then µ(H) = j=1 µ(Hj ).
Sm Pm
(b) If H ⊆ j=1 Hj , then µ(H) ≤ µ(Hj ). j=1
Sm Pm
(c) If H1 , . . . , Hm are disjoint and H ⊇ j=1 Hj , then µ(H) ≥ j=1 µ(Hj ).
Proof. Let H = (a, b] and Hj = (aj , bj ], where a1 < a2 < · · · < am . In (a) there can be no
“gaps” or “overlaps,” that is, a1 = a, bm = b, and bj = aj+1 . Therefore,
m
X m−1
X
µ(Hj ) = [F (aj+1 ) − F (aj )] + F (b) − F (am ) = F (b) − F (a) = µ(H).
j=1 j=1
Sm
In (b), we may assume that H = j=1 Hj , otherwise we could replace Hj by Hj ∩ H. As
in (a), a1 = a, bm = b, and aj+1 ≤ bj . However, since the intervals are no longer disjoint
it may happen that aj+1 < bj for some j, as illustrated in Figure 1.3. Form intersections
of overlapping intervals, thus partitioning (a, b] into a collection {Ii } of disjoint half-open
intervals, as shown in the figure. Each Hj is a union of some of these intervals so by (a)

a1 = a a2 b1 a3 b2 a4 b3 b4 = b
I1 I2 I3 I4 I5 I6 I7

FIGURE 1.3: Construction of partition.

X X
µ(H) = µ(Ii ) and µ(Hj ) = µ(Ii ).
i i:Ii ⊆Hj

Since an Ii may be contained in more than one Hj


X X X
µ(Ii ) ≤ µ(Ii ).
i j i:Ii ⊆Hj
P
Therefore, µ(H) ≤ j µ(Hj ), proving (b). The proof of (c) is similar.
1.8.5 Lemma. The set function µ is countably additive on HI .
Proof. By 1.8.4(a), µSis finitely additive. Let Hj =P(aj , bj ] be disjoint members of HI
∞ m
and let H = (a, b] = j=1 Hj . By 1.8.4(c), µ(H) ≥ j=1 µ(Hj ) for all m, hence µ(H) ≥
P∞
j=1 µ(Hj ). For the reverse inequality, let ε > 0 and by right continuity at a choose r ∈ (a, b)
so that F (r) ≤ F (a) + ε/2. Then
µ(r, b] = F (b) − F (r) ≥ F (b) − F (a) − ε/2 = µ(H) − ε/2. (†)
Similarly, for each j choose rj > bj such that F (rj ) ≤ F (bj ) + ε/2j , so
µ(aj , rj ] = F (rj ) − F (aj ) ≤ F (bj ) − F (aj ) + ε/2j = µ(Hj ) + ε/2j . (‡)
The open intervals
Sm (aj , rj ) cover [r, b], hence by compactness there exists an m ∈ N such
that (r, b] ⊆ j=1 (aj , rj ]. By (†), (‡), and 1.8.4(b),

X ∞
X
µ(H) ≤ ε/2 + µ(r, b] ≤ ε/2 + µ(aj , rj ] ≤ ε + µ(Hj ).
j=1 j=1

Letting ε → 0 yields the desired inequality.


Measurable Sets 69

Higher Dimensional Distribution Functions
As in the one-dimensional case, there is a close connection between Lebesgue-Stieltjes
measures on B(Rd ) and certain real-valued functions on Rd . The technical details are more
intricate, however, and depend on the following construct:
The ith coordinate difference operator on functions F : Rd → R is defined by
bi
4a F (x1 , . . . , xd ) = F (x1 , . . . , xi−1 , bi , xi+1 , . . . , xd ) − F (x1 , . . . , xi−1 , ai , xi+1 , . . . , xd ).
i

For example, consider the function F (x1 , x2 , . . . , xd ) = x1 x2 . . . xd . For 1 ≤ i ≤ d and ai < bi ,


the difference operators may be applied successively to obtain the following:
b1 bd b1 bd−1
4a 1
· · · 4ad F (x1 , x2 , . . . , xd ) = 4a1 · · · 4ad−1 (x1 · · · xd−1 )(bd − ad )
b1 bd−2
= 4a1 · · · 4ad−2 (x1 · · · xd−2 )(bd−1 − ad−1 )(bd − ad )
..
.

= (b1 − a1 ) · · · (bd − ad ).

b1 bd
Thus 4a1 · · · 4ad F (x1 , x2 , . . . , xd ) is the Lebesgue measure of the d-dimensional interval
(a1 , b1 ] × · · · × (ad , bd ]. This sort of connection holds more generally and is described in the
theorem below. For the statement of the theorem we need the following definitions:
A function F : Rd → R is a distribution function if it is nondecreasing in the sense
that
b b
4a11 · · · 4add F (x1 , . . . , xd ) ≥ 0, ai < bi , i = 1, . . . , d,
and right continuous in the sense that

xi,n ↓n xi , i = 1, . . . , d ⇒ F xn,1 , . . . , xn,d → F (x1 , . . . , xd ).

Here are some standard distribution functions:


1.8.6 Examples.
(a) Let Fi be a distribution function on R, i = 1, . . . , d. The function

F (x1 , x2 , . . . , xd ) := F1 (x1 )F2 (x2 ) · · · Fd (xd )

is a distribution function on Rd such that


d
Y
b1 bd
4a 1
· · · 4ad F (x1 , . . . , xd ) = [Fi (bi ) − Fi (ai )].
i=1

The function F (x1 , x2 , . . . , xd ) = x1 x2 · · · xd discussed above is a special case.


(b) Let f be a nonnegative, continuous function on Rd . Then
Z x1 Z xd
F (x1 , . . . , xd ) := ··· f (t1 , . . . , td ) dtd · · · dt1
−∞ −∞

is a distribution function on Rd (provided the improper integral is finite) such that


Z b1 Z bd
b b
4a11 · · · 4add F (x1 , . . . , xd ) = ··· f (t1 , . . . , td ) dtd · · · dt1 .
a1 ad
70 Principles of Analysis

(c) If µ is a finite measure on B(Rd ), then

F (x1 , . . . , xd ) = µ((−∞, x1 ] × · · · × (−∞, xd ])

defines a distribution function on Rd . ♦


The following theorem may be proved using a combination of ideas developed earlier in
the construction of Lebesgue measure and Lebesgue-Stieltjes measures. For a proof, the
reader is referred to [1] or [5].
1.8.7 Theorem. Let µ be a Lebesgue-Stieltjes measure on B(Rd ). Then there exists a
function F : Rd → R such that for all ai < bi
 b1 bd
µ (a1 , b1 ] × · · · × (ad , bd ] = 4a1 · · · 4ad F (x1 , . . . , xd ). (1.12)

Conversely, given a distribution function F : Rd → R, there exists a unique Lebesgue-Stieltjes


measure on B(Rd ) such that (1.12) holds for all ai < bi (i = 1, . . . , d).

Exercises
1.79 Describe the Lebesgue-Stieltjes measure for each of the following distribution functions.
(a) F (x) = bxc, the greatest integer function.
(b) F (x) = x1[0,1) + 1[1,∞] .

1.80 Show that the sum of finitely many distribution functions and the product of finitely many
nonnegative distribution functions are distribution functions.

1.81 Verify that the function in 1.8.3(b) is a distribution function. Prove also that F is left continuous
at a iff a 6= cn for every n.

1.82 For any monotone function F : R → R and −∞ ≤ a < b ≤ ∞, define

F (a+) := lim F (x) and F (b−) := lim F (x)


x→a+ x→b−

and set
F (−∞) := F ((−∞)+) and F (∞) := F (∞−).
Let F be a distribution function and µ the associated Lebesgue-Stieltjes measure. Prove the
following, when defined:
(a) µ(a, b) = F (b−) − F (a).
(b) µ[a, b) = F (b−) − F (a−).
(c) µ[a, b] = F (b) − F (a−).
Prove also that µ{x} = 0 iff F is continuous at x.

1.83 Let µ be a finite Lebesgue-Stieltjes measure on B(R) such that µ {x} = 0 for all x. Show that


any distribution function F corresponding to µ is uniformly continuous on R.

1.84 Show that a monotone function f : R → R has countably many discontinuities. Conclude that
if µ is a Lebesgue-Stieltjes measure, then there exist at most countably many x ∈ R such that
µ({x}) > 0. JFor each t ∈ R, define at = limx→t− f (x) and bt = limx→t+ f (x). Then at < bt iff
f is discontinuous at t.K

1.85 Let µ be a Lebesgue-Stieltjes measure on R with a continuous distribution function and let
A ∈ B(R) with µ(A) > 0. Prove that for each b ∈ (0, µ(A)) there exists a Borel set B ⊆ A such
that µ(B) = b. JUse the intermediate value theorem on G(x) = µ A ∩ [−n, x] for suitable nK.
Measurable Sets 71

*1.9 Some Special Sets


In this section we construct subsets of R that illustrate some of the finer points of Lebesgue
and Borel measurability.

An Uncountable Set with Lebesgue Measure Zero


The Cantor ternary set C is constructed as follows: Remove from I := [0, 1] = I0,1 the
“middle third” open interval (1/3, 2/3), leaving closed intervals I1,1 and I1,2 with union C1
and total length 2/3. Next, remove from each of the intervals I1,1 and I1,2 the middle third
open interval, leaving closed intervals I2,1 , I2,2 , I2,3 , and I2,4 with union C2 and total length
S2k
4/9 = (2/3)2 . By induction, one obtains a decreasing sequence of closed sets Ck = j=1 Ik,j
T
such that λ(Ck ) = (2/3)k . (See Figure 1.4.) Then C := k Ck is closed and λ(C) = 0.

I0,1
0 1
I1,1 I1,2
.0... .2...
I2,1 I2,2 I2,3 I2,4
.00... .02... .20... .22...
I3,1 I3,2 I3,3 I3,4 I3,5 I3,6 I3,7 I3,8
.000... .002... .020... .022... .200... .202... .220... .222...

FIGURE 1.4: Middle thirds construction.

To show that C is uncountable, consider the ternary representation of a number x ∈ [0, 1]:

X
x = .d1 d2 . . . = dk 3−k , where dk ∈ {0, 1, 2}. (1.13)
k=1

By induction, using the fact that x ∈ Ik−1,j ⇒ Ik,2j−1+dk /2 , one shows that x ∈ C iff x has
an expansion with even digits (see Figure 1.4). Define ϕ : C → [0, 1] by

ϕ .d1 d2 . . . (ternary) = .e1 e2 . . . (binary), where dk ∈ {0, 2} and ek = dk /2.

The function ϕ is not one-to-one, but by removing from C the countable set of all numbers
with ternary representations ending in a sequence of zeros we obtain a set D on which ϕ is
one-to-one. Since ϕ(D) = (0, 1), C is uncountable.

Non-Lebesgue-Measurable Sets
We show the following:
Every Lebesgue measurable set A with λ(A) > 0
contains a set that is not Lebesgue measurable.
S
Since A = n∈Z A ∩ [n, n + 1], we may suppose that A is bounded. Define an equivalence
relation on A by x ∼ y iff x − y ∈ Q. Let B be the subset of A obtained by choosing exactly
one point from each distinct equivalence class. (The existence of B requires the axiom of
choice.) Now observe that the sets r + B, r ∈ Q, are disjoint. Indeed, if (r + B) ∩ (s + B) 6= ∅,
then r + x = s + y for some x, y ∈ B, so x = y and r = s. Moreover, since A is bounded
72 Principles of Analysis

so is B + [0, 1]. Let (rn ) be an enumeration of the rationals in [0, 1] and assume that B is
measurable. Then
[  X X
∞>λ (B + rn ) = λ(B + rn ) = λ(B),
n n n

which implies that λ(B) = 0. But A ⊆ B + Q, hence


[  X
λ(A) ≤ λ (B + r) = λ(B + r) = 0,
r∈Q r∈Q

contradicting that λ(A) > 0. Therefore, B cannot be Lebesgue measurable.

A Lebesgue Measurable, Non-Borel Set


For this example, we first construct the Cantor function f : I → I, where I = [0, 1].
The construction is based on the Cantor set C described earlier in the section. For each n,
denote by Jn,k , k = 1, . . . , 2n−1 , the open intervals in increasing order that were removed
from I in the construction of C, that is, the intervals that form the complement of Cn in
[0, 1]. For example, J2,1 = (1/9, 2/9), J2,2 = (1/3, 2/3), and J2,3 = (7/9, 8/9), hence

[0, 1] = I2,1 ∪ J2,1 ∪ I2,2 ∪ J2,2 ∪ I2,3 ∪ J2,3 ∪ I2,4 .

Define a continuous function fn : I → I so that

fn (0) = 0, fn (1) = 1, fn = k/2n on Jn,k ,

and fn is linear on the complementary intervals In,j . Since |fn (x) − fn+1 (x)| ≤ 1/2n+1 , the

3
4

1
2

1
4

J21 J22 J23


1 2 1 4 5 2 7 8
9 9 3 9 9 3 9 9 1

FIGURE 1.5: The functions f2 and f3 .

sequence {fn } is uniformly Cauchy and so converges to a continuous function f , the Cantor
function.
To construct the desired non-Borel set, note first that since fn (0) = 0, fn (1) = 1, and fn is
nondecreasing on [0, 1], f also has these properties. Thus, by the intermediate value theorem,
f (I) = I. Since the values of f on the intervals Jn,k are already assumed at the endpoints
and since these endpoints lie in C, f (Jn,k ) contributes nothing additional to the range of f ,
Measurable Sets 73

hence f (C) = I. Now set g(x) = (f (x)+x)/2, x ∈ I. Then g is continuous, strictly increasing,
g(0) = 0, and g(1) = 1, hence g(I) = I. It follows that g : I → I is a homeomorphism, hence
g(C) is closed. Thus g(I \ C) is a proper nonempty open subset of I and so has positive
Lebesgue measure. Moreover, g takes the interval Jn,k , on which  f is constant, to an open
interval half its length, so by countable additivity λ g(I \ C) = λ(I \ C)/2 = 1/2 and
therefore λ(g(C)) = 1/2. Now let E be a subset of g(C) that is not Lebesgue measurable and
let A := g −1 (E). Then A ⊆ C and so is Lebesgue measurable with λ(A) = 0. However, A
cannot be a Borel set since g maps Borel sets onto Borel sets. (This is proved in Chapter 2.)
1.9.1 Remark. While the intricate nature of the construction of A might lead one to
believe that such sets are rare, there are in fact many more Lebesgue measurable sets than
Borel sets. Indeed, since the Cantor set C is uncountable and every subset of C is Lebesgue
measurable, the collection of Lebesgue measurable sets has cardinality 2c , where c is the
cardinality of the continuum. On the other hand, it may be shown that B(R) has only
cardinality c. (See, for example, [38].) ♦

Exercises
1.86 Show that (R, B(R), λ) is not complete.

1.87 Carry out the steps below to prove following assertion: If A ⊆ R has positive Lebesgue measure
then the set A − A := {x − y : x, y ∈ A} contains an interval (−r, r) for some r > 0.
(a) Show that it suffices to consider the case A compact.
(b) Choose an open set U ⊇ A such that λ(U ) < 2λ(A) (how?). Define a distance function
d : U → R by d(x) = inf{|x − y| : y ∈ U c }. Show that d is continuous and positive. Conclude
that d has a minimum r > 0 on A.
(c) Show that |x| < r ⇒ x + A ⊆ U ⇒ (x + A) ∩ A 6= ∅. Conclude that (−r, r) ⊆ A − A.

1.88 [↑ 1.87] Show that the only subgroup of (R, +) that has positive Lebesgue measure is R.

1.89 Let (an ) be a sequence in (0, 1) and set bn := 1 − an . Mimic the construction of the Cantor
ternary set by removing the middle part of [0, 1] of length a1 , leaving two intervals with union
E1 , each of length b1 /2, then removing the middle part of length a2 b1 /2 from these leaving
T four
intervals with union E2 , each of length b1 b2 /4, and so forth. The intersection E := n En is

0 b1 b1 1
2 2
a1
b1 b2 b1 b2 b1 b2 b1 b2
4 b1 a2 4 4 b1 a2 4
2 2

FIGURE 1.6: Generalized middle thirds construction.


called a generalized Cantor set. Verify the following:

Y n
Y
(a) E is closed and λ(E) = bn := lim bj .
n
n=1 j=1

(b) The interior of E is nonempty.


(c) If r > 0 and eventually an ≥ r (as in the Cantor ternary set), then λ(E) = 0.
∈ (0, 1), there
(d) For each a Q Pexists a generalized Cantor set with Lebesgue measure a.
∞ ∞
JConsider ln n=1 bn = n=1 ln bn . K

1.90 Let A be the set of all x ∈ [0, 1] having a decimal expansion .d1 d2 . . . with no digit equal to 3.
Show that A is uncountable, A ∈ B(R), and λ(A) = 0.
Chapter 2
Measurable Functions

In this chapter we consider functions that are measurable with respect to a given σ-field
F, that is, functions f for which (in the real-valued case) the sets {x ∈ X : f (x) ∈ (a, b)}
are F-measurable. As we shall see, such functions are natural candidates for integration
with respect to Lebesgue measure. We begin with the more general notion of measurable
transformation.

2.1 Measurable Transformations


Let (X, F) and (Y, G) be measurable spaces. By standard properties of preimages,
T −1 (G) := {T −1 (E) : E ∈ G} is a σ-field. If the inclusion T −1 (G) ⊆ F holds, then T is said
to be F/G-measurable, or simply measurable if the σ-fields F and G are understood. In
this case, T is also called a measurable transformation or measurable mapping. To
indicate such measurability we also say that T : (X, F) → (Y, G) is measurable.
Constant functions are measurable transformations, since the only preimages are ∅ and
X. Also, every function T : X → Y is both P(X)/G-measurable and F/{∅, X}-measurable.

General Properties
2.1.1 Proposition. If T : (X, F) → (Y, G) and S : (Y, G) → (Z, H) are measurable, then
S ◦ T : (X, F) → (Z, H) is measurable.

Proof. This follows from (S ◦ T )−1 (A) = T −1 S −1 (A) , A ∈ H.
The following result characterizes measurability in terms of the generators of a σ-field. It
will play an important role in what follows.
 
2.1.2 Theorem. Let A ⊆ P(Y ) and T : X → Y . Then σ T −1 (A) = T −1 σ(A) . In
particular, T : (X, F) → (Y, σ(A)) is measurable iff T −1 (A) ∈ F for all A ∈ A.
 
Proof. Since T −1 σ(A) is a σ-field
 and T −1 (A) ⊆ T −1 σ(A) , it follows by minimality
that σ T −1 (A) ⊆ T −1 σ(A) . For the reverse inclusion, observe that the set
 
A ∈ σ(A) : T −1 (A) ∈ σ T −1 (A)

is a σ-field containing A and hence must equal σ(A).


2.1.3 Corollary. Let X and Y be topological spaces. If T : X → Y is continuous, then T
is B(X)/B(Y )-measurable.
Proof. Let OX and OY denote the collections of open subsets of X and Y , respectively. Then
T −1 (OY ) ⊆ OX ⊆ B(X).

75
76 Principles of Analysis

For example, a linear transformation T : Rp → Rq , being automatically continuous, is


Borel measurable.
The inclusion T −1 (B(Y )) ⊆ B(X) in the proof of 2.1.3 may be strict. For example, let
X be any nontrivial set with the discrete topology, let Y = X have the indiscrete topology,
and take T to be the identity map.
2.1.4 Corollary. Let X be a set and {(Xi , Fi ) : i ∈ I} a family of measurable spaces. Given
mappings Ti : X → Xi , let
[
E := Ti−1 (Fi ) and F := σ(E).
i ∈I

If (X0 , F0 ) is a measurable space, then a mapping T : X0 → X is F0 /F)-measurable iff the


mapping Ti ◦ T is F0 /Fi -measurable for every i ∈ I.

T1 (X1 , F1 )
T T2
(X0 , F0 ) (X, F) (X2 , F2 )
T3 (X3 , F3 )

FIGURE 2.1: The mappings of 2.1.4.

Proof. Proposition 2.1.1 gives the necessity. For the sufficiency, if Ti ◦ T is F0 /Fi -measurable
for every i ∈ I, then
[  [
T −1 (E) = T −1 Ti−1 Fi = (Ti ◦ T )−1 (Fi ) ⊆ F0 ,
i∈I i∈I
 
hence σ T −1 (E) ⊆ F0 . But by the theorem, σ T −1 (E) = T −1 F).

One of the most important applications of 2.1.4 is the following:


2.1.5 Corollary. Let (Xi , Fi ) be measurable spaces (i = 1, . . . , d) and let (X, F) denote the
product measurable space (X1 × · · · × Xd , F1 ⊗ · · · ⊗ Fd ). Then for each i the projection map

πi : X → Xi , πi (x1 , . . . , xd ) = xi ,

is F/Fi -measurable. Moreover, if (X0 , F0 ) is a measurable space, then a mapping T : X0 → X


is F0 /F measurable iff πi ◦ T is F0 /Fi -measurable for every i.
Proof. If Ai ∈ Fi , then

πi−1 (Ai ) = X1 × · · · × Xi−1 × Ai × Xi+1 × · · · × Xd ∈ F,

hence πi is F/Fi -measurable. The set E in 2.1.4 corresponding to the maps πi is the
collection of all such sets, and taking intersections produces F1 × · · · × Fd . Therefore,
σ(E) = F1 ⊗ · · · ⊗ Fd , and the conclusion of the theorem follows from 2.1.4.
2.1.6 Corollary. Let (Xi , Fi ) be measurable spaces (i = 0, 1, . . . , d) and Ti : X0 → Xi
arbitrary mappings (i = 1, . . . , d). Define

T = (T1 , . . . , Td ) : X0 → X1 × · · · × Xd , T (x) = T1 (x), . . . , Td (x) .

Then T is F0 /(F1 ⊗ . . . ⊗ Fd )-measurable iff each Ti : (X0 , F0 ) → (Xi , Fi ) is measurable.


Measurable Functions 77

Proof. The mappings πi ◦ T of 2.1.5 are simply the given mappings Ti .

Here is a complement to 2.1.4. The proof is left as an exercise.


2.1.7 Proposition. Let X be any set and {(Xi , Fi ) : i ∈ I} a family of measurable spaces.
Given mappings Ti : Xi → X, set
\
F := Ti−1 (Fi ).
i∈I

Let (X0 , F0 ) be a measurable space and T : X → X0 . Then T is F/F0 -measurable iff T ◦ Ti


is Fi /F0 -measurable for every i ∈ I.

(X1 , F1 ) T1
(X2 , F2 ) T2 (X, F) T (X0 , F0 )
(X2 , F2 ) T3

FIGURE 2.2: The mappings of 2.1.7.

Exercises
2.1 Show that for a measurable transformation T : (X, F) → (Y, G) it is not necessarily the case
that T (F) ⊆ G.

2.2 Let (X, F) and (Y, G)


be measurable spaces and let T : (X, F) → (Y, G) be measurable. If
E ⊆ X, show that T E is (F ∩ E)/G-measurable.

2.3 Let (X, F) and (Y, G) be measurable spaces, En ∈ F, and X = ∞


S
n=1 En . Let T : X → Y have
the property T E is (F ∩ En )/G-measurable for every n. Prove that T is F/G-measurable.
n

2.4 Let (X, F), (Y, G), and (Z, H) be measurable spaces and let T : X → Y have countable range.
Assume that G contains the singletons (e.g., a Borel σ-field). Show that
(a) T is F/G-measurable iff T −1 ({y}) ∈ F for every y ∈ Y .
(b) If T is F/G-measurable, then S ◦ T is F/H-measurable for any mapping S : Y → Z.

2.5 Let (X, F) and (Y, G) be measurable spaces. Show that if A ⊆ X and B ⊆ Y are nonempty
and A × B ∈ F ⊗ G, then A ∈ F and B ∈ G.

2.6 Prove 2.1.7.

2.7 Let {(Xi , Fi ) : i ∈ I} be a family of measurable spaces with union X and let F be the σ-field
of all sets E ⊆ X such that E ∩ Xi ∈ Fi for all i ∈ I. Let (X0 , F0 ) be a measurable space and
T : X → X0 . Show that T is F/F0 -measurable iff T X is Fi /F0 -measurable for every i ∈ I.
i

2.8 Let T, S : (X, F) → R , B(R ) be measurable. Prove that {x ∈ X : T (x) = S(x)} ∈ F.


d d


2.9 Let (i1 , . . . , id ) be a permutation of (1, . . . , d) and define T : Rd → Rd by T (x1 , . . . , xd ) =


(xi1 , . . . , xid ). Prove that T is B(Rd )/B(Rd )-measurable.

2.10 Let (X, F), (Z, H) be measurable spaces, T : X → Y surjective, and G = {A ⊆ Y : T −1 (A) ∈ F}.
Let R : (X, F) → (Z, H) be measurable such that T (x) = T (x0 ) ⇒ R(x) = R(x0 ). Show that
there exists a measurable transformation S : (Y, G) → (Z, H) such that R = S ◦ T .
78 Principles of Analysis

2.11 Let (Y, F), (Z, H) be measurable spaces, T : X → Y , and set F := T −1 (G), so that the map
T : (X, F) → (Y, G) is measurable. Let R : (X, F) → (Z, H) be measurable with countable range.
Show that if H contains the singletons, then there exists a measurable S : (Y, G) → (Z, H) such
that R = S ◦ T .

2.12 Prove that if S, T : Rp → Rq are continuous and S = T λ-a.e., then S = T . What if only one of
the mappings is continuous?

2.13 [↓ 3.5.2] Let (X, F), (Y, G), and (Z, H) be measurable spaces and T : X × Y → Z an arbitrary
mapping. We say that T is separately measurable if Ty := T (·, y) is F/H-measurable for
each y ∈ Y and Tx := T (x, ·) is G/H-measurable for each x ∈ X. To distinguish from separate
measurability, we sometimes refer to F ⊗ G/H-measurability of T as joint measurability.
Show that if T is jointly measurable, then it is separately measurable.

2.2 Measurable Numerical Functions


In this section we consider functions f : X → K, which we shall sometimes refer to as
numerical or numerically valued. A numerical function f on a measurable space (X, F) is
said to be F-measurable, or simply measurable, if f is F/B(K)-measurable. Thus the
σ-fields B(R), B(R), or B(C) are always understood.
Since a function f : X → C may be identified with the mapping (Re f, Im f ) : X → R2 , it
follows from 2.1.6 that f is measurable iff Re f and Im f are measurable. This fact frequently
allows one to reduce arguments from the complex case to the real case.
The following notation for preimages will be convenient in further discussions involving
measurability:
• {f ∈ A} := {x ∈ X : f (x) ∈ A} = f −1 (A), where f : X → Y , A ⊆ Y .
• {f > a} := {x ∈ X : f (x) > a}, where f : X → R, a ∈ R.

• {f ≤ g} := {x ∈ X : f (x) ≤ g(x)}, where f, g : X → R,


and so forth. Additionally,
 if µ is a set function we write µ(f ∈ A) rather than the more
cumbersome µ {f ∈ A} , etc. These notational conventions are frequently used in probability
theory, but they will be seen to have general utility.

Criteria for Measurability

If A is a generating class for B(K), then measurability of f : X → K is equivalent to the


assertion that f −1 (A) ⊆ F (2.1.2). For K = R, it follows from 1.2.3 that f is F-measurable
iff
• {f = ∞}, {f = −∞} ∈ F
and any one of the following conditions holds:

• {f ∈ E} ∈ F for all open, (resp. closed, resp. Borel) sets E ⊆ R.


• {f ≤ t} ∈ F for all t ∈ R. • {f < t} ∈ F for all t ∈ R.
• {f > t} ∈ F for all t ∈ R. • {f ≥ t} ∈ F for all t ∈ R.

It follows easily that if the range of f is countable, say ran f = (an ), then f is measurable
iff {f = an } ∈ F for all n.
Measurable Functions 79

2.2.1 Example. Let dn (x) denote the nth digit of the decimal expansion of x ∈ [0, 1),
where for definiteness we exclude expansions that end in a sequence of 9’s, choosing for
example .500 · · · over .499 · · · . Let en ∈ {0, 1, . . . , 9}. Then

{x : d1 (x) = e1 } = [e1 /10, (e1 + 1)/10),

hence d1 is Borel measurable. Similarly,


9
[ [9  
e1 e2 e1 e2 + 1
{x : d2 (x) = e2 } = {x : d1 (x) = e1 , d2 (x) = e2 } = + , + ,
e1 =0 e =0
10 102 10 102
1

hence d2 is Borel measurable. By induction, dn is Borel measurable for all n. ♦

Almost Everywhere Properties

2.2.2 Proposition. Let f, g : X → K. If g is F-measurable and f = g a.e., then f is


measurable with respect to the completion Fµ .
Proof. By considering real and imaginary parts, we may assume that f and g are R-valued.
Let N = {x : f (x) 6= g(x)} and t ∈ R. Since N, N c ∈ Fµ ,
h i h i
{f < t} = {g < t} ∩ N c ∪ {f < t} ∩ N ∈ Fµ .

Similarly, the sets {f = ∞} and {f = −∞} are members of Fµ .


2.2.3 Corollary. Let f, g : Rd → C. If g is continuous and f = g a.e., then f is Lebesgue-
measurable.
The function f in 2.2.3 need not be Borel measurable. For example, let A ∈ M(R) \ B(R)
with λ(A) = 0 (§ 1.7) and take f = 1A , g ≡ 0.
2.2.4 Proposition. If f : Rd → K is continuous except on a set E of Lebesgue measure
zero, then f is Lebesgue measurable.
Proof. Let U ⊆ K be open. Then f −1 (U ) = A ∪ B, where A := f −1 (U ) ∩ E and B :=
f −1 (U ) ∩ E c . Since A ⊆ E and λ(E) = 0, A ∈ M(λd ). Since f is continuous at each
point of E c , B = V ∩ E c for some open subset V of Rd . Therefore, B ∈ M(λd ) and so
f −1 (U ) ∈ M(λd ).
By the preceding proposition, a function with at most countably many discontinuities, in
particular a monotone function, is Lebesgue measurable. In fact, the proof of the proposition
shows that such a function is Borel measurable.
Note that a function that is continuous except on a set of measure zero is not necessarily
equal a.e. to a continuous function (Ex. 2.19). Conversely, a function equal a.e. to a continuous
function need not be continuous anywhere (Ex. 2.14).

Combinatorial and Limit Properties of Measurable Functions

The following proposition shows that measurable R-valued functions may be combined in
standard ways to produce new measurable functions.
2.2.5 Proposition. If f, g : X → K are measurable and c ∈ C, then f + g, f g, cf , f , and
|f | are measurable. Moreover, if K = R, then f ∨ g and f ∧ g are measurable.
80 Principles of Analysis

Proof. Let F : K × K be defined by F (x, y) = x + y. Then F is Borel measurable, hence


f + g = F (f, g) is measurable by 2.1.1. The proofs of the remaining assertions are similar.
The limit properties of measurable functions are given in the next results.
2.2.6 Theorem. Let fn : X → R (n ∈ N) be measurable. Then supn fn , inf n fn , limn fn ,
and limn fn are measurable.
Proof. The assertions follow immediately from the relations
\
{supn fn ≤ t} = {fn ≤ t}, inf fn = − sup(−fn )
n n n

and
lim fn = sup inf fk , lim fn = − lim(−fn ).
n n k≥n n n

2.2.7 Corollary. Let fn : X → K be measurable and let f : X → K.


(a) If fn → f , then f is F-measurable.

(b) If fn → f a.e., then f is Fµ -measurable.

Proof. By considering real and imaginary parts, we may assume that fn and f are R-valued.
Part (a) follows from the fact that f = limn fn . For (b), let N = {x : limn fn (x) 6= f (x)}
and set gn = fn 1N c and g = f 1N c . Then gn is Fµ -measurable and gn → g, hence g is
Fµ -measurable by part (a). Since g = f a.e., f is Fµ -measurable.
2.2.8 Example. Let f : X × R → C have the property that f (x, t) is left continuous in t
for each x and F-measurable in x for each t. We show that f is F ⊗ B(R)-measurable. For
this, it suffices to take f real-valued. 
For each n, the collection of intervals of the form Ik,n := k/n, (k +1)/n , k ∈ Z, partitions
R. Define 
fn (x, t) = f x, k/n , t ∈ In,k , k ∈ Z, x ∈ X.
Then fn is F ⊗ B(R)-measurable, as may be seen by writing
X 
fn (x, t) = f x, k/n 1In,k (t)
n∈Z

and using appropriate combinatorial properties of measurability. Now let t ∈ R and x ∈ X.


For each n ∈ N, there exists a unique k = k(t, n) such that t ∈ Ik,n . Since 0 < t − k/n ≤ 1/n,
by left continuity limn fn (x, t) = limn f x, k/n = f (x, t). Therefore, f is a limit of F⊗B(R)-
measurable functions and hence is F ⊗ B(R)-measurable.
By ignoring X in the preceding argument, we see that a left continuous function on R
is Borel measurable. By taking (X, F) = (R, B(R)), we see that a function on R2 that is
left continuous in each variable separately is a Borel function. It follows by induction that
function on Rd that is left continuous in each variable separately is Borel measurable. Of
course, a similar result holds for separately right continuous functions. (In this regard, see
Ex. 2.25.) ♦
Measurable Functions 81

Exercises
2.14 Give an example of a nowhere continuous function equal a.e. to a continuous function.

2.15 Show that if F 6= P(X), then there exists a nonmeasurable function f such that |f | is measurable.

2.16 Let fn : X → R be F-measurable for every n. Prove that the following sets are F-measurable:
(a) {x : limn fn (x) exists in R}. (b) {x : limn fn (x) exists in R}.

2.17 Let f, g : X → R be F-measurable. Prove:


(a) If f is never zero, then 1/f is F-measurable. (b) If f > 0, then f g is F-measurable.

2.18 Let f, g : X → R be F-measurable. Prove that {f > g} ∈ F.

2.19 Prove that f = 1[0,1] is not equal a.e. to a continuous function on R. Show, however, that f is a
pointwise limit of continuous functions fn such that for each ε > 0, λ{|fn − f | ≥ ε} → 0.

2.20 Define f : (0, 1) → R by (


0 if x is irrational
f (x) =
1/n if x = m/n, reduced.
Show that f is continuous λ-a.e. and is equal λ-a.e. to a continuous function.

2.21 Let f : R → R be differentiable. Prove that f 0 is Borel measurable.

2.22 Let f : X × [a, b] → R such that f (x, t) is F-measurable in x for each x and continuous in t for
Rb
each t. Show that the Riemann integral a f (x, t) dt is F-measurable in x.

2.23 [↑ 2.2.1] For x ∈ (0, 1) define f (x) to be first digit in the decimal expansion of x that is greater
than 5 and f (x) = 0 if there is no such digit. (For definiteness, use decimal expansions that do
not end in a sequence of 9’s.) Also, define g(x) to be the first time a digit is greater than 5, and
g(x) = ∞ if there is no such digit. Prove that f and g are Borel measurable.

2.24 Show that the supremum of an uncountable family of Borel functions on R need not be Lebesgue
measurable.

2.25 [↑ 2.9] Let f : Rd → R have the property that for each i, f (x1 , . . . , xi , . . . xd ) is either left
continuous or right continuous in xi when the other variables are fixed. Show that f is Borel
measurable.

2.26 Let F be a σ-field on Rd such that every continuous function f : Rd → R that vanishes outside
a bounded interval is F-measurable. Prove that B(Rd ) ⊆ F.

2.27 Let µ be a finite measure on B(Rd ) and A ∈ B(Rd ). Define f (x) = µ(A + x), x ∈ Rd . Show
that f is Borel measurable. JAssume first that A is closed and show that At := {f ≥ t} is
closed.K

2.28 A function f : R → R is said to be upper (lower ) semicontinuous at x0 if



f (x0 ) ≥ lim f (x) := lim sup f (x) f (x0 ) ≤ lim f (x) := lim inf f (x) .
x→x0 r→0+ 0<|x−x0 |<r x→x0 r→0+ 0<|x−x0 |<r

If f is upper (lower) semicontinuous at each point of R, then f is said to be upper (lower )


semicontinuous on R. Prove the following:
(a) f is upper semicontinuous at x0 iff −f is lower semicontinuous at x0 .
(b) f is upper (lower) semicontinuous on R iff {f < t} ({f > t}) is open for every t. In particular,
upper semicontinuous and lower semicontinuous functions are Borel measurable.
(c) If f (x0 ) is finite, then f is continuous at x0 iff it is upper and lower semicontinuous at x0 .
82 Principles of Analysis

(d) For arbitrary f , the functions g(x) := limt→x f (t) and h(x) := limt→x f (t) are, respectively,
upper and lower semicontinuous on R.
(e) The set {x : limt→x f (t) exists in R} is Borel measurable.
(f) The set {x : limt→x f (t) exists in R} is Borel measurable.

2.29 Let 0 ∈ A ⊆ Rd . Define the “radius function” fA : Rd → R by fA (x) := sup{t ≥ 0 : tx ∈ A}.


(a) Let 0 ∈ Ak for all k and Ak ↑ A. Show that fAk ↑ fA .
(b) Show that if A is open, then fA is positive and Borel measurable.
(c) Use (b) to show that if A is compact, then fA is Borel measurable.
(d) Conclude that fA is Borel measurable for any Borel set A 3 0.

2.30 [↑ 1.20] Define a topology on [−∞, ∞] with open sets O such that B([−∞, ∞]) = σ(O).

2.3 Simple Functions


We show in this section that measurable functions are generated by measurable indicator
functions. This will enable us to construct the Lebesgue integral from measurable sets, a
fundamental feature of the Lebesgue theory.
Let (X, F) be a measurable space. A measurable function f : (X, F) → K with finite
range is called an F-simple function. If there is no chance of confusion, we drop the prefix
F. An indicator function 1E is simple iff E ∈ F (Ex. 2.31). If f is simple and g : K → K
is Borel measurable, then f ◦ g, being finite-valued and measurable, is simple. Similarly, if
F : Kd → K is Borel-measurable and the functions f1 , . . ., fd are simple, then F (f1 , . . . , fd )
is simple. In particular, a linear combination of measurable indicator functions is simple.
Conversely, every simple function f may be written (not necessarily uniquely) as a linear
combination of measurable indicator functions. For example, if a1 , . . . , an ∈ K are the distinct
values of f , then
Xn
f= aj 1Aj , Aj := {f = aj } ∈ F. (2.1)
j=1

The sum in (2.1) is called the standard representation of f .

A Fundamental Convergence Theorem


As we shall see in Chapter 3, the following theorem is one of the key ingredients in the
Lebesgue theory of integration, allowing the transition from measure to integral.
2.3.1 Theorem. Let (X, F) be a measurable space and f : (X, F) → K measurable.
(a) If f ≥ 0, then there exists a sequence of nonnegative, simple functions fn such that
fn ↑ f on X.
(b) In the general case, there exists a sequence of simple functions such that fn → f on X
and |fn | ≤ |f | for all n.
(c) The sequence (fn ) in (b) converges uniformly on sets E on which f is bounded.
Measurable Functions 83

Proof. (a) Let f0 = 0 and for each n ∈ N define


n
n2
X j−1
fn = 1An,j + n1An , where
j=1
2n
An,j = {(j − 1)2−n ≤ f < j2−n }, j = 1, 2, . . . , n2n , and An = {f ≥ n}.
We show that fn (x) ↑ f (x) for each x ∈ X. This is clear if f (x) = ∞, since then fn (x) = n
for all n. Suppose f (x) ∈ R and let n ∈ N. If f (x) ≥ n + 1, then fn+1 (x) = n + 1 > n = fn (x).
If n ≤ f (x) < n + 1, then x ∈ An+1,j for some j with 2j−1 /2n+1 ≥ n, hence fn+1 (x) ≥ n =
fn (x). Finally, if f (x) < n, then (j − 1)2−n ≤ f (x) < j2−n for some 1 ≤ j ≤ n2n , hence
2j − 2 2j − 1 2j − 1 2j
≤ f (x) < n+1 or ≤ f (x) < n+1 .
2n+1 2 2n+1 2
(See Figure 2.3.) In either case,
2j − 2 j−1
fn+1 (x) ≥ n+1
= n = fn (x).
2 2
Thus fn ↑ on X. Since 0 ≤ f (x) − fn (x) < 2−n for all sufficiently large n, fn (x) → f (x).

n+1

j2−n

(2j − 1)2−(n+1)

(j − 1)2−n

An
An,j
FIGURE 2.3: Components of fn and fn+1 .

(b) If f is R-valued, apply (a) to f + and f − . Suppose f is C-valued. Let gn and hn be real-
valued simple functions such that gn → Re f , hn → Im f , |gn | ≤ |Re f |, and |hn | ≤ |Im f |.
Then gn + ihn is a simple function, gn + ihn → f , and
|gn + ihn |2 = gn2 + h2n ≤ (Re f )2 + (Im f )2 = |f |2 .
The proof of part (c) is left to the reader as an exercise (2.34).

Applications of the Convergence Theorem


Theorem 2.3.1 is useful in establishing certain properties of measurable functions. The
idea is to prove the property first for indicator functions, then for simple functions, then for
nonnegative measurable functions via 2.3.1, then for R-valued functions using the identity
f = f + − f − , and finally for complex-valued functions by considering Re f and Im f . The
theorems in this subsection illustrate the technique.
2.3.2 Theorem. Let X be a set, (Y, G) a measurable space, T : X → Y , and F = T −1 (G).
If f : X → C is F-measurable, then there exists a G-measurable function g : Y → C such
that f = g ◦ T .
84 Principles of Analysis

T
X Y

g
f

FIGURE 2.4: The mappings of 2.3.2.

Proof. Since 1T −1 (A) = 1A ◦ T (A ∈ G), the assertion holds for F-simple functions. If f ≥ 0,
let (fn ) be a sequence of nonnegative F-simple functions such that fn ↑ f . For each n
there exists a nonnegative G-measurable function hn : Y → R such that fn = hn ◦ T . Set
gn = h1 ∨ · · · ∨ hn . Then gn is G-measurable and gn ↑ on Y , hence g := limn gn exists and is
G-measurable. Moreover, hn ↑ on T (X), hence fn = gn ◦ T . Taking limits, we have f = g ◦ T .
If f is real-valued, choose G-measurable functions g1 , g2 : Y → R such that f + = g1 ◦ T
and f − = g2 ◦ T . Then f = (g1 − g2 ) ◦ T .
Finally, if f is complex-valued, choose G-measurable functions h1 , h2 : Y → R such that
Re f = h1 ◦ T and Im f = h2 ◦ T . Then f = (h1 + ih2 ) ◦ T .
Hereafter, in arguments such as those in the preceding theorem we shall frequently omit
the part of the proof that constitutes the transition from the nonnegative case to the complex
case, this argument usually being straightforward.
2.3.3 Theorem. Let (X, F) be a measurable space and let f : X → K be Fµ -measurable.
Then there exists an F-measurable function g such that f = g a.e.
Proof. Let f = 1E (E ∈ Fµ ). Then E = A ∪ N , where A ∈ F, N ⊆ M , and µ(M ) = 0. Set
g = 1A . Then g is F-measurable, and {f 6= g} ⊆ N , so f = g a.e. Therefore, the assertion
holds for indicator functions.
If f is a Fµ -simple function in standard form then, by the first paragraph, each of its
terms is equal a.e. to an F-measurable function. By considering a finite union of sets of
measure zero we see that f has this property.
If f is nonnegative, there exists a sequence of nonnegative Fµ -simple functions fn such
that fn → f on X. By the previous paragraph, for each n there exists anSF-measurable

function gn such that fn = gn a.e. Let Nn := {x : fn 6= gn } and N := n=1 Nn . Then
N ∈ Fµ , µ(N ) = 0 and fn = gn on N for all n. Let M denote the set of all x such that
c

the sequence (gn (x)) does not converge in R. Then M ⊆ N , and by Ex. 2.16, M ∈ F. Let
g = limn gn 1M c . Then g is measurable and {g 6= f } ⊆ N , so g = f a.e. The general case
f : X → K follows by a standard argument.

Exercises
2.31 Prove that 1E is measurable iff E ∈ F.

2.32 Express the simple function bxc1[−n,n] (x) in standard form.

2.33 [↓ 3.5] Let X be uncountable and let F the σ-field of countable or cocountable subsets of X.
Show that a function f : X → C is measurable iff f is constant on some cocountable set. JUse
2.3.1.K

2.34 [↓ 4.2.1] Prove 2.3.1(c).


Measurable Functions 85

2.4 Convergence of Measurable Functions


Modes of Convergence
In this section we consider three important types of convergence of a sequence of functions
on a measure space (X, F, µ). A fourth type, Lp -convergence, is discussed in Chapter 4.
Let f, fn : (X, F) → C be F-measurable. The sequence (fn ) is said to converge to f
a.e.
• µ-almost everywhere, written fn → f , if µ{limn fn 6= f } = 0.
µ
• in µ-measure, written fn → f , if limn µ{|fn − f )| ≥ ε} = 0 ∀ ε > 0.
a.u.
• µ-almost uniformly, written fn → f , if for each ε > 0 there exists a set Aε in F
such that µ(Acε ) < ε and fn → f uniformly on Aε .
For example, on (R, B(R), λ),

a.e. λ a.u.
1[0,1/n] → 0, 1[0,1/n] → 0, 1[0,1/n] → 0,
λ a.u.
a.e.
1[n,n+1] → 0, 1[n,n+1] 6→ 0, 1[n,n+1] 6→ 0, (2.2)
a.e. λ a.u.
1[n,n+1/n] → 0, 1[n,n+1/n] → 0, 1[n,n+1/n] 6→ 0 .

2.4.1 Proposition. Let f, fn , g, gn : (X, F) → K be measurable, a, b ∈ K, and let m denote


any of the three modes of convergence. Then
m m m
(a) fn → f and gn → g ⇒ afn + bgn → af + bg.
m m m
(b) fn → f ⇒ |fn | → |f | and f n → f .
m m m
(c) fn → f iff Re fn → Re f and Im fn → Im f .
m m m
(d) If K = R, then fn → f iff fn+ → f + and fn− → f − .

Proof. We prove the proposition for convergence in measure. Part (a) follows from
  
|(af + bg) − (afn + bgn )| ≥ ε ⊆ |f − fn | ≥ ε/2(|a| + 1) ∪ |g − gn | ≥ ε/2(|b| + 1) ,

which implies that


  
µ |(af + bg) − (afn + bgn )| ≥ ε ≤ µ |f − fn | ≥ ε/2(|a| + 1) + µ |g − gn | ≥ ε/2(|b| + 1) .

For the first part of (b), use the inequality µ{| |fn | − |f | | ≥ ε|} ≤ µ{|fn − f | ≥ ε|}. Part (c)
follows from

µ{|Re fn − Re f | ≥ ε|}, µ{|Im fn − Im f | ≥ ε|} ≤ µ{|fn − f | ≥ ε} and


µ{|fn − f | ≥ ε|} ≤ µ{|Re fn − Re f | ≥ ε/2|} + µ{|Im fn − Im f | ≥ ε/2|}.

The proof of (d) is similar, using the inequality |x+ − y + | ≤ |x − y|.


86 Principles of Analysis

Relationships Among the Modes of Convergence


The following theorems relate the three modes of convergence. The first shows that a.u.
convergence is the strongest.
a.u. µ a.e.
2.4.2 Theorem. If fn → f , then also fn → f and fn → f .
Proof. For each δ > 0, choose Aδ ∈ F such that µ(Acδ ) < δ and fn → f uniformly on
Aδ . Given ε > 0, choose m such that |f − fn | < ε on Aδ for all n ≥ m. For such n,
µ S
µ(|fn − f | ≥ ε) ≤ µ(Acδ ) < δ. Thus fn → f . Since fn → f pointwise on A := k A1/k and
a.e.
µ(Ac ) ≤ 1/k for all k, fn → f .
a.e. a.u. µ
Examples (2.2) show that fn → f does not necessarily imply fn → f or fn → f , and
µ a.u.
that fn → f does not necessarily imply that fn → f . The following example shows that
µ a.e.
fn → f does not in general imply fn → f .
2.4.3 Example. Let f1 = 1[0,1) and for each k ∈ N set

f2k +j := 1[j/2k ,(j+1)/2k ) , 0 ≤ j ≤ 2k − 1.

If 0 < ε < 1 and n = 2k + j, then

λ{x ∈ [0, 1) : fn (x) ≥ ε} = λ[j/2k , (j + 1)/2k ) = 1/2k ,

λ
hence fn → 0 on [0, 1). On the other hand, for any x ∈ [0, 1), fn (x) = 1 for infinitely many

f4 f5 f6 f7
1

1/4 1/2 3/4 1


µ a.e
FIGURE 2.5: fn → f ; fn → f .

n and fn (x) = 0 for infinitely many n, so fn cannot converge a.e. or a.u. ♦


While the sequence (fn ) in the last example does not converge a.e., there are subsequences
that converge a.e. (for example, (f2k )). This phenomenon holds generally:
µ a.u.
2.4.4 Theorem. If fn → f , then there exists a subsequence (fnk ) such that fnk → f ,
a.e.
hence also fnk → f .
Proof. Since limn µ{|fn − f | ≥ ε} = 0, one may choose indices n1 < n2 < · · · such that for
each k
µ{|fn − f | ≥ 1/2k } < 1/2k for all n ≥ nk . (2.3)
Given ε, δ > 0, choose m ∈ N such that 1/2m−1 < min{ε, δ} and set
[
B := {|fnk − f | ≥ 1/2k }.
k≥m
P∞
By (2.3), µ(B) ≤ k=m 1/2k = 1/2m−1 < ε. Moreover, for x ∈ B c and k ≥ m we have
a.u
|fnk (x) − f (x)| < 1/2k < δ. Therefore, fnk → f .
Measurable Functions 87

A converse of 2.4.2 holds, but with a restriction:


a.e. a.u. µ
2.4.5 Egoroff ’s Theorem. If µ(X) < ∞ and fn → f , then fn → f , hence also fn → f .
Proof. Let E ∈ F such that µ(E c ) = 0 and fn → f on E. For x ∈ X, define
gn (x) = sup |f (x) − fj (x)|.
j≥n

Then gn is F-measurable and gn ↓ 0 on E. For n, k ∈ N, set


En,k = E ∩ {gn < 1/k}.
For each k, En,k ↑n E, hence µ(En,k ) ↑n µ(E) = µ(X). Given ε > 0, we may therefore
construct an increasing sequence nk ∈ N such that
µ(Enc k ,k ) = µ(X) − µ(Enk ,k ) < ε/2k .
T a.u.
Set Aε = k Enk ,k . Then µ(Acε ) < ε and gnk (x) < 1/k for x ∈ Aε . Thus fnk → f .
From the above results we see that a.e. convergence and a.u. convergence are equivalent
in a finite measure space. The sequence fn = 1[n,n+1] on R shows that this is not true in
general measure spaces.

Exercises

In these exercises, (X, F, µ) denotes an arbitrary measure space.

2.35 Discuss the convergence behavior of fn (x) = xn 1[0,1] on (R, B(R), λ).
µ
2.36 Let En ∈ F and let f be measurable. Suppose that 1En → f . Show that f = 1E a.e. for some
E ∈ F.
a.e.
2.37 Let En ∈ F, A := limn En , and B := limn En . Show that 1En → f for some f iff µ(A \ B) = 0.
µ µ
2.38 Let f, fn , g : X → C be measurable, fn → f and fn → g. Show that f = g a.e.
µ a.e.
2.39 Let f, fn : X → R be measurable and fn → f . Show that if fn ↑ then fn → f .
µ
2.40 Let f, fn : X → C be measurable. Show that fn → f iff for each ε > 0 there exists m such that
µ{|f − fn | ≥ ε} < ε for all n ≥ m.
a.e.
2.41 Let fn : X → C be measurable. Show that fn → f for some Fµ -measurable f iff gm,n :=
fm − fn → 0 a.e. as m, n → ∞.
2.42 [↑ 2.41] Let µ(X) < ∞, an > 0, and n an < ∞. Let fn : X → C be measurable and set
P
P a.e.
An := {|fn − fn+1 | ≥ an }. Show that if n µ(A n ) < ∞, then fn → f for some function
Pm−1
f : X → C. JBy 1.37, µ(limn An ) = 0. Consider k=n [fk (x) − fk+1 (x)]. K
µ
2.43 [↑ 2.42] Let µ(X) < ∞ and fn : X → C measurable. Show that fn → f for some f : X → C
µ
iff fm − fn → 0 as m, n → ∞. JFor the sufficiency, modify the proof of 2.4.4 to obtain a
strictly increasing sequence of positive integers nk such that µ{|fn − fm | ≥ 1/2k } < 1/2k for
all m, n ≥ nk . K
2.44 (Frechét). Let µ(X) < ∞. Define

ρ(f, g) = inf r + µ{x : |f (x) − g(x)| ≥ r}
r>0

Show that if functions that are equal µ-a.e are identified, then ρ becomes a metric on the space
µ
L0 = L0 (X, F, µ) of all measurable functions on X. Show also that ρ(f, fn ) → 0 iff fn → f .
Thus, by 2.43, the metric space is complete.
88 Principles of Analysis

n : X → C be measurable. Set An = {|fn − f | ≥ an }, where an > 0


2.45 Let µ(X) < ∞ and let f, fP
a.e.
and an → 0. Show that if n µ(An ) < ∞, then fn → f
µ µ
2.46 Let µ(X) < ∞, let f, g, fn , gn : X → C be measurable, fn → f , and gn → g. Show that
µ
fn gn → f g.
µ
2.47 Let µ(X) < ∞, f, fn : X → K measurable, and g : K → K continuous. Show that fn → f ⇒
µ
g ◦ fn → g ◦ f .

2.48 Let µ and ν be finite measures on (X, F) with the same sets of measure zero and let f, fn : X → C
µ ν
be measurable. Show that fn → f iff fn → f .
Chapter 3
Integration

In this chapter we construct the general Lebesgue integral. The construction proceeds
in stages. The integral is first defined on the class of nonnegative simple functions and
then extended to nonnegative measurable functions and ultimately to complex measurable
functions. The basic properties of the integral are also developed in this chapter. Additional
properties are discussed in subsequent chapters.

Throughout the chapter (X, F, µ)


denotes an arbitrary measure space.

3.1 Construction of the Integral


The integral of a function f with respect to µ is denoted variously by
Z Z Z Z Z
f = f dµ = f (x) dµ(x) = f (x)µ(dx) = f dF,
X X

the last notation if µ is a Lebesgue-Stieltjes measure on R with distribution function F . The


construction of the integral begins with nonnegative simple functions.

Integral of a Nonnegative Simple Function


Let f be a nonnegative simple function with standard representation
m
X
f= aj 1Aj , Aj := {f = aj } ∈ F.
j=1

The integral of f is then defined as


Z m
X
f dµ := aj µ(Aj ).
j=1

Note that the above sum may contain terms of the form a · ∞, where a ∈ [0, ∞). Such terms
have value either ∞ or 0, depending on whether a > 0 or a = 0 (see §0.1). In particular, the
integral of the identically zero function is 0 · µ(X) = 0, whether or not µ(X) is finite.
The following lemma summarizes the elementary properties of the integral of nonnegative
simple functions. These will be used later to obtain analogous properties of the general
integral.

89
90 Principles of Analysis

3.1.1 Lemma. Let f, g be nonnegative simple functions and a ≥ 0. Then


Z Z Z Z Z
(a) af dµ = a f dµ. (b) (f + g) dµ = f dµ + g dµ.
Z Z Z Z
(c) f dµ ≤ g dµ if f ≤ g a.e. (d) f dµ = g dµ if f = g a.e.

Proof. Part (a) is immediate from the definition of the integral, and (d) follows from (c). To
prove (b), let f and g have standard representations
m
X n
X
f= ai 1Ai and g = bj 1 B j .
i=1 j=1
Sm Sn Pn
Since X = i=1 Ai = j=1 Bj (disjoint), we have µ(Ai ) = j=1 µ(Ai ∩ Bj ) and µ(Bj ) =
Pm
i=1 µ(Ai ∩ Bj ), hence
Z Xm X Z n
X X
f= ai µ(Ai ) = ai µ(Ai ∩ Bj ) and g= bj µ(Bj ) = bj µ(Ai ∩ Bj ). (†)
i=1 i,j j=1 i,j

Now let c1 , . . . , cp be the distinct values of f + g and set


Ck = {f + g = ck }, k = 1, . . . , p.
Then
p
X [
f +g = ck 1Ck and Ck = Ai ∩ Bj (disjoint)
k=1 {(i,j):ai +bj =ck }

and so
Z p
X p
X X X Z Z
(f + g) = ck µ(Ck ) = ck µ(Ai ∩ Bj ) = (ai + bj )µ(Ai ∩ Bj ) = f + g,
k=1 k=1 ai +bj =ck i,j

the last equality by (†).


For (c), let E = {f ≤ g}. Then µ(E c ) = 0 and ai ≤ bj for all i, j for which Ai ∩ Bj ∩ E 6= ∅.
From (†) and the equalities
µ(Ai ∩ Bj ) = µ(Ai ∩ Bj ∩ E) + µ(Ai ∩ Bj ∩ E c ) = µ(Ai ∩ Bj ∩ E),
we have Z Z
m X
X n n X
X m
f= ai µ(Ai ∩ Bj ∩ E) ≤ bj µ(Ai ∩ Bj ∩ E) = g.
i=1 j=1 j=1 i=1

Integral of a Real-Valued Function


For a measurable function f : X → [0, ∞], define
Z nZ o
f dµ := sup fs dµ : 0 ≤ fs ≤ f, fs simple . (3.1)

Note that the integral is nonnegative and could be infinite. (For an extremeR example, consider
the measure µ on P(X) that assigns ∞ to every nonempty set. Then f dµ = ∞ for all
nonnegative functions except the identically zero function.)
The integral of a measurable function f : X → R is defined as
Z Z Z
f dµ := f + dµ − f − dµ,
R R
provided at least one of the integrals on the right is finite. If both f + dµ and f − dµ are
finite, then f is said to be integrable.
Integration 91

Integral of a Complex-Valued Function


To extend the integral to the complex case, recall that a complex-valued function f
is measurable iff both Re f and Im f are measurable. Such a function is declared to be
integrable if both Re f and Im f are integrable, in which case we define
Z Z Z
f dµ = Re f dµ + i Im f dµ,

that is, Z Z Z Z
Re f dµ = Re f dµ and Im f dµ = Im f dµ.

It follows that Z Z
f dµ = f dµ.

We have now constructed the integral with respect to µ on the class of all (suitably
restricted) measurable functions f : X → K. The special cases of the integral with respect
to Lebesgue measure on Rd and Lebesgue-Stieltjes measures on Rd are important examples.
Here is another example:
3.1.2 Example. Let x ∈ X and let δx be the Dirac measure defined in 1.3.3(d). Then
Z
f dδx = f (x) (3.2)

for every F-measurable function f : X → K. Indeed, this clearly holds for indicator functions
f , and, by 3.1.1(a) and (b), it holds for nonnegative F-simple functions. If f ≥ 0, then
Z n o
f dδx = sup fs (x) : 0 ≤ fs ≤ f, fs simple = f (x),

the last equality by 2.3.1. For the general real-valued case, use the positive and negative
parts of f . For the complex case, consider the real and imaginary parts of f . ♦

Integral over a Measurable Set


Let f : X → C be measurable and E ∈ F. The integral of f on E is defined by
Z Z
f dµ := f · 1E dµ
E

if the right side is defined. If f · 1E is integrable, then f is said to be integrable on E.



3.1.3 Remark. It is worth noting that the integral of f on E is simply the integral of f E
with respect to the measure ν := µ F∩E on E, that is,
Z Z

f dµ = f E dν. (3.3)
E

To see this, first take f = 1A , A ∈ F. Then the left side of (3.3) is simply µ(A ∩ E), and since
1A E is the indicator function of E ∩A on the domain E, the right side is ν(A∩E) = µ(A∩E).
Thus (3.3) holds for indicator functions, hence for nonnegative F-simple functions. Taking
suprema over integrals of simple functions shows that the equation holds for nonnegative
measurable functions, hence for arbitrary measurable R-valued functions via f = f + − f − ,
and finally for measurable C-valued functions using f = Re f + i Im f . ♦
R
The preceding
R remark implies that general properties of integrals f dµ are immediately
valid for E f dµ—no special argument is necessary.
92 Principles of Analysis

3.2 Basic Properties of the Integral


Almost Everywhere Properties
R R
3.2.1 Proposition.
R IfR f, g : X → R are measurable, f ≤ g a.e., and f dµ, g dµ are
defined, then f dµ ≤ g dµ. In particular, if f ≥ 0 and g is integrable, then f is integrable
Proof. Assume first that f, g ≥ 0. Let fs be simple with 0 ≤ fs ≤ f and set gs := 1E fs ,
where RE := {f ≤R g}. Then R 0 ≤ gsR ≤ 1E f ≤ 1E g ≤ g. By
R gs is simple, fs = gs a.e., and
3.1.1, fs dµ = gs dµ ≤ g dµ. Since fs was arbitrary, f dµ ≤ g dµ.
In the general case, f + ≤ g + and f − ≥ g − a.e., hence, by the first paragraph,
Z Z Z Z Z Z
f dµ = f + dµ − f − dµ ≤ g + dµ − g − dµ = g dµ.

Applying the proposition to the real and imaginary parts of f and g, we have

R Let f,R g : X → K be measurable, f integrable, and f = g a.e. Then g is


3.2.2 Corollary.
integrable and f dµ = g dµ.
3.2.3 Remark. In view of the last corollary, it makes sense to integrate functions that are
defined only a.e., that is, defined on a set E with µ(E c ) = 0. More precisely, such a function
h
R is said to
R be integrable if it has an integrable extension f to X, in which case we define
h du := f du. By the corollary, the integral is well-defined, that is, does not depend on
the particular extension of h. ♦
3.2.4 Proposition. If f : X → R is integrable, then f is finite a.e.
R
Proof. Suppose first that f ≥ 0. Let A := {f = ∞}. Since n1A ≤ f , µ(A) ≤ n−1 f dµ.
Letting n → ∞ shows that µ(A) = 0. In the general case, apply the preceding to f + and f −
to conclude that µ(|f | = ∞) = 0.
R
3.2.5 Proposition. Let f ≥ 0 be measurable. Then f dµ = 0 iff f = 0 a.e.
R
Proof. The sufficiency follows from 3.2.2. For the necessity,
S∞ suppose that f dµ = 0 and
let B := {fR > 0} and Bn := {f ≥ 1/n}. Then B = n=1 Bn and n−1 1Bn ≤ f 1Bn ≤ f , so
µ(Bn ) ≤ n f dµ = 0 for all n. By countable subadditivity, µ(B) = 0, that is, f ≤ 0 a.e.
R
By 3.2.1, if f ≥ 0 then A f dµ ≥ 0 for all A ∈ F. Here is a converse:
R
3.2.6 Proposition. Let f : X → R be measurable and let A f dµ be defined for all A ∈ F.
R
(a) If A f dµ ≥ 0 for all A ∈ F, then f ≥ 0 a.e.
R
(b) If A f dµ = 0 for all A ∈ F, then f = 0 a.e.
−1
(b) follows from part (a). To prove (a), let An = {f ≤
Proof. Part S R −n } and A R= {f < 0}.

Then A = n=1 An and 1An ≤ −nf 1An , hence µ(An ) ≤ −n An f dµ. Since An f ≥ 0,
µ(An ) = 0 and so µ(A) = 0.
Integration 93

Monotone Convergence Theorem


The following result is one of the key theorems in integration theory, underlying many of
the deeper properties of the integral. A generalization is given in §3.4.
3.2.7 Theorem. If (fn ) is a sequence of nonnegative measurable functions such that fn ↑ f
on X, then Z Z
f dµ = lim fn dµ.
n
R R R
Proof. ByR 2.2.6, f is measurable. Moreover,
R fn dµ ≤ fn+1 dµ ≤ f dµ for all n, hence
R := lim fn dµ exists in R and L ≤ f dµ. For the reverse inequality, it suffices to show that
L
Pgmdµ ≤ L for any simple function g with 0 ≤ g ≤ f . Let g have the standard representation
j=1 aj 1Aj and set En := {fn ≥ rg}, where 0 < r < 1. Then
m
X
fn ≥ rg1En = r aj 1En ∩Aj
j=1

and so Z m
X
fn dµ ≥ r aj µ(En ∩ Aj ).
j=1

Letting n → ∞ in the last inequality and noting that En ↑ X, we obtain


Xm Z
L≥r aj µ(Aj ) = r g dµ.
j=1
R
Finally, letting r ↑ 1 we see that L ≥ g dµ.

Linearity of the Integral


To simplify the development in this subsection, we divide the verification of linearity into
three parts. The first part treats the nonnegative case; the second and third parts treat the
real and complex cases, respectively.
3.2.8 Theorem. Let f, g : X → [0, ∞] be measurable and a, b ∈ R+ . Then
Z Z Z
(af + bg) dµ = a f dµ + b g dµ.

In particular, if f and g are integrable then so is af + bg.


Proof. Choose sequences (fn ) and (gn ) of nonnegative simple functions such that fn ↑ f
and gn ↑ g. Then afn + bgn ↑ af + bg, hence, by 3.2.7 and 3.1.1,
Z Z Z Z Z Z
(af + bg) = lim (afn + bgn ) = a lim fn + b lim gn = a f + b g.
n n n

3.2.9 Corollary. If gn is measurable and nonnegative for every n, then


Z X ∞  ∞ Z
X
gn dµ = gn dµ.
n=1 n=1
Pn P∞
Proof. Let fn = j=1 gj and f = n=1 gn . Then 0 ≤ fn ↑ f , so by the monotone
convergence theorem and linearity,
Z Z X n Z ∞ Z
X
f = lim fn = lim gj = gj .
n n
j=1 j=1
94 Principles of Analysis

3.2.10 Corollary. Let h ≥ 0 be measurable. Define a set function ν on F by


Z
ν(E) := h dµ, E ∈ F.
E

Then ν is a measure on F.
Proof. For countable additivity, apply 3.2.9 to gn = 1En · h.
3.2.11 Corollary. Let f, g : X → K be measurable.

(a) f is integrable iff |f | is integrable.

(b) If f is integrable and |g| ≤ |f |, then g is integrable.

(c) If f is integrable and E ∈ F, then f is integrable on E.

Proof. (a) Suppose first that f is R-valued. If f is integrable, then, by definition, f + and
f − are integrable, hence + −
R ± by theR theorem |f | = f+ + f −is integrable. Conversely, if |f | is
integrable, then 0 ≤ f dµ ≤ |f | dµ, hence f and f are integrable.
Now let f be C-valued. If f is integrable then by definition Re f and Im f are integrable.
By the first paragraph, |Re f | and |Im f | are integrable, hence, by the theorem |Re f | + |Im f |
is integrable. Since |f | ≤ |Re f | + |Im f |, |f | is integrable. This proves the necessity of (a). A
similar argument shows that if |f | is integrable, then Re f and Im f are integrable, verifying
the sufficiency.
(b) By part (a), |f | is integrable. The inequality |g| ≤ |f | then implies that |g| is integrable.
By (a) again, g is integrable.
(c) This follows from (b), since |f 1E | ≤ |f |.
We may now prove linearity for the real-valued case:
R
3.2.12 Theorem.
R Let f, g : X → R be measurable, g integrable, and a, b ∈ R. If f du
exists, then (af + bg) dµ exists and
Z Z Z
(af + bg) dµ = a f dµ + b g dµ.

Proof. Suppose first that f is integrable. The identity

(f + g)+ + f − + g − = (f + g)− + f + + g +

and 3.2.8 imply that


Z Z Z Z Z Z
(f + g)+ + f − + g − = (f + g)− + f + + g + .

Since these terms are finite we may rearrange them to obtain


Z Z Z Z Z Z Z Z Z
(f + g) = (f + g)+ − (f + g)− = f + − f − + g + − g − = f + g,

proving additivity.
If a ≥ 0, then (af )+ = af + and (af )− = af − , hence, by 3.2.8,
Z Z Z Z Z Z
+ − + −
af = (af ) − (af ) = a f − a f = a f.
Integration 95

Also, since (−f )+ = f − and (−f )− = f + ,


Z Z Z Z Z Z
+ − − +
(−f ) = (−f ) − (−f ) = f − f = − f.

Therefore, if a < 0, Z Z Z Z
af = (−a)(−f ) = −a (−f ) = a f.

This proves linearity if both f and g are integrable.


Now suppose that f is not integrable but that the integral of f exists. There are two
possibilities:
R R
(i) f − < ∞ and f + = ∞.
R R
(ii) f + < ∞ and f − = ∞.
Suppose (i) holds. Since Z Z

(f + g) ≤ (f − + g − ) < ∞,
R R
(f + g) is defined. If (f + g)+ < ∞, then (f + g) would be integrable, hence, by the first
part of the proof, so would f + = (f + g) + f − − g, contrary to our assumption. Therefore,
Z Z Z
(f + g) = ∞ = f + g.
R R R R
Also, if a > 0 (a < 0), then af and a f both equal ∞ (−∞), hence af = a f . This
proves linearity in case (i). Case (ii) is similar (or apply case (i) to −f ).
Linearity in the complex case follows from the preceding theorem by considering real and
imaginary parts of f :
3.2.13 Theorem. Let f, g : X → C be integrable and let α, β ∈ C. Then αf + βg is
integrable and Z Z Z
(αf + βg) dµ = α f dµ + β g dµ.

Proof. By 3.2.8, |α| |f | + |β| |g| is integrable. Since |αf + βg| ≤ |α| |f | + |β| |g|, by 3.2.11
αf + βg is integrable. Now let α = a + i b and set fr = Re f and fi = Im f . Then

αf = a fr − b fi + i [b fr + a fi ],

hence, by 3.2.12 and the definition of the complex integral,


Z Z Z  Z Z  Z Z  Z
αf = a fr − b fi + i b fr + a fi = (a + i b) fr + i fi = α f.
R R R
A similar argument, using 3.2.12 again, shows that (f + g) = f + g.
R R
3.2.14 Corollary. If f : X → C is integrable, then f dµ ≤ |f | dµ.
R R
Proof. Write f dµ in polar form ei θ f dµ , so that
Z Z Z Z Z

f dµ = e−i θ f dµ = e−i θ f dµ = Re(e−i θ f ) dµ ≤ |f | dµ,

R
the third equality because e−i θ f dµ is real.
96 Principles of Analysis

Integration Against an Image Measure


Let (Y, G) be a measurable space and T : (X, F) → (Y, G) a measurable transformation.
The image of µ under T is the measure T (µ) on (Y, G) defined by

T (µ)(E) = µ T −1 (E) , E ∈ G.
Image measures occur frequently in probability theory as distributions of random variables
(see §18.1).
3.2.15 Theorem. Let g : Y → K be G-measurable. Then
Z Z
g dT (µ) = g ◦ T dµ (3.4)
Y X

in the sense that if one side is defined, then so is the other and equality holds.
Proof. Since 1T −1 (A) = 1A ◦ T , (3.4) holds for indicator functions g, hence by linearity
for simple functions. Taking a sequence of nonnegative simple functions increasing to g
and applying the monotone convergence theorem yields (3.4) for nonnegative measurable
functions g. The general case follows by standard arguments.
Applying the theorem to the transformations x → x + z and x → rx on Rd , we have
3.2.16 Corollary. The following are valid in the sense that if one side of an equation is
defined, then so is the other and equality holds.
Z Z Z Z
f (x + y) dλd (x) = f (x) dλd (x) and f (rx) dλd (x) = |r|−d f (x) dλd (x). (3.5)

Properties (3.5) express, respectively, the translation invariance and dilation proper-
ties of the Lebesgue integral. The special case r = −1 gives the reflection invariance of
the integral.

Integration Against a Measure with Density


Let h be a nonnegative F-measurable function. The measure with density h with
respect to µ is defined by
Z
(hµ)(E) := h dµ, E ∈ F.
E

(See 3.2.10.) We also express this by writing d(hµ) = h dµ. Densities arise as Radon-Nikodym
derivatives and in particular as conditional expectations in probability theory. The proof of
the following theorem is similar to that of 3.2.15. The details are left to the reader (Ex. 3.15).
3.2.17 Theorem. Let f be F-measurable. Then
Z Z
f d (hµ) = (f · h) dµ

in the sense that if one side is defined, then so is the other and equality holds.
Note that the Dirac measure δx on R (1.3.3(d)) has no density with respect to λ. Never-
theless, it is customary in physics and elsewhere to write
Z ∞ Z ∞
f (x) = f (y)δ(y − x) dy = f (x + y)δ(y) dy
−∞ −∞

for a symbolic density function δ(·), the so-called Dirac delta function. This interpretation
can be made rigorous using distribution theory. (See §15.1.)
Integration 97

Change of Variables Theorem


For the next theorem, recall that a vector function ϕ = (ϕ1 , . . . , ϕd ) on an open subset U
of Rd is said to be C 1 if the components ϕi have continuous first partial derivatives on U .
The derivative ϕ0 (x) of ϕ at x is the d × d matrix with (i, j)-entry ∂j ϕi . The determinant
of this matrix is called the Jacobian of ϕ at x.
3.2.18 Theorem. Let U and V be open subsets of Rd and let ϕ : U → V be C 1 on U with
C 1 inverse ϕ−1 : V → U . If f is Lebesgue measurable on V , then
Z Z
f (y) dλd (y) = (f ◦ ϕ)(x)| det ϕ0 (x)| dλd (x) (3.6)
V U

in the sense that if one side is defined, then so is the other and then equality holds.
A proof of the theorem is given in Appendix A. Note that for all Lebesgue measurable
functions f ≥ 0 on V ,
Z Z Z

d
f dλ = 0 d
(f ◦ ϕ) · | det ϕ | dλ = (f ◦ ϕ) · det ϕ0 ◦ ϕ−1 ◦ ϕ dλd
V
ZU U
0 −1

= f · | det ϕ | ◦ ϕ dϕ(λd ).
V
−1
Replacing f by the f · | det ϕ0 | ◦ ϕ−1 we have
Z Z
0

−1 −1 d
f · | det ϕ | ◦ ϕ dλ = f dϕ(λd )
V V

hence
dλd
dϕ(λd ) = , (3.7)
| det ϕ0 | ◦ ϕ−1
which combines the notions of density and image measure.

Exercises
3.1 Let f ∈ L1 (R) be positive and a 6= 0 ∈ R. Prove that
Z Z
eiat f (t) dt < f (t) dt.

3.2 Let µ be the infinite series measure of 1.3.3(h). Prove that the equation
Z ∞
X
f dµ = f (k)pk
k=1

holds for any function f : N → C in the sense that if one side is defined, then so is the other
and equality holds. What is the significance of the case pk ≡ 1?

3.3 [↑ 1.40] Let (X, F) be a measurable space and {µn } a sequence of measures such that µn (A)


is a nondecreasing sequence for each A ∈ F. Then µ(A) = limn µn (A) defines a measure on
F. Prove that f is a nonnegative
R µ-integrable
R Borel measurable function on X then f is µn
integrable for every n and f dµ = limn f dµn .

3.4 Let f, g : X → R be integrable. Prove:


(a) A f dµ ≥ A g dµ ∀ A ∈ F ⇒ f ≥ g a.e. g dµ ∀ A ∈ F ⇒ f = g a.e.
R R R R
(b) A
f dµ = A
98 Principles of Analysis

3.5 Let X be uncountable and let F be the σ-field consisting of the countable and cocountable
subsets of X. Let µ be the probability measure on F assigns 0 to countable sets and 1 to
cocountable sets. By 2.33, an F-measurable Rfunction f is constant on some cocountable set.
Show that the constant is unique and equals f dµ.

3.6 Let f : X → C be measurable with P countable range {a1 , a2 , . . .}. Set An := {f = an }. Prove
that f is integrable Riff the series ∞
n=1 an µ(An ) converges absolutely, in which case the value
of the series equals f dµ.

3.7 Let f (x) := bx−1 c, 0 < x ≤ 1. Find


(a) (0,1) [1 + (−1)f (x) ] dλ(x). f p (x) dλ(x), 0 < p < 1.
R R
(b) (0,1)

R 2.2.1] Let dn (x) denote the nth digit in the decimal expansion of a number x ∈ [0, 1). Find
3.8 [↑
d (x) dλ(x).
[0,1] n

3.9 Let an ∈ C, n = 1, . . . , 9. Define f : [0, 1] → C by f (x) = 0 if x is rational and f (x) = an if


x is irrational, where n is the first nonzero digit in the decimal expansion of x. (Assume for
definiteness
R that repeated 9’s are not allowed.) Prove that f is Borel measurable and calculate
[0,1]
f dλ.

3.10 Let an ∈ C, n = 1, 2, . . . and define f : [0, 1) → C by f (x) = 0 if x is rational and f (x) = an if


x is irrational and the first nonzero digit in the decimal expansion of x occurs at the nth place.
(Assume for definiteness that repeated 9’s are not allowed.) Prove that f is Borel measurable
and calculate [0,1) f dλ whenever it is defined. In particular, show that if an = 2n then the
R

integral is 9/4.
P∞ −p
3.11 Let f be Lebesgue integrable on Rd and |f | > 0. Prove that the series
R
n=1 n f (nx)
converges absolutely a.e. on R iff p > 1 − d.
d

3.12 Let µ be a Lebesgue-Stieltjes measure on R such that for all integrable functions f ,
Z Z
f (x + y) dµ(x) = 2y f (x) dµ(x) y ∈ R.

Find the distribution function for µ.

3.13 [↓ 4.3.4] Let f ≥ 0 be µ-integrable. Prove that for each ε > 0 there exists a δ > 0 such that
Z
E ∈ F and µ(E) < δ ⇒ f dµ < ε.
E

Conclude that if {En } is a sequence in F with µ(En ) → 0, then


R
En
f dµ → 0. JBegin with
simple functions.K

3.14 Let f, g : X → (0, ∞) be measurable. Prove:


Z Z
−1 k
(1 + k f ) g exp (−f ) dµ → g dµ.

3.15 Prove 3.2.17.

3.16 Let dµ = h dν, where h is positive, finite and measurable. Show that dν = h−1 dµ

3.17 [↓ 4.3.4] Let f : X → K be µ integrable and ε > 0. Prove:


(a) The set A = {|f | ≥ ε} has finite measure.
Z Z
(b) There exists B ∈ F with µ(B) < ∞ such that f dµ −

f dµ < ε.
B
Integration 99

3.18 Let µ be a probability measure, fR positive and integrable,


R and gn measurable such that
0 ≤ gn ≤ C for all n. Prove that f gn dµ → 0 iff gn dµ → 0. JChoose a and b so that
µ{f < a} and µ{f > b} are small.K

3.19 [↑ 3.13] Let fn , f : X → C be integrable and En , E ∈ F such that limn |fn − f | dµ = 0 and
R

limn µ(En 4 E) = 0. Prove that


Z Z
lim fn dµ = f dµ.
n En E

3.20 Let f1 , f2 . . . , fn be positive and measurable and set f := f1 ∨ · · · ∨ fn . Prove:


Z n Z
X
f dµ ≤ fj dµ, t > 0.
{f >t} j=1 {fj >t}

JConsider Bj := {f = fj } ∩ {fj > t}.K

3.21 Let f be measurable, ε > 0, and p > 0. Prove that


Z
1
µ{|f | ≥ ε}) ≤ |f |p dµ.
εp

3.22 Let (X, F, µ) be a finite measure space and let L0 = L0 (X, F, µ) denote the linear space of all
measurable functions f : X → K. Show that
Z
|f − g|
d(f, g) = dµ, f, g ∈ L0 ,
1 + |f − g|

defines a metric on L0 , where we identify functions equal a.e. Show also that convergence in
this metric is convergence in measure.
Z
1
3.23 Let f : X → R be µ-integrable. Prove that a ≤ f ≤ b a.e iff f dµ ∈ [a, b] for all A ∈ F
µ(A) A
with 0 < µ(A) < ∞.

3.24 [↑ 3.21] Let f be µ-integrable. Show that the set {f 6= 0} is σ-finite.

3.25 Show that (X, F, µ) is σ-finite iff there exists a positive integrable function f on X.

3.26 Let I be an arbitrary index set. For i ∈ I and ai ∈ [0, ∞], define the extended real number
( )
X X
ai := sup aj : F ⊆ I, F finite .
i∈I j∈F

X ∞
X
(a) Show that there exists a sequence {in } in I such that ai = ain .
i∈I n=1

(b) Let (Xi , Fi , µi ), i ∈ I, be a family of measure spaces, where the sets Xi are disjoint. The
direct sum of these measure spaces is the triple (X, F, µ), where
[ X
X := Xi , F := {E ⊆ X : E ∩ Xi ∈ Fi ∀ i ∈ I}, µ(E) := µi (E ∩ Xi ).
i∈I i∈I

Verify that (X, F, µ) is a measure space.


(c) Show that for a nonnegative, F-measurable function f ,
Z XZ
f dµ = f X dµi .
i
i∈I

(d) Show that a σ-finite measure space is a direct sum.


100 Principles of Analysis

3.27 Let µ be a Lebesgue-Stieltjes


R measure on B(Rd ) and let f : Rd → C be Borel measurable and
µ-integrable such that I f dµ = 0 for all closed bounded intervals I. Prove that f = 0 a.e.
JReduce to the real case and consider f ± . Use 3.2.10 and 1.8.1.K

3.28 (Weighted mean value theorem for integrals). Let µ be a Lebesgue-Stieltjes measure on B(Rd )
and E ⊆ Rd compact and connected. Let f, g : E → R with g µ-integrable and f continuous. If
g does not change sign on E, show that for some c ∈ E.
Z Z
f g dµ = f (c) g dµ.
E E

P∞ f be measurable and µ(X) < ∞. Set An = {|f | ≥ n}. Prove that f is integrable iff
3.29 Let
n=1 µ(An ) converges, in which case limn nµ(An ) = 0. JConsider Bn := {n ≤ |f | < n + 1}.K

3.30 Let f ≥ 0 be Lebesgue integrable on [1, ∞). Prove that ∞


P
n=1 f (x + n) is integrable on [0, 1].
Conclude that the series converges a.e. on [1, ∞).

3.31 Let µ be a Lebesgue-Stieltjes measure on R with distribution function F and let T : R → R be


continuous and strictly increasing with T (R) = R. Find the distribution function of T (µ).

3.32 Let (Xi , Fi ), i = 1, 2, be measurable spaces and T : (X1 , F1 ) → (X2 , F2 ) measurable with
measurable inverse. Let µ be a measure on (X1 , F1 ) and let h ≥ 0 be F1 -measurable. Show that
T (hµ) = (h ◦ T −1 )T (µ).

3.33 Let V is a linear subspace of Rd of dimension m < d. Use the change of variables theorem to
show that λd (V) = 0. JConstruct a suitable linear transformation.K

R X be a metric space and µ, µ1 , µ2 , . . . finite measures on B(X) such that limn f dµn =
R
3.34 Let
f dµ for all bounded continuous f : X → R. Carry out the following steps to show that
limn µn (E) = µ(E) for all E ∈ B(X) with µ bd(E) = 0.


(a) Show that for each open U ⊆ X, there exists a sequence of closed sets Cn ↑ U .
(b) Referring to (a), show that there exist bounded continuous functions fk ↑ 1U .
R
(c) Show that fk du ≤ limn µn (U ) and hence µ(U ) ≤ limn µn (U ).
(d) Apply (c) to U = int(E) and U = X \ cl(E) to obtain the desired conclusion.

3.3 Connections with the Riemann Integral on Rd


As noted in the introduction to Chapter 1, the Lebesgue integral has several distinct
advantages over the Riemann integral. First (proper) Riemann integration takes place on
compact subintervals of Rd while no such restriction is placed on the Lebesgue integral.
Second, the class of functions that are Lebesgue integrable on compact intervals is much
larger than the class of Riemann integrable functions. Third, and perhaps most importantly,
the Lebesgue theory makes available powerful tools in the form of limit theorems such as the
monotone convergence theorem and the dominated convergence theorem, leading to many
important results in analysis and its applications. Nevertheless, the Riemann integral still
plays an important role in mathematics and the sciences and as such is worthy of discussion
here. In this section we give a brief description of the d-dimensional Riemann integral and
compare it to the Lebesgue integral.
Integration 101

The Darboux Integral


Let f be a bounded, real-valued function on a d-dimensional interval [a, b]. For each
j, let Pj be a partition of the jth coordinate interval [aj , bj ] of [a, b]. The collection of
subintervals [x, y] of [aj , bj ] produced by the partition will also be denoted by Pj . Points of
Pj in (aj , bj ) are called interior points of Pj . Taking Cartesian products of subintervals
of the partitions Pj produces what we shall loosely call a partition of [a, b] and denote by
P = P1 × · · · × Pd . For ease of notation, we set |I| := λd (I) for intervals I. The lower and
upper (Darboux) sums of f over P are defined, respectively, by
X X
S(f, P) = mI |I|, mI := inf f (x) and S(f, P) = MI |I|, MI := sup f (x).
x∈I x∈I
I∈P I∈P

The lower and upper (Darboux) integrals of f are defined, respectively, by


Z b Z b
f := sup S(f, P) and f := inf S(f, P),
a P a P

where the supremum and infimum are taken over all partitions P of [a, b]. If the upper and
lower integrals are equal, then f is said to be Darboux-integrable on [a, b], the common
Rb
value of these integrals then being denoted by a f .
Rb
For a limit description of a f , we need the following notions: A refinement of P =
P1 × · · · × Pd is a partition Q = Q1 × · · · × Qd of [a, b] such that, as a sets of points, Qj ⊇ Pj
for each j. Every member I of P is then a union of members J of Q, and because boundaries
of intervals have Lebesgue measure zero,
X
|I| = |J|.
J∈Q, J⊆I

The common refinement of partitions P and Q is the partition of [a, b] whose jth
coordinate partition consists of the points in Pj ∪ Qj . The following lemma shows that
taking refinements decreases the difference of upper and lower sums.
3.3.1 Lemma. If Q is a refinement of P, then

S(f, P) ≤ S(f, Q) ≤ S(f, Q) ≤ S(f, P).

Proof. The second inequality is clear, and the first inequality follows from the third by
considering −f . For the third inequality, we have
X X X X X
S(f, P) = MI |I| = MI |J| ≥ MJ |J| = S(f, Q).
I∈P I∈P J∈Q,J⊆I I∈P J∈Q,J⊆I

Rb R b
3.3.2 Lemma. For any partition P of [a, b], S(f, P) ≤ a
f≤ a
f ≤ S(f, P).
Proof. The first and last inequalities are immediate from the definition of lower and upper
integrals. For the middle inequality, let P and Q be partitions of [a, b], and let R be a
refinement of both P and Q. By 3.3.1,

S(f, P) ≤ S(f, R) ≤ S(f, R) ≤ S(f, Q).

Taking the supremum over P and the infimum over Q yields the desired inequality.
3.3.3 Corollary. A bounded function f : [a, b] → Rd is Darboux integrable iff for each
ε > 0 there exists a partition P of [a, b] such that S(f, P) − S(f, P) < ε.
102 Principles of Analysis

We may now describe the integral as a limit of Darboux sums. Given L ∈ R and a
real-valued function F (P) of partitions P of [a, b], we write

L = lim F (P)
P

if, given ε > 0, there exists a partition Pε such that |F (P) − L| < ε for all partitions P that
refine Pε . By applying standard techniques, one easily shows that such limits are unique
and have the usual combinatorial properties.1 Using this notion we can give the following
characterization of the Darboux integral. The proof is left as an exercise.
3.3.4 Theorem. A bounded function f : [a, b] → Rd is Darboux integrable iff the limits
Rb
limP S(f, P) and limP S(f, P) exist and are equal. In this case, their common value is a f .
A more useful limit characterization of the Darboux integral may be given in terms of
the following. The mesh of a partition P is the value

kPk := max{y − x : [x, y] ∈ Pj , 1 ≤ j ≤ d}.

The Darboux integral may be expressed as a limit of Darboux sums as kPk → 0. For this
we need the following technical lemma:
3.3.5 Lemma. Let P0 = P01 × · · · × P0d be a partition of [a, b]. Then there exist a positive
constant C such that for all partitions P with kPk sufficiently small,

S(f, P) ≤ S(f, P0 ) + CkPk.

Proof. Let P0 = P01 × · · · × P0d and let P = P1 × · · · × Pd with kPk sufficiently small so that
each interval I = I1 × · · · × Id of P has the property that either some Ij contains exactly
one interior point of P0j , or no Ij contains such a point. Let Jα denote the d-dimensional
intervals of P of the former type and Jβ the intervals of the latter type. The construction
is illustrated in the figure, where [x, y] is a coordinate interval of several Jα ’s and z is an
interior point of P20 . Let N be the number of intervals of type Jα and note that N depends
only on P0 . Let P00 denote the common refinement of P and P0 . An interval in P00 is either a

Jα ’s and Jγ ’s
z }| {
b2
Jβ Jβ
y Jα ’s
z
x
} and
Jγ ’s
Jβ Jβ
a2
a1 b1
FIGURE 3.1: The intervals of P (solid), P (dotted), and P00 .
0

Jβ or was formed from a Jα . Denote intervals of the latter type by Jγ . Since the introduction
of a point into a jth coordinate interval of a Jα results in two jth coordinate intervals, each
1 If the set of partitions of [a, b] is partially ordered by refinement, then the described convergence is

simply convergence of the net (F (P))P .


Integration 103

Jα can produce at most 2d Jγ ’s. Thus the number of Jγ ’s is at most 2d N . Since the terms
of S(f, P) and S(f, P00 ) corresponding to intervals Jβ are identical, we have
X X
S(f, P) − S(f, P00 ) = S(f, P) + S(−f, P00 ) = MJα (f ) |Jα | + mJγ (−f ) |Jγ |. (†)
α γ

Since the number of terms in the first sum in (†) is no more than N , we see that this sum
d
is majorized by M N kPk ≤ M N kPk, where M = kf k∞ and kPk is taken < 1. Similarly,
the second sum in (†) is majorized by M N 2d P00 ≤ M N 2d kPk, the inequality following
from the fact that P00 is a refinement of P. Thus there exists a constant C depending
only on P0 and f such that S(f, P) − S(f, P00 ) ≤ C kPk. Since P00 is a refinement of P0 ,
S(f, P) − S(f, P0 ) ≤ CkPk.
We may now prove the following complement to 3.3.4.
3.3.6 Theorem. A bounded function f : [a, b] → R is Darboux integrable iff the limits
limkPk→0 S(f, P) and limkPk→0 S(f, P) exist and are equal. In this case, their common value
Rb
is a f .
Proof. It suffices to prove that
Z b Z b
f = lim S(f, P) and f = lim S(f, P).
a kPk→0 a kPk→0

Rb
Given ε > 0, choose a partition P0 such that S(f, P0 ) < a f + ε. In the notation of 3.3.5,
for all partitions P with sufficiently small mesh,
Z b Z b
0
f ≤ S(f, P) ≤ S(f, P ) + C kPk < f + ε + C kPk.
a a
R b
Therefore S(f, P) − a f < 2ε for all P with sufficiently small mesh. This establishes the
first limit. The second follows from the first by considering −f .

The Riemann Integral


Let f be a bounded, real-valued function on a d-dimensional interval [a, b]. Let P be
a partition of [a, b], and for each interval I in P choose an arbitrary point ξ I ∈ I. Set
ξ := {ξ I : I ∈ P}. The pair (P, ξ) is called a tagged partition of [a, b] and the quantity
X
S(f, P, ξ) := f (ξ I )|I|
I∈P

is called a Riemann sum of f . A bounded function f : [a, b] → R is said to be Riemann


integrable iff the limit
R(f ) := lim S(f, P, ξ)
kPk→0

exists in the sense that, given ε > 0 there exists a δ > 0 such that

|S(f, P, ξ) − R(f )| < ε for all partitions P with kP| < ε and all choices of ξ.

The connection between the Darboux and Riemann integrals is given in the following result.
3.3.7 Theorem. A bounded function f : [a, b] → R is Darboux integrable iff it is Riemann
Rb
integrable. In this case, R(f ) = a f .
104 Principles of Analysis

Proof. Since S(f, P) ≤ S(f, P, ξ) ≤ S(f, P) for all ξ, the necessity follows from 3.3.6. For
the sufficiency, given ε > 0 choose a partition Pε such that

R(f ) − ε < S(f, Pε , ξ) < R(f ) + ε for all choices of ξ.

Since ξ is arbitrary, the approximation properties of suprema and infima imply that

R(f ) − ε ≤ S(f, Pε ) ≤ S(f, Pε ) ≤ R(f ) + ε.

Therefore, S(f, Pε ) − S(f, Pε ) < 2ε, hence f is Darboux integrable by 3.3.3.

Measure Zero Criterion for Riemann Integrability


Here is the connection between the Riemann integral and the Lebesgue integral on [a, b].
3.3.8 Theorem. Let f : [a, b] → R be bounded and let D be the set of discontinuities of f .
Then f is Riemann integrable iff λ(D) = 0. In this case, f is Lebesgue measurable and the
Riemann and Lebesgue integrals of f on [a, b] are equal.
Proof. We carry out the proof in the following steps:
(1) There exists a sequence of partitions Pn of [a, b] such that Pn+1 is a refinement of Pn ,
kPn k → 0, and the following hold:
Z b Z b
lim S(f, Pn ) = f and lim S(f, Pn ) = f.
n→∞ a n→∞ a

JBy the approximation property of infima and suprema, for each n there exist partitions
P0n and P00n of [a, b] such that
Z b Z b Z b Z b
1 1
f− < S(f, P0n ) ≤ f≤ f ≤ S(f, P00n ) < f+ .
a n a a a n

Since refinements decrease upper sums and increase lower sums, the inequalities still
hold if P0n and P00n are replaced by refinements. Now let P1 be a refinement of P01 and P001
with kP1 k < 1, then let P2 be a refinement of P1 , P01 , and P001 with kP2 k < 1/2, etc.K

(2) Define Borel simple functions


X X
hn = mI 1I and gn = MI 1I .
I∈Pn I∈Pn

Then hn ↑ h, gn ↓ g, h and g are bounded Borel functions, h ≤ f ≤ g, and


Z b Z Z b Z
f= h dλ and f= g dλ. (†)
a [a,b] a [a,b]

JClearly, h1 ≤ h2 ≤ · · · ≤ f ≤ . . . ≤ g2 ≤ g1 , hence h and g exist, are bounded, and


h ≤ f ≤ g. Moreover,
R as limits of Borel functions,
R h and g are themselves Borel functions.
Since S(f, Pn ) = [a,b] hn and S(f, Pn ) = [a,b] gn , (†) follows from step (1) and the
monotone convergence theorem applied to the (nonnegative) differences h − hn and
gn − g.K
Integration 105

(3) f is Riemann integrable on [a, b] iff g = h a.e. In this case, f is Lebesgue measurable
Rb R
and a
f= [a,b]
f.
R
JFrom (†), f is Riemann integrable iff [a,b] (g − h) = 0, which is equivalent to g = h
a.e. If the latter holds, then {f 6= h} and {f 6= g} are null sets, hence f is Lebesgue
measurable and the integrals are equal.K
(4) If f is continuous at x ∈ [a, b], then h(x) = g(x).
JGiven ε > 0, choose δ > 0 such that d(x, y) < δ implies |f (x) − f (y)| < ε, where d is
the metric on Rd defined by d(x, y) = maxj |xj − yj |. Choose m so that kPn k < δ for
all n ≥ m. For such n and for x ∈ I ∈ Pn ,

f (x) − ε < f (y) < f (x) + ε for all y ∈ I.

Taking the infimum and supremum of f (y), we have

f (x) − ε ≤ hn (x) = mI ≤ MI = gn (x) ≤ f (x) + ε.

Letting n → ∞ yields

f (x) − ε ≤ h(x) ≤ g(x) ≤ f (x) + ε.

Therefore, g(x) = h(x).K

(5) Let x ∈ [a, b] such that x is not on the boundary of any subinterval of any Pn . If
h(x) = g(x), then f is continuous at x.
JGiven ε > 0, choose n such that

|gn (x) − g(x)| < ε/2 and |hn (x) − h(x)| < ε/2.

Suppose that x is in the interior of I ∈ Pn . For all y in the interior of I, hn (y) = mI


and gn (y) = MI , hence

h(x) − ε/2 < hn (x) ≤ f (y) ≤ gn (x) < g(x) + ε/2 = h(x) + ε/2.

Therefore, |f (x) − f (y)| < ε, hence f is continuous at x.K

(6) Set A = {x : g(x) 6= h(x)}. Then λd (A) = λd (D).


JLet B be the union of all the boundaries in the partitions Pn . Since there are countably
many boundaries, λ(B) = 0. By steps (4) and (5), A ⊆ D ⊆ A ∪ B. Therefore, λd (A) =
λd (D).K

To complete the proof of the theorem, observe that, by step (3), f is Riemann integrable
iff λd (A) = 0. By step (6), this occurs iff λd (D) = 0.

3.3.9 Corollary. If F : Rn → R is continuous and the functions f1 . . . , fn are Riemann


integrable on [a, b], then F (f1 , . . . , fn ) is Riemann integrable on [a, b]. In particular, if c ∈ R
and p > 0, then the functions cf1 , f1 + f2 , f1 f2 , f1 ∨ f2 , f2 ∧ f2 , f1± , and |f1 |p are Riemann
integrable on [a, b].
106 Principles of Analysis

Improper Riemann Integrals


A real-valued function g is said to be locally Riemann integrable on an interval I if g
is Riemann integrable on every compact subinterval of I. For example, a continuous function
is locally integrable.
3.3.10 Theorem. Let g be locally Riemann integrable on [a, b) (where b could be infinite).
Then g is Lebesgue measurable on [a, b). Moreover:
(a) If g ≥ 0 and g is improperly integrable on [a, b), then g is Lebesgue integrable on [a, b)
and Z Z
b
g(x) dx = g dλ. (3.8)
a [a,b)

(b) If g is Lebesgue integrable on [a, b), then g is improperly integrable on [a, b) and (3.8)
holds.
Proof. That g is Lebesgue measurable on [a, b) follows from 3.3.8. To prove (a), let bn ↑ b
and let D denote the  set of discontinuities of g on [a, b). Since g is Riemann integrable on
[a, bn ], λ [a, bn ] ∩ D = 0. Then 1[a,bn ] g is Lebesgue measurable for every n and
Z bn Z
g(x) dx = 1[a,bn ] g dλ.
a

Taking limits, using 3.2.7, we see that g is Lebesgue measurable on [a, b) and (3.8) holds.
For (b) note that
R n by 3.3.9 theR functions g ± are locally Riemann integrable on [a, b).
Therefore, by (a), a g (x) dx = [a,n] g ± dλ for all n, and an application of the monotone
±

convergence theorem completes the proof.


Rb
Notation. In view of 3.3.8 and 3.3.10 one frequently writes a g(x) dx for the integral
R R∞ R
[a,b]
g dλ and a g(x) dx for the integral [a,∞) g dλ.
The following example show that part (a) of the above theorem does not necessarily hold
for functions that are not nonnegative.
3.3.11 Example. Consider the function g(x) = x−1 sin x on the interval [1, ∞). Integrating
by parts, we have Z t Z t
cos x t sin x
g(x) dx = − + dx.
1 x 1 1 x2
R∞
Since the integral 1 x−2 dx converges, the right side converges as t → ∞, hence g is
improperly integrable on [1, ∞). On the other hand,
Z n Z
X n
X Z kπ n

1 2X1
|g| dλ = |g(x)| dx > | sin x| dx = ,
[π,nπ] (k−1)π kπ (k−1)π π k
k=2 k=2 k=2

hence g is not Lebesgue integrable. It may be shown (Ex. 3.64) that


Z ∞
sin x π
dx = . ♦
0 x 2
Integration 107

Exercises
3.35 Let µ be a Lebesgue-Stieltjes measure whose distribution function F has a positive continuous
derivative on R. Show that dµ = F 0 dλ.

3.36 For a bounded function f : [a, b] → R, set


Z b  Z b 
i(f ) = inf g dλ : g ∈ C[a, b], g ≥ f , s(f ) = sup h dλ : h ∈ C[a, b], h ≤ f .
a a

Z b Z b
Show that i(f ) = f dλ and s(f ) = f dλ.
a a

3.37 Show that a bounded function f on [a, b] is Riemann integrable iff there exists a real number L
such that S(f, Pn , ξ n ) → L for each sequence of tagged partitions (Pn , ξ n ) with kPn k → 0.
Z ∞
3.38 The gamma function is defined by Γ(x) = tx−1 e−t dt, x > 0.
0
(a) Show that the integral converges.
(b) Integrate by parts to show that Γ(x + 1) = xΓ(x) for all x > 0.
(c) Show that Γ(n + 1) = n! for all n ∈ N.

     
R ∞ −t2 1 3 5
(d) Given that 0 e dt = π/2, evaluate Γ ,Γ , and Γ .
2 2 2
(e) The formula Γ(x) = x−1 Γ(x + 1) may
 beused to extend thegamma
 function
√ to noninteger
1 √ 3 4 π
values x < 0. Use this to show that Γ − = −2 π and Γ − = .
2 2 3
3.39 Show that for n ≥ 2,

(n − 1)(n − 3) · · · 4 · 2

π/2 π/2 1 , n odd,
xn
Z Z Z 
n(n − 2) · · · 5 · 3

sinn x dx = cosn x dx = √ dx =
0 0 0 1 − x2  π (n − 1)(n − 3) · · · 5 · 3 ,
 n even.
2 n(n − 2) · · · 4 · 2

3.40 Let n ∈ Z+ . Verify the formulas


Z ∞
(a) xn e−x dx = n!
0
Z ∞
2
xn e−x /2 dx = (n − 1)(n − 3) · · · 1 · π/2 if n ≥ 2 is even,
p
(b)
0
= (n − 1)(n − 3) · · · 2 if n ≥ 3 is odd.
Z ∞ 2
Z ∞
2
(c) xn e−x dx = 2−(n+1)/2 xn e−x /2 dx.
0 0
R∞ 2
e−x /2
p
JFor (b), use 0
dt = π/2.K

3.41 Show that if fn is Riemann integrable on [a, b] and fn → f uniformly on [a, b], then f is
Rb Rb
Riemann integrable and a fn → a f . Show also that the assertion is false if the convergence
is merely pointwise.

3.42 Show that the function



X
(2n)−p 1[n,n+1) − (2n + 1)−p 1[n+1,n+2)
 
f=
n=1

is improperly Riemann integrable on [1, ∞) for any p > 0, but is Lebesgue integrable iff p > 1.
108 Principles of Analysis

3.43 Show that (x−1 sin x)2 extended continuously to [0, ∞) is Lebesgue integrable and improperly
Riemann integrable on [0, ∞) and
Z ∞ 2 Z ∞
sin x sin x
dx = dx.
0 x 0 x
3.44 Let p > 0. Show that
Z 1 ∞  
dx X 1 1
= − .
0 1 + xp n=0
2np + 1 (2n + 1)p + 1
Show that for suitable p the formula yields
1 1 1 π 1 1 1
ln 2 = 1 − + − + · · · and = 1 − + − + ···
2 3 4 4 3 5 7
JUse the identity (1 + y)−1 = ∞
 2n
− y (2n+1) , 0 ≤ y < 1.K
P 
n=0 y

3.45 [↑ 3.39] Show that


Z π/2 ∞
dx X (2n)!
=π n+1 (n + 1)! n!
.
0 1 + sin x n=0
4

3.46 [↑ 3.40] Let p ∈ N.


(a) Show that xp /(ex − 1), extended continuously to [0, ∞), is Lebesgue integrable and
Z ∞ ∞
xp X 1
x
dx = p! .
0 e − 1 n=1
n p+1

2
(b) Show that x2p /(ex − 1), extended continuously to [0, ∞), is Lebesgue integrable and
Z ∞ √ ∞
x2p π(2p − 1)! X 1
2 dx = p (p − 1)! p+1/2
.
0 e −1
x 4 n=1
n

JUse (z − 1)−1 = ∞ −n
P
n=1 z , z > 1.K

3.4 Convergence Theorems


The General Monotone Convergence Theorem
The monotone convergence theorem established in § 3.2 is one of several theorems that
give sufficient conditions for the convergence of a sequence of integrals. In this section we
consider three such theorems. Other convergence theorems are treated in the next chapter.
We begin with the following extension of 3.2.7.
R
3.4.1 Theorem. Let fn and f be measurable, fn ↑ f a.e., and f1− dµ < ∞. Then
Z Z
f dµ = lim fn dµ. (3.9)
n

Proof. By hypothesis, there exists a set E ∈ F such that fn ↑ f on E and µ(E c ) = 0.


Replacing fn by fn 1E , we may assume R that fn ↑ f on X.
− − − −
R +Since 0 ≤ f ≤ f n ≤ f1 and f1 dµ < ∞, the integrals in (3.9) are R defined. If
f1 dµ = ∞, then from f1+ ≤ fn+ ≤ f + we see that each side of (3.9) is ∞. If f1+ dµ < ∞,
then f1 is integrable and we may apply 3.2.7 to the sequence of nonnegative functions fn − f1
to obtain Z Z Z Z Z Z
fn = (fn − f1 ) + f1 → (f − f1 ) + f1 = f.
Integration 109

Fatou’s Lemma
The following result is useful in cases where limn fn does not exist.
R
3.4.2 Theorem. If fn and g are measurable, fn ≥ g a.e. for all n, and g − du < ∞, then
Z Z
lim fn dµ ≤ lim fn dµ. (3.10)
n n

In particular, the inequality holds if fn ≥ 0 a.e. for all n.


R R − R −
Proof. Let hn = inf j≥n fj and h = limn fn . Then h− 1 ≤ f1 ≤ g < ∞, hn ≤ fn , and
hn ↑ h a.e. By 3.4.1 applied to the sequence (hn ),
Z Z Z Z Z
lim fn = h = lim hn = lim hn ≤ lim fn .
n n n n

The inequality in (3.10) may be strict. For example, if µ = λ and fn = n2 1[0,1/n] , then
the left side of (3.10) is zero while the right side is ∞.

The Dominated Convergence Theorem


The third convergence theorem in this section is typically used in cases where a sequence
of functions may not be monotone.
3.4.3 Theorem. Let g ≥ 0 be integrable and let (fn : X → K) be a sequence of measurable
functions
R such
R that |fn | ≤ g a.e. for all n. If fn → f a.e., then f is integrable and
fn dµ → f dµ.
Proof. By considering real and imaginary parts, we may assume that the functions fn and
f are R-valued. Since |fn | ≤ g a.e. and |f | ≤ g a.e., fn and f are integrable by 3.2.11. From
Fatou’s lemma applied to g ± fn (≥ 0), we have
Z Z Z Z
(g + f ) dµ ≤ lim (g + fn ) dµ = g dµ + lim fn dµ
n n

and Z Z Z Z
(g − f ) dµ ≤ lim (g − fn ) dµ = g dµ − lim fn dµ.
n n
R
Subtracting g dµ in each inequality yields
Z Z Z Z
f dµ ≤ lim fn dµ ≤ lim fn dµ ≤ f dµ.
n n

We note that the hypothesis that the functions |fn | be dominated R by an integrable
function cannot be omitted. For example, 1[n,2n] → 0 on R, but 1[n,2n] dλ → ∞.
We conclude this section with two applications of the dominated convergence theorem.
The first, whose proof is an exercise for the reader, describes a continuity property of integrals.
The second gives sufficient conditions for differentiating “under the integral sign.”
3.4.4 Corollary. Let I be an open d-dimensional interval and let f be B(I) ⊗ F-measurable
on I × X such that f (t, x) is continuous in t for each x ∈ X. RIf there exists an integrable
function g on X such that |f (t, x)| ≤ g(x) for all t and x, then f (t, x) dµ(x) is continuous
in t.
110 Principles of Analysis

3.4.5 Corollary. Let I be an open d-dimensional interval and let f be B(I) ⊗ F-measurable
on I × X such that for each t in I the function f (t, ·) is µ-integrable. Let α be a fixed
multi-index and assume that for all multi-indices β with |β| ≤ |α| the derivative ∂tβ f (t, x)
exists for each t and x and is measurable in x for each fixed t. If there exists an integrable
function g on X such that |∂tβ f (t, x)| ≤ g(x) for all such β, t and x, then
Z Z
∂tα f (t, x) dµ(x) = ∂tα f (t, x) dµ(x).

Proof. We prove the right-hand derivative version for the case d = 1. The general formula
follows by induction. Fix t ∈ I and let tn ↓ t. Set
Z
f (tn , x) − f (t, x)
H(t) = f (t, x) dµ(x) and hn (x) = .
tn − t

By the mean value theorem, hn (x) = ft (s, x) for some s = s(n, x) ∈ (t, tn ). Then |hn | ≤ g
and hn (x) → ft (t, x), hence
Z Z
H(tn ) − H(t)
= hn (x) dµ(x) → ft (t, x) dµ(x).
tn − t
R
This shows that the right-hand derivative of H exists at x and equals ft (t, x) dµ(x).

Exercises
3.47 Find all p > 0 for which there is an λ integrable function g on R+ such that n−p I[0,n] ≤ g for
all n.
Rb
3.48 Let µ be a Lebesgue-Stieltjes measure on B(R) and f µ-integrable. Show that limn ann f dµ = 0
for any pair of sequences (an ) and (bn ) with an < bn and an → ∞. Show that this may not
hold if f ≥ 0 is not integrable.

3.49 Let µ be a Lebesgue-Stieltjes measure on B(R) and f integrable. Let g be measurable and
bounded on R such that r := limt→∞ g(t) exists and is finite. Show that
Z Z
lim g(x + t)f (t) dµ(t) = r f dµ.
x→∞

3.50 Let µ be a Lebesgue-Stieltjes measure on B(Rd ) and f > 0 µ-integrable. Prove that
Z n Z Z
(a) n ln(1 + n−1 f ) dµ → f dµ. (b) n ln(1 + n−2 f ) dµ → 0.
1/n
Z Z Z
np sinp n−1 f 1/p dµ → f 1/n dµ → µ(E).

(c) f dµ. (d)
E

3.51 [↑ 3.40] Show that


Z ∞ √ Z ∞ √
2 π −x2 /4 2 π 2
(a) e−t cos(xt) dt = e . (b) e−t sin2 (xt) dt = 1 − e−x .
0 2 0 4
JFor (a) use the power series for cos(xt).K

3.52 [↑ 3.46] Show that


Z ∞ ∞ Z ∞ √ X ∞
sin(xt) X x t sin(xt) π x 2
(a) t −1
dt = 2 + x2
. (b) 2 dt = 3/2
e−x /4n .
0 e n=1
n 0 e −1
t 2 n=1
2n
Integration 111

3.53 [↑ 3.38] Prove that the kth derivative of the gamma function is
Z ∞
Γ(k) (x) = tx−1 e−t lnk t dt, x > 0.
0

3.54 Let gn : X → C such


P R P
that n |gn | dµZ < ∞. Show that the series g :=
Z X n gn converges a.e.,
X
is integrable and gn dµ = gn dµ.
n n


X n
X
3.55 Let f : R → K be λ-integrable. Show that the series f (k + x) := lim f (k + x)
n
k=−∞ k=−n
converges absolutely a.e. on R.

3.56 Let f : R → R be Lebesgue integrable on every interval and satisfy f (x + y) = f (x) + f (y) for
all x, y. Show that f (x) = f (1)x for all x. JShow first that f is continuous.K
X
3.57 Let fn : X → [0, ∞) be integrable and fn+1 ≤ fn a.e. for all n. Show that (−1)n+1 fn is
Z X  Z n
X
integrable and that fn dµ = fn dµ.
n n

3.58 Let g be integrable on X and let (fn ) be a sequence of real-valued measurable functions on X
such that |fn | ≤ g. Prove that
Z Z Z Z
lim fn dµ ≤ lim fn dµ ≤ lim fn dµ ≤ lim fn dµ.
n n n n

3.59 Let f,
R g, fn , gnR be real valued and R such that fn → f , gn → g a.e., |fn | ≤ gn a.e.,
R integrable
and gn dµ → g dµ. Prove that fn dµ → f dµ.
a.e. a.e.
3.60 [↑ 3.59] Let fn , gn , hn , Rf , g, h be Rintegrable, fRn ≤ gn ≤ hRn a.e. for allRn, fn → fR, gn → g, and
a.e.
hn → h. Show that if fn dµ → f dµ and hn dµ → h dµ, then gn dµ → g dµ.
a.e.
3.61 Show that the dominated convergence theorem holds if the hypothesis fn → f is replaced by
µ
fn → f .

3.5 Integration against a Product Measure


In this section we construct the product of finitely many measures and prove a theorem
that give conditions under which the corresponding integral may be expressed as an iterated
integral. This result, known as Fubini’s theorem, is fundamental in integration theory, having
both concrete and theoretical applications. Some of these are given in the next section.

Construction of the Product of Two Measures


Let (X, F, µ) and (Y, G, ν) be arbitrary measure spaces. Recall that F ⊗ G denotes the
σ-field generated by the semiring R of measurable rectangles A × B, A ∈ F and B ∈ G. The
following theorem shows that a measure on R may be constructed from the measures µ and
ν in a natural way and then extended to a measure µ ⊗ ν on F ⊗ G, called the product of
the measures µ and ν. For the statement of the theorem, we remind the reader of the
convention 0 · ∞ = ∞ · 0 = 0.
112 Principles of Analysis

3.5.1 Theorem. There exists a measure µ ⊗ ν on (X × Y, F ⊗ G) such that

(µ ⊗ ν)(A × B) = µ(A) · ν(B) for all A ∈ F and B ∈ G. (3.11)

Moreover, if the measure spaces (X, F, µ) and (Y, G, ν) are σ-finite, then the measure µ ⊗ ν
is unique with respect to property (3.11).
Proof. Define µ ⊗ ν on the semiring R by Equation (3.11). We claim that µ ⊗ ν is a
measure on R. Clearly, (µ ⊗ ν)(∅) S = 0. For countable additivity, let (An × Bn ) be a
disjoint sequence
P in R such that n An × Bn = A × B ∈ R. Then for (x, y) ∈ X × Y ,
1A (x)1B (y) = n 1 An
(x)1 B n
(y). P fixed x we can integrate with respect to y and
For
use 3.2.9 to obtain
P 1A (x)ν(B) = n 1An (x)ν(Bn ). Integrating with respect to x yields
(µ ⊗ ν)(A × B) = n (µ ⊗ ν)(An × Bn ), verifying the claim. By 1.6.4, µ ⊗ ν may be extended
to a measure on F ⊗ G. If (X, F, µ) and (Y, G, ν) are σ-finite, then (R, µ ⊗ ν) is σ-finite,
hence uniqueness follows from 1.6.9.
The measure space (X × Y, F ⊗ G, µ ⊗ ν) is called the product of the measure spaces
(X, F, µ) and (Y, G, ν).

Fubini’s Theorem
3.5.2 Theorem (Fubini-Tonelli). Let (X, F, µ) and (Y, G, ν) be σ-finite measure spaces and
let f : X × Y → K be F ⊗ G-measurable.
R R
(a) If f ≥ 0, then the functions X f (x, y) dµ(x) and Y f (x, y) dν(y) are measurable in y
and x, respectively, and
Z ZZ ZZ
f (x, y) d(µ ⊗ ν)(x, y) = f (x, y) dµ(x) dν(y) = f (x, y) dν(y) dµ(x). (3.12)
X×Y Y X X Y

(b) If one of the quantities


Z ZZ ZZ
|f (x, y)| d(µ ⊗ ν)(x, y), |f (x, y)| dµ(x) dν(y), |f (x, y)| dν(y) dµ(x)
X×Y Y X X Y
R
is finite, then so are the other two and (3.12) holds in the sense that R f (x, y) dµ(x) is
defined and finite for a.a y and is integrable with respect to y, and f (x, y) dν(y) is
defined and finite for a.a x and is integrable with respect to x.

Proof. Recall that a measurable function f (x, y) is separably measurable, that is, measurable
in x for each fixed y and measurable in y for each fixed x (Ex. 2.13). Thus the inner integrals
in (3.12) are legitimate.
We now make the following reductions. First, part (b) of the theorem is a consequence
of part (a). Indeed, if one of the inequalities in (b) holds, then f is integrable by part (a)
applied to |f |. By considering real, imaginary, positive, and negative parts, we see that (3.12)
holds. Second, to prove (a) we may assume by the usual arguments that f is an indicator
function. Thus to prove the theorem it suffices to show that for any C ∈ F ⊗ G,
Z Z Z Z
η(C) = 1C (x, y) dµ(x) dν(y) = 1C (x, y) dν(y) dµ(x), where η := µ ⊗ ν. (†)
Y X X Y

For this we may assume that the measure spaces (X, F, µ) and (Y, G, ν) are finite. Indeed,
if (†) holds in the finite case and if Xn ↑ X, where µ(Xn ) < ∞ and ν(Y ) < ∞, then by
Integration 113

considering the measures restricted to F ∩ Xn , and (F ⊗ G) ∩ (Xn × Y ) and applying the


monotone convergence theorem, we see that (†) holds for σ-finite and Y finite. Repeating
the argument with Yn ↑ Y shows that (†) holds for σ-finite X and Y .
To prove (†) for finite measures µ and ν, let H denote the collection of all C ∈ F ⊗ G
for which first equality in (†) holds and the inner integral in that equality is measurable in
y. We show that H is a λ-system (1.5) containing all measurable rectangles A × B. It will
follow from the π-λ theorem (1.2.6) that H = F ⊗ G, verifying the first equality in (†). The
second is proved in a similar manner.
Let A ∈ F, B ∈ G and set C := A × B. Then
Z Z
1C (x, y) dµ(x) = 1A (x)1B (y) dµ(x) = µ(A)1B (y),

which is measurable in y, hence


ZZ
1C (x, y) dµ(x) dν(y) = µ(A)ν(B) = η(C).

Thus H contains all measurable rectangles.


Now let C, D ∈ H with C ⊆ D and let E = D \ C. Then
Z Z Z
1E (x, y) dµ(x) = 1D (x, y) dµ(x) − 1C (x, y) dµ(x),

which is measurable in y and implies that


ZZ ZZ ZZ
1E (x, y) dµ(x) dν(y) = 1D (x, y) dµ(x) dν(y) − 1C (x, y) dµ(x) dν(y)

= η(D) − η(C) = η(E).

Therefore, H is closed under relative differences.


Finally, let Cn ∈ H and Cn ↑ C. Then 1Cn ↑ 1C , hence, by the monotone convergence
theorem, Z Z
1Cn (x, y) dµ(x) ↑ 1C (x, y) dµ(x).
R
Thus 1C (x, y) dµ(x) is measurable in y. Applying the monotone convergence theorem again
yields
ZZ ZZ
η(C) = lim η(Cn ) = lim 1Cn (x, y) dµ(x) dν(y) = 1C (x, y) dµ(x) dν(y).
n n

Therefore, H is closed under increasing unions, completing the proof that H is a λ-system
and establishing the theorem.
P P
Note that a special case of part (a) is the
P interchange of summation in n m amn ,
where amn ≥ 0, even when the double sum m,n amn is infinite.
3.5.3 Remarks. (a) The σ-finiteness hypothesis in Fubini’s theorem is essential: Consider
Lebesgue measure λ and counting measure ν on ([0, 1], B[0, 1]). The diagonal E = {(t, t) :
t ∈ [0, 1]} is closed and so is a member of B[0, 1] ⊗ B[0, 1]. But for all x and y
Z Z

1E (t, y) dλ(t) = λ{y} = 0 and 1E (x, t) dν(t) = ν {x} = 1,

hence the iterated integrals are unequal.


114 Principles of Analysis

(b) Part (b) of the theorem fails if the absolute values on the integrands are removed.
Indeed for Lebesgue measure on [0, 1] we have
Z 1Z 1 Z 1Z 1
x2 − y 2 x2 − y 2 π
2 2 2
dy dx = − 2 + y 2 )2
dx dy = .
0 0 (x + y ) 0 0 (x 4

Thus (x2 − y 2 )(x2 + y 2 )−2 is not integrable on [0, 1] × [0, 1]. (See Ex. 3.71.) ♦

The d-Dimensional Case


The above concepts generalize by induction to finitely many measure spaces (Xi , Fi , µi )
(i = 1, . . . , d). We state the generalizations and leave the verifications to the reader.
3.5.4 Theorem. There exists a measure µ = µ1 ⊗ · · · ⊗ µd on the product measurable space
(X1 × · · · × Xd , F1 ⊗ · · · ⊗ Fd ) such that

µ(A1 × · · · × Ad ) = µ1 (A1 ) · · · µd (Ad ) for all Ai ∈ Fi . (3.13)

Moreover, if the measure spaces (Xi , Fi , µi ) are σ-finite, then µ is unique with respect to
property (3.13).
3.5.5 Example. Consider the measure spaces (Rpi , B(Rpi ), λpi ) (i = 1, . . . , d) and
(Rp , B(Rp ), λp ), where p = p1 + · · · + pd . Since λp1 ⊗ · · · ⊗ λpd = λp on the semiring of
half-open intervals, the measures must be equal on B(Rp1 ) ⊗ · · · ⊗ B(Rpd ) = B(Rp ). ♦
3.5.6 Theorem. Let the measure spaces (Xi , Fi , µi ) be σ-finite and let f : X → R be
F-measurable.
(a) If f ≥ 0, then
Z Z Z
f dµ = ··· f (x1 , . . . , xd ) dµ1 (x1 ) . . . dµd (xd ), (3.14)
R
where f (x1 , . . . , xi , . . . , xd ) dµi (xi ) is measurable in (x1 , . . . , xi−1 , xi+1 , . . . , xd ), and
the iterated integration may be carried out in any of the d ! orders.
(b) If for some permutation (i1 , i2 , . . . , id ) of the indices 1, 2, . . . , d
Z Z
··· |f (xi1 , . . . , xid )| dµi1 (xi1 ) . . . dµid (xid ) < ∞, (3.15)
R R
then f is µ-integrable and (3.14) holds, where · · · f (x1 , . . . , xd ) dµ1 (x1 ) . . . dµi (xi ),
is defined and finite for a.a values of xi+1 , . . . , xd and is integrable in these variables.
3.5.7 Example. In elementary calculus, integration is sometimes carried out on regions
in R3 bounded by surfaces. This idea generalizes to higher dimensions as follows: Given
continuous functions u2 (x1 ) ≤ v2 (x1 ) on E1 := [a, b], and in general continuous functions
uk+1 (x1 , . . . , xk ) ≤ vk+1 (x1 , . . . , xk ) defined on the set

Ek := (x1 , . . . , xk ) : a ≤ x1 ≤ b, u2 (x1 ) ≤ x2 ≤ v2 (x1 ), . . . ,

uk (x1 , . . . , xk−1 ) ≤ xk ≤ vk (x1 , . . . , xk−1 ) ,

then for any integrable f on Ed ,


Z Z b Z v2 (x1 ) Z vd (x1 ,...,xd−1 )
d
f dλ = ··· f (x1 , . . . , xd ) dxd . . . dx2 dx1 . ♦
Ed a u2 (x1 ) ud (x1 ,...,xd )
Integration 115

Exercises
3.62 Show that the product of complete measure spaces need not be complete.
3.63 Let µ be a probability measure on B(Rd ). Find µ(Ix ) dx, where
R

Ix := [x1 , x1 + a1 ] × · · · × [xd , xd + ad ], aj > 0.

3.64 Let a, Rb > 0. Use Fubini’s theorem, the dominated convergence theorem, and the identity

1/x = 0 e−xt dt, x > 0, to prove that
Z ∞ Z ∞ −ax
sin x π e − e−bx
(a) dx = . (b) dx = ln(b) − ln(a).
0 x 2 0 x
3.65 Let µ be a Lebesgue-Stieltjes measure on R. Show that if 0 < µ(E) < ∞ and a > 0, then
Z ∞
1 
µ (x, x + a] ∩ E dx = a.
µ(E) −∞

3.66 Let (Xi , Fi , µi ) (i = 1, 2) be σ-finite measure spaces and let fi ≥ 0 be Fi -measurable. Find a
density function for the product measure (f1 µ1 ) ⊗ (f2 µ2 ) .
3.67 Let (X, F, µ) be σ-finite and f : X → [0, ∞) measurable. Prove that the integral of f is the
“area under the graph,” that is,
Z
f dµ = (µ ⊗ λ){(x, t) : 0 < t < f (x)} = (µ ⊗ λ){(x, t) : 0 < t ≤ f (x)}.

Conclude that if f is integrable, then the graph {(x, t) : t = f (x)} has measure zero.
3.68 (Cavalieri’s principle). For E ∈ B(Rd ) and t ∈ R, define
Et := {x = (x1 , . . . , xd−1 ) ∈ Rd−1 : (x, t) ∈ E}.
Show that Et ∈ B(Rd ) for all t ∈ [a, b] and prove that
h Z b
d d−1 i d−1
λ E∩ R × [a, b] = λ (Et ) dt.
a

Thus the “volume” of the portion of E between the hyperplanes xd = a and xd = b is the
integral from a to b of the “cross-sectional areas” λd−1 (Et ).
3.69 Let (X, F, µ) be σ-finite and f : X → [0, ∞) measurable. Suppose that ϕ : [0, ∞) → [0, ∞) has
a positive continuous derivative and ϕ(0) = 0. Prove that
Z Z ∞
ϕ0 (x)µ f ≥ x dx.

ϕ ◦ f dµ =
X 0
R∞
f p dµ = pxp−1 µ f ≥ x dx, (p ≥ 1).
R 
Deduce, in particular, that X 0

3.70 Let a > 0. Define the d-dimensional simplex in Rd by


n n
X o
S(a, n) = x : xj ≤ a and xj ≥ 0 .
j=1

an
Use Fubini’s theorem and induction to show that λn S(a, n) =

.
n!
3.71 Verify the assertions in 3.5.3. Also, show directly that [0,1]2 |x2 − y 2 |(x2 + y 2 )−2 dλ2 (x, y) = ∞.
R

3.72 Let µ be a translation invariant Lebesgue-Stieltjes measure on B(Rd ) and set E = [0, 1]d . Use
Fubini’s theorem to show that for all B ∈ B(Rd ),
Z Z
1E (x)1B (y) dλd (x) dµ(y) = 1E (y)1B (x) dλd (x) dµ(y),

hence µ(B) = µ(E)λ(B). Conclude that Lebesgue measureRλd is the only σ-finite translation
invariant measure µ on B(Rd ) with µ[0, 1]d = 1. JConsider 1E (x + y)1B (y) dλd (x) dµ(y).K
116 Principles of Analysis

3.6 Applications of Fubini’s Theorem


Gaussian Density
We show that Z ∞
1 2
√ e−t /2 dt = 1. (3.16)
−∞ 2π
The integrand in (3.16) is called the Gaussian density or the standard normal density;
it is the familiar “bell-curve” of statistics. R∞ √
2
By a change of variable, it suffices to prove that 0 e−t dt = 2π . Denoting the latter
integral by I we have
Z ∞ Z ∞ Z ∞ Z ∞
−y 2 −t2 −y 2 2 2
2
I = e e dt dy = e ye−x y dx dy (t = xy)
Z0 ∞ Z ∞ 0 0 0
2 2
= ye−y (1+x ) dy dx, (Fubini’s theorem)
0 0
Z Z ∞
1 ∞
= (1 + x2 )−1 e−u du dx (u = y 2 (1 + x2 ).)
2 0 0

The last expression evaluates to π/4, establishing the formula.


Note that, by a suitable substitution, for m ∈ R and σ > 0 we have
Z ∞   2 
1 1 x−m
√ exp − = 1.
−∞ σ 2π 2 σ

The integrand here is the density of a normal random variable with mean m and standard
deviation σ.

Integration by Parts
Let F and G be distribution functions on R with limx→−∞ F (x) = limx→−∞ G(x) = 0,
and let µ and ν be the corresponding Lebesgue-Stieltjes measures:

µ(x, y] = F (y) − F (x) and ν(x, y] = G(y) − G(x), x < y.

We establish the formula


Z Z
G(x) + G(x−) F (x) + F (x−)
dF (x) + dG(x) = F (b)G(b) − F (a)G(a). (3.17)
(a,b] 2 (a,b] 2

For the verification, let R := (a, b] × (a, b] and define


R1 := {(x, y) : a < x ≤ b, a < y ≤ x}, R2 := {(x, y) : a < y ≤ b, a < x < y}.

Since R is the disjoint union of R1 and R2 ,


  
(µ ⊗ ν)(R1 ) + (µ ⊗ ν)(R2 ) = (µ ⊗ ν)(R) = F (b) − F (a) G(b) − G(a) . (†)
Integration 117
(b, b)

R2

R1

(a, a)

FIGURE 3.2: Rectangles for integration by parts formula.

By Fubini’s theorem,
ZZ Z
(µ ⊗ ν)(R1 ) = 1(a,b] (x)1(a,x] (y) dν(y) dµ(x) = [G(x) − G(a)] dµ(x)
(a,b]
Z
 
= G(x) dµ(x) − G(a) F (b) − F (a) , and
(a,b]
ZZ Z
(µ ⊗ ν)(R2 ) = 1(a,b] (y)1(a,y) (x) dµ(x) dν(y) = [F (y−) − F (a)] dν(y)
(a,b]
Z
 
= F (y−) dν(y) − F (a) G(b) − G(a) .
(a,b]

Adding these equations and using (†), we find after cancellations that
Z Z
F (b)G(b) − F (a)G(a) = G(x) dµ(x) + F (y−) dν(y).
(a,b] (a,b]

Since F and G play symmetrical roles,


Z Z
F (b)G(b) − F (a)G(a) = G(x−) dµ(x) + F (y) dν(y).
(a,b] (a,b]

Averaging yields (3.17).


A simpler formula is available if F and G have no common points of discontinuity. In
this case, G(x−) 6= G(x) iff F (x−) = F (x) iff µ{x} = 0. Since the R set S of such points is
countable, µ(S) = 0, hence the first integral on the left in (3.17) is (a,b] G dF . Similarly, the
R
second integral is (a,b] F dG. Therefore,
Z Z
G(x) dF (x) + F (x) dG(x) = F (b)G(b) − F (a)G(a).
(a,b] (a,b]

For a special case, suppose that


R xg is continuously differentiable
R x and zero outside a bounded
interval. Taking first G(x) = −∞ (g 0 )+ (t) dt then G(x) = −∞ (g 0 )− (t) dt in the preceding
formula and then subtracting we obtain
Z Z
g(x) dF (x) + g 0 (x)F (x)dx = 0.
R R
Rx
Taking F (x) = −∞
f (t) dt for a continuously differentiable f we obtain the formula
Z Z
g(x)f 0 (x) dx = − g 0 (x)f (x) dx.
R R
118 Principles of Analysis

Spherical Coordinates

Define open sets U, V ⊆ Rd by



U := (0, ∞) × (0, π)d−2 × (0, 2π) and V := Rd \ Rd−2 × [0, ∞) × {0} ,

and define a transformation ϕ : U → V from spherical to rectangular coordinates by

x = ϕ(s), x := (x1 , . . . , xd ), s = (r, θ1 , . . . , θd−1 ), where

x1 = r cos θ1 , x2 = r sin θ1 cos θ2 , . . . , x3 = r sin θ1 sin θ2 cos θ3 . . . ,


(3.18)
xd−1 = r sin θ1 · · · sin θd−2 cos θd−1 , and xd = r sin θ1 · · · sin θd−2 sin θd−1 .
Clearly ϕ is C ∞ on U . Moreover, ϕ is a bijection, that is, for
qeach (x1 , . . . , xn ) ∈ V the system
Pd 2
(3.18) has a unique solution. To see this, note that r = j=1 xj and θ1 = arccos(x1 /r),
hence (3.18) has a unique solution for d = 2. Setting

yj = xj /(r sin θ1 ), 2 ≤ j ≤ d,

we may assume by induction that the reduced system

y2 = cos θ2 , y3 = sin θ2 cos θ3 , . . . , yd−1 = sin θ2 · · · sin θd−2 cos θd−1 ,


yd = sin θ2 · · · sin θd−2 sin θd−1

has a unique solution (θ2 , . . . , θd−1 ). Then (3.18) has the unique solution (r, θ1 , . . . , θd−1 ).
By standard properties of determinants and a reduction argument,

det ϕ0 (s) = rd−1 sind−2 θ1 sind−3 θ2 · · · sin2 θd−3 sin θd−2 .

Since the determinant is positive on U , ϕ has a C ∞ inverse on U by the inverse function


theorem. By the change of variables theorem and Fubini’s theorem, if f is Lebesgue mea-
surable on Rd and either f ≥ 0 or f is integrable, then since V differs from Rd by a set of
measure zero we have for s = (r, θ1 , . . . , θd−1 ),
Z Z
d
f dλ = (f ◦ ϕ)(s)|rd−1 sind−2 θ1 · · · sin2 θd−3 sin θd−2 |dr dθ1 · · · dθd−1 . (3.19)
Rd U

Volume of a d-Dimensional Ball

 Crd(x) denote the closed ball in R with center x and radius r. We show
d d
For d ≥ 1, let
d d
that λ Cr (x) = r αd , where
 

 (2π)d/2 
 if d is even, 

d(d − 2) · · · 4 · 2
αd = (d−1)/2 = volume of C1d (0) in Rd . (3.20)

 2(2π) 

 if d is odd 
d(d − 2) · · · 3 · 1

By translation invariance and the dilation property of Lebesgue measure, λd Crd (x) =

rd λd C1d (0) , hence it suffices to establish the formula for the case r = 1 and x = 0, which
is the version expressed in (3.20).
To simplify notation, for 1 ≤ k ≤ d let C k (r) := Crk (0) and let 1k (r; x1 , . . . , xk ) denote
the indicator function of C k (r). Formula (3.20) is easily verified for d = 1 and 2, so we
assume that d > 2. Since

C d (1) = {(x1 , . . . , xd ) : x23 + · · · + x2d ≤ 1 − x21 − x22 , (x1 , x2 ) ∈ C 2 (1)},


Integration 119

by Fubini’s theorem we have


Z Z q 
λ(C d (1)) = 12 (1; x1 , x2 ) 1d−2 1 − x21 − x22 ; x3 , . . . , xd dλd−2 (x3 , . . . , xd ) dx1 dx2 .
R2 Rd−2

The inner integral is


 q 
d−2

λ C 2 2
1 − x1 − x2 = (1 − x21 − x22 )(d−2)/2 λd−2 C d−2 (1) ,

hence, changing to polar coordinates,


Z
 
λd C d (1) = λd−2 C d−2 (1) (1 − x21 − x22 )(d−2)/2 dx1 dx2
x21 +x22 ≤1
Z 2π Z 1
d−2

=λ C (1) (1 − r2 )(d−2)/2 r dr dθ
0 0
2π d−2 d−2 
= λ C (1) .
d
Iterating, we obtain (3.20).

Integration of Radial Functions

Let S d−1 := {x ∈ Rd : |x| = 1}, where |x| is the Euclidean norm of x. The theorem in
this subsection asserts that the Lebesgue integral of a function on Rd may be calculated
by a two-stage process, integrating first over S d−1 with respect to a surface measure µ and
then radially outward. The surface measure is constructed as follows: Set Rd∗ := Rd \ {0}
and define a mapping

T : (0, ∞) × S d−1 → Rd∗ by T (r, x) = rx.

Then T is continuous with continuous inverse

T −1 : Rd∗ → (0, ∞) × S d−1 , T −1 (x) = (|x|, x/|x|).

Now define a measure µ on B(S d−1 ) by



µ(B) := d · λd T (0, 1] × B , B ∈ B(S d−1 ).

We then have
3.6.1 Theorem. If f : Rd∗ → K is Borel measurable, then
Z Z ∞Z
f (x) dλd (x) = rd−1 f (rx) dµ(x) dr
Rd
∗ 0 S d−1

in the sense that if one side of the equation is defined, then so is the other and equality holds.
Proof. Define a measure ρ on B(0, ∞) by dρ := rd−1 dλd . By Fubini’s theorem, the desired
equation may be written
Z Z
f (x) dλd (x) = (f ◦ T )(r, x) d(ρ × µ)(r, x).
Rd
∗ (0,∞)×S d−1
120 Principles of Analysis

By 3.2.15, this is equivalent to λd = T (ρ ⊗ µ) or T −1 (λd ) = ρ ⊗ µ. By the uniqueness theorem


for product measures it therefore suffices to show that
 
λd T (A × B) = ρ(A)µ(B), A ∈ B(0, ∞)), B ∈ B(S d−1 ).

Since the collection of intervals is a π-system, by the measure uniqueness theorem it suffices
to take A = (a, b]. The above equation then reduces to

λd T ((a, b] × B)) = d−1 (bd − ad ) µ(B).

But this follows from the dilation property of λd , using the relations
    
T (a, b] × B = T (0, b] × B − T (0, a] × B , T (0, c] × B = cT (0, 1] × B .

Theorem 3.6.1 is useful for calculating integrals of radial functions, that is, functions f
on Rd of the form f (x) = g(|x|).
3.6.2 Corollary. Let g be a Borel function on (0, ∞). Then
Z Z ∞
g(|x|) dλ(x) = dαd rd−1 g(r) dr
Rd
∗ 0

whenever the side is defined, where αd is given in (3.20).


Proof. Let f (x) = g(|x|) and note that f (rx) = g(r) on S d−1 . By the theorem,
Z Z
d−1
 ∞ d−1
g(|x|) dλ(x) = µ S r g(r) dr.
Rd
∗ 0

2
Taking g(r) = e−r we have
Z Z ∞
−|x|2 2 π d/2
e dx = µ(S d−1
) rd−1 e−r dr = µ(S d−1 ) ,
Rd
∗ 0 dαd

where the last equality is from Ex 3.74. By Fubini’s theorem and (3.16), the integral on the
left is π d/2 . Therefore, µ(S d−1 ) = dαd , completing the proof.
Corollary 3.6.2 may be used to establish the integrability of certain functions on Rd :
3.6.3 Example. Let f (x) = (1 + c|x|s )−t , where c, s, t > 0. Then
Z ∞ Z ∞ Z 1 Z ∞
rd−1 1
(dαd )−1 f dλ = dr ≤ rd−1 dr + c−t dr.
−∞ 0 (1 + crs )t 0 1 rst−d+1

Hence if st > d, then f is integrable on Rd . ♦

Surface Area of a d-Dimensional Ball


We use Corollary 3.6.2 to derive the following formula for the surface area of the sphere
S = Srd−1 = {x ∈ Rd : |x| = r}:

area(S) = drd−1 αd = dr−1 λd C d (0, r) , (3.21)

where αd is given in (3.20).


Integration 121

As a starting point, we take as the definition of the surface area of a graph


{x ∈ Rd : xd = h(x1 , . . . , xd−1 ), (x1 , . . . , xd−1 ) ∈ U }, U ⊆ Rd−1 open,
the integral Z p
1 + |∇h(u)|2 dλd−1 (u), (†)
U
where h is C 1 on U . Let H := {x ∈ S : xd > 0}, the upper hemisphere of S, and for
0 < t < r let Ht denote the part of H for which (x21 + · · · x2d−1 )1/2 < t. Then Ht is the graph
of p
ht (u1 , . . . , ud−1 ) = r2 − |u|2 , |u| = (u21 + · · · u2d−1 )1/2 < t,
hence from (†)
Z Z t
2

2 −1/2 sd−2
area(Ht ) = r r − |u| du = r(d − 1)αd−1 √ ds,
|u|<t 0 r2 − s2
the last equality by 3.6.2. By the substitution s = xr, we have
Z t/r
xd−2
area(Ht ) = (d − 1)rd−1 αd−1 √ dx.
0 1 − x2
2
Therefore, defining area(H) := limt→r area(Ht ), we have
Z 1
xd−2
area(H) = (d − 1)rd−1 αd−1 √ dx.
0 1 − x2
By Exercises 3.75 and 3.76, the integral on the right is
αd−2 d αd
= · .
2αd−3 d − 1 2 · αd−1
(Here we assume d ≥ 3, the desired formula for the cases d = 1, 2 being clear.) Therefore,
area(S) = 2 · area(H) = drd−1 αd . Recalling that rd αd is the volume of the ball of radius r
in Rd , (3.21) follows.

Exercises
3.73 Let F be a continuous distribution function on R with finite limits

α := lim F (t) and β = lim F (t).


t→−∞ t→∞
R
Calculate F (t) dF (t) in terms of α and β.
Z ∞
2 π n/2
3.74 [↑ 3.40] Establish the formula xn−1 e−x dx = .
0 nαn
3.75 Show that α1 = 2 and α2 = π. Define α0 = 1. Prove that for n ≥ 3,
(n − 1)(n − 3) · · · 2 αn n − 1 αn−2
= = · if n is odd
n(n − 2) · · · 3 2αn−1 n 2αn−3
(n − 1)(n − 3) · · · 3 αn n − 1 αn−2
= = · if n is even.
n(n − 2) · · · 2 παn−1 n παn−3
Thus for all n ≥ 3,
αn n − 1 αn−2
= ·
αn−1 n αn−3
2 This may be justified from general measure theory on surfaces.
122 Principles of Analysis
1
xn
Z
αn
3.76 [↑ 3.39] Show that for n ≥ 1, √ dx = , where α0 := 1.
0 1 − x2 2αn−1

3.77 Let 1 ≤ d < m, U ⊆ Rd open and f : U → Rm such that |f (x) − f (y)| ≤ C|x − y|p , where
C > 0 and p > d/m. Verify the following to prove that λm f (U ) = 0. What if p ≤ d/m?
(a) It suffices to prove that λm f (I) = 0 for a d-dimensional interval I = [a, b] ⊆ U .


(b) For fixed n ∈ N and each k form the partition Pk,n = {ak + j(bk − ak )/n : j = 0, . . . , n} of
the kth coordinate interval [ak , bk ] of I, k = 1, . . . , d. Let Jk ∈ Pk,n , J := J1 × · · · × Jd and y
the midpoint of J. Then for all x ∈ J
p/2
|f (x) − f (y)| ≤ Cn−p (a1 − b1 )2 + · · · + (ad − bd )2 := M n−p .


(c) λm (f (J)) ≤ αd (n−p M )m .


(d) λm f (I) ≤ nd−mp M 0 .


3.78 Show that the measure µ in 3.6.1 satisfies L(µ) = | det(L−1 )| µ for any 1-1 linear transformation
L on Rd for which |L(x)| = |x|. In particular, µ is invariant under rotations.

3.79 Let M , a, and ε be positive constants. Suppose f is Borel measurable on Rd and satisfies
(
M |x|ε−d if |x| ≤ a,
|f (x)| ≤
M |x|−ε−d if |x| > a.

Prove that f is integrable on Rd .

3.80 Let 0 ≤ a < b ≤ ∞ and set A(a, b) = {x ∈ Rd : a < |x| < b}. Prove that
Z Z b Z
f (x) dλd (x) = rd−1 f (rx) dµ(x) dr,
A(a,b) a S d−1

where µ is the measure in 3.6.1.


Z π Z π
dαd
3.81 [↑ (3.19)] Show that ··· (sind−2 θ1 )(sind−3 θ2 ) · · · (sin θd−2 ) dθ1 · · · dθd−2 = .
0 0 2π
Chapter 4
Lp Spaces

In this chapter we examine the properties of spaces of measurable functions f for which |f |p
(p > 0) is integrable, the so-called Lp spaces. These are among the most important examples
of Banach spaces. In particular, the case p = 2 is of critical importance in Fourier analysis.

Throughout the chapter, unless otherwise stated,


(X, F, µ) denotes an arbitrary measure space.

4.1 Definition and General Properties


The Lp spaces fall naturally into three categories depending on the range of values of p.

The Case 1 ≤ p < ∞


The space of Lp functions on X is defined by

Lp (X, F, µ) := {f : X → K : f is F-measurable and kf kp < ∞}, where


Z 1/p
kf kp := |f |p dµ .

If there is no ambiguity, we write Lp (X), Lp (µ), or Lp instead of Lp (X, F, µ). Note that
L1 (µ) is just the space of µ-integrable functions.
The quantity kf kp is called the Lp norm of f . This terminology is a slight abuse of
language, since the property of positivity of a norm does not always hold. Indeed, kf kp = 0
implies only that f = 0 a.e. We resolve this discrepancy informally by identifying functions
that are equal a.e. This will cause no problems as long as the reader keeps in mind that the
symbol f has the dual interpretation of a function as well as the equivalence class of all
measurable functions equal a.e. to f . A precise resolution may be given in terms of quotient
spaces. (See Ex. 8.56.)
The following inequality will be needed to establish that k · kp is indeed a norm (subject
to the aforementioned convention of identifying functions that are equal a.e.).
4.1.1 Lemma. Let a, b > 0 and 0 < t < 1. Then at b1−t ≤ ta + (1 − t)b, equality holding iff
a = b.
Proof. Equality clearly holds if a = b. Assume a < b and set x = ta + (1 − t)b. To prove
that at b1−t < x we use the strict concavity of ln x established as follows: By the mean value
theorem there exist y ∈ (a, x) and z ∈ (x, b) such that

ln b − ln x 1 1 ln x − ln a
= < = .
b−x z y x−a

123
124 Principles of Analysis

Solving for ln x we have


b−x x−a
ln a + ln b < ln x.
b−a b−a
Since x−a = (1−t)(b−a) and b−x = t(b−a), the last inequality becomes ln at +ln b1−t < ln x,
proving the lemma.
We may now prove the following fundamental inequality:
4.1.2 Hölder’s Inequality. Let 1 < p, q < ∞ with p−1 + q −1 = 1. If f ∈ Lp and g ∈ Lq ,
then f g ∈ L1 and kf gk1 ≤ kf kp kgkq . Moreover, equality holds iff there exist nonnegative
constants a and b, not both zero, such that a|f |p = b|g|q a.e.
Proof. If the right side of the desired inequality holds then either f or g is zero a.e. and the
inequality hold trivially. So assume kf kp kgkq > 0. Fix x and set
|f (x)|p |g(x)|q
F (x) = p and G(x) = .
kf kp kgkqq
By the lemma with t = 1/p,
|f (x)| |g(x)| F (x) G(x) |f (x)|p |g(x)|q
· = F (x)1/p G(x)1/q ≤ + = + . (†)
kf kp kgkq p q pkf kpp qkgkqq
Integrating we obtain
Z
1 kf kpp kgkqq
|f (x)g(x)| dµ(x) ≤ p + = 1,
kf kp kgkq pkf kp qkgkqq
which gives the desired inequality.
For the second part of the theorem, we may again assume that kf kp kgkq > 0, since if
kf kp = 0, say, then the desired equality holds with a = 1 and b = 0. With this assumption
we see that kf gk1 = kf kp kgkq iff the integral of the right side of (†) equals the integral of
the left side iff the two sides are equal a.e. iff F (x) = G(x) a.e. (by the lemma).
The numbers p and q in the theorem are called conjugate exponents. In the special
case p = q = 2, the inequality in 4.1.2 is known as the Cauchy-Schwarz inequality.
4.1.3 Minkowski’s Inequality. Let f, g ∈ Lp (µ) (1 < p < ∞). Then f + g ∈ Lp and
kf + gkp ≤ kf kp + kgkp . Moreover, equality holds iff there exist nonnegative constants a and
b not both zero such that af = bg a.e.
Proof. Since |f + g|p ≤ 2p (|f |p + |g|p ), f + g ∈ Lp . For the inequality, apply Hölder’s
inequality to the conjugate exponents p and q := p/(p − 1) to obtain
Z Z Z
p
kf + gkp = |f + g|p ≤ |f | |f + g|p−1 + |g| |f + g|p−1
Z 1/p Z 1/q Z 1/p Z 1/q
p p p p
≤ |f | |f + g| + |g| |f + g|
p−1
= (kf kp + kgkp ) kf + gkp , (α)
which is equivalent to kf + gkp ≤ kf kp + kgkp .
Now suppose that kf + gkp = kf kp + kgkp . Then the inequalities in (α) are equalities.
From the second of these we have
Z Z 1/p Z 1/q
|f | |f + g|p−1 = |f |p |f + g|p
Lp Spaces 125

and Z Z 1/p Z 1/q


p−1 p p
|g| |f + g| = |g| |f + g| ,

hence, by the second part of 4.1.2, there exist nonnegative constants a1 , b1 not both zero
and nonnegative constants a2 , b2 not both zero such that

a1 |f |p = b1 |f + g|p and a2 |g|p = b2 |f + g|p a.e. (β)

Now, if f + g = 0 a.e., then 0 = kf kp + kgkp , hence 1 · f = 0 = 1 · g a.e. Also, if f = 0 a.e.,


then 1 · f = 0 · g, and similarly if g = 0 a.e. Thus we may suppose that µ{f + g 6= 0} > 0,
µ{f 6= 0} > 0, and µ{g = 6 0} > 0. It follows that none of the constants aj and bj is zero and
so by (β)
|f | = a|f + g| = b|g| a.e., (γ)
for suitable positive constants a and b. Since the first inequality in (α) is an equality,

|f | |f + g|p−1 + |g| |f + g|p−1 = |f + g|p a.e.,

hence
|f + g| = |f | + |g| = (1 + b)|g| a.e. on the set E := {f + g 6= 0}.
Therefore,

1 + f = 1 + b = 1 + f a.e. on E.
g g

It follows that f /g is real and nonnegative and so f = bg a.e. on E. But by (γ), f = g = 0


a.e. on E c . Therefore, f = bg a.e. on X.
4.1.4 Theorem. Lp is a linear space over K with respect to pointwise addition and scalar
multiplication of functions. Moreover, k·kp is a norm relative to which Lp is a Banach space.
Proof. That Lp is closed under scalar multiplication is clear. The triangle inequality for the
case 1 < p < ∞ is Minkowski’s inequality. In particular, Lp is closed under addition and so
is a linear space. The remaining properties of a norm are clear. It remains to prove that Lp
is complete. For this we use the series
P∞characterization of completeness given in 0.4.3. Let
(fn ) be a sequence in Lp such that n=1 kfn kp < ∞ and set
n
X ∞
X
gn = |fk | and g := |fk |,
k=1 k=1
Pn
so that gn ↑ g and kgn kp ≤ k=1 kfk kp . By the monotone convergence theorem, kgn kp ↑
P∞
kgk , hence kgkp ≤ k=1 kfk kp < ∞, Therefore, g p is integrable, hence the series f :=
P∞p
k=1 fk is finite a.e. Defining f to be zero where the series fails to converge, we see that f
is measurable and satisfies
Xn p n
X p

f − f ≤ g p
and lim f − f
k = 0 a.e.
k n
k=1 k=1
Pn
By the dominated convergence theorem, limn kf − k=1 fk kp = 0.

The following generalization of Minkowski’s inequality will be needed in Chapter 6.


126 Principles of Analysis

4.1.5 Minkowski’s Integral Inequality. Let (X, F, µ) and (Y, G, ν) be σ-finite measure
spaces, f a nonnegative F ⊗ G-measurable function, and 1 ≤ p < ∞. Then
Z Z p 1/p Z Z 1/p
f (x, y) dν(y) dµ(x) ≤ f (x, y)p dµ(x) dν(y), (4.1)

provided the integrals in these expressions are finite.


Proof. For p = 1 the inequality is actually equality and is a consequence
R of Fubini’s theorem.
Now let 1 < p < ∞ and let q be conjugate to p. Set h(x) := f (x, y) dν(y), so that the left
side of (4.1) is khkp . We may assume that khkp > 0. Set
(
1−p
khkp h(x)p−1 if h(x) 6= 0,
g(x) :=
0 otherwise.

Then
Z
q−qp −p
g(x)q = khkp h(x)qp−q = khkp h(x)p , kgkq = 1, and hg dµ = khkp . (†)

By Fubini’s theorem and Hölder’s inequality,


Z ZZ Z Z 1/p
hg dµ = f (x, y)g(x) dµ(x) dν(y) ≤ f (x, y)p dµ(x) kgkq dν(y),

which, by virtue of (†), reduces to the desired inequality.

The Case p = ∞
The space of L∞ functions on X is defined by

L∞ (X, F, µ) = {f : X → K : f is F-measurable and kf k∞ < ∞}, where


kf k∞ := sup{t : µ{|f | > t} > 0}.

The quantity kf k∞ is called the L∞ norm of f . As in the case p < ∞, for kf k∞ to be


an actual norm we must (and do) identify functions that agree a.e. (see 4.1.7 below). The
following properties of k·k∞ will be needed.
4.1.6 Proposition. Let f be measurable. Then
(a) |f | ≤ kf k∞ a.e.
(b) kf k∞ = inf{t > 0 : |f | ≤ t a.e.}.
(c) f ∈ L∞ iff there exists 0 < t < ∞ such that |f | ≤ t a.e.
S
Proof. (a) We may assume that kf k∞ < ∞. Set An := {|f | > kf k∞ + 1/n} and A = n An .
By definition of kf k∞ , µ(An ) = 0 for all n, hence µ(A) = 0. Since |f | ≤ kf k∞ + 1/n on Ac ,
|f | ≤ kf k∞ a.e.
(b) Let α denote the infimum. By (a), α ≤ kf k∞ . For the reverse inequality, let |f | ≤ t
a.e. If kf k∞ > t, there would exist x with t < x ≤ kf k∞ and µ(|f | > t) ≥ µ(|f | > x) > 0,
impossible. Thus kf k∞ ≤ t, and taking infima over all such t yields kf k∞ ≤ α.
Part (c) follows from (a) and (b).
4.1.7 Theorem. L∞ is a linear space over K with respect to pointwise addition and scalar
multiplication of functions. Moreover, k·k∞ is a norm relative to which L∞ is a Banach
space.
Lp Spaces 127

Proof. Let f, g ∈ L∞ and c ∈ K. The inequalities

|cf | = |c| |f | ≤ |c| kf k∞ and |f + g| ≤ |f | + |g| ≤ kf k∞ + kgk∞ (a.e.)

show that L∞ is a linear space and that kf k∞ satisfies the triangle inequality. Moreover,
from 4.1.6, kf k∞ ≥ 0, equality holding iff f = 0 a.e.
P∞ To see that L∞ is complete, we use 0.4.3 again. Let (fn ) be a sequence in L∞ such that
n=1 kfn k∞ < ∞. By 4.1.6, the sets Nk := {|fk | > kfk k∞ } have measure zero, hence so
S P∞
does N := k Nk . Moreover, the series n=1 |fn | converges on N c , hence the function
P∞ P∞
f := 1N c n=1 fn is finite a.e., measurable, and is a version of n=1 fn . Since
n
X X X
f −
fk = fk ≤ kfk k∞ a.e.,

k=1 k>n k>n

by 4.1.6(b) we have
n
X X

f − f ≤ kfk k∞ .
k
k=1 ∞ k>n
P∞
This shows that f ∈ L∞ and that n=1 fn converges to f in the L∞ form.
1
Hölder’s inequality may now be extended to the case 1 ≤ p ≤ ∞, where ∞ := 0:

kf gk1 ≤ kf k1 kgk∞ , f ∈ L1 , g ∈ L∞ .

The verification is left to the reader.

The Case 0 < p < 1


The definitions of kf kp and Lp (µ) for the case 1 ≤ p < ∞ clearly make sense for
0 < p < 1. However, for such p the notion of conjugate exponents is not possible, hence
Hölder’s inequality is not available. Furthermore, it easy to see that Minkowski’s inequality
does not hold. Indeed, if A and B are disjoint sets of positive measure a and b, respectively,
then the triangle inequality for f = 1A and g = 1B reduces to (a + b)1/p ≤ a1/p + b1/p ,
which is clearly false. On the other hand, it is the case that
p p p
kf + gkp ≤ kf kp + kgkp ,

(Ex. 4.2), which implies that Lp (µ) is a linear space and d(f, g) = kf − gk is a metric. One
may prove, as in the case p ≥ 1, that Lp (µ) is complete in this metric.

`p -Spaces
An important special case of an Lp space is obtained by taking X = N and µ = counting
measure on N. In this case we write `p (N) instead of Lp (N). Thus for 1 ≤ p < ∞,
n ∞
X o
p p
` (N) := x := (xn ) : xn ∈ K, kxkp = |xn |p < ∞ ,
n=1

and for p = ∞
n o
`∞ (N) := x := (xn ) : xn ∈ K, kxk∞ = sup |xn | < ∞ .
n

Note that K may be identified with a linear subspace of ` (N) and, as such, inherits the `p
d p

norm. The case p = 2 is simply the Euclidean norm.


128 Principles of Analysis

Exercises
4.1 Let a, b > 0 and p ≥ 1. Prove that (a + b)p ≤ 2p−1 (ap + bp ). JConsider ϕ(x) = xp .K
4.2 Let a, b > 0 and 0 < p < 1. Prove that (a + b)p ≤ ap + bp . JConsider the function ϕ(x) =
ap + xp − (a + x)p .K
4.3 Show that the mapping (f, g) → f g : L2 (µ) × L2 (µ) → C is continuous in the L2 norm.
R

4.4 Show that f, g ∈ Lp (µ) ⇒ f ∨ g, f ∧ g ∈ Lp (µ).


4.5 Prove Hölder’s equality: If 1 < p < ∞, p−1 + q −1 = 1, and f ∈ Lq (µ), then
Z Z 1/p Z 1/q
|f | · |f |q/p dµ = |f |q dµ |f |q dµ .

4.6 Let fi be measurable and pi > 1 such that n


P
i=1 1/pi = 1 (generalized conjugate exponents).
Prove the generalized Hölder’s inequality
Z Y n Yn
|fi | dµ ≤ kfi kpi .
i=1 i=1

4.7 Let f be continuous and bounded on Rd . Show that kf k∞ = sup{|f (x)| : x ∈ Rd } (relative to
Lebesgue measure).
4.8 Let f : X → C be measurable. The essential range of f is defined as
rane (f ) = {z ∈ C : µ{|f − z| < ε} > 0 for all ε > 0} .
Prove:
(a) rane (f ) is closed and contained in cl f (X).
(b) f = g a.e. ⇒ rane (f ) = rane (g).
T 
(c) rane (f ) = f =g a.e. cl g(X) .
(d) If f ∈ L∞ , then rane (f ) is compact and kf k∞ = sup{x : x ∈ rane (|f |)}.
4.9 Let 1 < p < ∞, 0 < r < 1, and f ∈ Lp (0, ∞), λ . Define g(x, y) := f (x)x−1 sin(xy).


(a) Show that g(·, y) is integrable for each y > 0.


R∞
(b) Define h(y) = 0 g(x, y) dx. Prove that
|h(y + tp ) − h(y)|
lim uniformly in y > 0.
t→0 tr
4.10 Let 1 ≤ p < q < ∞. Prove:
(a) If µ(X) < ∞, then L∞ ⊆ Lq ⊆ Lp ⊆ L1 , where the inclusions may be strict.
(b) `∞ ⊇ `q ⊇ `p ⊇ `1 , where the inclusions are strict.
(c) For 1 ≤ p < q ≤ ∞, neither of the spaces Lp (λ1 ) or Lq (λ1 ) is contained in the other.
4.11 Let µ be a probability
R R measure and let f and g be positive and measurable such that f g ≥ 1.
Prove that f dµ g dµ ≥ 1.
4.12 Let µ(X) < ∞ and f bounded and measurable. Show that limp→∞ kf kp = kf k∞ via the
following steps.
(a) For 1 ≤ q < p < ∞, kf kpp ≤ kf kqq kf kp−q
∞ .

(b) limp→∞ kf kp ≤ kf k∞ .
(c) Assume kf k∞ > 0. Let 0 < r < kf k∞ and r < t ≤ kf k∞ such that µ(Et ) > 0, where
Et = {|f | > t} > 0. Then limp→∞ kf kp ≥ r.
(d) Conclude that limp→∞ kf kp ≥ kf k∞ .
Lp Spaces 129

4.13 Let 1 ≤ p, q, r < ∞, r−1 = p−1 + q −1 . Prove that if f ∈ Lp and g ∈ Lq , then f g ∈ Lr and
kf gkr ≤ kf kp kgkq .

4.14 Let f and g be nonnegative and measurable and 0 < p < q < r < ∞. Prove:
Z r−p Z r−q Z q−p
(a) f g q dµ ≤ f g p dµ f g r dµ .
Z r Z r−1 Z 
(b) f g dµ ≤ f dµ f g r dµ for r > 1.

4.15 Let 0 ≤ p < r < q ≤ ∞. Prove the following:


(a) Lr ⊆ Lp + Lq . JFor f ∈ Lr , let A = {|f | > 1} and consider f · 1A and f · 1Ac .K
(b) Lp ∩ Lq ⊆ Lr and for f ∈ Lp ∩ Lq ,

r−1 − q −1 p−1 − r−1


kf kr ≤ kf ksp kf ktq , where s := −1 −1
and t := −1 .
p −q p − q −1

Js + t = 1. If q = ∞, then rs/p = 1; if q < ∞, then p/sr and q/tr are conjugate exponents.K
(c) kf kr ≤ max{kf kp , kf kq }.
(d) If f ∈ Lp ∩ L∞ , then limr→∞ kf kr = kf k∞ . JUse (b) for one inequality. For the reverse
inequality, note that kf krr ≥ M r µ{|f | ≥ M }.K

4.16 Let T : L1 (µ) → L1 (µ) be a continuous linear transformation, and let g(t, x) be continuous in
t ∈ [a, b] for each x ∈ X and measurable in x ∈ X for each t. and set gt = g(t, ·). Suppose that
Rb
there exists an integrable function h ≥ 0 such that |g(t, x)| ≤ h(x) for all t and x. Let a gt dt
Rb
denote the function x 7→ a gt (x) dt. Assume that [T gt ](x) is continuous in t for each x ∈ X.
Rb
Carry out the following to show that a gt dt is in L1 and
Z b Z b
T gt dt = T gt dt. (†)
a a

(a) Let (Pn , tn ) be any sequence of tagged partitions of [a, b] with kPn k → 0 and let S(g, Pn , tn )
denote the function X
x 7→ S(g(·, x), Pn , tn ) = g(tj,n , x) |I|
I∈Pn

Then |S(g(·, x), Pn , tn )| ≤ (b − a)h(x).


Rb Rb
(b) a gt dt ∈ L1 and limn S(g, Pn , tn ) = a gt dt in the L1 norm.
Rb
(c) T S(g, Pn , tn ) = I∈Pn T gtj,n |I| → T a gt dt in the L1 norm.
P  

4.2 Lp Approximation
In this section we prove three approximation theorems that are useful in establishing
certain properties of Lp functions, as illustrated by Corollary 4.2.3 below.

Approximation by Simple Functions


4.2.1 Theorem. Let (X, F, µ) be a measure space and 1 ≤ p ≤ ∞. For each f ∈ Lp (µ) and
ε > 0 there exists a simple function fs such that |fs | ≤ |f | and kf − fs kp < ε. Moreover, if
p < ∞, then fs may be chosen to vanish outside a set of finite measure.
130 Principles of Analysis

Proof. Let {fn } be a sequence of simple functions such that fn → f and |fn | ≤ |f |
(2.3.1). The case p = ∞ follows from part (c) of that theorem. Assume that p < ∞. Then
|fn − f |p ≤ 2p+1 |f |p , hence kfn − f kp → 0 by the dominated convergence theorem. The first
assertion of the theorem
Pm follows by taking fs = fn for sufficiently large n. For the second
assertion, let fs = k=1 ak 1Ak , where ak 6= 0 and the sets Ak are disjoint. Then
Z m
X
|fs |p dµ = |ak |p µ(Ak ).
k=1
Sm
Since the integral is finite and ak 6= 0, µ(Ak ) < ∞. Therefore, fs = 0 outside k=1 Ak , a set
of finite measure.

Approximation by Continuous Functions

4.2.2 Theorem. Let 1 ≤ p < ∞, f ∈ Lp (λd ), and ε > 0. Then there exists a continuous
function g vanishing outside a bounded interval such that kf − gkp < ε.
Pm
Proof. By 4.2.1, we may assume that f is simple with standard representation k=1 ak 1Ak ,
where ak 6= 0 and λd (Ak ) < ∞. We may further assume that Ak is bounded, otherwise
d d
replace Ak by Ak ∩ I, where I is a bounded
Pminterval with λ (Ak ) − λ (Ak ∩ I) sufficiently
small so that f may be approximated by k=1 ak 1Ak ∩I .
Now let α > 0. By 1.8.1 we may choose for each k a compact set Ck and a bounded open
set Uk such that Ck ⊆ Ak ⊆ Uk and λd (Uk \ Ck ) < α. By 0.3.10, there exists a continuous
function gk : Rd → [0, 1] such that gk = 1 on Ck and gk = 0 on Ukc . Since gk = 1Ak on
Ukc ∪ Ck = (Uk \ Ck )c ,
Z
p p
kak 1Ak − ak gk kp = |ak | |1Ak − gk |p dλd ≤ 2p |ak |p λ(Uk \ Ck ) < (2M )p α,
Uk \Ck
Pm
where M := supk |ak |. The function g := k=1 ak gk is continuous, and by the triangle
inequality kf − gkp < S2mM α1/p . We then have kf − gkp < ε for sufficiently small α.
Furthermore, g = 0 on k Uk , which is contained in a bounded interval.
Here is an important application of 4.2.2.
4.2.3 Corollary. Let 1 ≤ p < ∞, and for y ∈ Rd let Ty be the translation operator
Ty f (x) = f (x + y). Then for each f ∈ Lp (Rd , λ), limy→y0 kTy f − Ty0 f kp = 0.

Proof. By translation invariance of the integral, we may take y 0 = 0. By the theorem, given
ε > 0 there exists continuous function g such that kf − gkp < ε and g = 0 on the complement
of some interval [a, b]. By translation invariance, kTy f − Ty gkp = kf − gkp , hence

kTy f − f kp ≤ kTy f − Ty gkp + kTy g − gkp + kg − f kp < 2ε + kTy g − gkp .

It now suffices to prove that limy→0 kTy g − gkp = 0. Let c = (1, . . . , 1) and let y n → 0 such
that |yn,j | < 1 (1 ≤ j ≤ d). For x ∈ [a − c, b + c]c , x + y n ∈ [a, b]c , hence g(x + y n ) = 0.
Thus if M is a bound for |g|, then

|g(x + y n ) − g(x)|p ≤ 2M p 1[a−c,b+c] , x ∈ Rd .

By continuity of g, the
R left side of the inequality tends to zero so, by the dominated
convergence theorem, |g(x + y n ) − g(x)|p dλd → 0.
Lp Spaces 131

Approximation by Step Functions


Pn
A step function on Rd is a simple function of the form i=1 ai 1Ij , where Ij is a bounded
open d-dimensional interval. The following result complements 4.2.2.
4.2.4 Theorem. Let 1 ≤ p < ∞, f ∈ Lp (λd ), and ε > 0. Then there exists a step function
h vanishing outside a bounded interval such that kf − hkp < ε.
Proof. Refer to the proof of 4.2.2. Let β > 0. Since Uk is a countable disjoint union of open
intervals, we may choose disjoint open intervals with union Vk ⊆ Uk such that λ(Uk \ Vk ) < β.
Now,
Ak 4 Vk = (Ak \ Vk ) ∪ (Vk \ Ak ) ⊆ (Uk \ Vk ) ∪ (Uk \ Ck ),
hence Z
k1Ak − 1Vk kpp = |1Ak − 1Vk |p dλd = λd (Ak 4 Vk ) < α + β.
Pm
The function h := k=1 ak 1Vk is a step function, and by the triangle inequality we have
kf − hkp < 2mM (α + β)1/p . Thus for sufficiently small α and β, kf − hkp < ε.

Exercises
4.17 [↑ 3.2.16] Let Dr be the dilation operator Dr f (x) = f (rx) on Lp (Rd ), 1 ≤ p < ∞. Show that
limr→s kDr f − Ds f kp = 0, r, s > 0.

4.18 Let f ∈ L1 (R) and let g be bounded with bounded continuous derivative. Prove that
Z
lim f (x)g 0 (nx) dλ(x) = 0.
n

4.19 Show that the last assertion of 4.2.1 fails for the case p = ∞. Show also that 4.2.2 does not
hold for p = ∞.

4.3 Lp Convergence
Let fn , f ∈ Lp (X, F, µ) (p ≥ 1). Convergence of fn to f in the Lp norm is called Lp
Lp
convergence and is written fn → f . For example, the approximation theorems in the
preceding section may be phrased in terms of Lp convergence. The results in the present
section relate Lp convergence to various modes of convergence considered in §2.4. The case
p = ∞ is easy to treat:
L∞
4.3.1 Theorem. Let fn , f ∈ L∞ . Then fn → f iff there exists a set A of measure zero
a.u.
such that fn → f uniformly on Ac . In particular, fn → f .
L∞
Proof. Let fn → f andSlet An be a set of measure zero such that |fn − f | ≤ kfn − f k∞
on Acn (4.1.6). Set A = n An . Then on Ac , |fn − f | ≤ kfn − f k∞ for all n, hence fn → f
uniformly on Ac .
Conversely, let µ(A) = 0 and fn → f uniformly on Ac . Given ε > 0, choose N so that
|fn − f | ≤ ε on Ac for all n ≥ N . By 4.1.6(b), for such n we have kfn − f k∞ ≤ ε. Therefore,
L∞
fn → f .
The case 1 ≤ p < ∞ is more delicate. We shall need the following lemma.
132 Principles of Analysis
a.e. Lp
4.3.2 Lemma. Let 1 ≤ p < ∞ and f, fn ∈ Lp . If kfn kp → kf kp and fn → f then fn → f .
Proof. From the inequality |fn −f |p ≤ 2p (|fn |p +|f |p ) we have 2p (|fn |p +|f |p )−|fn −f |p ≥ 0.
Moreover,  
lim 2p (|fn |p + |f |p ) − |fn − f |p = 2p+1 |f |p a.e.
n

Thus by Fatou’s lemma


Z Z
p+1 p
 p 
2 |f | dµ ≤ lim 2 (|fn |p + |f |p ) − |fn − f |p dµ
n
Z Z
= 2p+1 |f |p dµ − lim |fn − f |p dµ.
n

R Lp
Therefore, limn |fn − f |p dµ = 0, hence fn → f .
The following result characterizes Lp convergence in terms of convergence in measure.
Lp µ
4.3.3 Theorem. Let 1 ≤ p < ∞ and f, fn ∈ Lp . Then fn → f iff both fn → f and
a.e.
kfn kp → kf kp . In this case, there exists a subsequence fnk → f .

Proof. The necessity follows from the inequalities kf kp − kfn kp ≤ kf − fn kp and
Z Z Z

µ |fn − f | ≥ ε = 1{|fn −f |≥ε} dµ = 1{|fn −f |p ≥εp } dµ ≤ ε−p |fn − f |p dµ.

For the sufficiency, suppose for a contradiction that kfn − f kp 6→ 0. Then there exists
µ
an ε > 0 and an infinite subset S of N such that kfn − f kp ≥ ε for all n ∈ S. Since fn → f
holds for subsequences and since convergence in measure implies a.e. convergence for some
subsequence (2.4.4), we may choose a subsequence (fnk ) of (fn ) with indices in S such that
a.e Lp
fnk → f . But then by 4.3.2, fnk → f , which is impossible by definition of S.
A deeper result is the following, whose proof brings together some earlier results on
convergence.
a.e.
4.3.4 Vitali Convergence Theorem I. Let 1 ≤ p < ∞ and fn , f ∈ Lp such that fn → f .
Lp
Then fn → f iff for each ε > 0 the following conditions hold:
(a) There exists A ∈ F with finite measure such that supn kfn 1Ac kp ≤ ε.

(b) There exists δ > 0 such that E ∈ F and µ(E) < δ ⇒ limn kfn 1E kp ≤ ε.
Lp
Proof. Suppose fn → f . To establish (a), choose m so that kfn − f kp < ε/2 for all n > m.
For such n and any E ∈ F,

kfn 1E kp ≤ kfn − f kp + kf 1E kp ≤ ε/2 + kf 1E kp . (α)

By Ex. 3.17, we may choose E, E1 , . . . , Em ∈ F with finite measure such that

k1E c f kp , k1Enc fn kp < ε/2, n = 1, . . . , m. (β)

Set A = E ∪ E1 ∪ · · · ∪ Em . Then, by (α) and (β), kfn 1Ac kp ≤ ε for all n, verifying (a).
To establish (b), choose δ so that k1E f kp < ε for all E with µ(E) < δ (Ex. 3.13). For
such E and all n,

kfn 1E kp ≤ k(fn − f )1E kp + kf 1E kp ≤ kfn − f kp + ε,


Lp Spaces 133

hence limn kfn 1E k ≤ ε.


Conversely, assume that (a) and (b) hold and set gn = f − fn . For measurable sets E ⊆ A,
1E + 1A\E + 1Ac = 1, hence multiplying by gn and integrating we have

kgn kp ≤ kf 1E kp + kfn 1E kp + kgn 1A\E kp + kf 1Ac kp + kfn 1Ac kp . (γ)

We show that the right side of (γ) may be made arbitrarily small. Enlarging A if necessary,
we may assume by Ex. 3.17 that kf 1Ac kp < ε. By (a) we then have

kf 1Ac kp + kfn 1Ac kp < 2ε for all n.


a.e.
Now let δ be as in (b). Since fn → f , by Egoroff’s theorem there exists a measurable subset
E of A with µ(E) < δ such that gn → 0 uniformly on A \ E. Therefore,

lim kgn 1A\E kp = 0.


n

Finally, applying (b) and Fatou’s lemma, we have


Z Z
kf 1E kpp ≤ lim |fn |p dµ ≤ lim |fn |p dµ ≤ εp .
n E n E

Therefore, by (γ), limn kgn kp ≤ 3ε, proving that kgn kp → 0.


We shall call properties (a) and (b) in the theorem the Vitali convergence conditions.

Exercises
4.20 Let µ be a probability measure and fn ∈ Lp (µ). Prove:
(a) If fn → f uniformly on X, then f ∈ Lp (µ) and fn → f in Lp .
a.u.
(b) If fn → f and the functions fn and f are uniformly bounded, then fn → f in Lp .
a.u.
4.21 Let kfn k∞ ≤ C < ∞ for all n and fn → f . Show that f ∈ L∞ .

4.22 Let 1 ≤ p < ∞ and f, fn ∈ Lp . Show that


a.e. Lp
(a) fn → f does not necessarily imply that fn → f .
Lp a.e.
(b) fn → f does not necessarily imply that fn → f .
µ Lp
(c) fn → f does not necessarily imply that fn → f .
Lq Lp
(d) fn → 0 for all 1 ≤ q < p does not necessarily imply that fn → 0.
Lq Lp
(e) fn → 0 for all q > p does not necessarily imply that fn → 0.
a.e.
4.23 Show that the hypothesis fn → f in the Vitali convergence theorem may be replaced by
µ
fn → f .

*4.4 Uniform Integrability


Throughout this section, (X, F, µ) is a finite measure space.

For a finite measure, additional convergence results may be obtained via the notion of
uniform integrability. The following proposition motivates the definition.
134 Principles of Analysis

4.4.1 Proposition. A measurable function f : X → C is integrable iff


Z
lim |f | dµ = 0. (4.2)
t→∞ {|f |≥t}

Proof. Suppose that f is integrable. Then A := {|f | = ∞} has measure zero. Set R An :=
{|f | > n}. Then |f | ≥ 1An |f | ↓ 1A |f |, so by the dominated convergence theorem, An |f | →
R
1A |f | dµ = 0, which implies (4.2). R
Conversely, suppose that (4.2) holds. Choose t so that {|f |≥t} |f | dµ < 1. Then
Z Z Z
|f | dµ = |f | dµ + |f | dµ ≤ 1 + t · µ(X) < ∞,
{|f |≥t} {|f |<t}

hence f is integrable.
Note that on an infinite measure space a nonzero constant function trivially satisfies (4.2)
yet is not integrable. Thus the sufficiency of the proposition fails on infinite measure spaces.
With the preceding proposition in mind, we say that a family F of measurable functions
f : X → K is uniformly integrable (u.i.), if
Z 
lim sup |f | dµ : f ∈ F = 0. (4.3)
t→∞ {|f |≥t}

By 4.4.1, each member of such a family is integrable. Conversely, by the same proposition,
any finite family of integrable functions is u.i. Moreover, it is trivially the case that any
uniformly bounded family of measurable functions is uniformly integrable.
The following result is sometimes useful for establishing uniform integrability of a family
of functions. The reader should compare the conditions in the theorem with the Vitali
convergence conditions (4.3.4).
4.4.2 Theorem. A family F of measurable functions is u.i. iff the following conditions
hold:
(a) sup{kf k1 : f ∈ F} < ∞.

(b) For each ε > 0 there exists δ > 0 such that sup {kf 1E k1 : f ∈ F} < ε for all E ∈ F
with µ(E) < δ.
Proof. Suppose that F is u.i. Given ε > 0, choose t so that
Z
ε
|f | dµ < for all f ∈ F.
{|f |≥t} 2
Then for E ∈ F and all f ∈ F,
Z Z Z
ε
|f | dµ = |f | dµ + |f | dµ ≤ + t · µ(E).
E E∩{|f |≥t} E∩{|f |<t} 2

Taking E = X establishes (a). To establish (b), choose δ < ε/2t.


Conversely, assume that (a) and (b) hold. Let C := sup{kf k1 : f ∈ F}. Given ε > 0,
choose t0 so that C/t0 < δ, where δ is as in (b). Then for t ≥ t0 and all f ∈ F,
Z
1 C
µ{|f | ≥ t} ≤ |f | dµ ≤ < δ.
t t
R
Applying (b) with E := {|f | ≥ t}, we have {|f |≥t} |f | dµ < ε for all f ∈ F. Therefore, F is
u.i.
Lp Spaces 135

The proof of the following corollary is left as an exercise.


4.4.3 Corollary. Let F and G be families of measurable functions such that G is u.i. Then
the following hold:
(a) {αf + βg : f, g ∈ G} is u.i., α, β ∈ C.
(b) If for each f ∈ F there exists g ∈ G such that |f | ≤ |g| a.e., then F is u.i.
Here is a general method for constructing u.i. functions.
4.4.4 Proposition. Let φ : (0, ∞) → R be Borel measurable such that limx→∞ φ(x)/x = ∞.
If F is a family of measurable functions such that sup{kφ(|f |)k1 : f ∈ F} < ∞, then F is
u.i.
Proof. Let s denote the supremum in the statement of the theorem. Given ε > 0, choose t0
such that
φ(t) sµ(X)
> for all t ≥ t0 .
t ε
R
For such t, |f | ≤ ε(sµ(X))−1 φ(|f |) ≤ εµ(X)−1 on the set {|f | ≥ t}, hence {|f |≥t} |f | dµ ≤ ε
for all f ∈ F and t ≥ t0 .
For example, taking φ(x) = xp (1 < p < ∞) we see that any bounded subset of Lp is
uniformly integral.
Here is the main connection between uniform integrability and Lp convergence.
4.4.5 Vitali Convergence Theorem II. Let f and fn be measurable and 1 ≤ p < ∞.
Lp µ
Then f ∈ Lp and fn → f iff the sequence (|fn |p ) is u.i. and fn → f .
µ
Proof. Suppose that (|fn |p ) is u.i. and fn → f . By 4.4.2 applied to the functions |fn |p , we
see that the Vitali convergence conditions hold. Therefore, if we show that f ∈ Lp , then, by
Lp a.e
Ex. 4.23, fn → f . By 2.4.4, there exists a subsequence (gn ) of (fn ) such that gn → f . By
Fatou’s lemma, Z Z
|f | dµ ≤ lim |gn |p dµ.
p
n
Since the right side is finite (4.4.2(a)), f ∈ Lp .
Lp
Conversely, suppose f ∈ Lp and fn → f . Given ε > 0, choose m such that kf − fn kpp < ε
for all n > m, By 4.4.1, there exists t0 such that
Z
|fn − f |p dµ < ε for n = 1, . . . , m and t ≥ t0 .
{|fn −f |p ≥t}
R
For such t and all n we then have |f − f |p dµ < ε, which
{|fn −f |p ≥t} n
shows that the sequence
p p
(|fn − f | )n is u.i. Since |fn | ≤ 2p (|fn − f |p + |f |p ), it follows from 4.4.3 that (|fn |p )n is u.i.
µ
That fn → f follows from 4.3.3.

Exercises
4.24 Consider Lebesgue measure on B(0, 1]. Show that the sequence of functions fn = n1(0,1/n] is
not u.i. even though the sequence (kfn k1 ) converges.
4.25 Show that {fn } is u.i. iff {fn+ } and {fn− } are u.i.
4.26 Let (fn ) be a sequence of F-measurable functions such that supn |fn |r dµ < ∞ for some r > 1.
R

Show that (fn ) is u.i.


1
R
4.27 Let {fn } be u.i. and set gn := |f1 | ∨ · · · ∨ |fn |. Use Ex. 3.20 to show that n
gn dµ → 0.
136 Principles of Analysis

*4.5 Convex Functions and Jensen’s Inequality


In this section we develop the basic properties of convex functions on intervals (a, b),
where −∞ ≤ a < b ≤ ∞. These properties are then used to establish an important integral
inequality.
A real-valued function ϕ on (a, b) is said to be convex if

ϕ (1 − t)u + tv ≤ (1 − t)ϕ(u) + tϕ(v) for a < u < v < b and t ∈ (0, 1). (4.4)

Strict convexity is defined by replacing weak inequality by strict inequality. Thus a function

a u v b
FIGURE 4.1: A strictly convex function.

is convex iff the line segment connecting any two points on its graph lies above the part of
the graph between the two points. A function f is (strictly) concave if −f is (strictly)
convex.
A function ϕ with an increasing derivative (in particular, a function with a nonnegative
 x = (1 − t)u +
second derivative) is convex. Indeed, if  tv (0 < t < 1) then, by the mean value
theorem, there exist points y ∈ u, x and z ∈ x, v such that

ϕ(x) − ϕ(u) ϕ(v) − ϕ(x)


= ϕ0 (y) ≤ ϕ0 (z) = .
x−u v−x
Solving the inequality for ϕ(x) yields
v−x x−u
ϕ(x) ≤ ϕ(u) + ϕ(v) = (1 − t)ϕ(u) + tϕ(v).
v−u v−u
Analogous results hold for the strict case and the concave case. For example, taking second
derivatives we see that ex , and e−x are strictly convex on R, x ln x and 1/xp (p > 0) are
strictly convex on (0, ∞), and ln x and xp (0 < p < 1) are strictly concave on (0, ∞).
The basic properties of convex functions are summarized in the following theorem.

4.5.1 Theorem. Let ϕ be convex on (a, b).


ϕ(t) − ϕ(z)
(a) For fixed z ∈ (a, b), the difference quotients increase in t on the intervals
t−z
(a, z) and (z, b).
(b) The left- and right-hand derivatives

ϕ(x) − ϕ(u) ϕ(y) − ϕ(u)


ϕ0` (u) = lim− , ϕ0r (u) := lim+
x→u x−u y→u y−u

exist, are nondecreasing, and satisfy ϕ0` (x) ≤ ϕ0r (x).


Lp Spaces 137
ϕ(z) − ϕ(t) ϕ(z) − ϕ(t)
(c) ϕ0` (z) ≥ (z > t) and ϕ0r (z) ≤ (z < t).
z−t z−t
Proof. Let a < u < x < y < v < b. The assertions are a consequence of the following
numbered inequalities, which are verified below.
ϕ(x) − ϕ(u) (1) ϕ(y) − ϕ(u) ϕ(y) − ϕ(u) (2) ϕ(v) − ϕ(y) ϕ(v) − ϕ(x) (3) ϕ(v) − ϕ(y)
≤ , ≤ , ≤ .
x−u y−u y−u v−y v−x v−y
To prove (a), take u = z in (1) and v = z in (3). For (b), observe that because the difference
quotients [ϕ(x) − ϕ(u)]/(x − u) decrease as x ↓ u, ϕ0r (u) exists in R, and by (1) and (2)
ϕ(v) − ϕ(y)
ϕ0r (u) ≤ < ∞.
v−y
Letting v ↓ y shows that ϕ0r (u) ≤ ϕ0r (y). Therefore, ϕ0r is increasing. Similarly, since the
difference quotients [ϕ(v) − ϕ(y)]/(v − y) increase as y ↑ v, ϕ0` (v) exists in R and by (1) and
(2),
ϕ(x) − ϕ(u)
ϕ0` (v) ≥ > −∞.
x−u
Letting u ↑ x shows that ϕ0` (v) ≥ ϕ0` (x). Therefore, ϕ0` is increasing. Taking x = y in (2), we
have
ϕ(x) − ϕ(u) ϕ(v) − ϕ(x)
≤ .
x−u v−x
Letting u ↑ x and v ↓ x, we obtain ϕ0` (x) ≤ ϕ0r (x). In particular, ϕ0` (x) and ϕ0r (x) are finite.
This proves (a) and (b). Part (c) follows from these.
ϕ

Luv
Lxv
Luy

u x y v

It remains to verify the inequalities (1)–(3) above. For a < c < d < b, let Lcd denote the
function whose graph is the line segment from (c, ϕ(c)) to (d, ϕ(d)). Since u < x < y < v,
convexity implies that ϕ(x) ≤ Luy (x) and ϕ(y) ≤ Luv (y), hence
ϕ(x) − ϕ(u) Luy (x) − ϕ(u) ϕ(y) − ϕ(u)
≤ = slope of Luy =
x−u x−u y−u
ϕ(y) − ϕ(u) Luv (y) − ϕ(u) Luv (v) − Luv (y)
≤ = slope of Luv = , and
y−u y−u v−y
Luv (v) − Luv (y) Luv (v) − ϕ(y) ϕ(v) − ϕ(y)
≤ = ,
v−y v−y v−y
verifying (1) and (2). A similar argument establishes (3).
4.5.2 Corollary. A convex function is continuous.
4.5.3 Corollary. If a convex function ϕ is differentiable at x ∈ (u, v), then
ϕ0 (x)(t − x) + ϕ(x) ≤ ϕ(t) for all t ∈ (u, v).
That is, the tangent line at (x, ϕ(x)) lies below the graph of ϕ on (u, v).
138 Principles of Analysis

We may now prove

4.5.4 Jensen’s Inequality. Let (X, F, µ) be a probability space and let ϕ : (a, b) → R be
convex. If f : X → (a, b) and f, ϕ ◦ f ∈ L1 , then
Z  Z
ϕ f dµ ≤ ϕ ◦ f dµ.

Proof. By 4.5.1(c), for fixed z ∈ (a, b) there exists a constant c such that

c(t − z) ≤ ϕ(t) − ϕ(z) for all t ∈ (a, b).

Taking t = f (x) and integrating with respect to x yields


Z  Z
c f dµ − z ≤ ϕ ◦ f dµ − ϕ(z).

R
Taking z = f dµ produces the desired inequality.
Note that the inequality in 4.5.4 reverses for concave functions, as may be seen by
considering −ϕ.

Exercises
Pn
4.28 Prove that for xj , tj > 0 and j=1 tj = 1,

n n
t
Y X
xjj ≤ tj xj .
j=1 j=1

JUse the concavity of ln.K

4.29 Use Jensen’s inequality to verify the following for a probability measure µ:
(a) kf kp is increasing on (0, ∞].
(b) kf k1 k1/f kp ≥ 1 (p > 0).
(c) kln f k1 ≤ ln kf k1 , (f > 0).
(d) kf k1 ln kf k1 ≤ ln kf ln f k1 , (f > 0).

4.30 Let µ be a probability measure, 0 < q < ∞, and f ∈ Lq with kf kq > 0. Verify (a)–(e) and
conclude that Z

lim ln kf kp = ln |f | dµ.
p→0

(a) The function p 7→ kf kp is finite and decreasing on (0, q].


R
(b) ln |f | dµ ≤ ln kf kp .
(c) ln x ≤ x − 1 (x > 0), hence ln kf kp ≤ p−1 (|f |p − 1) dµ.
R

(d) (ap − 1)/p → ln a monotonically as p ↓ 0.


(e) limp→0 p−1 (|f |p − 1) dµ = ln |f | dµ.
R R
Chapter 5
Differentiation

In this chapter we consider countably additive set functions that take values in K, so-called
signed and complex measures. These set functions play an important role in the description
of linear functionals on spaces of continuous functions a topic considered in Chapter 7, as
well as in harmonic analysis, developed in Chapter 16. The main result of the chapter
is the Radon-Nikodym theorem, which establishes in terms of integrals the existence of a
derivative of one measure with respect to another. This notion of measure differentiation is
made concrete for Lebesgue-Stieltjes measures on Rd .

Throughout the chapter, (X, F) denotes an arbitrary measurable space.

5.1 Signed Measures


In this section we discuss countably additive, R-valued set functions µ on (X, F). An
example is the difference of two measures, one of which is finite. It turns out that every such
set function µ is of this form, a fact of considerable importance and whose proof is the main
goal of the current section.

Definition and a Fundamental Example

A signed measure on (X, F) is a R-valued set function µ with the following properties:
(a) µ(∅) = 0.

(b) The range of µ contains at most one of the values −∞, ∞.


[  X
(c) If (An ) is a sequence of disjoint sets in F, then µ An = µ(An ).
n n

Property (b) is needed to avoid expressions such as ∞ − ∞. Property (c) asserts that µ
is countably additive. This property, together with (a), implies that µ is also finitely
additive, which may be verified by considering sequences with a “tail end” of empty sets.
Note that because the left side of (c) is invariant under permutations of the sequence (An ),
the right side must also have this property. We shall therefore make that assumption.
To emphasize the distinction between signed measures and the set functions considered
in Chapters 1–4, we sometimes refer to the latter as nonnegative measures. Nevertheless,
the unadorned term measure will continue to refer to the nonnegative set functions studied
in previous chapters.
The sum µ1 + µ2 of signed measures µ1 and µ2 is defined by

(µ1 + µ2 )(E) = µ1 (E) + µ2 (E), E ∈ F.

139
140 Principles of Analysis

For this to be well-defined, the right side must not be of the form ∞ − ∞ or −∞ + ∞. When
dealing with such sums we shall therefore tacitly assume that this restriction holds.
A signed measure µ is said to be finite if µ(X) is finite. Note that in this case µ(A) is
finite for all A ∈ F (use additivity on the sequence {A, Ac }).
5.1.1 Example. Let ν and η be measures on F at least one which, say η, is finite. We
show that µ := ν − η is a signed measure. Properties (a) and (b) are clear. PFor (c) we
consider two
P cases: If µ(A) = ∞, then ν(A) P= ∞ and η(A) < ∞, hence n ν(An ) =
∞ and n η(A n ) < ∞, which implies that
P n µ(A n ) =P∞. If µ(A) is finite, then both
ν(A) and η(A) P are finite and are equal to n ν(A n ) and n η(A n ), respectively. In each
case, µ(A) = n µ(An ). ♦

The Hahn-Jordan Decomposition


Two measures ν and η on F are said to be mutually singular, written ν ⊥ η, if
ν(A) = η(Ac ) = 0 if for some A ∈ F. The following theorem shows that a signed measure µ
may always be decomposed in the manner described in 5.1.1, with the additional property
that the measures ν and η are mutually singular.
5.1.2 Hahn-Jordan Theorem. Let µ be a signed measure on F. Then there exists a set
P ∈ F such that the following hold:
(a) µ+ (E) := µ(E ∩ P ) ≥ 0 and µ− (E) := −µ(E ∩ P c ) ≥ 0 for all E ∈ F.

(b) µ+ (E) = sup{µ(A) : A ∈ F ∩ E} and µ− (E) = − inf{µ(A) : A ∈ F ∩ E}.

(c) µ± are the unique mutually singular measures that satisfy µ = µ+ − µ− .

(d) If also µ+ (E) = µ(E ∩ P1 ) and µ− (E) = −µ(E ∩ P1c ) for some P1 ∈ F, then
µ+ (P 4 P1 ) = µ− (P 4 P1 ) = 0.
We give the proof below. The measures µ+ and µ− in the statement of the theorem are
called the positive and negative variations of µ, and the measure
|µ| := µ+ + µ−
is called the total variation measure of µ. The quantity |µ|(X), which may be infinite,
is called the total variation of µ. The equation µ = µ+ − µ− in (b) is called the Jordan
decomposition of µ. The decomposition of X into a disjoint union of measurable sets P
and P c such that µ ≥ 0 on F ∩ P and µ ≤ 0 on F ∩ P c is called a Hahn decomposition
for µ. Thus part (a) guarantees the existence of a Hahn decomposition (P, P c ) and part (d)
asserts that the decomposition is unique up to a set of total variation measure zero.
R
5.1.3 Example. R Let ν be a measure on F and let f be measurable such that f dν is
defined (hence E f dν is defined for all E ∈ F). Set
Z Z Z
µ(E) = f dν, µ1 (E) = f + dν, and µ2 (E) = f − dν.
E E E

Then µ1 and µ2 are measures and µ = µ1 − µ2 . Moreover, f = 0 on P := {f ≥ 0} and
f + = 0 on P c = {f < 0}, hence µ1 (P c ) = µ2 (P ) = 0. Therefore, µ1 ⊥ µ2 . By uniqueness,
µ1 = µ+ and µ2 = µ− , hence the total variation measure of µ is |µ| = |f |ν. ♦
5.1.4 Corollary. If µ is a signed measure, then for all E ∈ F,
X n 
|µ|(E) = sup |µ(Ej )| : E1 , . . . , En is a measurable partition of E .
j=1
Differentiation 141

Proof. Let ν(E) denote expression on the right. We show that ν(E) = µ+ (E) + µ− (E). This
is clear if |µ(E)| = ∞, since then ν(E) = ∞ and either µ+ (E) = ∞ or µ− (E) = ∞. Now let
|µ(E)| < ∞, so µ+ (E) < ∞ and µ− (E) < ∞. Let A and B be measurable subsets of E and
set C := A ∩ B. Then

µ(A) = µ(A \ C) + µ(C) and µ(B) = µ(B \ C) + µ(C).

and since µ(A) and µ(B) are finite,

µ(A) − µ(B) = µ(A \ C) − µ(B \ C) ≤ |µ(A \ C)| + |µ(B \ C)| ≤ ν(E),

the last inequality because A \ C and B \ C are disjoint subsets of E and so are members of
a measurable partition of E. Therefore, by (b) of the theorem,

µ+ (E) + µ− (E) = sup{µ(A) − µ(B) : A, B ∈ F ∩ E} ≤ ν(E).

For the reverse inequality, let E1 , . . . , En be a measurable partition of E and let A be the
union of those Ej for which µ(Ej ) ≥ 0 and B the union of the remaining Ej . Then
n
X X X
|µ(Ej )| = µ(Ej ) − µ(Ej ) = µ(A) − µ(B) ≤ µ+ (E) + µ− (E).
j=1 j:µ(Ej )≥0 j:µ(Ej )<0

Since the partition {E1 , . . . , En } was arbitrary, ν(E) ≤ µ+ (E) + µ− (E).

Proof of the Hahn-Jordan theorem:


We may assume that −∞ ≤ µ(E) < +∞ for all E ∈ F (otherwise replace µ by −µ).
Proof of (a): Define

A+ := {A ∈ F : µ ≥ 0 on F ∩ A} and A− := {A ∈ F : µ ≤ 0 on F ∩ A}.
+
Note that the sets contain ∅ and hence are nonempty.
+ S Set a := sup{µ(A) : A ∈ +A } and
choose An ∈ A such that µ(An ) → a. Define P := n An . We claim that P ∈ A , that is,
µ(E) ≥ 0 for all E ∈ F ∩ P . To see this, set B1 := A1 ∩ E, Bn := An ∩ Acn−1 ∩ · · · ∩ Ac1 ∩ E,
P
n ≥ 2. The sets Bn are disjoint members of A+ with union E, hence µ(E) = n µ(Bn ) ≥ 0.
We show next that P c ∈ A− , which will complete the proof of (a). Since P ∈ A+ ,
a ≥ µ(P ) = µ(An ) + µ(P ∩ Acn ) ≥ µ(An ) for all n. Taking limits we see that µ(P ) = a.
Since a ≥ 0 and µ never takes on the value ∞, a is finite. These facts imply that P c ∈ A− .
Indeed, let E ⊆ P c be measurable and suppose for a contradiction that µ(E) > 0. We
claim that there exists and F ∈ A+ such that F ⊆ E and µ(F ) > 0. Assuming this for the
moment, we then have µ(P ∪ F ) = µ(P ) + µ(F ) > µ(P ) = a. On the other hand, since P
and F are disjoint, P ∪ F ∈ A+ and so µ(P ∪ F ) ≤ a. With this contradiction we see that
µ(E) ≤ 0, hence P c ∈ A− .
It remains to verify the claim, namely:

If E ∈ F with µ(E) > 0, then there exists a set F ∈ A+ such that F ⊆ E and µ(F ) > 0.

If E ∈ A+ , take F = E. Otherwise, E contains a set with negative measure, and in particular


E contains sets E 0 with µ(E 0 ) < −1/n for some n ∈ N. Let n1 be the smallest positive
integer for which µ(E1 ) < −1/n1 for some E1 ∈ F ∩ E. We then have

µ(E \ E1 ) > µ(E \ E1 ) + µ(E1 ) = µ(E) > 0.

If E\E1 ∈ A+ , take F = E\E1 . Otherwise, apply the same argument to E\E1 , obtaining a set
142 Principles of Analysis

E2 ∈ F∩(E \E1 ) and n2 ≥ n1 such that µ(E2 ) < −1/n2 . Continue inductively. If the process
stops at some point, we are done. Otherwise, we generate a sequence 1 ≤ n1 ≤ n2 ≤ . . . in
N and disjoint E1 , E2 , . . . ∈ F such that
k−1
[
Ek ⊆ E \ Ej , µ(Ek ) < −1/nk , and
j=1
k−1
[
µ(A) ≥ −1/(nk − 1) for all k with nk > 1 and all A ⊆ E \ Ej . (†)
j=1
S∞ S∞ P∞
Set F := E \ k=1 Ek . Then E \ F = k=1 Ek , hence µ(E \ F ) = k=1 µ(Ek ) < 0.
Because µ(E) is finite, so is µ(E \ F ), hence the series converges and so µ(Ek ) → 0. Since
−µ(Ek ) > 1/nk , nk → ∞. Also µ(F ) > µ(F ) + µ(E \ F ) = µ(E) > 0. Finally, if A ∈ F ∩ F ,
letting k → ∞ in (†) yields µ(A) ≥ 0. Therefore, F ∈ A+ .
Proof of (b): Let s denote the supremum in (b). Since µ+ (E) = µ(E ∩ P ), µ+ (E) ≤ s. For
the reverse inequality, let A ∈ F with A ⊆ E. By definition of µ± ,

µ(A) = µ+ (A) − µ(A− ) ≤ µ+ (A) ≤ µ+ (E).

Taking the sup over all such A yields s ≤ µ+ (E). This proves the first part of (b). The proof
of the second part is similar.
Proof of (c): The set functions µ± are clearly mutually singular measures and µ = µ+ − µ− .
Suppose also that µ = µ1 − µ2 , where µ1 and µ2 are nonnegative singular measures. Let
E ∈ F. For any measurable A ⊆ E, µ1 (E) ≥ µ1 (A) ≥ µ(A), hence, taking the sup over all
such A, we have µ1 (E) ≥ µ+ (E) by (b). For the reverse inequality, use the mutual singularity
to obtain B ∈ F such that µ1 (B c ) = µ2 (B) = 0. Then

µ1 (E) = µ1 (B ∩ E) − µ2 (B ∩ E) = µ(B ∩ E) ≤ µ+ (B ∩ E) ≤ µ+ (E)

and so µ1 (E) = µ+ (E). Similarly, µ2 (E) = µ− (E).


Proof of (d): For any A ∈ F, µ(A ∩ P ) = µ+ (A) = µ(A ∩ P1 ), hence µ(A) = 0 for A ⊆ P ∩ P1c .
Therefore, µ+ (P ∩P1c ) = 0. Switching P and P1 , µ+ (P1 ∩P c ) = 0. Therefore, µ+ (P 4 P1 ) = 0.
Similarly, µ− (P 4 P1 ) = 0.

Exercises
5.1 Show that a signed measure µ is finite iff |µ| is finite.

5.2 Let µ1 and µ2 be signed measures such that µ1 + µ2 is defined. Prove that

(µ1 + µ2 )+ + µ− − − + +
1 + µ2 = (µ1 + µ2 ) + µ1 + µ2 .

5.3 Let Q ∈ F have the property that µ(E ∩ Q) ≥ 0 and µ(E ∩ Qc ) ≤ 0 for all E ∈ F. Show that
µ+ (E) = µ(E ∩ Q) and µ− (E) = µ(E ∩ Qc ).

5.4 Let µ be a finite measure and x ∈ X. Find the Hahn decomposition of µ − aδx , where a = µ(X).

5.5 Let µ be a signed measure with Hahn decomposition (P, P c ). Show that (−µ)+ = µ− and
(−µ)− = µ+ and that (P c , P ) is a Hahn decomposition of −µ.

5.6 Let µ and ν be probability measures and η := µ − ν. Show that

|η|(X) = 2 sup |η(E)| : E ∈ F .



Differentiation 143

5.7 Let µ = µ1 − µ2 , where µ1 and µ2 are measures. Show that µ1 ≥ µ+ and µ2 ≥ µ− .


5.8 [↑ 5.7] Show that if µ1 and µ2 are measures and µ1 (Q) = µ2 (Qc ) = 0, then (Q, Qc ) is a Hahn
decomposition for µ2 − µ1 .
5.9 Let µ be a finite signed measure such that |µ|(X) = µ(X). Show that |µ| = µ.
5.10 Show that for a signed measure µ and real-valued h ∈ L1 (µ),
|hµ| = |h| |µ|, (hµ)+ = h+ µ+ + h− µ− , and (hµ)− = h+ µ− + h− µ+ .

5.11 Let µ, µ1 , µ2 , . . . be signed measures with µn → µ. Show that µ± (E) ≤ limn µ±


n (E), and
|µ|(E) ≤ limn |µn |(E).
5.12 Let µ be a signed measure and ν a measure. Show that |µ| ≤ ν iff −ν ≤ µ ≤ ν.
5.13 [↓ 5.29] For finite signed measures µ1 and µ2 , define
µ1 ∨ µ2 = µ1 + (µ2 − µ1 )+ and µ1 ∧ µ2 = µ2 − (µ1 − µ2 )− .
Prove:
(a) µ1 ∨ µ2 = µ2 ∨ µ1 and µ1 ∧ µ2 = µ2 ∧ µ1 .
(b) µ1 , µ2 ≤ µ1 ∨ µ2 , and if ν is a signed measure such that µ1 , µ2 ≤ ν, then µ1 ∨ µ2 ≤ ν. Thus
µ1 ∨ µ2 is the smallest signed measure larger than µ1 and µ2 .
 
(c) µ1 ∧ µ2 = − (−µ1 ) ∨ (−µ2 ) . Use this to formulate and prove assertions for µ1 ∧ µ2 analogous
to those in (b).
(d) µ1 ∨ (−µ1 ) = |µ1 | and µ1 ∧ (−µ1 ) = −|µ1 |.
(e) µ1 ∧ µ2 + µ1 ∨ µ2 = µ1 + µ2 .
(f) If µ is a measure and f, g ∈ L1 (µ1 ), then (f ∨ g)µ1 = (f µ1 ) ∨ (gµ1 ) and (f ∧ g)µ1 =
(f µ1 ) ∧ (gµ1 ).

5.2 Complex Measures


A complex measure is a C-valued set function µ on a measurable space (X, F) with
the following properties:
(a) µ(∅) = 0.
[  X
(b) If (An ) is a sequence of pairwise disjoint sets in F, then µ An = µ(An ),
n n
where the sequence is assumed to converge absolutely.
For a complex measure µ, define
 
µr (E) := Re µ(E) and µi (E) := Im µ(E) , E ∈ F.
Then µr and µi are finite signed measures, called the real and imaginary parts of µ, and
− + −
µ = µr + iµi = µ+
r − µr + i(µi − µi ). (5.1)
Thus complex-valued measures are linear combinations over C of finite measures. This
observation will be useful in reducing some arguments involving complex measures to the
nonnegative case. We also define the complex conjugate of µ in the expected way:
µ = µr − iµi . (5.2)
The notation conflicts with that for the completion of a measure, but this should not be a
problem, as context will make clear which meaning is intended.
144 Principles of Analysis

The Total Variation Measure


The total variation measure |µ| of a complex measure µ is defined by
X
n 
|µ|(E) = sup : |µ(Ej )| : E1 , . . . , En is a measurable partition of E , E ∈ F. (5.3)
j=1

This definition is compatible with the corresponding notion for signed measures (5.1.4). In
the latter case, however, the total variation was immediately seen to be a measure. In the
complex case, some work is required to verify this.
5.2.1 Theorem. If µ is a complex measure, then |µ| is a finite measure and |µ(E)| ≤ |µ|(E)
for all E ∈ F. Moreover, if ν is a complex measure, then |µ + ν| ≤ |µ| + |ν|.

Proof. To show that |µ| is finite, let E1 , . . . , En be an arbitrary measurable partition of


E ∈ F. From (5.1),
− + −
|µ(Ej )| ≤ µ+
r (Ej ) + µr (Ej ) + µi (Ej ) + µi (Ej ),

hence
n
X
− + −
|µ(Ej )| ≤ µ+
r (E) + µr (E) + µi (E) + µi (E) < ∞.
j=1

To show countable additivity, let (An ) be a sequence of disjoint measurable sets with
union A and let {E1 , . . . , En } be a measurable partition of A. Then
n
X n
X n X
X ∞ X ∞ X
n

|µ(Ej )| = |µ(A ∩ Ej )| = µ(Ak ∩ Ej ) ≤ |µ(Ak ∩ Ej )|
j=1 j=1 j=1 k=1 k=1 j=1

X
≤ |µ|(Ak ),
k=1

the last inequality


P∞ because {Ak ∩ E1 , . . . , Ak ∩ En } is a measurable partition of Ak . Therefore,
|µ|(A) ≤ k=1 |µ|(Ak ).
For the reverse inequality, for each k let Pk := {Ek,1 , . . . , Ek,nk } be a measurable partition
of Ak . For each m, the sets Ek,j (1 ≤ k ≤ m, 1 ≤ j ≤ nk ) are part of a finite measurable
partition of A, hence

X nk
m X m X
X
|µ|(A) ≥ |µ(Ek,j )| = |µ(E)|.
k=1 j=1 k=1 E∈Pk
Pm
Taking the suprema over Peach of the partitions Pk yields |µ|(A) ≥ k=1 |µ|(Ak ). Since m

was arbitrary, |µ|(A) ≥ k=1 |µ|(Ak ). This establishes countable additivity of |µ|.
The inequality |µ(E)| ≤ |µ|(E) follows directly from the definition of |µ|(E). The proof
of the triangle inequality is an exercise (5.15).
5.2.2 Example. Let ν be a measure R on F and let f : X → C be ν-integrable. Define the
complex measure µ by µ(E) = E f dν, that is, dµ = f dν. We show that d|µ| = |f | dν.
Let {E1 , . . . , En } be an arbitrary measurable partition of E. Then
n
X n Z
X X n Z Z

|µ(Ej )| =
f dν ≤ |f | dν = |f | dν,

j=1 j=1 Ej j=1 Ej E
Differentiation 145
R
hence
R |µ|(E) ≤ E |f | dν. In particular, |µ|{f = 0} = 0, so for the reverse inequality
E
|f | dν ≤ |µ|(E) we may assume that f is never zero, otherwise remove the part of E on
which f = 0. Consider the polar form of z 6= 0, written as |z| = zeiθ(z) , where −π ≤ θ(z) < π.
For each n define
n
X 
gn (z) = eiθk 1[θk ,θk+1 ) θ(z) , θk = −π + 2πk/n, k = 0, 1, . . . , n − 1.
k=1

Then zgn (z) → |z| and |gn (z)| = 1. Therefore, fn := gn ◦ f is an F-simple function satisfying
|fn | = 1 and f · fn → |f | on X. Let E ∈ F and let fn have standard form
mn
X
fn = cj 1Aj , |cj | = 1.
j=1

Then E ∩ A1 , . . . , E ∩ Amn is a measurable partition of E and


Z mn
X Z mn
X
f · fn dν = cj f dν = cj µ(E ∩ Aj ),
E j=1 E∩Aj j=1

hence Z X
mn
f · fn dν ≤ |cj | |µ(E ∩ Aj )| ≤ |µ|(E).

E j=1
R
Letting n → ∞ and applying the dominated convergence theorem yields E
|f | dν ≤ |µ|(E),
as required. ♦

The Vitali-Hahn-Saks Theorem


We give an application of the total variation measure which asserts that the limit of a
sequence of complex measures on a measurable space (X, F) is a complex measure. For this
we need the following lemma.
5.2.3 Lemma. Let η be a finite measure on F and define

d(A, B) := η A 4 B = k1A − 1B k1 , A, B ∈ F.

If we identify A and A0 whenever η A 4 A0 = 0, then d is a metric on F and (F, d) is
complete.
 
Proof. To see that d is well-defined, let η A 4 A0 = η B 4 B 0 = 0. Then

d(A, B) = k1A − 1B k1 ≤ k1A − 1A0 k1 + k1A0 − 1B 0 k1 + k1B 0 − 1B k1 = d(A0 , B 0 ),

and similarly d(A0 , B 0 ) ≤ d(A, B). That d is a metric follows easily from the properties of
the L1 norm. To show completeness, let (An ) be a Cauchy sequence in F. Then (1An ) is
a Cauchy sequence in L1 , hence there exists f ∈ L1 such that kf − 1An k1 → 0. Choose a
subsequence (1Ank ) that converges a.e. to f . Then f takes on the values 0 and 1 a.e., hence
f = 1A a.e., where A = {f = 1}. Therefore, d(An , A) = k1An − 1A k1 → 0.
5.2.4 Vitali-Hahn-Saks Theorem. Let (X, F) be a measurable space and (µn ) a sequence
of complex measures on F such that the limit

µ(A) := lim µn (A)


n

exists for every A ∈ F. Then µ is countably additive and hence is a complex measure.
146 Principles of Analysis

Proof. The set function µ is clearly finitely additive, and µ(∅) = 0. It remains to show that
µ is continuous from below. To this end, apply the lemma to the finite measure
X∞
1 |µn |(A)
η(A) := n 1 + |µ |(X)
, A ∈ F.
n=1
2 n

For each n, the function A 7→ µn (A) on (F, d) (viewed as a collection of equivalence classes)
is well-defined, since d(A, A0 ) = 0 ⇒ |µn |(A 4 A0 ) = 0 ⇒ µn (A) = µn (A0 ). Moreover, from
 
|µn (A) − µn (B)| ≤ |µn |(A 4 B) ≤ 2n 1 + |µn |(X) η(A 4 B) = 2n 1 + |µn |(X) d(A, B)

we see that the mapping A 7→ µn (A) is continuous in the metric topology.


Now let ε > 0 and set

\
Cn = {A ∈ F : |µk (A) − µn+k (A)| ≤ ε} .
k=1
S
Then Cn is closed in the metric topology, and since µn (A) → µ(A), F = n Cn . Since (F, d)
is complete, the Baire category theorem implies that some Cm contains an open ball. Thus
for some A0 ∈ F and δ > 0,

|µk (A) − µm+k (A)| ≤ ε for all k ≥ 1 and all A with d(A, A0 ) < δ. (†)

Let η(A) < δ and set B = A ∪ A0 and C = A0 \ A. Then

A = B \ C, C ⊆ B, B 4 A0 ⊆ A, and C 4 A0 ⊆ A,

hence d(B, A0 ), d(C, A0 ) < δ and µn (A) = µn (B) − µn (C). Therefore, for all n ≥ m,

|µn (A)| ≤ |µm (A)| + |µn (A) − µm (A)|


≤ |µm (A)| + |µn (B) − µm (B)| + |µn (C) − µm (C)|
≤ |µm (A)| + 2ε,

the last inequality by (†). Letting n → ∞ we have

|µ(A)| ≤ |µm (A)| + 2ε for all A with η(A) ≤ δ. (‡)

Now let En ↑ E. Then η(E \ En ) → 0, hence also µm (E \ En ) → 0. Thus from (‡),

lim |µ(E \ En )| ≤ lim |µm (E \ En )| + 2ε ≤ 2ε.


n n

Since ε was arbitrary, limn µ(E \ En ) = 0, which shows that µ is continuous from below.

The Banach Space of Complex Measures


Let M = M (X, F) denote the linear space of complex measures on F under pointwise
addition and scalar multiplication. It is easy to check that the total variation norm
kµk := |µ|(X) is a norm on M . For example, the triangle inequality follows from the second
part of 5.2.1. We claim that
P M is complete in this norm. For this we use 0.4.3. LetP (µn ) be
n
a sequence M such that n kµn k < ∞. The sequence of complex measures νn := k=1 µk
then converges to a complex measure in the total variation norm. To see this, let E ∈ F and
note that for m > n,
m
X X
|νm (E) − νn (E)| ≤ |µk (E)| ≤ kµk k ,
k=n+1 k>n
Differentiation 147

so (νn (E)) is a Cauchy sequence in C. Let νn (E) → ν(E). By the Vitali-Hahn-Saks theorem,
ν is a complex measure. For any measurable partition E1 , . . . , Ep of X,
p
X m
X p
X X
|νm (Ej ) − νn (Ej )| ≤ |µk (Ej )| ≤ kµk k ,
j=1 k=n+1 j=1 k>n
Pp P
and letting m → ∞ weP have j=1 |ν(Ej ) − νn (Ej )| ≤ k>n kµk k . Since the partition was
arbitrary, kν − νn k ≤ k>n kµk k. Therefore, kνn − νk → 0, proving that M is complete.
We summarize this discussion in
5.2.5 Proposition. The linear space M (X, F) of complex measures on a measurable space
(X, F) is a Banach space under the total variation norm.

Integration against a Signed or Complex Measure


If µ is a signed measure, define
Z Z Z
f dµ := f dµ − f dµ− , f ∈ L1 (|µ|).
+

For a complex measure µ, define


Z Z Z
f dµ := f dµr + i f dµi , f ∈ L1 (|µ|).

It is straightforward to check that in each case the integrals are well-defined, linear, and
satisfy Z Z Z Z

f dµ ≤ |f | d|µ| and f dµ = f dµ (5.4)

(Ex. 5.16). Moreover, the dominated convergence theorem holds for a signed or complex
measure µ, as may be seen by decomposing µ into a linear combination of the measures µ±
r,i .

Exercises
5.14 Show that if µ is a complex measure and kµk = µ(X) < ∞, then µ is a nonnegative measure.
5.15 Verify the inequality |µ + ν| ≤ |µ| + |ν| for complex measures.
5.16 Let µ be a signed or complex measure. Verify that the integral with respect to µ is well-defined
and linear. Also, verify the assertions in (5.4).
5.17 Show that in the definition of |µ|, the finite measurable partition E1 , . . . , En may be replaced a
countable measurable partition.
5.18 Let µ be a complex measure and E ∈ F. Prove that
 Z 

|µ|(E) = sup f dµ : f is measurable and |f | ≤ 1 .
E

5.19 Let µ and ν be complex measures on measurable spaces (X, F) and (Y, G), respectively.
(a) Show that there exists a unique complex measure µ × ν on F ⊗ G such that (µ ⊗ ν)(A × B) =
µ(A)ν(B) for all A ∈ F and B ∈ G.

RR Prove the complex version of Fubini’s theorem for a (F × G)-measurable function f : If


(b)
|f (x, y)| d|µ|(x) d|ν|(y) < ∞, then
Z Z Z Z Z
f d(µ ⊗ ν) = f (x, y) dµ(x) dν(y) = f (x, y) dν(y) dµ(x).
X×Y Y X X Y

(c) Show that |µ ⊗ ν| = |µ| ⊗ |ν|.


148 Principles of Analysis

5.20 Let (X, F) and (Y, G) be measurable spaces and T : (X, F) → (Y, G) measurable. If µ is a signed
or complex measure on (X, F), then the image of µ under  T is the signed of complex measure
T (µ) on (Y, G) defined as before by T (µ)(E) = µ T −1 (E) , E ∈ G. Show that in the signed case
|T (µ)| ≤ |T |(µ), (T µ)+ ≤ T (µ+ ), and (T µ)− ≤ T µ− , and in the complex case |T (µ)| ≤ |T |(µ).

5.3 Absolute Continuity of Measures


Let (X, F, µ) be a measure space. A signed or complex measure ν on (X, F) is said to be
absolutely continuous with respect to µ, written ν  µ, if

E ∈ F and µ(E) = 0 ⇒ ν(E) = 0.


R
For example, if g ∈ L1 (µ) then ν(E) := E g dµ defines ν a signed or complex measure on
(X, F) with ν  µ. The main goal of this section is to prove the converse: if µ is σ-finite,
then every signed or complex measure ν that is absolutely continuous with respect to µ is of
the form g dµ for some g. This result, known as the Radon-Nikodym theorem, has many
important applications in analysis and probability theory, some of which are given below.
Before we state the theorem we develop a few preliminary results.

General Properties of Absolute Continuity


The following proposition is useful in reducing arguments involving absolute continuity of
signed and complex measures to the nonnegative case.
5.3.1 Proposition. Let µ be a measure and ν, η signed or complex measures.
(a) If ν is signed, then ν  µ iff |ν|  µ iff both ν +  µ and ν −  µ.

(b) If ν is complex, then ν  µ iff |ν|  µ iff both νr  µ and νi  µ.

(c) ν  µ and |η| ⊥ µ ⇒ |ν| ⊥ |η|.

(d) ν  µ and |ν| ⊥ µ ⇒ |ν| = 0.

Proof. (a) Let ν  µ, µ(E) = 0, and A ⊆ E measurable. Then, µ(A) = 0, hence ν(A) = 0.
Therefore, by 5.1.2, ν + (E) = ν − (E) = 0, hence also |ν|(E) = 0. The converses are clear.
(b) This follows from |νr,i | ≤ |ν| ≤ |νr | + |νi | and (a).
(c) For the signed case, let |η|(E c ) = µ(E) = 0. Since ν  µ, ν(A) = 0 for all measurable
A ⊆ E, hence |ν|(E) = 0. Therefore, |ν| ⊥ |η|. The complex case is obtained by using νr,i .
(d) By (c), |ν| ⊥ |ν|, that is, |ν|(E) = |ν|(E c ) = 0 for some E. Therefore, |ν| = 0.
5.3.2 Proposition. Let µ be a measure and ν a complex measure. Then ν  µ iff
limµ(E)→0 ν(E) = 0 for all E ∈ F.
Proof. The limit assertion means that for every ε > 0 there exists δ > 0 such that |ν(E)| < ε
for all E ∈ F with µ(E) < δ. Suppose this holds. If µ(E) = 0, then the δ-inequality holds
trivially, hence |ν(E)| < ε for all ε and so ν(E) = 0. Therefore, ν  µ.
Conversely, suppose ν  µ. Then by 5.3.1(b), |ν|  µ. If we show that limµ(E)→0 |ν|(E) =
0, then the inequality, |ν(E)| ≤ |ν|(E) will imply that limµ(E)→0 ν(E) = 0. Thus we may
assume without loss of generality that ν is nonnegative. Suppose that the ε-δ condition
does not hold. Then there exists ε > 0 and for each n ∈ N a measurable set En such that
Differentiation 149
P
µ(En ) < 1/2n and ν(En ) ≥ ε. Let E = limn En . Since n µ(En ) < +∞, µ(E) = 0 (Ex. 1.37).
But by continuity from above (since ν is finite), ν(E) ≥ limn ν(En ) ≥ ε, contradicting the
assumption that ν  µ.
5.3.3 Remark. The necessity of 5.3.2 does not necessarily P hold if ν is not finite. For
example, let ν be counting measure on N and let µ(E) := n∈E 1/2n . Clearly ν  µ. On
the other hand, if An := {n, n + 1, . . .} then µ(An ) → 0 but ν(An ) = ∞ for all n. ♦

The Radon-Nikodym Theorem


5.3.4 Theorem. Let (X, F, µ) be a σ-finite measure space and ν a complex or signed
measure on F such that ν  µ. Then there exists a measurable function h : X :→ K, unique
up to a set of µ-measure zero, such that
Z
ν(E) = h dµ for all E ∈ F.
E

Moreover, we may take h to be R-valued if ν is a signed measure and nonnegative if ν is a


measure.
We prove the theorem below. The function h in the theorem is called the Radon-
Nikodym derivative of ν with respect to µ and is denoted by dν/ dµ. By the usual
arguments, Z Z

f dν = f dµ (5.5)

in the sense that if one side is defined, then so is the other and then equality holds.
5.3.5 Remark. The theorem is false if µ is not σ-finite. For example, if µ is counting
R measure
on B[0, 1], then, trivially, λ  µ, but there is no function g such that λ(E) = E g dµ for
all E ∈ F. Otherwise, taking E = {x}, we would have 0 = g(x) for all x implying that
λ[0, 1] = 0. ♦
5.3.6 Examples. In the following, (X, F, µ) is a σ-finite measure space.
(a) Let ν be a measure such that ν(E) ≤ µ(E) for all E ∈ F. Then evidently ν  µ, hence
(5.5) holds for all suitable f and so 0 ≤ dν/ dµ ≤ 1 a.e. In particular, if µ1 and µ2 are
σ-finite measures, then
Z Z
dµ1 dµ1
f dµ1 = f d(µ1 + µ2 ), where 0 ≤ ≤ 1 a.e.
d(µ1 + µ2 ) d(µ1 + µ2 )

(b) Let ν be a complex measure. Since ν  |ν|, we have


Z

ν(E) = d|ν|, E ∈ F.
E d|ν|

Recalling 5.2.2, we see that for all E ∈ F


Z Z

1 d|ν| = |ν|(E) =
d|ν| d|ν|.
E E

Therefore,


d|ν| = 1, |ν| a.e.
150 Principles of Analysis

As the function Arg(z) is continuous on C \ (−∞, 0], the function


 dν 
θ := Arg
d|ν|
is measurable, and we have

dν = d|ν| = eiθ d|ν|.
d|ν|
This equation is called the polar decomposition of ν.
(c) Let U and V be open subsets of Rd and let ϕ : U → V be C 1 on U with C 1 inverse
ϕ−1 : V → U . By (3.7)
dϕ(λ) 1
= .
dλ | det ϕ0 | ◦ ϕ−1
(d) Let G be a sub-σ-field of F and let f ∈ L1 (µ). Define
Z
µf (E) = f dµ, E ∈ G.
E

Then µf  µ on G and
Z Z
dµf
dµ = f dµ for all E ∈ G. (5.6)
E dµ E
The salient point here is that dµf / dµ has the same integral property as f but is G-measurable,
while, of course, f need not be. If µ is a probability measure, then dµf / dµ is called the
conditional expectation of f given G, studied in detail in Chapter 18. For a concrete
example, let (X, F, µ) be the product of the probability spaces (X1 , F1 , µ1 ) and (X2 , F2 , µ2 )
and take G = F1 × X2 so that
Z Z Z
f dµ = f (x1 , x2 ) dµ2 (x1 ) dµ1 (x1 ), E1 ∈ F1 .
E1 ×X2 E1 X2

Identifying G with F1 and comparing with (5.6), we see that


Z
dµf
(x1 ) = f (x1 , x2 ) dµ2 (x2 ),
dµ X2

where we have omitted the redundant argument x2 in dµf / dµ. Viewing X1 × X2 as the
set of outcomes of a two-stage experiment and taking x1 as the outcome of stage one, we
see that dµf / dµ(x1 ) is the average of f over the possible outcomes of stage two. Thus if
the σ-field F1 is interpreted as “given information,” namely as the information revealed
after the first stage, then dµf / dµ incorporates both the “known” and an average over the
“unknown.” Therefore, dµf / dµ may be interpreted as the best information regarding f that
is available after stage one but before stage two. ♦

Proof of the Radon-Nikodym theorem:


The uniqueness of h follows from 3.4. For proof of existence, we consider several cases.
Case I. µ is finite and ν is a finite measure.
R
Let F denote the collection of F-measurable functions f ≥ 0 such that E f dµ ≤ ν(E) for
all E ∈ F. Clearly, 0 ∈ F. Furthermore, f, g ∈ F ⇒ f ∨ g ∈ F. Indeed, if A = {f > g} and
E ∈ F, then
Z Z Z
f ∨ g dµ = f dµ + g dµ ≤ ν(E ∩ A) + ν(E ∩ Ac ) = ν(E).
E E∩A E∩Ac
Differentiation 151

We claim there exists an h ∈ F such that


Z Z 
h dµ = sup f dµ : f ∈ F . (†)

R
To see this, let s denote the supremum on the right and let fn ∈ F such that fn dµ → s.
Replacing fn by f1 ∨ · · · ∨ fn if necessary, we may assume that fn ↑ h for some measurable
h ≥ 0. By the monotone convergence theorem,
Z Z
h dµ = lim fn dµ ≤ ν(E), E ∈ F.
E n E
R
Therefore, h ∈ F and s = E
h dµ, verifying the claim.
Now define Z
η(E) := ν(E) − h dµ, E ∈ F.
E

Since h ∈ F, η(E) ≥ 0 for every E ∈ F. Therefore, η is a finite measure. The proof of the
theorem for Case I will be complete once we show that η(X) = 0. Let r > 0 and let (P, P c )
be a Hahn decomposition for the signed measure η − rµ. Since (η − rµ)(E ∩ P ) ≥ 0,
Z Z Z
ν(E) = h dµ + η(E) ≥ h dµ + rµ(E ∩ P ) = (h + r1P ) dµ, E ∈ F,
E E E

hence h + r1P ∈ F. By (†),


R µ(P ) = 0, hence, by absolute continuity, ν(P ) = 0. Therefore,
(η − rµ)(P ) = η(P ) = − P h dµ = 0 and so

0 ≤ η(X) = rµ(X) + (η − rµ)(X) = rµ(X) + (η − rµ)(P c ) ≤ rµ(X).

Letting r → 0 we conclude that η(X) = 0.


Case II. µ is finite and ν is a σ-finite measure.
S
Let {Xn } be a sequence of pairwise disjoint sets in F such that X = n Xn and ν(Xn ) < ∞.
For each n, ν  µ on F ∩ Xn , hence by Case R I there exists a finite F ∩ Xn -measurable
function hn ≥ 0 on Xn such that ν(E) = E hn dµ for all E ∈ F P ∩ Xn . Extend hn to an
F-measurable function on X by setting hn = 0 on Xnc . Then h := n hn is F-measurable,
h ≥ 0, and
[ XZ Z
ν(E) = ν(E ∩ Xn ) = hn dµ = h dµ, E ∈ F.
n n E E

Case III. µ is finite and ν is an arbitrary measure.


Let A be the collection of all sets A ∈ F such that ν is σ-finite on F ∩ A. Since ∅ ∈ A,
A 6= ∅.SChoose a sequence (An ) in A such that µ(An ) → s := sup{µ(A) : A ∈ A}. Then
B := n An ∈ A, and letting n → ∞ in µ(An ) ≤ µ(B) ≤ s we see that µ(B) = s. By
Case II, applied to µ and ν on F ∩ B, there exists a nonnegative measurable function g on
B such that
Z
ν(E) = ν(B c ∩ E) + ν(B ∩ E) = ν(B c ∩ E) + g dµ for all E ∈ F.
B∩E

Extend g to a measurable function on X by setting g = ∞ on B c . It remains to show that


Z
c
ν(B ∩ E) = g dµ, E ∈ F. (‡)
B c ∩E
152 Principles of Analysis

But if µ(B c ∩ E) = 0, then by absolute continuity ν(B c ∩ E) = 0, hence both sides of (‡)
are zero. On the other hand, if µ(B c ∩ E) > 0, then the right side is ∞. In this case the
left
 side must  be ∞, since
also otherwise B ∪ (B c ∩ E) would be in A, impossible because
c c
µ B ∪ (B ∩ E) = s + µ(B ∩ E) > s.
Case IV. µ is σ-finite and ν is an arbitrary measure.
The proof is similar to that of Case II. The details are left to the reader.
Case V. µ is σ-finite and ν is an arbitrary signed measure.
Apply Case IV to ν + and ν − to obtain nonnegative measurable functions h1 and h2 such
that Z Z
ν + (E) = h1 dµ and ν − (E) = h2 dµ, E ∈ F.
E E
+ −
Since ν (X) and ν (X) are not both infinite, one of the hj is µ-integrable. Taking h := h1 −h2
produces the desired result.
Case VI. µ is σ-finite and ν is an arbitrary complex measure.
Apply Case V to νr and νi . The details are left to the reader.

Lebesgue-Decomposition of a Measure
The following result, a consequence of the Radon-Nikodym theorem, asserts that for
a suitable pair of measures µ and ρ, the former may be decomposed into parts that are,
respectively, absolutely continuous and singular with respect to the latter. This decomposition
will lead to an important result in the next section regarding the derivative of a Lebesgue-
Stieltjes measure on Rd .
5.3.7 Lebesgue Decomposition Theorem. Let ρ be a σ-finite measure and µ a signed
(resp., complex) measure on (X, F) such that |µ| is σ-finite. Then there exist unique signed
(resp., complex) measures µa and µs such that µ = µa + µs , µa  ρ, and |µs | ⊥ ρ.
Furthermore, if µ is a measure, then so are µa and µs .
Proof. Suppose first that µ is a measure. Consider the σ-finite measure m = ρ+µ. By 5.3.6(a)
there exists a measurable function h (0 ≤ h ≤ 1) such that for all E ∈ F
Z Z
µ(E) = h dm and ρ(E) = (1 − h) dm.
E E
Define  
µa (E) = µ E ∩ {h < 1} and µs (E) = µ E ∩ {h = 1} .
Clearly, µa + µs = µ. If ρ(E) = 0, then h = 1 m-a.e. and hence also µ-a.e. on E and so
µa (E) = 0. Therefore, µa  ρ. Since µs (h < 1) = 0 = ρ(h = 1), µs ⊥ ρ. This proves the
theorem for the case µ a measure.
If µ is a signed measure, then, by the previous paragraph, there exist measures µa1 , µa2
and µs1 , µs2 such that
µ+ = µa1 + µs1 , µa1  ρ, |µs1 | ⊥ ρ, and µ− = µa2 + µs2 , µa2  ρ, |µs2 | ⊥ ρ.
Set µa = µa1 − µa2 and µs = µs1 − µs2 . Clearly, µa  ρ. Also, if |µsj |(Ej ) = ρ(Ejc ) = 0,
then |µs |(E1 ∩ E2 ) ≤ |µs1 |(E1 ∩ E2 ) + |µs2 |(E1 ∩ E2 ) = 0 and ρ((E1 ∩ E2 )c ) = 0, so |µs | ⊥ ρ.
A similar argument proves the complex case.
For uniqueness, assume that
µ = µ0a + µ0s , where µ0a  ρ and |µ0s | ⊥ ρ.
Then, µa − µ0a = µs − µ0s , hence the common value is both absolutely continuous and singular
with respect to ρ and so must be zero (5.3.1(d)).
Differentiation 153

5.3.8 Remark. The conclusion of the theorem is false if |µ| is not σ-finite. For example,
take ρ = λ and let µ = counting measure on B[0, 1]. Suppose that µ = µa + µs , where
µa  λ and µs ⊥ λ. Then µs (Ac ) = λ(A) = 0 for some A ∈ B[0, 1] and µa {x} = 0 for all x.
Since µs {x} = µs {x} + µa {x} = µ{x} = 1, Ac = ∅. But then A = [0, 1], impossible. ♦

Exercises
5.21 Let µ and ν be finite measures with ν  µ and let a > 0. Find a Hahn decomposition of ν − aµ
in terms of h = dν/dµ.

5.22 Let p > 0 and define ν(E) = E xp dλ(x), E ∈ B[1, ∞). Show that ν  λ, but the limit
R

limλ(E)→0 |ν(E)| is not zero.

5.23 Let f be the (increasing, continuous) Cantor function and let ν be the probability measure on
B[0, 1] with distribution function f . Show that ν ⊥ λ.

5.24 Let ν1 and ν2 be complex measures, µ a σ-finite measure, and c1 , c2 ∈ C. Show that if ν1  µ
and ν2  µ, then c1 ν1 + c2 ν2  µ and
d(c1 µ1 + c2 µ2 ) dµ1 dµ2
= c1 + c2 .
dµ dµ dµ

5.25 Let µ1 and µ2 be signed measures. Find a Hahn decomposition for µ1 + µ2 . JConsider Radon-
Nikodym derivatives.K

5.26 Let µ be a σ-finite measure and ν a signed or complex measure with ν  µ. Show that
d|ν| dν
= .
dµ dµ
5.27 Let µj be σ-finite measures with µ1  µ2 and µ2  µ3 . Prove:
dµ1 dµ1 dµ2
= , µ3 -a.e.
dµ3 dµ2 dµ3
 −1
dν dν dν
5.28 Let σ-finite measures with ν  µ. Show that = 1+ .
d(µ + ν) dµ dµ
5.29 [↑ 5.13] Let µ be a measure and let µ1 and µ2 be a finite signed measure with µj  µ. Show
that (µ1 ∨ µ2 )  µ and
d(µ1 ∨ µ2 ) dµ1 dµ2
= ∨ .
dµ dµ dµ
Show conversely that if (µ1 ∨ µ2 )  µ, then µ1  µ and µ2  µ. Formulate and prove the
analogous assertions for µ1 ∧ µ2 .

5.30 Let µ1 and µ2 be finite measures. Show that µ1 ⊥ µ2 iff µ1 ∧ µ2 = 0 iff µ1 ∨ µ2 = µ1 + µ2 . JIn
one direction use Ex. 5.8.K

5.31 Two σ-finite measures µ and ν are said to be equivalent if µ  ν and ν  µ, that is, µ and ν
have the same sets of measure zero.
(a) Show that µ and ν are equivalent iff there exists a finite, positive, measurable function h
such that ν = hµ.
(b) Show that every σ-finite measure µ is equivalent to some probability measure ν. JConsider
an infinite series of measures.K

5.32 For j = 1, 2, let µj and νj be nontrivial σ-finite measures on (Xj , Fj ). Show that ν1 ⊗ν2  µ1 ⊗µ2
iff ν1  µ1 and ν2  µ2 , in which case
d(ν1 ⊗ ν2 ) dν1 dν2
(x1 , x2 ) = (x1 ) (x2 ).
d(µ1 ⊗ µ2 ) dµ1 dµ2
154 Principles of Analysis

5.33 Let T : (X, F) → (Y, G) be measurable, µ a σ-finite measure and ν a complex or signed measure
on X such that ν  µ. (a) Show that T (ν)  T (µ). (b) Suppose also that T −1 : (Y, G) → (X, F)
exists and is measurable. Prove that
d T (ν) dν
= ◦ T −1 .
d T (µ) dµ

5.34 Let ν be a σ-finite measure.


(a) Let µn , µ be measures such that µn ↑ µ and µ  ν. Show that
dµn dµ
↑ , ν-a.e.
dν dν
dµn
In particular, ↑ 1.

(b) Let ηn be measures, µ := ∞
P
n=1 ηn and µ  ν. Show that


dµ X dηn
= ν-a.e.
dν n=1

5.35 Let µ, µn be probability measures.


(a) Show that there exists a probability measure ν such that µ  ν and µn  ν for all n.
(Consider an infinite series of measures.)
(b) Suppose that |µ − µn |(X) → 0. Let ν be any finite measure such that µ  ν and µn  ν
for all n. Show that
dµn ν dµ
→ .
dν dν
JLet hn = dµn /dν − dµ/dν. Consider An := {|hn | ≥ ε}, Bn := {hn ≥ ε}, Cn := {hn ≤ −ε}.K

5.36 Let µ, ν, νn be measures with µ σ-finite and νn (E) ↑ ν(E) for every E ∈ F. Show that ν  µ
iff νn  µ for all n, in which case
dν dνn
= lim µ a.e.
dµ n dµ

P
5.37 [↓ 5.5.9] Let µ and µn be finite measures with µ = n µn and let µn = µna +µns and µ = µa +µs
be the Lebesgue decompositions with respect to a σ-finite measure ρ. Show that
X X
µa = µna and µs = µns .
n n

5.4 Differentiation of Measures


In this section we show that a Radon-Nikodym derivative on Rd can be expressed as a
limit of difference quotients, thus establishing a connection with the classical derivative.

Definition and Properties of the Derivative

Let µ be a signed measure on B(Rd ) which is finite on bounded sets. We shall call such a
measure a Lebesgue-Stieltjes signed measure. For each x ∈ Rd and r > 0, let B(x, r)
denote the collection of all open balls containing x and with radius less than r. Define
   
µ(B) µ(B)
D(µ; x, r) := sup : B ∈ B(x, r) and D(µ; x, r) := inf : B ∈ B(x, r) ,
λ(B) λ(B)
Differentiation 155

where for simplicity of notation we set λ = λd . Note that for fixed x, the functions D(µ; x, r)
and D(µ; x, r) decrease and increase, respectively, as r ↓ 0. Moreover, for each c and r the
sets {x : D(µ; x, r) > c} and {x : D(µ; x, r) < c} are open (Ex. 5.38). Thus D(µ; x, r) and
D(µ; x, r) are Borel measurable in x for fixed r.
Now define the upper and lower derivates D µ and D µ of µ by

D µ(x) := lim D(µ; x, r) = inf D(µ; x, r), D µ(x) := lim D(µ; x, r) = sup D(µ; x, r).
r→0+ r>0 r→0+ r>0

Then
D(µ; x, r) ↑ D µ(x) and D(µ; x, r) ↓ D µ(x) as r ↓ 0,
so by the preceding observations the functions D µ and D µ are Borel measurable. If D µ(x)
and D µ(x) are finite and equal, then µ is said to be differentiable at x. In this case, the
common value is denoted by Dµ(x) and is called the derivative of µ at x. Note that the
inequalities 
µ B(x, r)
D(µ; x, r) ≤  ≤ D(µ; x, r)
λ B(x, r)
imply that  
µ B(x, r) µ B(x, r)
D µ(x) ≤ lim  ≤ lim  ≤ D µ(x).
r→0 λ B(x, r) r→0 λ B(x, r)

It follows that if Dµ is differentiable at x, then



µ B(x, r)
D µ(x) = lim  . (5.7)
r→0 λ B(x, r)

Here is a related sequential characterization of differentiability:


5.4.1 Proposition. A Lebesgue-Stieltjes signed measure µ is differentiable at x ∈ Rd iff
there exists a real number a such that for any sequence of open balls Bn ∈ B(x, rn ) with
rn → 0,
µ(Bn )
lim = a.
n λ(Bn )

In this case, a = Dµ(x).


Proof. The inequality D(µ; x, r) ≤ µ(Bn )/λ(Bn ) ≤ D(µ; x, r) (rn ≤ r) implies that
µ(Bn ) µ(Bn )
D(µ; x, r) ≤ lim ≤ lim ≤ D(µ; x, r).
n λ(Bn ) n λ(Bn )

Letting r → 0, we obtain
µ(Bn ) µ(Bn )
Dµ(x) ≤ lim ≤ lim ≤ Dµ(x).
n λ(Bn ) n λ(Bn )

Therefore, if Dµ is differentiable at x, then limn µ(Bn )/λ(Bn ) = Dµ(x).


Conversely, suppose that Dµ is not differentiable at x, so D µ(x) < D µ(x). Let tn ↑
D µ(x). For each n, D(µ; x, r) > tn for all sufficiently small r, hence for each n ∈ N we
may choose an < 1/n and An ∈ B(x, an ) such that tn < µ(An )/λ(An ) ≤ D µ(x). Thus
limn µ(An )/λ(An ) = D µ(x). Similarly, there exists bn < 1/n and Bn ∈ B(x, bn ) such that
limn µ(Bn )/λ(Bn ) = D µ(x). Since these limits are unequal, the sequential criterion fails.
5.4.2 Corollary. The differential operator D is linear: If Dµ(x) and Dν(x) exist and
a, b ∈ R, then D(aµ + bν)(x) exists and D(aµ + bν)(x) = aD(µ)(x) + bD(ν)(x).
156 Principles of Analysis

Connections with the Classical Derivative


Equation (5.7) expresses the derivative of a measure as a limit of quotients, analogous to
the definition of the classical derivative of a function of a real variable. This connection is
crystallized in Proposition 5.4.4 below. For the proof we need the following lemma.
5.4.3 Lemma. Let µ be a Lebesgue-Stieltjes signed measure on R and let F be a function
on R such that µ(a, b] = F (b) − F (a) for all a < b. Then at each continuity point x of F ,

F (x + h) − F (x) F (x + h) − F (x)
D µ(x) = lim inf and D µ(x) = lim sup .
r→0+ 0<|h|≤r h r→0+ 0<|h|≤r h

Proof. Note first that F has at most countably many discontinuities. This follows from
µ = µ+ − µ− , allowing us to write F (b) − F (a) = [F+ (b) − F+ (a)] − [F− (b) − F− (a)], where
F± are distribution functions, hence nondecreasing, and so have at most countably many
discontinuities (Ex. 1.84).
We prove only the equality for D µ(x). (The other equality follows by considering −µ.)
Define
F (x + h) − F (x)
f (r) := inf and f (0+) := lim+ f (r).
0<|h|≤r h r→0

We show first that f (0+)


 ≥ D µ(x). Suppose for a contradiction that f (0+) < D µ(x). Let
f (0+) < a < D µ(x) . Since f (1/n) ↑ f (0+) < a, we may choose a sequence hn → 0 such
that
F (x + hn ) − F (x)
< a.
hn
By considering subsequences, it suffices to consider two cases: (a) hn ↓ 0 and (b) hn ↑ 0.
Suppose that (a) holds. Since F is continuous at x and right continuous at x + hn , for
each n we may choose 0 < tn < 1/n such that

F (x + (1 + tn )hn ) − F (x − tn hn )
< a.
hn
Since F has at most countably many discontinuities, we may take tn so that F is continuous
at x + (1 + tn )hn . Setting Bn = (x − tn hn , x + (1 + tn )hn ) and rn = hn (1 + 2tn )/2 (the
radius of the interval Bn ), we then have

µ(Bn ) F (x + (1 + tn )hn ) − F (x − tn hn ) a
D(µ, x, rn ) ≤ = < .
λ(Bn ) hn (1 + 2tn ) 1 + 2tn

Letting n → ∞ produces the contradiction a < D µ(x) ≤ a.


Now assume that (b) holds. Let kn = −hn , so kn > 0 and

F (x) − F (x − kn )
< a.
kn
By right continuity at x, there exists 0 < tn < 1/n such that

F (x + tn kn ) − F (x − kn )
< a.
kn
Setting Bn = (x − kn , x + tn kn ) and rn = kn (1 + tn )/2 we then have

µ(Bn ) F (x + tn kn ) − F (x − kn ) a
D(µ, x, rn ) ≤ = < .
λ(Bn ) kn (1 + tn ) 1 + tn
Differentiation 157

Letting n → ∞ gives the contradiction a < D µ(x) ≤ a.


We have shown that f (0+) ≥ D µ(x). To show equality, let b > D µ(x) and choose
rn , sn ↓ 0 such that for Bn := (x − sn , x + rn )
F (x + rn ) − F (x − sn ) µ(Bn )
= < b. (†)
rn + sn λ(Bn )
Setting qn := rn + sn , we may write the left side as
F (x + rn ) − F (x) rn F (x − sn ) − F (x) sn
+ ≥ min{f (rn ), f (sn )}. (‡)
rn qn −sn qn
Letting n → ∞ we see from (†) and (‡) that b ≥ f (0+). Since b > D µ(x) was arbitrary,
D µ(x) ≥ f (0+).
Here is the promised connection between differentiability of measures and differentiability
of functions.
5.4.4 Theorem. Let µ be a Lebesgue-Stieltjes signed measure on R and F a function on
R such that µ(a, b] = F (b) − F (a) for all a < b. Then µ is differentiable at x ∈ R iff F is
differentiable at x. In this case, D µ(x) = F 0 (x).
Proof. If µ is differentiable at x, then F is continuous at x (Ex. 5.39), hence the expressions
in the preceding lemma are all equal. This implies that F is differentiable at x with derivative
Dµ(x). The converse is similar.

Existence of the Measure Derivative


The following theorem shows that Dµ exists λ-a.e. and is in fact a Radon-Nikodym
derivative, where as before we set λ = λd .
5.4.5 Theorem. Let µ be a Lebesgue-Stieltjes signed measure on B(Rd ) with Lebesgue
decomposition µ = µa + µs , where µa  λ and |µs | ⊥ λ. Then
dµa
Dµ(x) = (x) = Dµa (x) and Dµs (x) = 0 λ-a.e.

In particular, Dµ(x) exists and is finite λ-a.e.
Proof. It is enough to prove the first equality. The proof consists of several steps, the first of
which is called the Vitali covering lemma.
(1) Let {B1 , . . . , Bn } be a collection of open balls in Rd . Then there is a subcollection of
disjoint balls {Bk1 , . . . , Bkm } such that
[n  m
X
λ Bi ≤ 3d λ (Bki ) .
i=1 i=1

JChoose the notation so that the radius of Bi decreases as i increases. Let k1 = 1, and
successively choose ki ∈ N such that ki+1 is the smallest index j > ki for which Bj is
disjoint from Bk1 ∪ · · · ∪ Bki . Let km be the index for which the process stops, so that
the collection {Bk1 , . . . , Bkm } is disjoint. By choice of km , if j > km or ki < j < ki+1 ,
then Bj ∩ Bkq 6= ∅ for some q with kq < j. Thus for each j = 1, . . . , n there exists kq ≤ j
such that Bj ∩ Bkq 6= ∅. Now let Akq be the ball with the same center as Bkq and with
triple the radius.

Sn j ≥ kqS, m
Since the radius of Bj is no larger than that of Bkq , hence Bj ⊆ Akq . Therefore
B
j=1 j ⊆ i=1 Aki , and the desired inequality follows from the dilation property of λ.K
158 Principles of Analysis

Akq

Bj
Bkq

FIGURE 5.1: Construction of Akq .

(2) Let µ be nonnegative, c > 0, and K ⊆ {Dµ > c} compact. Then cλ(K) ≤ 3d µ(K).
JLet r > 0. For each x ∈ K, choose B ∈ B(x, r) such that µ(B)/λ(B) > c. By
compactness, there exists a finite subcover B1 , . . . , Bn of K of such balls. Choose
{Bk1 , . . . , Bkm } as in step (1). Then
X m m [m 
3d X 3d
λ(K) ≤ 3d λ(Bki ) ≤ µ(Bki ) = µ Bki .
i=1
c i=1 c i=1

Since Bki has radius < r and meets K it must be contained in Ur := {x : d(x, K) < 2r}.
Therefore, λ(K) ≤ (3d /c)µ(Ur ). Letting r ↓ 0 yields λ(K) ≤ (3d /c)µ(K).K
(3) If µ is nonnegative and µ(E) = 0, then Dµ(x) = 0 for λ-a.a. x ∈ E.
JLet c > 0 and B := E ∩ {D µ > c}. We show that λ(B) = 0. By regularity, it suffices
to show that λ(K) = 0 for any compact K ⊆ B. But this follows from step (2), since
λ(K) ≤ (3d /c)µ(K) ≤ (3d /c)µ(E) = 0.K

(4) If |µ| ⊥ λ, then Dµ(x) = 0 for λ-a.a. x ∈ Rd .


JLet E ∈ B(Rd ) such that λ(E c ) = |µ(E)| = 0. Then also µ+ (E) = 0, so by step (3),
Dµ+ (x) = 0 for λ-a.a. x ∈ E, hence for λ-a.a. x ∈ Rd . Similarly Dµ− (x) = 0 for λ-a.a.
x ∈ Rd . Therefore, Dµ = Dµ+ − Dµ− = 0 λ-a.e.K
(5) If µ  λ, then Dµ = dµ/dλ and is finite λ-a.e.
JSince µ is finite on bounded sets, h := dµ/dλ is finite λ-a.e. For the desired equality it
suffices to show that D µ ≤ h λ-a.e.; the reverse inequality will follow by considering −µ.
It therefore must be shown that λ{h < t < D µ} = 0 for all t ∈ R. To this end define a
measure ρ on B(Rd ) by Z
ρ(A) = (h − t) dλ.
A∩{h≥t}
For any open ball B,
Z Z
µ(B) − tλ(B) = (h − t) dλ ≤ (h − t) dλ = ρ(B),
B B∩{h≥t}

hence
µ(B) ρ(B)
≤ + t.
λ(B) λ(B)
Since B was arbitrary, D µ ≤ D ρ + t. Set E := {h < t}. Then ρ(E) = 0, so by step (3)
applied to ρ, Dρ(x) = 0 for λ-a.a x ∈ E. Therefore, D µ(x) ≤ t for λ-a.a. x ∈ E, as
required.K
The desired equality now follows by applying step (4) to µs and step (5) to µa .
dµa
5.4.6 Corollary. Let µ and F be as in 5.4.4. Then F 0 = λ-a.e.

Differentiation 159

Exercises
5.38 Let µ be a Lebesgue-Stieltjes signed measure on B(Rd ). Show that the sets {x : D(µ; x, r) > c}
and {x : D(µ; x, r) < c} are open.

5.39 Let µ be a Lebesgue-Stieltjes signed measure on B(Rd ). Prove that if µ is differentiable at x,


then µ{x} = 0. In particular, if d = 1 and µ has distribution function F , then F is continuous
at x.

5.40 Prove that if µ is a nonnegative measure, then E (Dµ) dλ ≤ µ(E) for all E ∈ B(Rd ).
R

5.41 Define the metric density of E ∈ B(Rd ) by



λ E ∩ B(x, r)
dE (x) = lim  .
r→0 λ B(x, r)

Show that dE exists and = 1E λ-a.e.

5.42 Let f be locally integrable on Rd . Verify (a)–(c) to prove the Lebesgue differentiation theorem:
For λ-a.e. x, Z
1
lim  |f (y) − f (x)| dy = 0
r→0 λ B(x, r) B(x,r)

Points x for which this holds are called Lebesgue points.


Z
1
(a) lim  f (y) dy = f (x).
r→0 λ B(x, r) B(x,r)

(b) Let a ∈ R. Then there exists Na with λ(Na ) = 0 such that


Z
1
lim  |f (y) − a| dy = |f (x) − a|, x ∈ Nac .
r→0 λ B(x, r) B(x,r)

(c) Set N = a∈Q Na . Let ε > 0, x ∈ N c and choose a ∈ Q such that |f (x) − a| < ε. There
S
exists δ > 0 such that for all r < δ,
Z
1
 |f (y) − f (x)| dy < 2ε.
λ B(x, r) B(x,r)

5.43 [↑ 5.42] For each x ∈ Rd , let {E(x,


 r) : r > 0} be a collection in B(Rd ) such that E(x, r) ⊆ B(x, r)
and λ E(x, r) ≥ cλ B(x, r) for all r, where c > 0 is a constant that does not depend on r.
(Such a collection is said to shrink to x nicely. For example, the collection {(x, x + r) : r > 0}
shrinks to x nicely.) Let µ be a measure with µ  λ. Prove that for λ-a.a. x,

µ E(x, r)
Dµ(x) = lim  .
r→0 λ E(x, r)

5.5 Functions of Bounded Variation


Definition and Basic Properties
Let I ⊆ R be an interval and f : I → C. For an ordered subset P = {x0 < x1 < · · · < xn }
of I, define the P-variation of f on I by
n
X
VI,P (f ) = VP (f ) = |f (xj ) − f (xj−1 )|. (5.8)
j=1
160 Principles of Analysis

Note that if Q ⊇ P, then, by the triangle inequality, VI,Q (f ) ≥ VI,P (f ). The total variation
of f on I is the extended real number

VI (f ) := sup VI,P (f ), (5.9)


P

where the supremum is taken over all ordered subsets P of I. We say that f has bounded
variation on I if VI (f ) < ∞. We shall mainly be concerned with the cases I = R and
I = [a, b]. For the latter, we may assume in (5.9) that P is a partition of [a, b], as the
supremum does not change by adjoining the points a and b. We denote set of all functions
with bounded variation on I by BV (I).
By the mean value theorem, a real-valued function with a bounded derivative has bounded
variation on bounded intervals. In particular, sin x has bounded variation on any bounded
interval (but not on R: consider partition points (2k + 1)π/2).
The following proposition summarizes the elementary properties of BV (I). The proof is
left as an exercise for the reader (5.44).
5.5.1 Proposition. Let I be any interval.
(a) A bounded, monotone function f on R has bounded variation.

(b) If f ∈ BV (I), then f is bounded.

(c) f ∈ BV (I) iff Re(f ), Im(f ) ∈ BV (I).

(d) If f is real-valued, then f ∈ BV (I) iff f ± ∈ BV (I).

(e) If f, g ∈ BV (I) and c ∈ C then f + g, cf, , f , f g, |f | ∈ BV (I).

By the proposition, the difference of two bounded monotone functions on I has bounded
variation on I. The converse also holds:
5.5.2 Proposition. If f ∈ BV (I) is real-valued, then there exist nondecreasing functions g
and h on I such that f = g − h. In particular, f is a Borel function.
Proof. For definiteness, we take I = R. For x ∈ R, define

g(x) := V(−∞,x] (f ) and h(x) := g(x) − f (x),

so that f = g − h. Clearly, g is increasing. To see that h is increasing, let a < x < y, let Px
be an arbitrary partition of [a, x], and set Py := Px ∪ {y}. Then

VPx (f ) + f (y) − f (x) ≤ VPx (f ) + |f (y) − f (x)| = VPy (f ) ≤ g(y).

Taking the supremum over all partitions Px yields V[a,x] (f ) + f (y) − f (x) ≤ g(y). Since a
was arbitrary, g(x) + f (y) − f (x) ≤ g(y), that is, h(x) ≤ h(y).
Since monotone functions have at most countably many discontinuities (Ex. 1.84), we
have
5.5.3 Corollary. If f ∈ BV (R), then f has at most countably many discontinuities.
Differentiation 161

The Total Variation Function


Let f ∈ BV (R). The function x 7→ V(−∞,x] (f ) in the proof of 5.5.2 is called the total
variation function of f and is denoted by Tf :

Tf (x); = V(−∞,x] (f ), x ∈ R.

Clearly, Tf is increasing, hence has bounded variation on any bounded interval. The theorem
below makes a connection between the total variation function and the total variation
measure of a complex measure. For the proof we need the following lemmas.

5.5.4 Lemma. Let f ∈ BV (R). Then for x < y, Tf (y) − Tf (x) = V(x,y] (f ).
Proof. Set T := Tf . Note first that for the sets P = {x0 < x1 < · · · < xn = y} ⊆ (−∞, y]
implicit in the definition of T (y), we may assume that x0 ≤ x, otherwise simply adjoin a
suitable point to P, increasing the P-variation of f but not altering T (y). Choose k so that
xk ≤ x < xk+1 and set

Q = {x0 < x1 < · · · < xk } and R = {xk+1 < · · · < xn = y}.

Then VP (f ) ≤ VQ (f ) + VR (f ) ≤ T (x) + V(x,y] (f ). Taking the supremum over all P yields


T (y) ≤ T (x) + V(x,y] (f ). For the reverse inequality, for arbitrary Q and R as above we have
VQ + VR ≤ T (y). Taking the supremum over Q and R yields T (x) + V(x,y] (f ) ≤ T (y).

5.5.5 Lemma. Let f ∈ BV (R). If f is right continuous at x, then so is Tf .


Proof. Set T := Tf and let y > x. By right continuity at x, given ε > 0 we may choose
δ > 0 so that 0 < y − x < δ ⇒ |f (x) − f (y)| < ε. For such a fixed y, choose an ordered set
P = {x0 < x1 < · · · < xn = y} ⊆ (x, y] such that
n
X
|f (xj ) − f (xj−1 )| ≥ V(x,y] − ε = T (y) − T (x) − ε.
j=1

Since |f (x1 ) − f (x0 )| ≤ |f (x1 ) − f (x)| + |f (x0 ) − f (x)| < 2ε, we have
n
X n
X
|f (xj ) − f (xj−1 )| = |f (xj ) − f (xj−1 )| − |f (x1 ) − f (x0 )| ≥ T (y) − T (x) − 3ε. (†)
j=2 j=1

Next, choose Q = {t0 < t1 < · · · < tm = x1 } ⊆ (x, x1 ] such that


m
X
|f (tj ) − f (tj−1 )| ≥ V(x,x1 ] − ε = T (x1 ) − T (x) − ε. (‡)
j=1

Adding (†) and (‡), we have


m
X n
X
|f (tj ) − f (tj−1 )| + |f (xj ) − f (xj−1 )| ≥ T (y) + T (x1 ) − 2T (x) − 4ε.
j=1 j=2

Since the left side is ≤ V(x,y] = T (y) − T (x), we see that T (x1 ) − T (x) ≤ 4ε. Letting x1 ↓ x
yields T (x+) − T (x) ≤ 4ε. Therefore, T (x+) = T (x), as required.

5.5.6 Corollary. Let f ∈ BV (R) be right continuous. Then f = g − h, where g and h are
distribution functions.
162 Principles of Analysis

Proof. In the proof of 5.5.2, g is the function Tf .


5.5.7 Theorem. Let µ be a complex measure on B(R) and set f (x) := µ(−∞, x]. Then
f ∈ BV (R) and Tf is a distribution function for |µ|.
Proof. That f ∈ BV (R) follows from
n
X n
X
µ(xj , xj+1 ] ≤ |µ|(xj , xj+1 ] ≤ |µ|(R) < ∞.
j=1 j=1

Since f is right continuous, T := Tf is a distribution function by 5.5.5. By the measure


uniqueness theorem (1.6.8), it remains to show µT (a, b] = |µ|(a, b] for all a < b. For the
inequality µT (a, b] ≤ |µ|(a, b] we have
nX o
µT (a, b] = V(a,b] (f ) = sup |µ(xj , xj+1 ]| : a < x1 < · · · < xn = b
j
nX o
≤ sup |µ(Ej )| : E1 , . . . , Ek a measurable partition of (a, b]
j
= |µ|(a, b].

For the reverse inequality, let E1 , . . . , Ek be a measurable partition of (a, b]. Given ε > 0,
by regularity of |µ| (1.8.1) there exist compact sets Kj ⊆ Ej such that |µ|(Ej \ Kj ) < ε/k,
hence X X
|µ(Ej )| ≤ ε + |µ(Kj )|. (†)
j j

Since the sets Kj are disjoint, there exist disjoint open sets Uj ⊇ Kj . Each Uj is a countable
union of disjoint open intervals (ajn , bjn ), hence
X XX 
|µ(Kj )| ≤ |µ (ajn , bjn ] ∩ (a, b] |.
j j n

The partial sums of the double sum on the right are


P terms of V(a,b],P (f ) for suitable partitions
P and hence are ≤ µT (a, b]. Therefore, by (†), j |µ(Ej )| ≤ ε + µT (a, b]. Since E1 , . . . , Ek
and ε were arbitrary, |µ|(a, b] ≤ µT (a, b].

Differentiation of Functions of Bounded Variation


We may now prove the following fundamental property of functions of bounded variation.
5.5.8 Theorem. If f ∈ BV (R), then f 0 (x) exists for λ-a.a. x ∈ R. Moreover, if g(x) =
f (x+) (= limt→x+ f (t)), then g 0 = f 0 a.e.
Proof. By decomposing f as in 5.5.2, we may assume that f is real-valued and nondecreasing.
Then g has these properties, and since g is right continuous it is a distribution function and
so is differentiable λ-a.e. (5.4.6). Moreover, g(x) = f (x) except possibly at a sequence of
points xn , the points at which f is not continuous. Set h(x) := g(x) − f (x). The theorem
will be proved if we show that h0 = 0 a.e.
Since the intervals f (xn ), f (xn +) (|xn | < m) are pairwise disjoint and contained in the
interval f (−m), f (m) ,
X X
h(xn ) = [f (xn +) − f (xn )] < ∞. (†)
|xn |<m |xn |<m
Differentiation 163

Define X
ν := lim h(xn )δxn .
m→∞
|xn |<m

By (†), ν is a Lebesgue-Stieltjes measure on R. Moreover, for any a < x < y < b, the quantity
h(x) + h(y) is zero unless x or y is one of the discontinuity points of f , in which case its
value is at most h(xm ) + h(xn ) for some m and n. Therefore, h(x) + h(y) ≤ ν(a, b) for all
a < x < y < b. For r > 0, we then have

h(x + r) − h(x) h(x + r) + h(x) ν(x + 2r, x − 2r) ν B(x, 2r)
≤ ≤ =4  ,
r r r λ B(x, 2r)

hence 
h(x + r) − h(x) ν B(x, r)
.
lim
r→0 r ≤ 4 r→0
lim
λ B(x, r)
Since λ{x1 , x2 , . . .} = 0 = ν{x1 , x2 , . . .}c , ν ⊥ λ, hence the right side of the preceding
inequality is zero for λ-a.a x (5.4.5). Therefore, limr→0 r−1 |h(x + r) − h(x)| = 0 for λ-a.a. x,
completing the proof.
5.5.9 Corollary. Let (f P functions on R such
Pn ) be a sequence of nonnegative, nondecreasing
that the series f (x) := n fn (x) converges for all x. Then f 0 (x) = n fn0 (x) λ-a.e.
P
Proof. Let g and gn correspond to f and fn as in the theorem. Then g(x) = n gn (x), as
is easily verified, hence we may assume that f and fn are distribution functions on R. PLet
µn and µ be the corresponding Lebesgue-Stieltjes measures. By the hypothesis,
P µ = µn
on intervals (a, b] hence, by the uniqueness theorem for measures, µ = µn on B(R). Let
µn = µna + µnsPand µ = µa + µs be the Lebesgue decompositions with respect to λ. By
Ex. 5.37, µa = n µna so
Z X Z X
dµa dµna
dλ = µa (E) = µna (E) = dλ for all E ∈ B(R),
E dλ n E n dλ

hence
dµa X dµna
= λ-a.e.
dλ n

The assertion now follows from 5.4.6.

Exercises
5.44 Prove 5.5.1.

5.45 Let f, g ∈ BV (I) be real-valued. Prove that f ∨g, f ∧g ∈ BV (I). Show also that if |f (x)| ≥ c > 0
for all x, then 1/f ∈ BV (I).

5.46 Show that if E ⊆ R and E c are dense in R, then 1E 6∈ BV (I) for all intervals I.

5.47 Let f be real-valued. Prove:


Rb
(a) If f 0 exists and is Riemann integrable on [a, b], then V[a,b] (f ) = a
|f 0 (x)| dx.
(b) If f is right continuous at a and f 0 is locally Riemann integrable on (a, b], then the equation
Rb
in (a) holds, where the integral is improper. Thus f ∈ BV [a, b] iff a |f 0 | converges.

5.48 Let F ∈ BV (R) be right continuous and suppose that F (−∞) exists and is finite. Show that
F 0 ∈ L1 (λ). JMay assume F is a distribution function. Define a finite measure µ on B(R) so
that µ(−∞, x] = F (x) − F (−∞). K
164 Principles of Analysis

5.6 Absolutely Continuous Functions


Definition and Basic Properties
A complex-valued function f on an interval I is said to be absolutely continuous if
for each ε > 0 there exists a δ > 0 such that
n
X
|f (bj ) − f (aj )| < ε
j=1
Pn
for any collection of disjoint subintervals (a1 , b1 ), . . . , (an , bn ) of I with j=1 (bj − aj ) < δ.
We denote the collection of all absolutely continuous functions on I by AC(I).
By taking a single open interval in the definition, we see that an absolutely continuous
function is necessarily uniformly continuous. The converse is false: The function f of
Example 5.6.8 below is uniformly continuous on [0, 1] but is not absolutely continuous if
α < 1.
The straightforward proof of the following proposition is left to the reader.
5.6.1 Proposition. Let I be any interval.
(a) f, g ∈ AC(I) and c ∈ C ⇒ f + g, cf, f , |f | ∈ AC(I).

(b) f ∈ AC(I) iff Re(f ), Im(f ) ∈ AC(I).

(c) If f is real-valued, then f ∈ AC(I) iff f ± ∈ AC(I).


5.6.2 Proposition. If I is a bounded interval, then AC(I) ⊆ BV (I).
Proof. Let f ∈ AC(I), [a, b] ⊆ I, and let δ > 0 correspond to ε = 1 in the definition of
absolute continuity. Give a partition P of [a, b], let Q = {a = x0 < x1 < · · · < xn = b}
be a refinement such that xk − xk−1 < δ/2 for all k. Set k0 = 0 and let k1 be the largest
index > k0 such that xk1 − xk0 < δ. In general, let ki be the largest index > ki−1 such that
xki − xki−1 < δ. The process will terminate with an index km = n. Figure 5.2 illustrates the
construction. Note that for i = 1 . . . , m − 1, we have xki − xki−1 ≥ δ/2, otherwise we could

a x1 x2 x3 x4 x5 x6 ··· b
xk 0 xk 1 xk 2
δ
δ
FIGURE 5.2: Construction of the sequence xki .

choose an index k > ki for which xk − xki−1 < δ. Thus if ` denotes the length of I, then
m−1
X
` ≥ b − a ≥ xkm−1 − a = (xki − xki−1 ) ≥ (m − 1)δ/2. (†)
i=1
Pki −1
Since j=ki−1 (xj+1 − xj ) = xki − xki−1 < δ, we have by absolute continuity
i −1
kX
|f (xj+1 ) − f (xj )| < 1.
j=ki−1

Since m − 1 such sums comprise VQ (f ), VP (f ) ≤ VQ (f ) ≤ (m − 1) ≤ 2`/δ, the last inequality


by (†). Since P and [a, b] were arbitrary, VI (f ) ≤ 2`/δ.
Differentiation 165

Note that the inclusion in the proposition is always strict (see 5.6.7) and is clearly false
for unbounded intervals.
The next result complements 5.5.2.
5.6.3 Proposition. Let I be an arbitrary interval and let f ∈ AC(I) ∩ BV (I) be real-valued.
Then there exist monotone increasing functions g, h ∈ AC(I) ∩ BV (I) such that f = g − h.
Proof. As in the proof of 5.5.2, for definiteness we take I = R. It suffices to show that
the function g(x) := V(−∞,x] (f ) = Tf (x) in that proof is absolutely continuous, since
then h = g − f will also be absolutely continuous. Let ε > 0 and let δ correspond to
εPin the definition of absolute continuity of f . Let (a1 , b1 ), . . . , (an , bn ) be disjoint with
n
j=1 (bj − aj ) < δ. For each j, let Pj be a partition of [aj , bj ]. The open intervals formed
by the totality
Pn of these partitions are disjoint and have total length < δ, hence, by absolute
continuity, j=1 VPj (f ) < ε. Taking the supremum over P1 , . . . , Pn and using 5.5.4 yields
n
X n
X
[g(bj ) − g(aj )] = V(aj ,bj ] (f ) ≤ ε.
j=1 j=1

Here is the principle result regarding absolutely continuous functions.


5.6.4 Theorem. Let µ be a finite measure on R with distribution function F . Then F is
absolutely continuous iff µ  λ. In this case, F 0 = dµ/ dλ ∈ L1 (λ).
Proof. Assume µ  λ and let ε > 0. By 5.3.2, there exists a δ > 0 such Pthat λ(E) < δ ⇒
n
µ(E) < ε. Thus if (a1 , b1 ), . . . , (an , bn ) are disjoint with union E and j=1 (bj − aj ) < δ,
then
X n n
  X
F (bj ) − F (aj ) = µ(aj , bj ] = µ(E) < ε,
j=1 j=1

the last equality because µ{bj } = 0.


Conversely, assume that F is absolutely continuous and let λ(E) = 0. Given ε > 0, choose
δ > 0 as in the definition of absolutely continuity of F . By regularity, there exists an open set
U ⊇ E such that λ(U ) < δ. Now write U as a disjoint union of a sequence of open intervals
(an , bn ). By absolute continuity of F , for all n
n
X n
X
µ(aj , bj ) = [F (bj ) − F (aj )] < ε.
j=1 j=1

Thus µ(E) ≤ µ(U ) ≤ ε and so µ(E) = 0. The last assertion follows from 5.4.6.

Fundamental Theorems of Calculus


Theorem 5.6.4 implies a version of the fundamental theorem of calculus for Lebesgue
integrable functions. For comparison, we first state the classical version:
5.6.5 Theorem. Let f : [a, b] → C. The following are equivalent:
(a) f 0 exists and is continuous on [a, b].
(b) There exists a continuous function h : [a, b] → C such that
Z x
f (x) = f (a) + h(t) dx for all x ∈ [a, b].
a

If these hold, then f 0 = h.


166 Principles of Analysis

Note that condition (a) in the preceding theorem implies that f is absolutely continuous.
Weakening this condition by requiring f to be merely absolutely continuous yields the
following version of the fundamental theorem of calculus.
5.6.6 Theorem. Let f : [a, b] → C. The following are equivalent:
(a) f is absolutely continuous.

(b) There exists a Lebesgue integrable h : [a, b] → C such that


Z x
f (x) = f (a) + h(t) dt for all x ∈ [a, b].
a

If these hold, then f 0 exists and equals h λ-a.e.


Proof. Assume (a) holds. Using 5.6.3, we may suppose that f is a distribution function.
Extend f to R by defining f (x) = f (a) for x < a and f (x) = f (b) for x > b, obtaining an
absolutely continuous distribution function on R. Let µ be the corresponding finite measure.
By 5.6.4, µ  λ. Therefore, part (b) holds for the function h = dµ/ dλ = f 0 .
Now suppose that (b) holds. Then
n
X n Z
X bj Z [
|f (bj ) − f (aj )| ≤ |h(t)| dt = |h(t)| dt, A := (aj , bj ).
j=1 k=1 aj A j

Since |h|λ  λ, f is absolutely continuous.


5.6.7 Remark. The preceding theorem implies that if f is absolutely continuous and
f 0 = 0 λ-a.e., then f = constant. On the other hand, the Cantor function f ∈ BV [0, 1]
satisfies f 0 = 0 a.e. but is nonconstant. Therefore, AC[0, 1] $ BV [0, 1]. it follows that
AC(I) $ BV (I) for all bounded (nondegenerate) intervals I. ♦
5.6.8 Example. For α > 0, define a continuous function f on [0, 1] by
(
xα sin(1/x) if 0 < x ≤ 1,
f (x) :=
0 if x = 0.

We show that f ∈ AC[0, 1] if α > 1 and f 6∈ BV [0, 1] if α ≤ 1


Suppose α > 1. Since

|f 0 (x)| = |αxα−1 sin(1/x) − xα−2 cos(1/x)| ≤ αxα−1 + xα−2


R1
and 0
(xα−1 + xα−2 ) dx converges, f 0 is Lebesgue integrable. Moreover, for a > 0,
Z x
f (x) − f (a) = f 0 (t) dt.
a

Letting a → 0 yields (b) of 5.6.6. Therefore, f ∈ AC[0, 1].


Now let α ≤ 1. Set
2 1 2α
ak := , bk := , c := ,
(4k + 1)π 2kπ πα
and note that
c
bk+1 < ak < bk , f (bk ) = 0 and, f (ak ) = .
(4k + 1)α
Differentiation 167

For sufficiently small ε > 0, form the partition

Pε = {ε < ap < bp < ap−1 < · · · < ak < bk < · · · < bq+1 < aq < bq < 1}

of [ε, 1], where p and q are, respectively, the largest and smallest integers satisfying the
inequalities ε < ap < bq < 1, or equivalently

1 2 − πε
<q<p< .
2π 4πε
c
From f (ak ) − f (bk ) = we have
(4k + 1)α
p
X 1
V[0,1] (f ) ≥ V[ε,1] (f ) ≥ VP (f ) ≥ c .
(4k + 1)α
k=q

By choosing
P∞ ε arbitrarily small, the upper limit p of the sum may be made arbitrarily large.
Since k=1 (4k + 1)−α diverges, V[0,1] (f ) = ∞. ♦

Exercises
5.49 Show that f, g ∈ AC[a, b] ⇒ f g ∈ AC[a, b].

5.50 Let p, q > 0. Show that the function f (x) := xp sin(x−q ) (x > 0), f (0) = 0, is absolutely
continuous on [0, 1] iff p > q.

5.51 The Cantor function f is an example of a continuous nondecreasing function on [0, 1] with
f 0 = 0 a.e. Extend the Cantor function to a nondecreasing P function on R by defining f(x) = 0,
x ≤ 0, and f (x) = 1, x ≥ 1. Define g on [0, 1] by g(x) = ∞ n=1 2
−n
f (x − an )/(bn − an ) , where
the [an , bn ] are the closed intervals in [0, 1] with rational endpoints. Show that g is continuous,
strictly increasing, and g 0 = 0 a.e.

5.52 Let f ∈ AC[c, d] and let g : [a, b] → [c, d] be strictly increasing with g([a, b]) = [c, d]. Show that
g ∈ AC[a, b] ⇒ f ◦ g ∈ AC[a, b]. Give an example of a strictly increasing function g for which
f ◦ g ∈ BV [a, b] \ AC[a, b] for nontrivial f ∈ AC[c, d].

5.53 Let ϕ : [a, b] → [c, d] be absolutely continuous and ϕ0 > 0 a.e.


(a) Show that ϕ is 1-1 hence ϕ−1 exists and is continuous.
(b) Let f : [c, d] → R be Lebesgue integrable. Prove the change of variables formula
Z Z
f dλ = (f ◦ ϕ)ϕ0 dλ.
[c,d] [a,b]

5.54 Let F be monotone nondecreasing on [a, b]. Show that


Z
F 0 dλ ≤ F (x+) − F (a+)
[a,x]
Chapter 6
Fourier Analysis on Rd

The subject of this chapter plays an important role in many areas of science, technology, and
mathematics, including quantum physics, image processing, probability theory, statistics,
and differential equations. We begin with the notion of convolution, which is central to
Fourier analysis.

6.1 Convolution of Functions


Definition and Basic Properties

Let f, g ∈ L1 (Rd , λd ). The convolution of f and g is the function


Z
(f ∗ g)(x) = f (x − y)g(y) dy, x ∈ Rd .

The basic properties of convolution are summarized in the following proposition. Note that
parts (a)–(e) of the proposition collectively assert that L1 (Rd ) is a commutative Banach
algebra under convolution.
6.1.1 Proposition. Let f, g, h ∈ L1 (Rd ), c ∈ C, φ ∈ Cc∞ (Rd ), and α a multi-index. Then
convolution f ∗ g is well-defined, f ∗ g ∈ L1 (Rd ), and the following hold.

(a) |f ∗ gk1 ≤ kf k1 kgk1 . (b) f ∗ g = g ∗ f.


(c) (f ∗ g) ∗ h = f ∗ (g ∗ h). (d) f ∗ (cg) = (cf ) ∗ g = c(f ∗ g).
(e) f ∗ (g + h) = f ∗ g + f ∗ h. (f) ∂ α (f ∗ φ) = f ∗ ∂ α φ.

Proof. To see that convolution is well-defined, note first that since the function (x, y) → x−y
is Borel measurable, the integrand is measurable in (x, y). Thus if f, g ≥ 0, the integral
exists for all x. The inequality kf ∗ gk1 ≤ kf k1 kgk1 , proved next, shows that f ∗ g is finite
a.e. and in L1 . Considering real and imaginary parts and then positive and negative parts,
we see that f ∗ g ∈ L1 for every f, g ∈ L1 .
(a) By Fubini’s theorem and translation invariance,
Z ZZ ZZ
|(f ∗ g)(x)| dx ≤ |f (x − y)g(y)| dx dy = |f (x)g(y)| dx dy = kf k1 kgk1 .

(b) By translation and reflection invariance,


Z Z Z
(f ∗ g)(x) = f (x − y)g(y) dy = f (−y)g(x + y) dy = f (y)g(x − y) dy = (g ∗ f )(x).

169
170 Principles of Analysis

(c) By Fubini’s theorem,


Z ZZ
[f ∗ (g ∗ h)](x) = f (x − y)(g ∗ h)(y) dy = f (x − y)g(y − z)h(z) dz dy
ZZ ZZ

= f (x − y)g(y − z)h(z) dy dz = f x − z − y) g(y)h(z) dy dz
Z
= f ∗ g(x − z)h(z) dz = [(f ∗ g) ∗ h](x).

Parts (d) and (e) are left as exercises.


(f) By 3.4.5 and the chain rule,
Z Z
 
∂ α (f ∗ φ)(x) = ∂xα φ(x − y) f (y) dy = (∂ α φ)(x − y)f (y) dy.

Approximate Identities

The Banach algebra L1 (Rd ) does not possess an identity, that is, there is no function
e such that f ∗ e = e ∗ f for all f ∈ L1 (Ex. 6.2). However, L1 (Rd ) has an approximate
identity, as described in the following.
R
6.1.2 Theorem. Let φ ∈ L1 (Rd ) with φ dλd = 1. For n ∈ N and x ∈ Rd define φn (x) :=
Lp
nd φ(nx). If 1 ≤ p < ∞, then f ∗ φn → f for all f ∈ Lp . The same conclusion holds for
p = ∞ if f is uniformly continuous and bounded.
Proof. Let Tz denote translation by −z, that is, Tz f (x) = f (x −z). By the dilation property
of λd ,
Z Z
f ∗ φn (x) − f (x) = nd f (x − y)φ(ny) dy − f (x)φ(y) dy
Z
 
= Ty/n f (x) − f (x) φ(y) dy. (†)

Therefore,
Z  Z p 1/p

kf ∗ φn − f kp ≤ Ty/n f (x) − f (x) |φ(y)| dy dx
Z Z 1/p

≤ Ty/n f (x) − f (x) p dx |φ(y)| dy
Z
= kTy/n f − f kp |φ(y)| dy,

the second inequality by 4.1.5. Since kTy/n f − f kp → 0 (4.2.3) and kTy/n f − f kp |φ(y)| ≤
2kf kp |φ(y)|, the dominated convergence theorem implies that kf ∗ φn − f kp → 0. This proves
the first part of the theorem. The second part follows from (†), since by uniform continuity
kTy/n f − f k∞ → 0.
6.1.3 Remark. The function φ in the statement of the theorem may be taken to be C ∞
with support contained in a given compact interval. To see this, let h : Rd → [0, +∞) be
a C ∞ function such that
R −1 h > 0 on (−a, a), and h = 0 on (−a, a)c , where Ra = (1, . . . , 1)
(0.14.4). Then φ := ( h) h is C ∞ with support contained in [−a, a] and φ = 1. As a
consequence, given ε = (ε, . . . , ε), the support of φn is contained in [−ε, ε] for all large n.
For future reference we note that because the interval is symmetric, h may be taken to be
even. (Take a = −b in 0.14.2.) ♦
Fourier Analysis on Rd 171

The following application of 6.1.2 asserts that Lp functions may be approximated by


smooth functions with compact support.
6.1.4 Corollary. Let 1 ≤ p < ∞ and f ∈ Lp (Rd ). Then for each ε > 0 there exists a
function ϕ ∈ C ∞ that vanishes outside a compact interval such that kf − ϕkp < ε. If f is
continuous with compact support, then the same assertion holds for the case p = ∞.
Proof. By 4.2.2 we may assume that f is continuous and vanishes outside an interval [a, b].
Let φn be as in 6.1.3 with support contained in [−ε, ε]. Then
supp(f ∗ φn ) ⊆ supp(f ) + supp(φn ) ⊆ [a − ε, b + ε].
Moreover, by 6.1.1(f), f ∗ φn ∈ C ∞ . Since kf ∗ φn − f kp → 0, we need only take ϕ = f ∗ φn
for sufficiently large n to complete the argument.

Exercises
6.1 Let f, g ∈ L1 (Rd ). Show that
R R R
(f ∗ g)(x) dx = f (x) dx · g(x) dx.
6.2 Show that there is no function e ∈ L (R ) such that f ∗ e = f for all f ∈ L1 .
1 d

6.3 Let a > 0 and f (x) = 1[−a,a] . Show that f ∗ f (x) = (2a − |x|)1[−2a,2a] .
6.4 Let Ta denote translation by a. Show that Ta (f ∗ g) = (Ta f ) ∗ g = f ∗ (Ta g).
6.5 Let 1 ≤ p < ∞, q conjugate to p, f ∈ Lp and g ∈ Lq . Prove:
(a) kf ∗ gk∞ ≤ kf kp kgkq . (b) f ∗ g is uniformly continuous. (c) lim|x|→∞ f ∗ g(x) = 0 (p > 1).
6.6 Show that if f, g ∈ L1 (Rd ), then supp f ∗ g is contained in the closure K of supp f + supp g. In
particular, the members of Cc (Rd ) ∗ Cc (Rd ) have compact support.
6.7 Let f ∈ L1 (Rd ) and g ∈ Lp (Rd ) (1 ≤ p ≤ ∞). Prove that f ∗ g(x) exists for a.a x and that
kf ∗ gkp ≤ kf k1 kgkp .
−1 −1 −1 p d q d
R p, q, r ∈ [1, ∞] such that p + q = 1 + r , and let f ∈ L (R ), g ∈ L (R ).
6.8 [↑ 6.7, 4.6] Let
Prove that |f (x − y)g(y)| dy < ∞ for a.a. x and that kf ∗ gkr ≤ kf kp kgkq . JEliminate the
special cases (1) p = q/(q − 1), r = ∞, (2) q = 1, r = p, and (3) p = 1, r = q. Then let p, q, r
be finite and write |f (x − y)g(y)| = |f (x − y)|1−p/r |g(y)|1−q/r |f (x − y)|p/r |g(y)|q/r . K

6.9 Let T : Rd → Rd be linear and nonsingular. Prove that (f ∗ g) ◦ T = | det T |(f ◦ T ) ∗ (g ◦ T ). In


particular, if T is orthogonal, then (f ∗ g) ◦ T = (f ◦ T ) ∗ (g ◦ T ).

6.2 The Fourier Transform


Definition and Basic Properties

The Fourier transform F(f ) = fb of a function f ∈ L1 (Rd ) is defined by


Z
F(f )(ξ) = fb(ξ) := e−2πi ξ·x f (x) dx, ξ ∈ Rd . 1 (6.1)

It is clear that F is linear and


kF(f )k∞ = kfbk∞ ≤ kf k1 . (6.2)
1 Some authors omit the factor 2π in the exponent of (6.1). Its presence, however, simplifies the inversion

formula.
172 Principles of Analysis

Additional properties of the transform are given in the next proposition. The following
notation will be needed:
αd
xα := xα
1 · · · xd , where x = (x1 , . . . , xd ) and α = (α1 , . . . , αd ) is a multi-index.
1

6.2.1 Proposition. Let f, g ∈ L1 (Rd ), φ ∈ Cc∞ (Rd ), T : Rd → Rd linear and nonsingular


with adjoint T ∗ , Ta f (x) := f (a + x), and Dr f (x) := f (rx). Then

(a) f[
∗ g = fbgb. (b) ∂d b
α φ(ξ) = (2πi ξ)α φ(ξ).

(c) f[
◦ T = | det T |−1 fb ◦ T ∗−1 . (d) Td
a f (ξ) = e
2πi ξ·a b
f (ξ).
(e) Ta fb = b
h, h(x) := e−2πi a·x f (x). d
(f) Drf = r
−d
D1/r fb.

Proof. (a) By Fubini’s theorem and translation invariance,


ZZ ZZ
f[∗ g(ξ) = e−2πi ξ·x f (x − y)g(y) dy dx = e−2πi ξ·(x+y) f (x)g(y) dx dy = fb(ξ)b
g (ξ).

(b) An integration by parts yields

d Z Z Z
∂φ b
(ξ) = · · · e−2π i ξ·x φx1 (x) dx1 · · · dxd = 2π i ξ1 e−2π i ξ·x φ(x) dx = 2π i ξ1 φ(ξ).
∂x1
(The constant term is absent because φ has compact support.) The analogous result holds
for the remaining variables. The desired formula now follows by induction.
For (c), we apply the change of variable theorem:
Z Z
f[
∗−1
◦ T (ξ) = e−2π ix·ξ f (T x) dx = e−2π i T x·T ξ f (T x) dx = | det T |−1 fb(T ∗−1 ξ).

Parts (d) – (f) are left as exercises (Ex. 6.11).


6.2.2 Theorem (Riemann-Lebesgue Lemma). The Fourier transform F is a continuous
linear mapping from the Banach space L1 (Rd ) into the Banach space C0 (Rd ).
Proof. We have already noted that F is linear. Let f ∈ L1 . Using the dominated convergence
theorem one easily establishes that fb is continuous. Since kfbk∞ ≤ kf k1 , F is a continuous
linear mapping from L1 (Rd ) into Cb (Rd ). It remains to show that fb ∈ C0 (Rd ).
Suppose first that f ∈ Cc∞ (Rd ). Then ∂f\ /∂xj is bounded, hence, by 6.2.1(b), there exists
a constant C such that |ξj | |fb(ξ)| ≤ C for all j and ξ. This implies that fb ∈ C0 (Rd ).
In the general case, given ε > 0 choose g ∈ Cc∞ (Rd ) such that kf − gk1 < ε (6.1.4). By
the preceding paragraph, gb ∈ C0 (Rd ). Since kfb − gbk∞ ≤ kf − gk1 , we have kfb − gbk∞ < ε.
Therefore, fb ∈ C0 (Rd ).

The Fourier Inversion Theorem b

The inverse Fourier transform f of a function f ∈ L1 (Rd ) is defined by


b Z
f (ξ) := e2πi ξ·x f (x) dx, ξ ∈ Rd . (6.3)

The next theorem describes one of the most important properties of the Fourier transform,
one that is largely responsible for the utility of the transform. For the proof we need the
following lemma.
Fourier Analysis on Rd 173

6.2.3 Lemma. Let a > 0 and b ∈ R. Then


Z  π d/2
exp (ibξ · x − a|ξ|2 ) dξ = exp (−b2 |x|2 /4a).
a

Proof. Let F (x) denote the left side of the equation. Consider first the case d = 1. Differen-
tiating and then integrating by parts, we have
Z
F 0 (x) = ib ξ exp (ibξx − aξ 2 ) dξ
Z
ib
2 ∞ b2
= − exp (ibξx − aξ ) −∞ − x exp (ibξx − aξ 2 ) dξ
2a 2a
−b2
= xF (x).
2a
It follows that the derivative of F (x) exp (b2 x2 /4a) is zero and so
Z ∞
r
2 π
F (x) exp (b2 x2 /4a) = F (0) = e−aξ dξ = ,
−∞ a

the last equality from (3.16).


In the general case, apply Fubini’s theorem and the case d = 1:
Z Y
d d Z
Y
F (x) = exp (ibξk xk − ax2k ) dξ = exp (ibξk xk − ax2k ) dξk
k=1 k=1
d r  π d/2
Y π
= exp (−b2 x2k /4a) = exp (−b2 |x|2 /4a).
a a
k=1

Here is the inversion theorem: b


6.2.4 Theorem. If f and fb are both in L1 (Rd ), then f = fb, that is,
Z
f (x) = e2π i ξ·x fb(ξ) dξ, for a.a. x ∈ Rd .

Proof. We divide the proof into several steps.


R
(1) Set φ(x) := exp (−π|x|2 ). Then φ = 1.
JFor d = 1 this follows from (3.16). The general case follows from Fubini’s theorem.K

(2) For the function φ in (1) define φn (x) = nd φ(nx) as in 6.1.2. For n ∈ N and x ∈ Rd ,
define ψn,x (ξ) = exp (2πiξ · x − πn−2 |ξ|2 ). Then ψbn,x (y) = φn (x − y).
JTake a = π/n2 and b = 2π in 6.2.3 to obtain
Z
−2 2 2 2
ψbn,x (y) = e2πi ξ·(x−y)−πn |ξ| dξ = nd e−πn |x−y| = φn (x − y).K

R R
(3) For g, h ∈ L1 (λd ),gbh = h gb.
RR RR
JBy Fubini’s theorem, g(x)h(y)e−2πi x·y dx dy = g(x)h(y)e−2πix·y dy dx.K
174 Principles of Analysis
Z
(4) ψn,x (ξ)fb(ξ) dξ = f ∗ φn (x).
R R
JBy (2) and (3), the left side of (4) is ψbn,x (y)f (y) dy = φn (x − y)f (y) dy, which
is the right side.K
b
To complete the proof of the theorem, let n → ∞ in (4). Since ψn,x (ξ) → exp (2πiξ · x)
and fb ∈ L , the left side tends to fb(x) by the dominated convergence theorem. By 6.1.2 the
1

right side tends in L1 to f , hence a subsequence tends to f (x) a.e. (4.3.3). Thus the two
functions are equal a.e.

Exercises
6.10 The indicator function h = 1(0,∞) is called the Heaviside function. Let f ∈ L1 (R) be differentiable
with f 0 ∈ L1 (R). Find (f 0 h) ∗ h and h ∗ h ∗ · · · ∗ h (n factors).

6.11 Verify (d) – (f) of 6.2.1.

6.12 Let a > 0 and f (x) = 1[−a,a] . Show that fb(ξ) = (πξ)−1 sin 2πaξ.

6.13 Let f ∈ L1 (λd ). Prove that f even (odd) ⇒ fb even (odd).

6.14 Let f (x) = πe−2πb|x| , b > 0. Show that fb(ξ) = b/(b2 + ξ 2 ).


b
6.15 Define fe(x) = f (−x). Verify that fe = fb and fb = fe(x).
b e

6.16 Let f ∈ L1 . Show that for an orthogonal linear transformation T , f[


◦ T = fb. Use this to show
that if f is a radial function (§3.6), then so is fb.
b
1 d
6.17 Let f ∈ L (R ). Show that f = fb.

6.18 [↑ 6.14] Let g(x) = (1 + x2 )−1 . Use the inversion formula to show that g ∗ g(x) = π/(4 + x2 ).

6.3 Rapidly Decreasing Functions


These functions are particularly well behaved with respect to the Fourier inversion
operation, a fact that will allow a relatively simple proof of the Plancherel theorem.

Definition and Basic Properties

For a function φ ∈ C ∞ (Rd ), define seminorms


n
pα,β (φ) = sup |xβ ∂ α φ(x)| and qα,n (φ) = sup 1 + |x|2 |∂ α φ(x)|, (6.4)
x∈Rd x∈Rd

where n ∈ Z+ and α = (α1 , . . . , αd ) and β = (β1 , . . . , βd ) are multi-indices. The following


proposition makes a connection between these seminorms.
6.3.1 Proposition. (a) For each n ∈ Z+ , there exists a finite set S of multi-indices and a
constant A > 0, each depending only on n, such that
X
qα,n (φ) ≤ A pα,β (φ) ∀ multi-index α and ∀ φ ∈ C ∞ (Rd ).
β∈S
Fourier Analysis on Rd 175

(b) For each multi-index β, there exists a constant B > 0 and m ∈ N, each depending only
on β, such that

pα,β (φ) ≤ B qα,m (φ) ∀ multi-index α and ∀ φ ∈ C ∞ (Rd ).


Pd
Proof. The function y → k=1 |yk |2n has a positive minimum s on the unit sphere |y| = 1.
Pd
Taking y = x/|x| we see that s−1 k=1 |xk |2n ≥ |x|2n for all x. For a suitable constant A
depending only on n we then have
 d
X   d
X  X

2 n n 2n
 n −1 2n 2n
1 + |x| ≤2 1 + |x| ≤2 1+s |xk | ≤A 1+ |xk | =A |xβ |,
k=1 k=1 β∈S

where S denotes the set of d + 1 multi-indices (0, 0 . . . , 0), (0, . . . , 0, 2n, 0, . . . , 0). Multiplying
by |∂ α φ(x)| and taking suprema yields the inequality in (a).
Pd
(b) For the multi-index β, set m := j=1 βj , tj := βj /m, and t = (t1 , . . . , td ). By Ex. 4.28
and the Cauchy-Schwarz inequality,
d
Y Y
d m  X
d m  X
d m/2  X
d m/2 X
d m/2
|xj |βj = |xj |tj ≤ tj |xj | ≤ t2j |xj |2 ≤ dm/2 |xj |2 .
j=1 j=1 j=1 j=1 j=1 j=1

Therefore, |xβ | ≤ dm/2 (1 + |x|2 )m , proving (b).


As a consequence we have
6.3.2 Corollary. The following are equivalent for a function φ ∈ C ∞ (Rd ).
(a) pα,β (φ) < ∞ for all multi-indices α and β.

(b) qα,m (φ) < ∞ for all multi-indices α and all m ∈ Z+ .

A function φ that satisfies the equivalent conditions (a) and (b) of the corollary is called
a rapidly decreasing or Schwartz function. For example, xα exp (−|x|2 ) is rapidly
decreasing. The collection of all rapidly decreasing functions is called the Schwartz space
on Rd and is denoted by S = S(Rd ). Clearly, the following inclusions hold:

Cc∞ (Rd ) ⊆ S(Rd ) ⊆ C0 (Rd ).

Moreover, from the sum and product rules for ∂ α it follows that S(Rd ) is an algebra and is
closed under the operations ∂ α and multiplication by xα .
6.3.3 Proposition. Let 1 ≤ p < ∞. Then in the Lp norm, Cc∞ (Rd ) is dense in S(Rd ) and
S(Rd ) is dense in Lp (Rd ).
Proof. Let φ ∈ S(Rd ). Choose n > p/d and C > 0 so that (1 + |x|)n |φ(x)| ≤ C for all x.
Then |φ(x)| ≤ C(1 + |x|)−n , so φ ∈ Lp by 3.6.3. Thus S(Rd ) ⊆ Lp . Since Cc∞ (Rd ) ⊆ S(Rd )
and Cc∞ (Rd ) is dense in Lp (Rd ) (6.1.4), the assertions follow.
The following result will be needed in the proof of the Plancherel theorem below.
6.3.4 Theorem. Sb = S.
176 Principles of Analysis

Proof. Let φ ∈ S. Let α and β be any multi-indices and set c := −2πi. Differentiating under
the integral sign, we have
Z Z
β αb β α c ξ·x

(cξ) ∂ φ(ξ) = (cξ) ∂ξ e φ(x) dx = (cξ)β ec ξ·x (cx)α φ(x) dx
Z

= ∂xβ ec ξ·x (cx)α φ(x) dx.

Performing a sequence of integration by parts on the last integral, we obtain


Z
β αb b
(cξ) ∂ φ(ξ) = ec ξ·x ψ(x) dx = ψ(ξ)

b
for some function ψ in S. In particular, ξ β ∂ α φ(ξ) is bounded, hence φb ∈ S. Therefore,
Sb ⊆ S.
b
For the reverse inclusion, set φ1 (x) := φ(−x). By the first paragraph, φ1 ∈ S, and by the
inversion theorem
Z Z
c1 (x) = e2π i ξ·x φ(−ξ)
φ b b dξ = φ(x).
dξ = e−2πi ξ·x φ(ξ)

b
Therefore, φ ∈ S.

The Plancherel Theorem


The following classical result is an easy consequence of 6.3.4.
6.3.5 Theorem (Plancherel). There exists an linear mapping T from L2 (Rd ) onto L2 (Rd )
such that T f = fb on S(Rd ) and kT f k2 = kf k2 for all f ∈ L2 (Rd ).
Proof. Let φ ∈ S. By the inversion theorem and Fubini’s theorem,
Z ZZ

kφk22 = φ(x)φ(x) dx = b
exp 2πix · ξ φ(ξ)φ(x) dξ dx
ZZ Z

= b dx dξ = φ(ξ)
exp − 2πix · ξ φ(x) φ(ξ) b φ(ξ)
b dξ

b 2.
= kφk2

Now define a linear mapping T : S → S = Sb by T φ = φ. b By what has just been proved,


kT φk2 = kφk2 . Given f ∈ L2 , choose a sequence (φn ) in S such that kf − φn k2 → 0. Then
kT φn − T φm k2 = kφn − φm k2 → 0 hence (T φn ) is a Cauchy sequence in L2 and therefore
converges to a unique L2 function T f . It follows that T is linear and kT f k2 = kf k2 .
b
2 d b
Notation. For functions f in L (R ) one frequently writes b −1
b f for T f and f for T f , even
b
though f may not lie in L1 (Rd ). We then have fb = f = f in L2 .
R R b R R
6.3.6 Corollary. For all f, g ∈ L2 (Rd ), f g = fb g and f g = fb gb.
R R b R b
Proof. By step (3) of proof the inversion theorem, f g = f b g = fbg (f, g ∈ S). Since S
2
is dense in L , the first equality in theRconclusion follows from the Plancherel theorem and
the L2 continuity of the map (f, g) b→ f g. (Ex. 4.3). The second equality follows from the
first by replacing g by g and using g = gb (Ex. 6.17).
Fourier Analysis on Rd 177

Exercises
6.19 Prove that a C0 -function f on Rd is uniformly continuous. Conclude that a Schwartz function
is uniformly continuous.
6.20 [↑ 6.12, 3.43] Use the Plancherel theorem to show that
Z ∞ Z ∞ 2
sin x sin x π
dx = dx = .
0 x 0 x 2

6.21 Let φ ∈ S and α a multi-index. Show that ∂ α φ b where ψ(x) := (−2πix)α φ(x).
b = ψ,

6.22 Let f, g ∈ L2 (λd ). Show that f ∗ g(x) = fb(ξ)b


g (ξ)e2πi ξ·x dξ. JFix x and define h(y) = g(x − y).
R
R R
Then (f ∗ g)(x) = f (y)g(x − y) dx = fb(ξ)b h(ξ) dξ.K
6.23 Let f, g ∈ S. Verify (a)–(e) below to conclude that f ∗ g ∈ S, that is, qα,n (f ∗ g) < ∞ ∀ n, α.
(a) ∂ α (f ∗ g) = (∂ α f ) ∗ g = f ∗ (∂ α g). This reduces the argument to the case α = 0.
(b) For each n, there exists a constant Cn such that for all x,
Z
|(f ∗ g)(x)| ≤ Cn (1 + |x − y|2 )−n (1 + |y|2 )−n dy.

(c) Set A := {y : 2|y − x| ≥ |x|}. Then


Z Z
(1 + |x − y|2 )−n (1 + |y|2 )−n dy ≤ (1 + 41 |x|2 )−n (1 + |y|2 )−n dy.
A A

(d) Set B := {y : 2|y − x| ≤ |x|}. Then


Z Z
(1 + |x − y|2 )−2n (1 + |y|2 )−2n dy ≤ (1 + 41 |x|2 )−n (1 + |y|2 )−n dy,
B B

(e) For sufficiently large n, there exists a constant Dn depending on d such that
|(f ∗ g)(x)| ≤ Dn (1 + |x|2 )−n for all x.

6.24 Let φn , φ ∈ S and ψn := φn − φ. Define convergence φn → φ in S to mean that pα,β (ψn ) → 0


for all multi-indices α and β, or equivalently, qα,m (ψn ) → 0 for all multi-indices α and all
m ∈ Z+ . Let φn → φ in S. Show that φ cn → φb in S by verifying the following:
(a) ∂ α φn → ∂ α φ in Lp for all 1 ≤ p ≤ ∞ and for all multi-indices α.
(b) For any ψ ∈ S and multi-indices α, β,
Z
(−2πi ξ)β ∂ α ψ(ξ)
b = (−2πi x)α ψ(x)∂xβ e−2πi x·ξ dx
Z
= (−1)β e−2πi x·ξ ∂xβ (−2πi x)α ψ(x) dx
 

(c) For any ψ ∈ S and multi-indices α, β, there exists a constant C and a finite set F of
multi-indices such that
X 0 0
β αb β α
ξ ∂ ψ(ξ) ≤ C x ∂ ψ(x) .

∞ 1
α0 ,β 0 ∈F

6.25 (Heisenberg uncertainty principle). The principle states that a nonzero function and its Fourier
transform cannot both be sharply localized. The precise analytical statement takes the form
kφk42
Z Z
|x|2 |φ(x)|2 dx · |ξ|2 |φ(ξ)|
b 2
dξ ≥
16π 2
Establish this for φ ∈ S(R) by verifying (a) and (b) and then using φb0 (ξ) = (2πi ξ)φ(ξ).
b
2
R R
(a) |φ(x)| dx = −2Re xφ(x)φ0 (x) dx.
(b) kφk42 ≤ 4 x2 |φ(x)|2 dx |φ0 (x)|2 dx = 4 x2 |φ(x)|2 dx |φb0 (ξ)|2 dξ.
R R R R
178 Principles of Analysis

6.4 Fourier Analysis of Measures on Rd


The notions of convolution and Fourier transform of functions have natural extensions
to measures. These have important applications in probability theory, for example, in the
proof of the central limit theorem (18.4.19).

Convolution of Measures
The convolution of complex measures µ and ν on B(Rd ) is the complex measure
µ ∗ ν defined by
Z Z Z
(µ∗ν)(E) = 1E (x+y) d(µ⊗ν)(x, y) = 1E (x+y) dµ(x) dν(y), E ∈ B(Rd ). (6.5)

Note that if A : Rd ×Rd → Rd is the addition operator A(x, y) := x+y, then µ∗ν = A(µ⊗ν),
the image measure of µ ⊗ ν under A. Thus, by 3.2.15 and Fubini’s theorem, for all suitable h
Z Z ZZ
h(z) d(µ ∗ ν)(z) = h(x + y) d(µ ⊗ ν)(x, y) = h(x + y) dµ(x) dν(y). (6.6)

A related notion is the convolution f ∗ ν of f ∈ L1 (ν) and ν defined by


Z
(f ∗ ν)(x) = f (x − y) dν(y), x ∈ Rd .

The following proposition gives the basic properties of measure convolution.


6.4.1 Proposition. Let µ, ν, and η be complex measures on B(Rd ), f , g ∈ L1 (ν), and
c ∈ C. Then

(a) µ ∗ ν = ν ∗ µ. (b) (µ ∗ ν) ∗ η = µ ∗ (ν ∗ η).


(c) c(µ ∗ ν) = (cµ) ∗ ν) = µ ∗ (cν). (d) µ ∗ (ν + η) = µ ∗ ν + µ ∗ η.
(e) |µ ∗ ν| ≤ |µ| ∗ |ν|. (f) kµ ∗ νk ≤ kµk kνk .
(g) c(f ∗ ν) = (cf ) ∗ ν = f ∗ (cν). (h) (f + g) ∗ ν = f ∗ ν + g ∗ ν.
(i) f ∗ (µ + ν) = f ∗ µ + f ∗ ν. (j) kf ∗ νk1 ≤ kf k1 kνk .

Proof. Parts (a)–(d) and (g)–(i) are exercises (6.26). For (e) we use (6.6) and 5.18:
Z Z

|µ ∗ ν|(E) = sup f d(µ ∗ ν) ≤ sup 1E (x + y)|f (x + y)| d|µ|(x) d|ν|(y)
|f |≤1 E |f |≤1
Z
≤ 1E (x + y) d|µ|(x) d|ν|(y) = (|µ| ∗ |ν|)(E).

d d
RR
For (f) we have |µ ∗ ν|(R
RR ) ≤ |µ| ∗ |ν|(R ) = RR 1Rd (x + y) d|µ| d|ν| = |µ|(Rd )|ν|(Rd ).
For (j) note first that
R |f (x − y)| d|ν|(y) dx ≤ |f (x)| dx d |ν|(y) = kf k1 |ν|(Rd ) < ∞,
hence the function f (x − y) dν(y) is defined for a.a. x and is integrable. Moreover, this
calculation together with (5.4) shows that kf ∗ νk1 ≤ kf k1 kνk .
From 6.4.1 and 5.2.5 we have
6.4.2 Corollary. The space M (Rd ) of complex measures on B(Rd ) is a commutative Banach
algebra under convolution and the total variation norm.
Fourier Analysis on Rd 179

The Fourier-Stieltjes Transform


b of a complex measure µ is defined by
The Fourier-Stieltjes transform µ
Z
b(ξ) = e−2πi ξ·x dµ(x), ξ ∈ Rd .
µ

b = fb, hence the Fourier-Stieltjes


Note that for the measure dµ = f dλ (f ∈ L1 (λd )) we have µ
transform may be seen as a generalization of the Fourier transform.
The Fourier-Stieltjes transform enjoys many of the properties of the Fourier transform, a
notable exception being the Riemann-Lebesgue lemma (consider δ0 ). The following proposi-
tion summarizes the properties of the transform.
6.4.3 Proposition. Let µ and ν be complex Borel measures on Rd , T : Rd → Rd linear,
a, b ∈ C, and α ∈ Rd . Then

b is continuous.
(a) µ (b) kb
µk∞ ≤ kµk .
\
(c) aµ b + b νb.
+ bν = a µ [
(d) µ ∗ν =µ
b νb.
[
(e) T b◦T .
(µ) = µ ∗
b(α + ξ) = νb(ξ), where dν(x) := e−2πi(α·x) dµ(x).
(f) µ

Proof. Part (a) follows from the dominated convergence theorem. Parts (b) and (c) are clear.
For (d) we have
Z ZZ
∗ ν(ξ) = e−2πi x·ξ d(µ ∗ ν)(x) =
µ[ e−2πi (x+y)·ξ dµ(x)dν(y) = µ
b(ξ)b
ν (ξ).

For (e),
Z Z Z

Tcµ(ξ) = e−2πi x·ξ dT (µ)(x) = e−2πi T (x)·ξ dµ(x) = e−2πi x·T ξ
b(T ∗ ξ).
dµ(x) = µ

Part (f) is left as an exercise.


The next result has important applications to probability distributions (§18.1).
6.4.4 Uniqueness Theorem for Fourier-Stieltjes Transforms. Let µ and ν be finite
measures on B(Rd ) such that µ
b = νb. Then µ = ν.
Proof. Let φ ∈ S. By Fubini’s theorem
Z ZZ Z Z
b dµ(ξ) =
φ(ξ) exp (−2π i ξ · x)φ(x) dx dµ(ξ) = µb(x)φ(x) dx = νb(x)φ(x) dx
Z
= φ(ξ)b dν(ξ).

R R
Since the Fourier transform S → S is surjective, φ dµ = φ dν for all φ ∈ S. Let
ε = (ε, . . . , ε) and a, b ∈ Rd with aj < bj for all j. Choose a C ∞R function φε so that
1[a,b] ≤ φε ≤ 1(a−ε,b+ε) (0.14.5). By dominated convergence, limε→0 φε dµ = µ[a, b] and
similarly for ν. Therefore, µ[a, b] = ν[a, b] for all [a, b]. By the uniqueness theorem for
measures (1.6.8), µ = ν.
180 Principles of Analysis

Exercises
6.26 Verify parts (a)–(d) and (g)–(i) of 6.1.1.

6.27 Let ν be a complex measure on Rd , 1 ≤ p ≤ ∞, and f ∈ Lp (Rd ). Show that (f ∗ ν)(x) exists λ
a.e., f ∗ ν ∈ Lp (Rd ) and kf ∗ νkp ≤ kf kp kνk. JLet 1 ≤ p < ∞. Consider first the case f ≥ 0
and ν ≥ 0 and use Minkowski’s inequality for integrals to show that kf ∗ νkp ≤ kf kp kνk. Apply
this to |f | and |ν| in the general case.K

6.28 Let f, g ∈ L1 and let ν a finite measure. Prove:


(a) δa ∗ ν = Ta (ν) and δ0 ∗ ν = ν, where a ∈ Rd and Ta is translation by a.
(b) δa ∗ δb = δa+b .
(c) (f λd ) ∗ ν = (f ∗ ν)λd
(d) (f λd ) ∗ (g λd ) = (f ∗ g)λd .

6.29 Let T : Rd → Rd be linear, and µ, ν complex measures on B(Rd ). Show that (T µ) ∗ (T ν) =


T (µ ∗ ν).

6.30 Let µj , νj be finite measures on B(Rd ) with µj  νj (j = 1, 2). Show that µ1 ∗ µ2  ν1 ∗ ν2 .


\
6.31 Let µ and ν be finite measures on B(Rp ) and B(Rq ). Express µ ⊗ ν in terms of µ
b and νb.

6.32 Verify 6.4.3(f).

6.33 Let µ be a complex measure on B(Rd ) and f ∈ L1 (Rd ). Show that f[


∗ µ = fb · µ
b.

6.34 Let µ, ν ∈ M (Rd ) and µ  λ. Show that µ ∗ ν  λ and find d(µ ∗ ν)/dλ in terms of dµ/dλ.
Chapter 7
Measures on Locally Compact Spaces

In this chapter we describe a fundamental connection between topology and measure in the
setting of locally compact Hausdorff spaces. Many of the results will be seen as generalizations
of already established links between Borel measures on Rd and the Euclidean topology.

7.1 Radon Measures


Definition and Basic Properties
A Radon measure on a locally compact Hausdorff space X is a measure µ on B(X)
with the following properties:
(a) µ(K) < ∞ for all compact K ⊆ X.
(b) µ(U ) = sup{µ(K) : K ⊆ U, K compact} for all open U ⊆ X. (7.1)
(c) µ(E) = inf{µ(U ) : U ⊇ E, U open} for all E ∈ B(X).

Properties (b) and (c) assert, respectively, that µ is inner regular on open sets and
outer regular on Borel sets. If µ is a Radon measure on X, we shall call the pair (X, µ)
a Radon measure space.
If µ is a finite measure, then conditions (b) and (c) are equivalent to the assertion that
for each Borel set E and each ε > 0 there exist a compact set K and an open set U such
that K ⊆ E ⊆ U and µ(U \ K) < ε. It follows that if η is a measure with η ≤ µ, then η is a
Radon measure.
A Radon measure that satisfies (b) for every Borel set U is said to be regular. For
example, Lebesgue-Stieltjes measures on Rd are regular Radon measures (1.8.1). The following
proposition shows that if µ is σ-finite, then a Radon measure is regular.
7.1.1 Proposition. A Radon measure µ is inner regular on σ-finite sets E, that is,

µ(E) = sup{µ(K) : K ⊆ E, K compact}. (7.2)

In particular, a finite Radon measure is regular.


Proof. Let s denote the supremum. Clearly, s ≤ µ(E). For the reverse inequality, assume
first that µ(E) < ∞. Given ε > 0, choose an open set U ⊇ E such that µ(U ) < µ(E) + ε,
and a compact set K ⊆ U such that µ(K) > µ(U ) − ε. Since µ(U \ E) = µ(U ) − µ(E) < ε,
we may choose an open set V ⊇ U \ E with µ(V ) < ε. Then K \ V is compact, contained
in U , and s ≥ µ K \ V = µ K) − µ(K ∩ V ) > µ(U ) − µ(V ) − ε ≥ µ(E) − 2ε. Since ε was
arbitrary, s ≥ µ(E). Therefore, (7.2) holds for sets E of finite measure.
Now assume that µ(E) = ∞ and E is σ-finite. Let En ↑ E with µ(En ) < ∞ for every n.
Given k ∈ N, choose n so that µ(En ) > k and by the first paragraph choose a compact set
K ⊆ En such that s ≥ µ(K) > k. Since k is arbitrary, s = ∞.

181
182 Principles of Analysis

Consequences of Regularity

The proof of following result is the same as that for special case X = Rd (4.2.2), since
the proof of the latter uses only the properties (7.1) of λd .

7.1.2 Theorem. Let (X, µ) be a Radon measure space and f ∈ Lp (µ) (1 ≤ p < ∞). Then
for each ε > 0 there exists g ∈ Cc (X) such that kf − gkp < ε.
The following is an important application of the preceding theorem. The proof brings
together several familiar results on convergence of sequences of functions as well as Tietze’s
extension theorem.
7.1.3 Lusin’s Theorem. Let (X, µ) be a Radon measure space and f : X → C Borel
measurable such that µ{f 6= 0} < ∞. Then for each ε > 0 there exists g ∈ Cc (X) such that
g = f except on a set of measure < ε. Moreover, if f is bounded, then g may be chosen so
that kgk∞ ≤ kf k∞ .

Proof. Set E := {f 6= 0}. Suppose first that f is bounded. Then f ∈ L1 (µ), hence by 7.1.2
L1
there exists a sequence of continuous functions fn with compact support such that fn → f .
By 4.3.3, there exists a subsequence (fnk ) that converges to f a.e. By Egoroff’s theorem
(2.4.5), there exists set A ⊆ E with µ(E \ A) < ε/3 such that fnk → f uniformly on A. In
particular, f is continuous on A. By regularity, we may choose a compact set and an open
set U such that K ⊆ A ⊆ E ⊆ U , µ(A \ K) < ε/3, and µ(U \ E) < ε/3, hence µ(U \ K) < ε.
By 0.12.8, there exists a continuous function F on X with compact support contained in U
such that F = f on K. Now define a continuous function φ : C → C by
(
z if |z| ≤ kf k∞ ,
φ(z) =
kf k∞ sgn z if |z| > kf k∞

and set g := φ ◦ F . Then g = f on K, g = f = 0 on U c ⊆ E c , and {g = 6 f } ⊆ U \ K. The


function g satisfies the requirements.
In the unbounded case, set En = {0 < |f | < n}. Then En ↑ E, hence we may choose n so
that µ(E \ En ) < ε/2. Since fn := f 1En is bounded, by the first paragraph there exists a
continuous function gn with compact support such that µ{gn 6= fn } < ε/2. Since {fn 6= f }
is contained in E \ En ,
 
{gn 6= f } ⊆ {gn 6= fn } ∪ {gn = fn } ∩ {fn 6= f } ⊆ {gn 6= fn } ∪ E \ En ,

hence µ{gn 6= f } < ε.

The Space of Complex Radon Measures


A signed Radon measure on a locally compact Hausdorff space X is a signed Borel
measure µ whose positive and negative variations µ± are Radon measures. A complex
Radon measure on X is a complex Borel measure µ whose real and imaginary parts µr,i
are signed Radon measures. The collection of all complex Radon measures on X will be
denoted by Mra (X). We show in this subsection that Mra (X) is a Banach space under the
total variation norm. For this we prepare the following lemmas.
7.1.4 Lemma. Mra (X) is a linear space over C.
Proof. Let µj ∈ Mra (X), cj ∈ C, j = 1, 2. The proof that c1 µ1 + c2 µ2 ∈ Mra (X) is divided
into the following steps:
Measures on Locally Compact Spaces 183

(1) If µj ≥ 0 and cj ≥ 0, then c1 µ1 + c2 µ2 ∈ Mra (X).


JLet E ∈ B(X), ε > 0, Kj compact, and Uj open such that Kj ⊆ E ⊆ Uj and
µj (Uj \ Kj ) < ε(c1 + c2 )−1 . Set K = K1 ∪ K2 and U = U1 ∩ U2 . Then K ⊆ E ⊆ U and
(c1 µ1 + c2 µ2 )(U \ K) < ε.K
(2) If µj ≥ 0, then µ1 − µ2 ∈ Mra (X).
JSet η := µ1 − µ2 . By Ex. 5.7, µ1 ≥ η + and µ2 ≥ η − . It follows that the measures η ±
are regular and so, by definition, η is regular.K
(3) If µj are signed measures and cj ∈ R, then c1 µ1 + c2 µ2 ∈ Mra (X).
− −
JWrite µj = µ+ +
j − µj and cj = cj − cj and use (1) and (2).K

(4) If µj are complex measures and cj ∈ C, then c1 µ1 + c2 µ2 ∈ Mra (X).


JWrite µj = µjr + i µji and cj = Re cj + i Im cj . Then c1 µ1 + c2 µ2 is of the form aν + ibη,
where a, b ∈ R and ν, η are signed measures. By (3), aν, bη ∈ Mra (X).K
7.1.5 Lemma. µ ∈ Mra (X) iff |µ| ∈ Mra (X).
− + −
Proof. We may write µ = µ+ r − µr + i(µi − µi ), which, for notational convenience, we
express as µ = µ1 − µ2 + i (µ3 − µ4 ). If µ ∈ Mra (X), then by definition µj ∈ Mra (X),
hence given E ∈ B(X) and ε > 0 there exist compact sets Kj and open sets Uj such that
T4 S4
Kj ⊆ E ⊆ Uj and µj (Uj \ Kj ) < ε. Set U = j=1 Uj and K = j=1 Kj and let E1 , . . . , En
be a measurable partition of U \ K. Then
n
X n X
X 4 4 X
X n 4
X
|µ(Ej )| ≤ µi (Ej ) = µi (Ej ) = µi (U \ K) < 4ε.
j=1 j=1 i=1 i=1 j=1 i=1

Taking the supremum over all such partitions yields |µ|(U \K) ≤ 4ε. Therefore, |µ| ∈ Mra (X).
Conversely, let |µ| ∈ Mra (X). The inequality µj (E) ≤ |µ|(E) implies that µj ∈ Mra (X).
By definition, µ ∈ Mra (X).
7.1.6 Theorem. Mra (X) is a Banach space under the total variation norm.
Proof. By 7.1.4, Mra (X) is a linear subspace of M (X), the space of all complex Borel
measures on X. Since the latter is complete (5.2.5), it suffices to show that Mra (X) is closed
in M (X). Let µn ∈ Mra (X) and µ ∈ M (X) such that kµn − µk = |µn − µ|(X) → 0. Let
ε > 0 and choose n so that |µn − µ|(X) < ε. Given E ∈ B(X), choose a compact set K and
an open set U such that K ⊆ E ⊆ U and |µn |(U \ K) < ε (7.1.5). Then
|µ|(U \ K) ≤ |µn − µ|(U \ K) + |µn |(U \ K) < 2ε,
hence µ ∈ Mra (X).

The Support of a Radon Measure


S
Let (X, µ) be a Radon measure space and set U = {V : V is open in X and µ(V ) = 0}.
Then U is the largest open set such that µ(U ) = 0. To see this, let K ⊆ U be compact and
choose a finite cover V1 , . . . , Vn of K such that µ(Vj ) = 0. Then µ(K) = 0, and since µ is
inner regular, µ(U ) = 0.
The complement of U is called the support of µ and is denoted by supp µ:
[ c \
supp µ = {V : V ⊆ X open, µ(V ) = 0} = {C : C ⊆ X closed, µ(C c ) = 0}.

Thus supp(µ) is the smallest closed set on which the measure µ is concentrated. Exercise 7.2
gives various properties of the support.
184 Principles of Analysis

Exercises
7.1 Let µ be a regular Radon measure on a locally compact, Hausdorff space X and let Y ⊆ X be
closed, hence locally compact (0.12.1). Show that the restriction ν of µ to B(Y ) = B(X) ∩ Y is
a Radon measure on Y .

7.2 Let (X, µ) be a Radon measure space.


(a) Show that if V is open in X and V ∩ supp µ 6= ∅, then µ(V ) > 0.
T  −1 R
(b) Show that supp(µ) = f ({0}) : f ∈ C0 (X), f ≥ 0, f dµ = 0 .
(c) Let f be nonnegative and continuous on Rd . Show that supp(f λ) = supp(f ).
(d) Let µ1 and µ2 be Radon measures on X. Prove that

supp(µ1 ∨ µ2 ) = supp(µ1 ) ∪ supp(µ2 ) and supp(µ1 ∧ µ2 ) ⊆ supp(µ1 ) ∩ supp(µ2 )

and that the inclusion may be proper.

PnLet x ∈ X, c > 0. Show that supp(µ + cδx ) = supp(µ) ∪ {x}. Conclude that the support of
(e)
j=1 cj δxj (cj > 0) is {x1 , . . . , xn }.

7.3 Let µ be a RadonR measure on X and f ∈ L1 (X). Given ε > 0, show that there exists a compact
set K such that X\K |f | dµ < ε.

7.4 The Baire σ-field Ba (X) is the smallest σ-field relative to which each member of Cc (X) is
measurable. Show that Ba (X) is generated by the compact Gδ sets.

7.5 [↑ 1.85] (Intermediate value property of measures). Let µ be a regular Borel measure on X with
the property that µ{x} = 0 for all x ∈ X. Let E be a Borel set and 0 < c < µ(E). Verify the
following assertions to show that there exists a compact subset C of E such that µ(C) = c.
(a) Let A := {C : C is compact C ⊆ E and µ(C) ≥ c}. Then A is nonempty.
(b) Order A by reverse inclusion. If C is a chain in A and B = C, then µ(B) = inf C∈C µ(C).
T
Conclude that A has a minimal element C. JArgue by contradiction, using outer regularity on
B and the finite intersection property for compact sets.K
(c) If µ(C) > c and x ∈ C, then there exists an open set U 3 x such that µ(U ) < µ(C) − c.
Hence there exists a proper closed subset C1 of C such that µ(C1 ) > c.

7.6 Let µ be a σ-finite Radon measure on X and ν a complex measure such that ν  µ. Show that
ν is a Radon measure.

7.7 Let X and Y be locally compact Hausdorff spaces and T : X → Y continuous. Let µ be a
regular Borel measure on X. Prove:
(a) T µ is inner regular on Borel subsets of Y . (b) If X and Y are compact, then T µ is regular.

7.2 The Riesz Representation Theorem


Let X be a locally compact Hausdorff space. A positive linear functional on Cc (X)
is a linear mapping I : Cc (X) → R such that f ≥ 0 ⇒ I(f ) R≥ 0. For example, if µ a Borel
measure on X which is finite on compact sets, then I(f ) = f dµ defines a positive linear
functional on Cc (X). Positive linear functionals of the latter type are said to be represented
by an integral. The goal of this section is to prove that every positive linear functional on
Cc (X) may be so represented. As we shall see in §7.5 below as well as in Chapter 8, this
result is one of several regarding concrete representations of linear functionals.
Measures on Locally Compact Spaces 185

7.2.1 Riesz Representation Theorem. Let I be a positive linear functional on Cc (X).


Then there exists a unique Radon measure µ on X such that
Z
I(f ) = f dµ for all f ∈ Cc (X). (7.3)

Proof. The basic idea of the proof is to construct an outer measure from I and then use
Carathéodory’s theorem to obtain µ. This accomplished in the following steps, the first of
which establishes uniqueness, for which regularity is crucial. The remaining steps establish
existence.

(1) Let µ be a Radon measure satisfying (7.3). For an open set U ⊆ X,

µ(U ) = sup{I(f ) : f ∈ CU }, where CU := {f ∈ Cc (X) : 0 ≤ f ≤ 1U }.

In particular, the Radon measure satisfying (7.3) is unique.


R
JDenote the supremum by s. If f ∈ CU , then I(f ) ≤ 1U dµ = µ(U ), hence s ≤ µ(U ).
For the reverse inequality, let K be a compact set contained in U and by Urysohn’s
lemma choose f ∈ CU such that 1K ≤ f ≤ 1U . Then µ(K) ≤ I(f ) ≤ s. By inner
regularity, µ(U ) ≤ s. Two Radon measures satisfying (7.3) therefore agree on all open
sets and hence, by outer regularity, are equal.K
(2) For an open U ⊆ X, define µ(U ) by the equation in (1). Then µ is monotone, µ(∅) = 0,
and for any sequence of open sets Un with union U
X
µ(U ) ≤ µ(Un ).
n

JThe first two assertions are clear. For the inequality, let f ∈ CU and set K := supp(f ).
Since K ⊆ U , there exists a finite
Psubcover {U1 , . . . , Up } of K and nonnegative
Pn fi ∈ Cc (X)
p
such that supp(fi ) ⊆ Ui and i=1 fi = 1 on K (0.14.1). Then j=1 f · fj = f , and
since f · fj ∈ CUj ,
n
X n
X ∞
X
I(f ) = I(f · fj ) ≤ µ(Uj ) ≤ µ(Uj ).
j=1 j=1 j=1

Taking the supremum over all f ∈ CU and applying (1) yields the desired inequality.K
(3) For an arbitrary E ⊆ X, define µ∗ (E) by

µ∗ (E) = inf{µ(U ) : U ⊇ E, U open}.

Then µ∗ is an outer measure on X that agrees with µ on open sets.


JIt is obvious that µ∗ (U ) = µ(U ) for open sets U . In particular, µ∗ (∅) = µ(∅) = 0. It
follows from (2) that
nX [ o
µ∗ (E) = inf µ(Un ) : E ⊆ Un , Un open .
n n

Thus by 1.5.1, µ∗ is an outer measure.K


(4) B(X) ⊆ M(µ∗ ), hence µ∗ restricted to B(X) is a measure that extends µ. Denoting this
extension by µ, we have by step (3)

µ(E) = inf{µ(U ) : U ⊇ E, U open} for every E ∈ B(X).


186 Principles of Analysis

JFor the inclusion, it suffices to show that M(µ∗ ) contains all open sets U , that is,

µ∗ (E) ≥ µ∗ (E ∩ U ) + µ∗ (E ∩ U c ) for all E ⊆ X. (†)

For this we may assume that µ∗ (E) < ∞. Suppose first that E is open. Then V :=
E ∩ U is open, so given ε > 0 there exists f ∈ CV such that I(f ) > µ(V ) − ε. Also,
W := E \ supp(f ) is open, so there exists g ∈ CW such that I(g) > µ(W ) − ε. Since
f = 0 on V c = E c ∪ U c and g = 0 on W c = E c ∪ supp(f ), f + g ∈ CE . Therefore,

µ(E) ≥ I(f + g) = I(f ) + I(g) > µ(V ) + µ(W ) − 2ε.

Since supp(f ) ⊆ U , W ⊇ E \ U . We now have

µ(E) ≥ µ(E ∩ U ) + µ(E \ U ) − 2ε for all ε > 0.

this shows that (†) holds for open sets E.


Now let E ⊆ X be arbitrary and let A be open with E ⊆ A and µ(A) < µ∗ (E) + ε.
Then by the first part

µ∗ (E) + ε > µ(A) ≥ µ∗ (A ∩ U ) + µ∗ (A ∩ U c ) ≥ µ∗ (E ∩ U ) + µ∗ (E ∩ U c ).

Therefore, (†) holds.K


(5) If K is compact, then µ(K) = inf{I(f ) : f ∈ Cc (X), f ≥ 1K }.
JLet α denote the infimum. For an open set U ⊇ K choose f ∈ Cc (X) such that
1K ≤ f ≤ 1U . Then I(f ) ≤ µ(U ), and taking the infimum over all such U and using (4)
yields I(f ) ≤ µ(K). Therefore, α ≤ µ(K).
For the reverse inequality, let f ∈ Cc (X) with f ≥ 1K , and for fixed 0 < r < 1 set
U := {f > r}. If g ∈ CU , then g = 0 on {f ≤ r} and g ≤ 1 < r−1 f on {f > r}, so
in each case rg ≤ f . Therefore rI(g) ≤ I(f ). Taking the supremum over all such g we
have rµ(U ) ≤ I(f ). Therefore, rµ(K) ≤ I(f ), and letting r → 1 we obtain µ(K) ≤ I(f ).
Taking the infimum over all such f yields µ(K) ≤ α.K
(6) For every open set U ⊆ X, µ(U ) = sup{µ(K) : K compact and K ⊆ U }.
JLet s denote the supremum. Clearly, s ≤ µ(U ). For the reverse inequality, let r < µ(U )
and choose g ∈ CU such that I(g) > r. Let K = supp(g) and f ∈ Cc (X) with f ≥ 1K .
Then f ≥ g, so I(f ) ≥ I(g). Taking the infimum over all such f we see from (5) that
µ(K) ≥ I(g). Therefore, s ≥ µ(K) > r, and since r was arbitrary, s ≥ µ(U ).K
(7) If f ∈ Cc (X) and 0 ≤ f ≤ 1, then for each n there exist fj ∈ Cc (X) and compact sets
K0 ⊇ K1 ⊇ · · · ⊇ Kn such that
n
X
f= fj and n−1 1Kj ≤ fj ≤ n−1 1Kj−1 .
j=1

JSet ϕ0 = 0, K0 = supp(f ), and for j = 1, . . . , n define

ϕj (x) := min{f (x), j/n}, fj := ϕj − ϕj−1 , and Kj := {f ≥ j/n}.

The sets Kj are compact and decreasing, ϕj = f on Kjc , ϕj = j/n on Kj , ϕj = 0 on


K0c , and ϕn = f on Kn . The assertions follow.K
Measures on Locally Compact Spaces 187

(8) Let f , fj , and Kj be as in (7). Then for j = 1, . . . , n,


Z
1 1 1 1
µ(Kj ) ≤ fj dµ ≤ µ(Kj−1 ) and µ(Kj ) ≤ I(fj ) ≤ µ(Kj−1 ).
n n n n

JThe first set of inequalities are an immediate consequence of (7). For the second set,
observe that for any open set U containing Kj−1 , nfj ∈ CU , hence I(nfj ) ≤ µ(U ).
Taking the infimum over U and applying (4) and (5) produces the second inequality.K
R
(9) I(f ) = f dµ for all f ∈ Cc (X).
JLet f ∈ Cc (X). By considering positive and negative parts, we may assume that
f ≥ 0. Furthermore, dividingP by kf k∞ , we may also assume that f ≤ 1. Summing the
inequalities in (8) and using j fj = f , we obtain

n Z n−1 n n−1
1X 1X 1X 1X
µ(Kj ) ≤ f dµ ≤ µ(Kj ) and µ(Kj ) ≤ I(f ) ≤ µ(Kj ),
n j=1 n j=0 n j=1 n j=0

from which it follows that


Z
 
f dµ − I(f ) ≤ 1 µ(K0 ) − µ(Kn ) ≤ 1 µ(K0 ).
n n

Letting n → ∞ yields the desired conclusion.K

The following result is immediate from step (1) of the preceding proof.
7.2.2 Corollary. Let (X, µ) be a Radon measure space. Then for each open subset U of X,
nZ o
µ(U ) = sup f dµ : 0 ≤ f ≤ 1, supp(f ) ⊆ U .

Exercises
7.8 [↑ 7.1] Let µ be a regular Radon measure on X and Y a closed subset of X. For each f ∈
R Cc (Y ),
define a measurable function fe on X by fe = f on Y and fe = 0 on X \Y . Then I(f ) = X fe dµ
defines a positive linear functional on Cc (Y ). Describe the corresponding Radon measure in
terms of µ and justify your assertion.

7.9 Let I be a positive linear functional on C0 (X) with corresponding Radon measure µ and let
U ⊆ X be open. Then U is a locally compact Hausdorff space. For g ∈ C0 (U ) define ge by
ge = g on U and ge (X \ U ) = 0.
(a) Show that ge ∈ C0 (X).
(b) Show that J(g) = I(ge ) defines a positive linear functional on C0 (U ).
(c) What is the connection between µ and the Radon measure corresponding to J?

7.10 [↑ 7.7] Let X and Y be compact Hausdorff spaces and T : X → Y continuous. Given a positive
linear functional I on Cc (X) define a positive linear functional J on Cc (Y ) by J(f ) = I(f ◦ T ).
Find a connection between the associated Radon measures and justify your assertions.
188 Principles of Analysis

7.3 Products of Radon Measures


Finitely Many Measures
Let µ and ν be Borel measures on locally compact Hausdorff spaces X and Y, respectively.
The product measure µ ⊗ ν is then defined on B(X) ⊗ B(Y ). Denoting the collection of
open sets on a topological space Z by OZ , we have

B(X) ⊗ B(Y ) = σ(OX ) ⊗ σ(OY ) = σ(OX × OY ) ⊆ σ(OX×Y ) = B(X × Y ), (7.4)

the second equality by 1.2.4. If X and Y are second countable with countable bases (Un )
and (Vn ), respectively, then every open set in X × Y is a countable union of sets of the
form Un × Vm , hence the inclusion in (7.4) is equality and so B(X) ⊗ B(Y ) = B(X × Y ). In
general, however, the inclusion may be strict (see, for example, [20]), in which case µ ⊗ ν is
not a Borel measure on X × Y . In spite of this shortcoming, if µ and ν are Radon measures it
is possible to extend µ ⊗ ν to a Borel measure on X × Y . For this we need a preliminary result
which is of some independent interest. The development is facilitated by the introduction of
some standard notation.
Given functions g on X and h on Y , define the tensor product g ⊗ h of g and h on
X × Y by
(g ⊗ h)(x, y) = g(x)h(y), x ∈ X, y ∈ Y.
If G and H are linear spaces of functions on X and Y , respectively, the tensor product
G ⊗ H of G and H is the linear span of the set of all functions g ⊗ h, g ∈ G and h ∈ H.
7.3.1 Proposition. Cc (X) ⊗ Cc (Y ) is dense in Cc (X × Y ) in the uniform norm.
Proof. Let πX : X × Y → X and πY : X × Y → Y denote the projection  mappings. For
f ∈ Cc (X × Y ), the sets KX := πX supp(f ) and KY := πY supp(f ) are compact and
supp(f ) ⊆ KX × KY . Choose open sets UX ⊆ X and UY ⊆ Y with compact closure
such that KX ⊆ UX and KY ⊆ UY and set K := cl UX × cl VY . By the Stone-Weierstrass
theorem, C(cl UX ) ⊗ C(cl UY ) is dense in C(K), hence, given ε > 0, there exists a function
n
X
F := gi ⊗ hi ∈ C(cl UX ) ⊗ C(cl UY )
i=1

such that |F − f | < ε on K and so |F | < ε on K \ (KX × KY ). By the Tietze extension


theorem, gi and hi extend to members of Cb (X) and Cb (Y ), respectively. By Urysohn’s
lemma, there exist g ∈ Cc (X) and h ∈ Cc (Y ) with 1KX ≤ g ≤ 1UX and 1KY ≤ h ≤ 1UY .
Define G ∈ Cc (X × Y ) by
n
X n
X
G = (g ⊗ h) gi ⊗ hi = (ggi ) ⊗ (hhi ).
i=1 i=1

Then G = F on KX × KY , G = f = 0 on (UX × UY )c , and |G − f | = |G| ≤ |F | < ε on


(UX × UY ) \ (KX × KY ). Therefore, kG − f k∞ < ε.
It follows from 7.3.1 that members of Cc (X × Y ) are B(X) ⊗ B(Y )-measurable. Indeed,
one need only show this for functions g ⊗ h ∈ Cc (X) ⊗ Cc (Y ), and such a function is
B(X) ⊗ B(Y )-measurable since it is the product of continuous functions g ◦ πX and h ◦ πY .
With this observation we may now prove the following version of Fubini’s theorem:
Measures on Locally Compact Spaces 189

7.3.2 Theorem. Let (X, µ) and (Y, ν) be Radon measure spaces. Then Cc (X × Y ) ⊆
L1 (µ ⊗ ν), and for all f ∈ Cc (X × Y )
Z ZZ ZZ
f (x, y) d(µ ⊗ ν)(x, y) = f (x, y) dµ(x) dν(y) = f (x, y) dν(y) dµ(x). (7.5)

Proof. In the notation of the proof of 7.3.1, f = 0 off KX × KY , which has finite measure.
Therefore, the inclusion holds and (7.5) is a consequence of Fubini’s theorem applied to
KX × KY .
Now define a positive linear functional I(f ) on Cc (X × Y ) by the common value in
(7.5). The corresponding measure from the Riesz representation theorem is then defined on
B(X × Y ) and is an extension of µ ⊗ ν. We denote this measure by µ⊗ν. In summary:
7.3.3 Corollary. There exists a unique Radon measure µ⊗ν on B(X × Y ) whose restriction
to B(X) ⊗ B(Y ) is µ ⊗ ν.

Infinitely Many Measures


The preceding results extend in the obvious way to finitely many Radon measure spaces.
More interestingly, the results extend to the case of arbitrarily many spaces. However, for
this we require that the spaces be compact and the measures be probability measures. Here
are the details:
Let I be an arbitrary index set andQfor each i ∈ I let Xi be a compact Hausdorff space.
The product topological space X := i Xi is compact by Tychonoff’s theorem. In what
follows we consider finite sequences

s := (i1 , . . . , in ), ij ∈ I, ij 6= ik ,

which we shall index sequences. For such a sequence, define



Xs := Xi1 × · · · × Xin and πs : X → Xs , πs (x) := x(i1 ), . . . , x(in ) .

Denote by Cs (X) the subalgebra


S of C(X) consisting of all functions of the form f ◦ πs ,
f ∈ C(Xs ), and set F := s Cs (X), the collection of the functions in C(X) that depend on
only finitely many coordinates. Note that s1 ≤ s2 ⇒ Cs1 (X) ⊆ Cs2 (X), where the notation
means that the terms of the sequence s1 are contained in the sequence s2 . It follows that
F is an algebra. Since F contains the constant functions and separates points of X, F is
dense in C(X) by the Stone-Weierstrass theorem. With these preliminaries out of the way,
we may prove
7.3.4 Theorem. For each i ∈ I letQXi be a compact Hausdorff space and µi a Radon
probability measure on Xi . Set X := i Xi . Then there exists a unique Radon measure µ
on B(X) such that for all index sequences s = (i1 , . . . , in ) the image measure πs (µ) equals
µi1 ⊗ · · · ⊗ µin . Equivalently,
Z Z
f ◦ πs dµ = f d(µi1 ⊗ · · · ⊗ µin ) (7.6)
X Xs

for all bounded Borel functions f on Xs .


Proof. For g = f ◦ πs ∈ Cs (X), f ∈ C(Xs ), let Is (g) denote the right side of (7.6). Clearly,
Is is a well-defined, positive linear functional on Cs (X) and

|Is (g)| = |Is (f ◦ πs )| ≤ kf k∞ = kgk∞ .


190 Principles of Analysis

Moreover, because the µi are probability measures, s ≤ s0 ⇒ Is (g) = Is0 (g). Define a positive
linear functional I on F by

I(g) = Is (g), g ∈ Cs (X).

If also g ∈ Cs0 (X), then Is (g) = Is∪s0 (g) = Is0 (g), hence I is well-defined. Since F is dense
in C(X) and |I(g)| ≤ kgk∞ , I has an extension to a positive linear  functional on C(X).
Indeed, if (gn ) is a sequence in F and gn → g ∈ C(X), then I(gn ) is a Cauchy sequence
in C hence converges to some I(g) ∈ C, independent of the sequence (gn ), giving the desired
extension. By the Riesz representation
R theorem, there exists a unique Radon probability
measure µ on X such that I(g) = X g dµ, g ∈ C(X), which implies (7.6) for continuous f .
It remains to show that πs (µ) = µi1 ⊗ · · · ⊗ µin . Since these define equal positive linear
functionals on C(Xs ) and since µi1 ⊗ · · · ⊗ µin is a Radon measure on Xs , it suffices by the
uniqueness part of the Riesz representation theorem to show that πs (µ) is a Radon measure
on Xs . But since πs : X → Xs is continuous, this follows directly from Ex. 7.7.

Exercises
7.11 For each n ∈ N, let Xn be a compact Hausdorff space, µn a Radon probability measure on
Xn , πn : X → Xn the projection map, and (X, µ) the product measure space. Show that
the projection mappings πn : X → Xn are independent, that is, if n1 < n2 < · · · < nk and
Bj ∈ B(Xnj ), then
k
 Y 
µ πnj ∈ Bj , j = 1, . . . , k = µ πnj ∈ Bj .
j=1

7.12 In the preceding exercise, assume that X1 = X2 = . . . and µ1 = µ2 = . . .. For E ∈ B(X1 ),


define
TE (x1 , x2 , . . .) = inf{n ∈ N : xn ∈ E},
where by convention inf ∅ = ∞. Show that TE is Borel measurable and find µ{TE = n}, the
probability that the first time a sequence enters E is at time n. The function TE is an example
of a stopping time. (See §18.5.)

Pn ∈ N, let Xn = {0, 1} and define µn {1} = p and µn {0} = 1 − p,


7.13 [↑ 7.11] (Coin Toss) For each
where 0 < p < 1. Set Sn = n j=1 πj .

(a) Show that Sn is binomially distributed, that is,


!
n k
p (1 − p)n−k , k = 0, 1, . . . , n.

µ Sn = k =
k

Sn2 dµ = np + n(n − 1)p2 .


R R
(b) Show that X
Sn dµ = np and X
µ
(c) Show that n−1 Sn → p. JUse 3.21.K
Part (c) is a special case of the weak law of large numbers. (See §18.4.)

7.14 For n ∈ N let Xn be a compact Hausdorff space and set Yn = X1 × · · · × Xn and X = ∞


Q
n=1 Xn .
Let µ1 be a probability measure on B(X1 ) and suppose for each n > 2 and (x1 , . . . , xn−1 ) ∈ Yn−1
that there exists a probability measure µn−1 (x1 , . . . , xn−1 , ·) = µn−1 (x1 , . . . , xn−1 , dxn ) on
B(Xn ) such that µ(x1 , . . . , xn−1 , B) is Borel measurable in (x1 , . . . , xn−1 ) for each B ∈ B(Xn ).
Let Πn : X → Yn denote the projection mapping. Show that there exists a unique Radon
probability measure µ on X such that for all bounded Borel functions f on Yn ,
Z Z Z
f ◦ Πn dµ = ··· f (x1 , . . . , xn )µn−1 (x1 , . . . , xn−1 , dxn ) · · · µ1 (dx1 ).
X X1 Xn
Measures on Locally Compact Spaces 191

7.4 Vague Convergence


Let X be a locally compact Hausdorff space. A sequence of complex measures µn on
v
B(X) is said to converge vaguely to a complex measure µ, written µn → µ, if
Z Z
lim f dµn = f dµ (7.7)
n

v
for all f ∈ C0 (X). For example, if (xn ) is a sequence in Rd and xn → x, then δxn → δx .
Note that since the measures µn and µ may be identified with continuous linear functionals
on C0 (X), vague convergence is simply weak∗ sequential convergence in the dual of C0 (X)
(see §10.2).
Vague convergence does not necessarily imply that (7.7) holds for all f ∈ Cb (X) (Ex. 7.16).
Additional conditions are needed, as described in the next theorem.
7.4.1 Theorem. Let µ, µn ∈ Mra (X) be nonnegative. Then (7.7) holds for all f ∈ Cb (X)
v
iff µn → µ and kµn k → kµk.
Proof. The necessity is obvious. For the sufficiency, we may assume thatR kµk > 0. Choose
0 < ε < kµk. By 7.2.2, there exists a φ ∈ Cc (X) with 0 ≤ φ ≤ 1 such that φ dµ > µ(X) − ε.
Let f ∈ Cb (X). Since µn (X) → µ(X),
Z Z Z


lim f (1 − φ) dµn ≤ kf k∞ lim (1 − φ) dµn = kf k∞ (1 − φ) dµ ≤ ε kf k∞ . (†)
n n

R R
Since f φ dµn → f φ dµ (because f φ ∈ Cc (X)), we see from the expansion
Z Z Z Z Z Z
f dµn − f dµ = f (1 − φ) dµn − f (1 − φ) dµ + f φ dµn − f φ dµ

and (†) that Z Z



lim f dµn − f dµ ≤ 2ε kf k∞ .

n

The following result gives a sufficient condition for vague convergence on B(Rd ) in terms of
Fourier-Stieltjes transforms. It will be needed later in the proof of the central limit theorem.

7.4.2 Theorem. Let µ, µ1 , µ2 , . . . be complex measures on B(Rd ) such that supn kµn k < ∞
v
bn → µ
and µ b pointwise. Then µn → µ.
Proof. We use the Fourier inversion formula: For φ ∈ S(Rd ) and any complex measure ν,
Z ZZ Z
φ(x) dν(x) = b
φ(ξ)e2πξ·x b νb(−ξ) dξ,
dν(x) dξ = φ(ξ)

hence Z Z Z

b
φ(x) dµn (x) − φ(x) dµ(x) ≤ |φ(ξ)| |c
µn (−ξ) − µ
b(−ξ)| dξ.

By hypothesis, the integrand on the right tends pointwise to 0. Since the integrand is
192 Principles of Analysis
R R
b
dominated by the L1 function 2|φ(ξ)|, φ(x) dµn → φ(x) dµ. Now let f ∈ Cc (Rd ) and
choose φ ∈ Cc∞ (Rd ) such that kf − φk∞ < ε. (6.1.4). Then
Z Z Z Z Z Z

f dµn − f dµ ≤ (f − φ) dµn + (f − φ) dµ + φ dµn − φ dµ

Z Z

≤ kf − φk∞ (kµn k + kµk) + φ dµn − φ dµ .

R R R R 
Since φ dµn → φ dµ, limn f dµn − f dµ ≤ ε supn kµn k + kµk .

The following theorem characterizes vague convergence of finite measures on R in terms


of convergence of the associated distribution functions.
7.4.3 Theorem. Let µ, µ1 , µ2 , . . . be finite measures on B(R) such that supn kµn k < ∞.
v
Set F (x) := µ(−∞, x] and Fn (x) := µn (−∞, x]. Then µn → µ iff Fn (x) → F (x) at each
continuity point x of F .
Proof. Suppose Fn (x) → F (x) at each continuity point x of F . Because F has only countably
many discontinuities, Fn → F λ-a.e. Let g ∈ Cc1 (R). Integrating by parts (§3.6) and applying
the dominated convergence theorem, we have
Z Z Z Z Z
g dµn = g(x) dFn (x) = − g 0 (x)Fn (x)dx → − g 0 (x)F (x)dx = g dµ.

Since an arbitrary f ∈ Cc (R) may be uniformly approximated by functions g ∈ Cc1 (R) (6.1.4)
v
and since supn kµn k < ∞, it follows that µn → µ.
For the converse, let x be a continuity point of F . Fix k ∈ N and δ > 0 and construct
a piecewise linear function f ∈ Cc (R) such that f = 1 on the Rinterval [−k,
R x], and f = 0
on (−∞, −k − δ] ∪ [x + δ, ∞). Given ε > 0, choose N so that f dµn ≤ f dµ + ε for all
n > N . For such n and all k
Z Z
Fn (x) − Fn (−k) = µn (−k, x] ≤ f dµn ≤ ε + f dµ ≤ ε + F (x + δ) − F (−k − δ),

hence
Fn (x) ≤ ε + F (x + δ) − F (−k − δ) + Fn (−k).
Letting k → ∞ we have Fn (x) ≤ ε + F (x + δ) for all n ≥ N and so limn Fn (x) ≤ F (x + δ).
Letting δ → 0 we then have limn Fn (x) ≤ F (x). Similarly, by taking g ∈ Cc (R) such that
g = 1 on [−k + δ, x − δ] and g = 0 on (−∞, k] ∪ [x, ∞) and linear on the remaining intervals,
we see that limn Fn (x) ≥ F (x). Therefore, Fn (x) → F (x).

Exercises
7.15 Let X be locally compact and Hausdorff and µ, µn complex R with supn kµn k < ∞.
R measures
Show that the set V of all functions f ∈ Cb (X) for whichR f dµn →R f dµ is a closed linear
subspace of Cb (X) in the uniform norm. Conclude that if f dµn → f dµ for all f ∈ Cc (X),
v
then µn → µ.

7.16 Show that the condition kµn k → kµk in 7.4.1 cannot be removed.

7.17 Show that the convergence Fn (x) → F (x) in 7.4.3 need not hold at points x where F is
discontinuous.
v
7.18 Consider the space [0, 1] with Lebesgue measure λ. Set fn = n1[0,1/n] . Show that fn · λ → δ0 .
Measures on Locally Compact Spaces 193
v
7.19 Let µn (E) = 2−n ∞
j=−∞ 1E (j/2 ), E ∈ B([a, b]). Show that µn → λ [a,b]
P n

7.20 [↑ 7.7] Let X and Y be compact Hausdorff spaces and T : X → Y continuous. Let µn and µ be
v v
Radon measures on X. Show that if µn → µ, then T (µn ) → T (µ).
7.21 Let (X, F, µ) be a probability space and gn , g real-valued, measurable functions on R such
µ v
that gn → g. Show that gn (µ) → g(µ). JLet f ∈ C0 (R), ε > 0, and set En = {|gn − g)| ≥ δ}
for a suitable δ obtained from the uniform continuity of f .K Show that the converse is false. J
Consider the space [0, 1) with Lebesgue measure λ. Set

An = [0, 1/2n ) ∪ [2/2n , 3/2n ) ∪ [4/2n , 5/2n ) ∪ · · · ∪ [(2n − 2)/2n , (2n − 1)/2n )
v
and rn = 1An . (The functions rn are called Rademacher functions). Show that rn (λ) → r1 (λ)
λ
but rn 6→ f for any f .K
v
7.22 Let µ, µ1 , µ2 , . . . be probability measures on B(R) such that µn → µ. Let F , Fn be as in 7.4.3.
Carry out the following steps to show that if F is continuous, then Fn (x) → F (x) uniformly on
R. Give an example to show that the continuity of F is needed here, that is, in general Fn need
not converge uniformly to F on the set of continuity points of F .
(a) Given ε > 0, choose a < b so that F (a) < ε and 1 − F (b) < ε. Then there exists a partition
P = {x0 = a < x1 < · · · < xk = b} such that |F (xi ) − F (xi−1 )| < ε for all i.
(b) There exists N such that |Fn (xi ) − F (xi )| < ε for all n ≥ N and all i. Fix such an n.
(c) If x ≤ a, then 0 ≤ F (x) < ε and 0 ≤ Fn (x) < 2ε.
(d) If x ≥ b, then 0 ≤ 1 − F (x) < ε and 0 ≤ 1 − Fn (x) < 2ε.
(e) If x ∈ [xi−1 , xi ], then F (xi−1 ) ≤ F (x) < F (xi−1 )+ε and F (xi−1 )−ε < Fn (x) ≤ F (xi−1 )+2ε.
(f) Conclude that |Fn (x) − F (x)| < 4ε for all x.

*7.5 The Daniell-Stone Representation Theorem


The Riesz representation theorem asserts that for a locally compact Hausdorff space X, a
positive linear functional I on Cc (X), may be represented as an integral against a Radon
measure. In this section we consider the representation problem in more general setting,
one which admits integral representations of positive linear functionals on a variety of other
function spaces.
A (real) linear space L of real-valued functions on a set X is called a Stone vector
lattice if
f, g ∈ L ⇒ f ∨ g, f ∧ g, f ∧ 1 ∈ L.
 
The second requirement is actually redundant, since f ∧ g = (−f ) ∨ (−g) . The third
requirement is also redundant if L contains the constant function 1; however, we do not
assume this. Note that the conditions on L imply the additional properties
f ∈ L ⇒ f + , f − , |f | ∈ L.
Here are some examples: The set of real-valued measurable functions on a measurable
space is a Stone vector lattice, as is the subspace of simple functions. If X a topological
space, then Cb (X), C0 (X), and Cc (X) are Stone vector lattices. If (X, F, ν) is an arbitrary
measure space, then L1 (X, F, ν) is a Stone vector lattice.
A Daniell-Stone integral on a Stone vector lattice L is a linear functional I : L → R
with the following properties:
194 Principles of Analysis

• I is positive: f ≥ 0 ⇒ I(f ) ≥ 0.
• I is continuous from above: fn ↓ 0 ⇒ I(fn ) → 0.
Note that I must then have the additional properties

• f ≤ g ⇒ I(f ) ≤ I(g) • fn ↓ f ⇒ I(fn ) ↓ I(f ) • fn ↑ f ⇒ I(fn ) ↑ I(f ),

as may be seen by considering differences.


7.5.1 Example. Let X be a locally compact Hausdorff topological space. A positive linear
functional I on Cc (X) is a Daniell-Stone integral. Indeed, if fn ↓ 0, then supp fn ⊆ K :=
supp f1 , hence by Dini’s theorem the convergence is uniform on K. Choose a continuous
function g with compact support such that 1K ≤ g ≤ 1 . Given ε > 0, choose N so that
fn ≤ ε/I(g) on K for all n ≥ N . For such n, fn ≤ εg/I(g) on X, hence I(fn ) ≤ ε. This
shows that I is continuous from above and hence is a Daniell-Stone integral. ♦
The preceding example shows that 7.2.1 is a special case of the following general result:
7.5.2 Daniell-Stone Representation Theorem. Let X be a nonempty set, I a Daniell-
Stone integral on a Stone vector lattice L of functions on X, and let F denote the σ-field
on X generated by L, that is, by Rthe sets f −1 (B), f ∈ L, B ∈ B(R). Then there exists a
measure µ on F such that I(f ) = f dµ for every f ∈ L.
Proof. We follow [26], where first a measure is constructed on the regions between the graphs
of functions in L. The proof is broken down into the following steps:

(1) For functions f, g : X → R with f ≤ g, define

(f, g] := {(x, t) ∈ X × R : f (x) < t ≤ g(x)}.

Then for any B ⊆ X and c > 0, (0, c1B ] = B × (0, c].


J(x, t) ∈ (0, c1B ] iff 0 < t ≤ c1B (x) iff 1B (x) = 1 and 0 < t ≤ c iff (x, t) ∈ B × (0, c].K
(2) Let H denote the collection of all sets (f, g] with f, g ∈ L and f ≤ g. Then H is a
semiring on X × R.
JClearly ∅ = (f, f ] ∈ H. Let (f1 , g1 ], (f2 , g2 ] ∈ H. Then

(f1 , g1 ] ∩ (f2 , g2 ] = f1 ∨ f2 , (f1 ∨ f2 ) ∨ (g1 ∧ g2 ) ,

hence H is a π system. Moreover,


 
(f1 , g1 ] \ (f2 , g2 ] = f1 , f1 ∨ (g1 ∧ f2 ) ∪ g1 ∧ (g2 ∨ f1 ), g1 .

Indeed, (x, t) in the left side iff f1 (x) ≤ t < g1 (x) and either t < f2 (x) or t ≥ g2 (x), that
is, iff (a) f1 (x) ≤ t < f1 (x) ∨ g1 (x) ∧ f2 (x) or (b) g1 (x) ∧ g2 (x) ∧ f1 (x) ≤ t < g1 (x).
Moreover, since (a) and (b) cannot occur simultaneously, the union is disjoint. Therefore
H is a semiring.K
(3) Define a set function ν on H by ν(f, g] = I(g − f ). Then ν is a measure on H and
hence, by 1.6.4, has an extension to σ(H).
S∞
JFor countableSadditivity, let (f, g] = n=1 (fn , gn ] (disjoint). Then for each x ∈ X,

(f (x), g(x)] = Pn=1  (fn (x), gn (x)] (disjoint). Applying Lebesgue measure λ, we have
g(x) − f (x) = n gn (x) − fn P (x) . Since the partial sums of the
Pseries increase mono-
tonically to g − f , I(g − f ) = n I(gn − fn ), that is, ν(f, g] = n ν(fn , gn ].K
Measures on Locally Compact Spaces 195

(4) Let f ∈ L with f ≥ 0 and c > 0. Then there exists a sequence of nonnegative functions
fn in L such that fn ↑ 1{f >1} , hence (0, cfn ] ↑ (0, c1{f >1} ] = {f > 1} × (0, c] (by (1)).
 
JDefine fn = n(f − f ∧ 1) ∧ 1. If f (x) ≤ 1, then fn (x) = 0 for all n. If f (x) > 1, then
eventually fn (x) = 1. Therefore, fn ↑ 1{f >1} K
(5) σ(H) contains all sets of the form {a < f ≤ b} × (0, c], f ∈ L, 0 < a < b, c > 0.
JSince a ≥ 0, the sets are unchanged when f is replaced by f + , so we may assume that
f ≥ 0. By (4), {f > 1} × (0, c] ∈ H. Since

{a < f ≤ b} × (0, c] = {1 < a−1 f } × (0, c] \ {1 < b−1 f } × (0, c] ,

the assertion follows.K


(6) σ(H) contains all sets of the form B × (0, c], c > 0, B ∈ F.
JFor fixed c, the collection of all B ∈ F for which B × (0, c] ∈ σ(H) is a σ-field containing
the generators {a < f ≤ b} of F, hence must equal F.K

(7) Define a measure µ on F by µ(B) := ν B × (0, 1] . Then for f ∈ L and f ≥ 0,

ν{a < f ≤ b} × (0, c] = cµ{a < f ≤ b}.

JUsing the sequence in (4), we have


  
ν {f > 1} × (0, c] = lim ν 0, cfn = lim I(cfn ) = c lim I(fn ) = cν {f > 1} × (0, 1]
n n n
= cµ{f > 1}.

Now apply the set identities in the proof of (5).K


R
(8) I(f ) = f dµ for every f ∈ L.
JWe may take f ≥ 0. Moreover, since f ∧ n ↑ f , we may assume that f is bounded. Let
Pk
h = i=1 ci 1Ci be a simple function in standard form, where ci > 0, Ci := {ai < f ≤ bi },
and 0 ≤ ai < bi . Since (0, h] is a disjoint union of the sets (0, ci 1Ci ] = Ci × (0, ci ], we
have by (7) Z
X  X
ν(0, h] = ν Ci × (0, ci ] = ci µ(Ci ) = h dµ.
i i

By a minor modification of the proof of 2.3.1 (necessitated by the use of left open
rather than right open intervals in the definition of Ci ), there exists a Rsequence (hn )
of such simple functions
R such that hn ↑ f . Taking limits in ν(0, hn ] = hn dµ yields
I(f ) = ν(0, f ] = f dµ.K
Part II

Functional Analysis
Chapter 8
Banach Spaces

Several examples of Banach spaces have played important roles in Part I of the text, notably
Lp spaces and various spaces of continuous functions. In this chapter we develop the basic
properties of general normed spaces. Additional properties are considered in Chapters 10
and 14.

8.1 General Properties of Normed Spaces


We remind the reader that a norm k·k on a linear space X over K has the following
properties:

(a) kxk ≥ 0, (b) x 6= 0 ⇒ kxk 6= 0, (c) kcxk = |c| kxk , (d) kx + yk ≤ kxk + kyk .

A seminorm has the same properties with the possible exception of (b). We also recall the
following variations of the triangle inequality:
X X
n n

x ≤ kxj k and kxk − kyk ≤ kx − yk . (8.1)
j
j=1 j=1

For ease of reference, we list below the main examples of normed spaces discussed in the
first part of the text together with some new ones. All are Banach spaces except (d) and (j).
The sequence spaces (h) – (k) are special cases of the function spaces (a) – (e). We remind
the reader that k·kp is in general only a seminorm unless one adopts the convention (which
we do) of identifying functions that are equal a.e.
8.1.1 Examples.
(a) Lp (X, F, µ) = {f : X → K : f is F-measurable and kf kp < ∞}, where
Z 1/p
kf kp = |f |p dµ (1 ≤ p < ∞), kf k∞ = sup{t : µ{|f | > t} > 0}.

(b) B(X) = the space of all bounded functions f : X → C with norm kf k∞ = sup |f (X)|,
where X is a nonempty set.
(c) Cb (X) = the space of all bounded continuous functions f : X → C with norm k·k∞ ,
where X is a topological space.
(d) Cc (X) = {f ∈ Cb (X) : supp(f ) is compact} with norm k·k∞ , where X is a locally
compact Hausdorff topological space.
(e) C0 (X) = closure of Cc (X) in Cb (X), X a locally compact Hausdorff space.
(f) M (X) = space of complex measures on a measurable space (X, F) with the total variation
norm kµk = |µ|(X)

199
200 Principles of Analysis

(g) Mra (X) = space of complex Radon measures on B(X) with the total variation norm
kµk = |µ|(X), where X is a locally compact Hausdorff space.

(h) `p = `p (N) := x = (xn ) : kxkp < ∞ , where
X
∞ 1/p
p
kxkp = |xn | (1 ≤ p < ∞), kxk∞ = sup |xn |.
n
n=1

(i) `p (Z) = the space of all bilateral sequences x = (. . . , x−1 , x0 , x1 , . . .) such that
kxkp < ∞, where
 X
∞ 1/p
kxkp := |xn |p (1 ≤ p < ∞), kxk∞ := sup |xn |.
n=−∞ n∈Z


(j) c00 := x = (xn ) : xn = 0 for all but finitely many n , kxk∞ := supn |xn |.

(k) c0 := x = (xn ) : limn xn = 0 , kxk∞ := supn |xn |.

(l) c := x = (xn ) : limn xn exists , kxk∞ := supn |xn |. ♦

The Topology and Geometry of Normed Spaces


Let X be a normed space. As noted in §0.4, the mapping (x, y) 7→ kx − yk is a metric.
The metric topology of X is called the norm topology. The second inequality in (8.1)
shows that kxk is a continuous function of x. Using the sequential form of continuity,
one may easily establish the continuity of vector space operations (x, y) 7→ x + y and
(c, x) 7→ cx. (Ex. 8.1). As a consequence, the closure of a subspace of X is a linear space
and the closure of a convex set is convex. Moreover, for fixed y and c 6= 0, the mappings
x 7→ x + y and x 7→ cx are homeomorphisms, hence one has

cl(y + A) = y + cl(A), cl(cA) = c cl(A), int(y + A) = y + int(A) and int(cA) = c int(A).

The open ball, closed ball, and sphere of radius r and center x in a normed space X take
the forms

Br (x) := {y ∈ X : kx − yk < r}, Cr (x) := {y ∈ X : kx − yk ≤ r} and


Sr (x) := {y ∈ X : kx − yk = r}.

In case of ambiguity, we include the norm symbol in the notation, as in Br (x, k·k). We also
use the simplified notation

Br := Br (0), Cr := Cr (0), and Sr := Sr (0).

The ball B1 is called the open unit ball and C1 is called the closed unit ball. The
following relations are occasionally useful (Ex. 8.5):

Br (x) = x + rB1 , Cr (x) = x + rC1 , Sr (x) = x + rS1 . (8.2)

The reader may check that Cr (x) is the closure of Br (x) and Br (x) is the interior of Cr (x)
(Ex. 8.3), properties not shared by general metric spaces (consider a discrete space). The
balls Br (x) and Cr (x) are easily seen to be convex; Br and Cr have the additional property
of being balanced (see §0.2).
Banach Spaces 201

Separable Spaces
A normed linear space is separable if it is separable in the metric topology. Such spaces
are important in contexts where a metric is needed for the weak or weak∗ topologies discussed
in Chapter 10.
8.1.2 Examples.
p d
(a) For 1 ≤ Ppn < ∞, the space L (R ) is separable. For example, the collection of all step
functions i=1 ai 1Ij , where aj ∈ Q and Ij is a bounded open interval whose coordinate
intervals have rational endpoints, is dense in Lp .
(b) The space L∞ (Rd ) is not separable. To see this for the case d = 1, let ft := 1(−∞,t) and
note that the balls B1/2 (ft ) are disjoint. Since there are uncountably many of these, L∞ (R)
cannot contain a countable dense set.
(c) The space C[a, b] is separable under the uniform norm. Indeed, by the Weierstrass
approximation theorem, the set of polynomials on [a, b] with rational coefficients is dense in
C[a, b]. A similar argument shows that C(X) is separable for any compact subset X of Rd .
(d) The space Cb (R) of bounded continuous functions on R is not separable in the uniform
norm. The basic idea is a variation of the argument for L∞ : For each doubly infinite sequence
s = (. . . , s−1 , s0 , s1 , . . . , ), where sn = 0 or 1, define fs ∈ Cb (R) such that fs (n) = sn and fs
is linear for n ≤ x ≤ n + 1 (n ∈ Z). Then kfs − ft k = 1 (s = 6 t), hence the balls B1/2 (fs ) are
disjoint. Since the set of all such sequences is uncountable, Cb (R) cannot contain a countable
dense set. ♦
(e) The disk algebra A(D) is the algebra of continuous functions on the closed unit disk
cl D that are analytic on D. We show that A(D) is separable in the uniform norm by showing
that the set of all polynomials P (z) is dense in A(D). To this end, let 0 < r < 1 and note
P∞ if f ∈k A(D), then fr (z) := f (rz) is analytic on the disk r D ⊇ cl D. The Taylor series
−1
that
k=0 ck z for fr therefore converges uniformly to fr on cl D. Given ε > 0, choose a partial
sum Pn of the series such that |fr (z) − Pn (z)| < ε for all z ∈ D. Letting r → 1, we obtain
|f (z) − Pn (z)| ≤ ε on D.

Equivalent Norms
Two norms k·k and ||| · ||| on a vector space X are said to be equivalent if the associated
metrics are equivalent, that is, if there exist positive real numbers a and b such that

kxk ≤ a ||| x ||| and ||| x ||| ≤ b kxk for all x ∈ X. (8.3)

The notion of equivalence of norms is an equivalence relation on the collections of norms on


X (Ex. 8.7).
The norms k·kp , 1 ≤ p ≤ ∞, on Kd are easily seen to be equivalent. For an infinite
dimensional example, let µ be a finite measure on B[0, 1], g a positive continuous function
on [0, 1], and ν = gµ. Then the L1 norms of µ and ν are equivalent on C[0, 1]. On the
other hand, the norms k·k1 and k·k∞ on C[0, 1] satisfy k·k1 ≤ k·k∞ but are not equivalent
(Ex. 8.8).
8.1.3 Proposition. Let k·k and ||| · ||| be norms on a vector space X. Each of the following
statements implies the other two.
(a) k·k and ||| · ||| are equivalent.

(b) The corresponding metric topologies are equal.

(c) For any sequence (xn ) in X, kxn k → 0 iff ||| xn ||| → 0.


202 Principles of Analysis
  
Proof. (a) ⇒ (b): If (8.3) holds, then Br x, k·k ⊆ Bbr x, ||| · ||| and Br x, ||| · ||| ⊆
Bar x, k·k . Since open sets are unions of open balls, the topologies coincide.
(b) ⇒ (c): Let kxn k → 0 and ε > 0. Since Bε (0, ||| · |||) is open in the k·k-topology, 
Bδ (0, k·k) ⊆ Bε (0, ||| · |||) for some δ > 0. Therefore, xn is eventually in Bε 0, ||| · ||| , which
shows that ||| xn ||| → 0. The converse is similar.
(c) ⇒ (a): We claim that if (c) holds, then there exists m ∈ N such that
 
B1/m 0, k·k ⊆ B1 0, ||| · ||| . (†)
 
Indeed, if the claim is false then for each n we could choose xn ∈ B1/n 0, k·k \ B1 0, ||| · |||
to obtain kxn k → 0 and ||| xn ||| ≥ 1, contradicting (c). Now let x = 6 0 and set y :=
(2m kxk)−1 x. Then kyk < 1/m, hence, by (†), ||| y ||| < 1. Therefore, ||| x ||| < 2m kxk,
which is the second inequality in (8.3) with b = 2m. The verification of the first inequality is
similar.

Finite Dimensional Spaces


Let X be a finite dimensional vector space. If v1 , . . . , vd is a basis for X and 1 ≤ p ≤ ∞,
then a norm on X is given by

||| x |||p := k~
xkp , where x := x1 v1 + · · · + xd vd and x
~ := (x1 , . . . , xd ). (8.4)

These norms are easily seen to be equivalent. A somewhat surprising result is the following:
8.1.4 Theorem. All norms on a finite dimensional vector space X are equivalent.
Proof. Let k·k be an arbitrary norm on X. It suffices to show that k·k is equivalent to the
complete norm ||| · |||2 defined in (8.4).
One inequality in (8.3) is easy: In the notation of (8.4), we have, by the triangle and CBS
inequalities,
d
X X
d 1/2  X
d 1/2
2
kxk ≤ kvk k |xk | ≤ kvk k |xk |2 = a ||| x |||2 ,
k=1 k=1 k=1

Pd 2 1/2
where a := k=1 kvk k . For the other inequality, define a function F : Kd → R+ by
x) = kxk. Then
F (~
|F (~
x) − F (~y )| ≤ kx − yk ≤ a k~x − ~y k2 ,
hence F is continuous. Moreover, if x~ =6 0 then, by linear independence, x 6= 0. Thus
F is positive on the compact Euclidean sphere {~ x : k~xk2 = 1} and so has
 a positive
minimum m there. For any x 6 0 we then have kx/ k~
~ = xk2 k = F x~ / k~
xk2 ≥ m, hence
kxk ≥ m k~xk2 = m||| x |||2 .
Theorem 8.1.4 shows that in a finite dimensional normed space X one may always choose
an equivalent norm relative to which X isometrically isomorphic to a Euclidean space Kd .
This implies that the metric properties of Kd carry over to X. In particular,
8.1.5 Corollary. A finite dimensional normed space is complete, its subspaces are closed,
and its bounded sets are relatively compact.
Interestingly, the last assertion of the corollary actually characterizes finite dimensional
spaces: a normed space with a compact ball is finite dimensional. The proof depends on
the following result, which guarantees the existence of vectors in a normed space that are
“nearly orthogonal” to a given closed subspace.
Banach Spaces 203

8.1.6 Theorem (F. Riesz). Let Y be a proper closed subspace of a normed space X. Then
for each ε ∈ (0, 1) there exists xε ∈ X such that

kxε k = 1 and inf{kxε − yk : y ∈ Y} ≥ 1 − ε. (8.5)

Proof. Choose any x ∈ X \ Y and set d := inf{kx − yk : y ∈ Y}. Since Y is closed, d > 0.
x

d
1 <
1−ε
1−ε

Y
0 y0

FIGURE 8.1: xε nearly orthogonal to Y

Choose y0 ∈ Y such that kx − y0 k < d/(1 − ε) and set


−1
xε := kx − y0 k (x − y0 ).

Then for any y ∈ Y the vector z := y0 + kx − y0 k y is in Y and


−1
y = kx − y0 k (z − y0 ),

hence
1 d
kxε − yk = kx − zk ≥ ≥ 1 − ε.
kx − y0 k kx − y0 k
8.1.7 Theorem. Let X be a normed space with S1 = {x ∈ X : kxk = 1} compact. Then
X is finite dimensional.
Proof. Assume that X is infinite dimensional. Choose x1 ∈ X with kx1 k = 1. Since the
span of x1 is a proper closed subspace of X, by 8.1.6 there exists a vector x2 with kx2 k = 1
such that kx2 − yk ≥ 1/2 for all y ∈ span{x1 }. Proceeding by induction, we obtain an
infinite sequence (xn ) in S1 such that

kxn+1 − yk ≥ 1/2 for all y ∈ span{x1 , . . . , xn }.

In particular, kxm − xn k ≥ 1/2 for all m 6= n. On the other hand, the compactness of S1
implies that (xn ) has a convergent subsequence. As these assertions are incompatible, X
must be finite dimensional.

Strictly Convex Spaces
A normed space is strictly convex if it satisfies the equivalent conditions in the following
proposition, these conditions asserting in various ways that a sphere does not contain line
segments.
8.1.8 Proposition. Let X be a normed space. The following statements are equivalent:

(a) x 6= y and kxk = kyk = 1 ⇒ kx + yk < 2.

(b) x 6= y and kxk = kyk = 1 ⇒ k(1 − t)x + tyk < 1 for all 0 < t < 1.

(c) x 6= 0 and y 6= 0 and kx + yk = kxk + kyk ⇒ x = ty for some t > 0.


204 Principles of Analysis

Proof. (a) ⇒ (b): Let x 6= y and kxk = kyk = 1. By hypothesis, the inequality in (b)
holds for t = 1/2. Now let 0 < t < 1/2. Then 0 < 2t < 1 and

tx + (1 − t)y = t(x + y) + (1 − 2t)y,

hence
ktx + (1 − t)yk ≤ tkx + yk + (1 − 2t) kyk < 2t + (1 − 2t) = 1.
Thus the inequality in (b) holds for 0 < t < 1/2. Similarly, if 1/2 < t < 1, then 0 < 2t − 1 < 1
and
tx + (1 − t)y = (1 − t)(x + y) + (2t − 1)x,
hence

ktx + (1 − t)yk ≤ (1 − t)kx + yk + (2t − 1) kxk < 2(1 − t) + 2t − 1 = 1.

Therefore the inequality in (b) holds for all t ∈ (0, 1).


(b) ⇒ (c): Let x 6= 0 and y 6= 0 and kx + yk = kxk + kyk. Then

kxk kyk
+ = 1,
kx + yk kx + yk
−1 −1 −1
which forces kxk x = kyk y; otherwise, by (b) with t := kyk kx + yk ,

x+y
1= = (1 − t) x + t y < 1.
kx + yk kxk kyk

(c) ⇒ (a): Let kxk = kyk = 1, x = 6 y. Suppose for a contradiction that kx + yk = 2.


Then 2 = kx + yk ≤ kxk + kyk ≤ 2, hence kx + yk = kxk + kyk and so, by hypothesis,
x = ty for some t > 0. But then t = 1, contradicting that x 6= y.

Pdspace is an example of a strictly convex space. By contrast, R with the


d
A Euclidean
norm kxk1 = j=1 |xj | or kxk∞ = max1≤j≤d |xj | is not strictly convex. A more interesting
example is Lp (X, F, µ). If 1 < p < ∞, then Lp is strictly convex by the second part of 4.1.3.
On the other hand, if p = 1 or ∞, then Lp is strictly convex only in trivial circumstances
(Ex. 8.23).
We conclude this subsection with an application of strict convexity to approximation
theory. Let X be a normed space and C a nonempty convex subset of X. If x ∈ X, then an
element y0 ∈ C is called a best approximation to x out of C if

kx − y0 k = inf{kx − yk : y ∈ C}.

The relevance of this notion here is that if X is strictly convex then best approximations, if
they exist, are unique. To see this, let α denote the infimum and suppose that kx − z0 k = α
for some point z0 ∈ C distinct from y0 . Then x − z0 6= x − y0 , and since kx − z0 k =
kx − y0 k = α, we have, by strict convexity,

x − 1 (z0 + y0 ) = 1 k(x − z0 ) + (x − y0 )k < α.
2 2

But this is impossible as 21 (z0 + y0 ) ∈ C. We have proved


8.1.9 Proposition. Let C be a nonempty convex subset of a strictly convex space X. Then
each x ∈ X has at most one best approximation out of C.
Banach Spaces 205

Note that, as a special case, a nonempty convex subset of a strictly convex space X
cannot have more than one member with smallest norm.
While strict convexity guarantees uniqueness of best approximations, it does not guarantee
existence. For this additional conditions must be placed on X. One such condition is uniform
convexity, discussed in §10.4. For now, we offer the following more modest result, the proof
of which is left to the reader as an exercise (8.25).
8.1.10 Proposition. Let X be a normed space and Y a finite dimensional subspace of X.
Then for each x ∈ X there exists a best approximation to x out of Y.
For example, if 1 < p < ∞ and f ∈ Lp [0, 1], then there exists a unique polynomial on
[0, 1] of degree ≤ n that best approximates f in Lp norm out of all polynomials of degree
≤ n.

Exercises
8.1 Prove that the operations of addition and scalar multiplication in a normed linear space are
continuous.

8.2 Let X be a normed space and x 6= 0 ∈ X. Show that if (cn ) is a sequence in K such that
cn x → y ∈ X, then c := limn cn exists in K and cx = y.

8.3 Show that in a normed linear space, Cr (x) = cl Br (x) and Br (x) = int Cr (x).

8.4 Let C be a nonempty, closed subset C of a normed space X with the property x, y ∈ C ⇒
1
2
(x + y) ∈ C. Show that C is convex. JConsider the dyadic rationals.K

8.5 Verify the relations (8.2).

8.6 Let Y be a dense subspace of a normed linear space X. Show that the open unit ball B1 ∩ Y of
Y is dense in open unit ball B1 of X.

8.7 Prove that equivalence of norms is an equivalence relation.

8.8 Show that the norms k·k1 and k·k∞ on C[0, 1] are not equivalent.

in C such that | n∈F cn | ≤ M for all finite F ⊆ N. Show


P
8.9 Let M
P> 0 and let (cn ) be a sequence
1
that n |cn | ≤ 4M , hence (cn ) ∈ ` (N).

8.10 Show that an infinite dimensional Banach space X has a nonclosed linear subspace. JUse the
Baire category theorem.K

8.11 Show that the linear space D[a, b] of differentiable functions on [a, b] is not complete in either
the uniform norm or the L1 norm.

8.12 Show that the space c of all convergent sequences in C is a Banach space under the sup norm.
8.13 Let 0 < α < 1. A function f ∈ Cb (R) is Hölder continuous of order α if

|f (x) − f (y)|
kf k0,α := sup < ∞.
x6=y |x − y|α

Show that the set C0,α (R) of all such functions is a Banach space under the norm ||| f ||| :=
kf k0,α + kf k∞ .

8.14 (a) Prove that kf kbv := |f (a)| + V[a,b] (f ) defines a norm on the space BV [a, b] of functions of
bounded variation (see §5.5) and that BV [a, b] is a Banach space under this norm.
(b) Show that the space AC[a, b] of absolutely continuous functions on [a, b] is a closed subspace
of the Banach space BV [a, b].
(c) Show that the norms k·kbv and k·k∞ on AC[a, b] are not equivalent.
206 Principles of Analysis

8.15 Show that the spaces Cc (Rd ) and C0 (Rd ) are separable.
8.16 Prove that if X is a separable Banach space and Y is a closed subspace, then Y is separable.
8.17 Let A ⊆ R be Lebesgue measurable with λ(A) > 0. Show that Lp (A) is infinite dimensional,
1 ≤ p ≤ ∞.
8.18 Show that the Lp and Lq norms on C[0,1] are not equivalent if 1 ≤ p < q ≤ ∞.
8.19 Let X and Y be Hausdorff topological spaces and let Z be a dense subset of X. Let Cb (X)
and Cb (Y ) have the supremum norms. Suppose that f : X × Y → C is bounded such that
f ( · , y) ∈ Cb (X) for all y ∈ Y , f (z, · ) ∈ Cb (Y ) for all z ∈ Z, and the set of mappings
f (Z, · ) := {f (z, · ) : z ∈ Z} is relatively compact in Cb (Y ). Prove the following:
(a) The collection of mappings f (X, · ) = {f (x, · ) : x ∈ X} is relatively compact in Cb (Y ).
JLet x ∈ X and zα ∈ Z with zα → x. Then f (zβ , · ) → g ∈ Cb (Y ).K
(b) f ∈ Cb (X × Y ). JLet (xα , yα ) → (x0 , y0 ) in X × Y and use (a).K
(c) If Y is compact, then x → f (x, · ) maps X continuously into Cb (Y ). JArgue by contradiction,
using (b).K
8.20 Show that each of the conditions below is equivalent to strict convexity:
(i) x 6= y and kxk = kyk ⇒ kx + yk < kxk + kyk.
(ii) x 6= y and kxk , kyk ≤ 1 ⇒ kx + yk < 2.
(iii) x 6= y and kxk = kyk = 1 ⇒ k(1 − s)x + syk < 1 for some 0 < s < 1.
8.21 Prove the converse of 8.1.9: Let a normed space X have the property that for each closed convex
subset C, every x ∈ X has at most one best approximation out of C. Then X is strictly convex.
8.22 Show that for a locally compact Hausdorff space containing at least two points, C0 (X) is not
strictly convex.
8.23 Show that L1 and L∞ are not strictly convex except in trivial cases.
8.24 Let X be a normed space and C a nonempty subset of X. Show that the set of best approxima-
tions to x out of C is convex.
8.25 Prove 8.1.10.

8.2 Bounded Linear Transformations

Throughout this section, X, Y, and Z denote normed spaces over K.

Recall from §0.4 that a linear transformation T : X → Y is continuous iff it is bounded,


that is, iff there exists a constant M ≥ 0 such that kT xk ≤ M kxk for all x ∈ X (0.4.8). The
set of all bounded linear transformations (operators) from X to Y is denoted by B(X, Y):

B(X, Y) := T : X → Y : T is linear and bounded .
Note that B(X, Y) is a vector space under pointwise addition and scalar multiplication:
(S + T )(x) := Sx + T x, (cT )x := c(T x), S, T ∈ B(X, Y), c ∈ K.
An operator T ∈ B(X, Y) is said to be invertible if T −1 exists and is a member of
B(Y, X). In this case T is a topological isomorphism from X onto Y. Recall that a special
case is a surjective isometry, which satisfies kT xk = kxk for all x.
Banach Spaces 207

The Operator Norm


The operator norm of T ∈ B(X, Y) is defined by
kT k := inf{M : kT xk ≤ M kxk for all x ∈ X}. (8.6)
The boundedness of T implies that the set on the right is nonempty, hence kT k < ∞. Taking
a sequence of M ’s in (8.6) tending to kT k, we see that
kT xk ≤ kT k kxk for all x ∈ X. (8.7)
It follows that
kT k = sup{kT xk : kxk ≤ 1}. (8.8)
Indeed, denoting the supremum by s, we have s ≤ kT k by (8.7). For the reverse inequality,
let x 6= 0 and set y := x/ kxk. Then kT yk ≤ s, hence kT xk ≤ s kxk. By (8.6), kT k ≤ s.
8.2.1 Theorem. kT k is a norm on B(X, Y).
Proof. If kT k = 0, then T = 0 by (8.7). The triangle inequality follows from (8.8) and the
calculation k(S + T )xk ≤ kSxk + kT xk ≤ (kSk + kT k) kxk. Absolute homogeneity is an
easy consequence of definition (8.6).
8.2.2 Theorem. If Y is complete, then B(X, Y) is complete in the norm kT k, that is,
B(X, Y) is a Banach space.
Proof. Let (Tn ) be a Cauchy sequence in B(X, Y). In particular, M := supn kTn k < ∞.
Since kTn x − Tm xk ≤ kTn − Tm k kxk, (Tn x) is a Cauchy sequence in Y. Thus there exists
a unique element T x ∈ Y such that Tn x → T x. The mapping T : X → Y is obviously
linear, and the calculation kT xk = limn kTn xk ≤ M kxk shows that T is bounded. To
see that kTn − T k → 0, let ε > 0 and choose N so that kTn − Tm k ≤ ε for all m, n ≥ N .
For such m and n and all x with kxk ≤ 1, kTn x − Tm xk ≤ ε. Letting m → ∞ yields
k(Tn − T )xk ≤ ε. Thus kTn − T k ≤ ε for all n ≥ N .
8.2.3 Examples.
(a) (Convolution operator ). For a fixed g ∈ L1 (Rd ), the mapping Tg f := f ∗ g (§6.1) is a
bounded linear operator on L1 with kTg k = kgk1 . The inequality kTg k1 ≤ kgk1 follows from
6.1.1. For the reverse inequality, choose an approximate identity (φn ) in L1 (6.1.2). Then
limn kg ∗ φn − gk1 = 0, hence kTg k ≥ kTg φn k1 → kgk1 .
(b) (Integral operator ). Let X be a compact Hausdorff space, µ a Radon probability measure
on X, and k : X × X → C a continuous function. Define K on C(X) by
Z
(Kf )(x) = k(x, y)f (y) dµ(y), f ∈ C(X).

Then K is a bounded linear operator with kKk ≤ kkk∞ .


(c) (Multiplication operator ). Let (X, F, µ) be a σ-finite measure space and φ ∈ L∞ (µ).
For 1 ≤ p < ∞, define a linear operator Mφ : Lp → Lp by Mφ f = φf . We claim that
kMφ kp = kφk∞ . The inequality kMφ kp ≤ kφk∞ follows from
Z 1/p Z 1/p
kMφ f kp = |φf |p ≤ kφk∞ |f |p = kφk∞ kf kp .

For the reverse inequality, recall that kφk∞ = inf{t > 0 : |φ| ≤ t a.e.} (4.1.6). Thus for
0 < r < 1, the set on which |φ| ≥ r kφk∞ has positive measure. Since X is σ-finite, the
inequality holds on some set E of positive finite
R measure. Since f := µ(E)−1/p 1E has Lp
p p −1
norm equal to one, kMφ k ≥ kf φkp = µ(E) E
|φ| du ≥ (r kφk∞ )p . Letting r → 1 shows
p

that kMφ k ≥ kφk∞ . ♦


208 Principles of Analysis

The Banach Algebra B(X)


We use the notation B(X) for B(X, X). Note that operator composition in B(X) satisfies
(RS)T = R(ST ) R(aS + bT ) = aRS + bRT, and (aS + bT )R = aSR + bT R.
Thus B(X) is an algebra under composition, with identity the identity operator I. Moreover,
one has
kST k ≤ kSk kT k , T, S ∈ B(X). (8.9)
(Ex. 8.29.) Thus
8.2.4 Proposition. B(X) is a normed algebra under operator composition and a Banach
algebra if X is complete.

The Dual Space X 0


The space B(X, K) is called the dual space of X and is denoted by X 0 . If f ∈ X 0 and
x ∈ X, the value f (x) is frequently written hx, f i, depending on notational requirements.
Thus for a, b ∈ K, x, y ∈ X and f, g ∈ X 0 we have the duality relations
hax + by, f i = hax, f i + hby, f i and hx, af + bgi = hx, af i + hx, bgi .
By Theorem 8.2.2, X 0 is a Banach space under the norm
kf k = sup{| hx, f i | : kxk ≤ 1}.
Moreover, we have the inequality
| hx, f i | ≤ kxk kf k x ∈ X, f ∈ X0 .
We denote the closed unit ball in X 0 by C10 :
C10 := {x0 ∈ X : kx0 k ≤ 1}.

Bilinear Transformations
Let X, Y, and Z be normed spaces over K. A mapping B : X × Y → Z is said to be
bilinear if B(x, y) is linear in x for each fixed y and linear in y for each fixed x. B is
said to be bounded if for some M > 0
kB(x, y)k ≤ M kxk kyk for all x ∈ X and y ∈ Y. (8.10)
The set BI(X × Y, Z) of all bounded bilinear mappings is easily seen to be a vector
space under pointwise addition scalar multiplication. Defining kBk to be the infimum of the
constants M in (8.10), we have
8.2.5 Theorem. BI(X × Y, Z) is a normed space and
kBk = sup{kB(x, y)k : kxk ≤ 1, kyk ≤ 1}. (8.11)
Moreover, if Z is complete, then BI(X × Y, Z) is complete.
Proof. The proof is similar to that of (8.2.2). For example, to verify (8.11) let s denote
the supremum and let M be as in (8.10). If kxk , kyk ≤ 1, then kB(x, y)k ≤ M , hence
s ≤ M . Taking the infimum of the M 0 s yields s ≤ kBk. Since B(kxk−1 x, kyk−1 y) ≤ s
(x, y 6= 0), we have kB(x, y)k ≤ s kxk kyk and so kBk ≤ s by (8.10).
A bilinear transformation B : X × X → K is called a bilinear form on X. For example,
for f, g ∈ X 0 , the mapping f ⊗ g : X × X → K defined by (f R⊗ g)(x, y) = f (x)g(y) is a
bilinear form with kf ⊗ gk = kf k kgk. The mapping (f, g) → f g dµ is a bilinear form on
L2 (µ) with norm ≤ kf k2 kgk2 .
Banach Spaces 209

Exercises
8.26 Show that every infinite dimensional normed linear space has an unbounded operator. JExtend
a linearly independent sequence (xn ) to a basis and start by defining T xn . K
8.27 Show that kT k = sup{kT xk : kxk = 1}.
8.28 Let X and Y be normed spaces and T : X → Y linear. Show that if T (Sr ) is bounded for some
r > 0, then T ∈ B(X, Y).
8.29 Prove (8.9).

P∞X and Y be normed spaces over K and T : X → P


8.30 Let Y linear. Prove that T is bounded iff

n=1 T x n converges for all sequences (xn ) for which n=1 xn converges absolutely.

8.31 Let X and Y be normed spaces with X finite dimensional. Show that every linear transformation
T : X → Y is continuous.
8.32 [↑ 3.2.16] (Translation operator ). For f : Rd → C, define (Tx f )(y) = f (x + y). Show that Tx
is an isometric isomorphism on both Cb (Rd ) and Lp (λd ) (1 ≤ p ≤ ∞). If we consider Tx on
Cb [0, ∞) and take x > 0, then kTx k = 1 but Tx is not an isometry.
8.33 [↑ 3.2.16] (Dilation operator ). For f : Rd :→ R, define (Dr f )(x) = f (rx), r 6= 0. Show that Dr
is an isometric isomorphism on Cb (Rd ) and |r|d/p Dr is an isometric isomorphism on Lp (λd ).
8.34 (Left and right shift operators). Define T` and Tr on sequences x = (xn ) by
T` (x1 , x2 , . . .) = (x2 , x3 , . . .) and Tr (x1 , x2 , . . .) = (0, x1 , x2 , . . .).
Clearly T` X ⊆ X and Tr X ⊆ X for the spaces `p , c0 , c. Show that in each case the operators
have norm one and that Tr is an isometry but T` is not. Show also that T` Tr = I 6= Tr T` .
8.35 (Evaluation functional ). For x ∈ [0, 1] define the linear functional x
b(f ) = f (x), f ∈ C[0, 1].
b is continuous on C[0, 1] in the uniform norm but not in the L1 norm.
Show that x
8.36 Let P[0, 1] denote the space of all polynomials on [0, 1]. Show that the derivative operator
Df = f 0 on P[0, 1] is unbounded in both the uniform norm and the L1 norm.
8.37 Let X and Y be normed linear spaces such that B(X, Y) is complete. Show that if X 0 6= {0},
then Y is complete.
8.38 Let X and Z be Banach spaces, Y a dense subspace of X, and T ∈ B(Y, Z). Show that T
extends uniquely to a member of B(X, Z) with the same norm.
8.39 Fix a ∈ `∞ and define T : `∞ → `∞ by T (x) = (a1 x1 , a2 x2 , . . .). (a) Find kT k. (b) Show that
ran(T ) need not be closed. (c) If T is 1-1, show that T −1 may not be bounded on ran T .
8.40 For x ∈ c define L(x) = limn xn . Show that L ∈ c0 and kLk = 1.
8.41 Let µ and ν be σ-finite measures on a measurable space (X, F) such that µ  ν and set
ϕ = dµ/dν. Show that the mapping T f = ϕ1/p f is a linear isometry from Lp (µ) to Lp (ν). Show
that T is surjective iff ν  µ.
8.42 Let (X, F, R and k : X ×X → C measurable such that the functions
R µ) be a σ-finite measure space
F (x) := |k(x, y)| dµ(y) and G(y) := |k(x, y)| dµ(x) are in L∞ . Let 1 < p < ∞ and let q be
conjugate to p. Show that the integral operator
Z
Kf (x) := k(x, y)f (y) dµ(y), f ∈ Lp (µ),

is bounded with norm ≤ kF k1/p 1/q


∞ kGk∞ .

8.43 Let X be strictly convex and P ∈ B(X) such that P 2 = P and kP k ≤ 1. Suppose that for
each x ∈ X there exists Tx ∈ B(X) with kTx k ≤ 1 such that Tx P x = x. Show that P is the
identity operator.
210 Principles of Analysis

8.3 Concrete Representations of Dual Spaces


Dual spaces play a fundamental role in functional analysis and appear frequently in the
development of the subject in the book. In this section we give several examples of concrete
representations of dual spaces. Later sections treat the properties of general dual spaces.
The arguments in the first two examples make frequent use of the signum function sgn z
defined in Chapter 0. We shall also need the special sequences
n
en := (0, . . . , 0, 1, 0, . . .) and e := (1, 1, . . .).

The Dual of c0 is `1
For x = (x1 , x2 , . . .) ∈ `1 , define a linear map fx by

X
fx (y) := xn yn , y := (y1 , y2 , . . .) ∈ c0 .
n=1

We show that the mapping x → fx is an isometric isomorphism of `1 onto c00 with inverse
f → xf , where 
xf := f (e1 ), f (e2 ), . . . .
Clearly, fx ∈ c00 with kfx k ≤ kxk1 . Now let f ∈ c00 be arbitrary and set yj := sgn f (ej ).
Then
Xn
y(n) := yj ej = (y1 , . . . , yn , 0, 0, . . .) ∈ c0 and ky(n) k∞ ≤ 1,
j=1
P P
n n
hence kf k ≥ |hy(n) , f i| = j=1 yj f (ej ) = j=1 |f (ej )|, which shows that kf k ≥ kxf k1 .

Moreover, if z = (z1 , z2 , . . .) ∈ c0 and z(n) := (z1 , . . . , zn , 0, . . .), then z(n) − z ∞ → 0
Pn
and so f (z) = limn f (z(n) ) = limn k=1 zk f (ek ) = fxf (z). Therefore, f = fxf .

The Dual of c is `1
For x = (x1 , x2 , . . .) ∈ `1 define a linear map fx by

X
fx (y) := x1 lim yn + xn+1 yn , y := (y1 , y2 , . . .) ∈ c
n
n=1

We show that the mapping x → fx is an isometric isomorphism of `1 onto c0 with inverse


f → xf , where
 X∞ 
xf := f (e) − f (en ), f (e1 ), f (e2 ), . . . .
n=1

Clearly, x → fP x is linear. Moreover, since | limn yn | ≤ kyk∞ , kfx k ≤ kxk1 . Now let

f ∈ c0 . As above, j=1 |f (ej )| < ∞, hence xf ∈ `1 . Set
n
X n
X
n
(n)

dn = (0, . . . , 0, 1, 1, . . .) = e − ej and y = sgn(f (dn ))dn + sgn f (ej ) ej .
j=1 j=1
Banach Spaces 211
P∞
Then y(n) ∈ c and ky(n) k ≤ 1, and since f (dn ) → f (e) − n=1 f (en ) we have
n
X
kf k ≥ |hy(n) , f i| = |f (dn )| + |f (ej )| → kxf k1 .
j=1

(n) if z = (z1 , z2 , . . .) ∈
Finally, c, α := limn zn , and z(n) := (z1 , . . . , zn , α, α, . . .), then
z − z → 0, hence

 n
X 
f (z) = lim f (z(n) ) = lim αf (dn ) + zk f (ek ) = fxf (z).
n n
k=1

Therefore, f = fxf , completing the proof.

The Dual of Lp is Lq
Let (X, F, µ) be a σ-finite measure space, 1 ≤ p < ∞, and let q be conjugate to p. For
g ∈ Lq define ϕg on Lp by Z
ϕg (f ) = f g dµ, f ∈ Lp .

We show that the mapping g 7→ ϕg is an isometric isomorphism from Lq onto (Lp )0 .


Clearly, ϕg is linear, and by Hölder’s inequality kϕg k ≤ kgkq . Now let ϕ ∈ (Lp )0 be
arbitrary. It remains to find a g ∈ Lq such that ϕ = ϕg and kϕg k ≥ kgkq .
Suppose that the existence of g has been established for finite measure spaces. For the
σ-finite case, let Xn ↑ X with 0 < µ(Xn ) < ∞ for all n. Consider the restriction ϕn of ϕ to
the subspace Lpn of Lp consisting of those functions that vanish outside Xn . We identify Lpn
with Lp (Xn , F ∩ Xn , µ) and make the analogous identification for Lqn . For each n choose a
function gn in Lqn such that
Z

ϕn (f ) = f gn dµ, ϕn := ϕ Lp , f ∈ Lpn , kgn kq = kϕn k ≤ kϕk .
n

Since Lpn ⊆ Lpn+1 , gn+1 = gn a.e. on Xn , hence we may define a measurable function g on
X such that g = gn on Xn . Since |gn | ≤ |gn+1 | on Xn+1 , kgn kq → kgkq by the monotone
convergence theorem, hence kgkq ≤ kϕk. Furthermore, if f ∈ Lp (µ), then kf 1Xn → f kp by
the dominated convergence theorem and so
Z Z Z
ϕ(f ) = lim ϕ(f 1Xn ) = lim f gn dµ = lim f 1Xn g dµ = f g dµ = ϕg (f ).
n n n

Thus if the assertion holds for the finite case then it holds for the σ-finite case.
We now establish the existence of g for the case µ(X) < ∞. To this end, define a set
function ν on F by ν(E) = h1E , ϕi. Then ν is countably additive. Indeed, if (En ) is a disjoint
sequence in F with union E, then
Z n
X p Z X X

1E − 1 Ej dµ = 1Ej dµ = µ(Ej ) → 0,
j=1 j>n j>n

Pn Lp
that is, j=1 1Ej → 1E . Countable additivity now follows from the linearity and continuity
of ϕ.
Next, observe that the inequality |ν(E)| ≤ kϕk k1E kp implies that ν  µ. Thus, by the
212 Principles of Analysis
R
Radon-Nikodym theorem, there exists a function g ∈ L1 (µ) such that h1E , ϕi = E
g dµ for
all E ∈ F. In particular, for all simple functions f , we have
Z
ϕ(f ) = f g dµ. (a)

We claim that f g ∈ L1 and that (a) holds for all f ∈ Lp (µ). Define fn = 1En f , where
En = {|f | ≤ n}. For each n choose a sequence of simple functions (fn,k )k such that
a.e.
fn,k → fn and |fn,k | ≤ |fn | for all k (2.3.1). By the dominated convergence theorem,
limk kfn,k − fn kp → 0, hence from (a)
Z Z
ϕ(fn ) = lim ϕ(fn,k ) = lim fn,k g = fn g, (b)
k k

the last equality by the dominated convergence theorem, since |fn,k g| ≤ |ng|. Now set
|g| = eiθ g. Replacing f by eiθ |f | in the above, we have
Z Z

kϕk kf kp ≥ kϕk kfn kp = kϕk keiθ |fn |kp ≥ |ϕ(eiθ |fn |)| = geiθ |fn | dµ = |gfn | dµ.

Letting n → ∞ and applying Fatou’s lemma yields


Z
kϕk kf kp ≥ |gf | dµ, (c)

p R
hence f g ∈ L1 . Moreover, since kf − fn kp = |f |>n |f | dµ → 0 we have ϕ(fn ) → ϕ(f ). Using
the dominated convergence theorem in (b), we see that (a) holds for all f ∈ Lp (µ).
We now show that g ∈ Lq and that kgkq ≤ kϕk, completing the argument. Suppose first
that q < ∞. Define gn = g if |g| ≤ n and gn = 0 otherwise, so that

gn → g, |gn | ≤ |g|, and kgkq ≤ lim kgn kq , (d)


n
R R
the last by Fatou’s lemma. By (c), kϕk kf kp ≥ |gf | dµ ≥ |gn f | dµ (f ∈ Lp ). Taking
f = |gn |q/p and applying Ex. 4.5 we have
Z 1/p Z Z 1/p Z 1/q
q q/p q q
kϕk |gn | ≥ |gn | · |gn | = |gn | |gn | ,

hence kϕk ≥ kgn kq . Therefore, by (d), kϕk ≥ kgkq .


Now suppose q = ∞. Set A := {|g| > kϕk + ε} (ε > R 0) and suppose that
R µ(A) > 0. Define
f (x) = µ(A)−1 sgn g(x)1A (x). Then kf k1 = 1 and f g dµ = µ(A)−1 A |g| dµ > kϕk + ε,
contradicting (d). Therefore, |g| ≤ kϕk + ε a.e. and so kgk∞ ≤ kϕk, as required. ♦

The Dual of C0 (X) is Mra (X)


Let X be a locally compact Hausdorff space. For µ ∈ Mra (X) define
Z
ϕµ (f ) = f dµ, f ∈ C0 (X).

We show that the mapping µ → ϕµ is an isometry from Mra (X) onto C0 (X)0 . This result is
known as the Riesz representation theorem.
Clearly, ϕµ is linear and |ϕµ (f )| ≤ kµk kf k∞ , hence ϕµ ∈ C0 (X)0 and kϕµ k ≤ kµk. To
Banach Spaces 213

show equality, let µ = eiθ |µ| be the polar decomposition of µ (5.3.6(b)). Since |µ| is a Radon
measure (7.1.5), by Lusin’s theorem (7.1.3) given ε > 0 there exists g ∈ Cc (X) such that
|g| ≤ 1 and g = e−iθ on a set E with |µ|(E c ) < ε/2. Then
Z Z Z

kµk = e−iθ dµ ≤ g dµ + (e−iθ − g) dµ ≤ |ϕµ (g)| + 2|µ|(E c ) ≤ kϕµ k + ε,
Ec

hence kµk ≤ kϕµ k.


It remains to show that if ϕ ∈ C0 (X)0 then ϕ = ϕµ for some µ ∈ Mra (X). Assume  first
that ϕ is real-valued on the real linear space C0 (X, R). Define ϕ+ on C0 X, [0, ∞) by

ϕ+ (f ) := sup{ϕ(g) : g ∈ C0 (X, R), 0 ≤ g ≤ f }.

Clearly ϕ+ (f ) ≥ 0 and ϕ+ (cf ) = cϕ+ (f ) for c ≥ 0. We claim that

ϕ+ (f1 + f2 ) = ϕ+ (f1 ) + ϕ+ (f2 ). (†)

Let 0 ≤ gj ≤ fj . Then 0 ≤ g1 + g2 ≤ f1 + f2 , hence ϕ(g1 ) + ϕ(g2 ) ≤ ϕ+ (f1 + f2 ). Taking


suprema over g1 and g2 yields ϕ+ (f1 ) + ϕ+ (f2 ) ≤ ϕ+ (f1 + f2 ). For the reverse inequality, let
g ≤ f1 + f2 and set g1 = g ∧ f1 and g2 = g − g1 . Then 0 ≤ gi ≤ fi , hence ϕ(g) = ϕ(g1 + g2 ) ≤
ϕ+ (f1 ) + ϕ+ (f2 ). Taking the supremum over g yields ϕ(f1 + f2 ) ≤ ϕ+ (f1 ) + ϕ+ (f2 ).
Next, extend the definition of ϕ+ to C0 (X, R) by defining ϕ+ (f ) = ϕ+ (f + ) − ϕ+ (f − ).
Using (†) one shows by an argument entirely similar to the first part of the proof of 3.2.12
that ϕ+ is linear on C0 (X, R). Defining ϕ− = ϕ+ − ϕ we now have ϕ = ϕ+ − ϕ− , where ϕ±
are positive linear functionals on C0 (X, R).
By 7.2.1, there exist unique Radon measures µ± such that
Z Z
ϕ+ (f ) = f dµ+ and ϕ+ (f ) = f dµ− for all f ∈ Cc (X, R).

Then µ+ − µ− is a signed Radon measure on X such that


Z
ϕ(f ) = ϕ (f ) − ϕ (f ) = f d(µ+ − µ− ),
+ −
f ∈ Cc (X, R).

For a complex-valued ϕ, apply this result to Re ϕ and Im ϕ to obtain Radon measures µr ,


µi such that for all f ∈ Cc (X, R),
Z Z Z
ϕ(f ) = Re ϕ(f ) + i Im ϕ(f ) = f dµr + i f dµi = f d(µr + iµi ). (‡)

By considering real and imaginary parts of f ∈ Cc (X) we see that (‡) holds for f ∈ Cc (X).
Since Cc (X) is dense in C0 (X), (‡) holds for all f ∈ C0 (X). Therefore, ϕ = ϕµ with
µ = µr + iµi . ♦

Exercises
8.44 Show that the dual of c00 is `1 .
8.45 Let ba(N) denote the linear space of finitely additive, complex set functions µ on N with the
totalPvariation norm kµk. (The latter is defined exactly as in Rthe case of complex measures.) If
g= n j=1 aj 1Ej is a simple function in standard form, then g dµ may be defined as in § 3.1.
Moreover, (a) and (b) of 3.1.1 (linearity) hold since only finite additivity is used in the proof.
Verify the following to show that the dual of `∞ (N) is ba(N).
(a) For µ ∈ ba(N), ϕµ (g) := g dµ is a bounded linear functional on the subspace of `∞
R
214 Principles of Analysis

consisting of simple functions. Therefore, ϕµ extends to a bounded linear functional on `∞ such


that |ϕµ (f )| ≤ kµk kf k∞ .
(b) kϕµ k = kµk JLet E1 , . . . En be a partition of N with n
P
j=1 |µ(Ej )| > kµk − ε.K
(c) The mapping µ → ϕµ is an isometric isomorphism from ba(N) onto the dual of `∞ .

8.46 Give the space C k [0, 1] of k-times continuously differentiable functions on [0, 1] the norm
k
X
||| f ||| = kf (j) k∞ .
j=0

(a) Show that C k [0, 1] is a Banach space.


Pk−1
(b) Show that for any a ∈ [0, 1], an equivalent norm is j=0 |f (j) (a)| + kf (k) k∞ .
(c) Show that the dual of C k [0, 1] consists of all function of the form
Z
ϕa,µ (f ) := a · f~(a) + f (k) dµ,

where µ is a complex Radon measure on [0, 1], a = (a0 , . . . , ak−1 ) ∈ Rk , and f~(a) =
(f (a), f 0 (a), . . . , f (k−1) (a)).
(d) Show that the mapping S : (a, µ) → ϕa,µ is a topological isomorphism from the product
space Kk × Mra [0, 1] onto the dual of C k [0, 1].

8.4 Some Constructions


In this section we describe several standard ways of constructing new normed spaces from
given spaces.

Product Spaces
Let X and Y be normed linear spaces over K. The product vector space is the set
X × Y together with the operations

(x1 , y1 ) + (x2 , y2 ) := (x1 + x2 , y1 + y2 ), c(x, y) := (cx, cy).

There is no canonical norm for X ×Y; however, the following equivalent norms are frequently
used:
q
2 2
k(x, y)k1 := kxk + kyk , k(x, y)k2 := kxk + kyk , k(x, y)k∞ := max{kxk , kyk}.

Each of these norms induces the product topology on X × Y. More generally, we have the
following result, which may be seen as a direct consequence of 8.1.3. The proof is left as an
exercise (8.47).
8.4.1 Proposition. All norms on X × Y that generate the product topology are equivalent.
The projection maps PX : X × Y → X and PY : X × Y → Y are defined by

PX (x, y) = x and PY (x, y) = y.

These are clearly linear and continuous in the product topology. The straightforward proof
of following proposition is left to the reader.
Banach Spaces 215

8.4.2 Proposition. Let X, Z, and Y be normed spaces.


(a) X × Y is complete iff X and Y are complete.
(b) The projection mappings are open.
(c) If T : Z → X × Y is linear, then T is bounded iff PX T and PY T are bounded.
The preceding discussion may be generalized to a product X1 × · · · × Xn of finitely many
normed linear spaces. The analogs of 8.4.1 and 8.4.2 are easily seen to hold in this setting.
The details are left to the reader.

Direct Sums
Let Z be a vector space over K with subspaces X and Y. Then Z is said to be the
algebraic direct sum of X and Y if the following conditions hold:

Z = X + Y := {x + y : x ∈ X, y ∈ Y} and X ∩ Y = {0}. (8.12)

In this case we write


Z = X ⊕ Y.
Conditions (8.12) are equivalent to the property that every member of Z is uniquely
expressible as a sum x + y, x ∈ X, y ∈ Y. This in turn is equivalent to X × Y being
algebraically isomorphic to Z under the mapping

(x, y) 7→ x + y : X × Y → Z. (8.13)

Projection mappings PX and PY on X ⊕ Y are defined by analogy with product spaces:

PX (x + y) = x and PY (x + y) = y.

Uniqueness of representation implies that the mappings are well-defined. Moreover, the
mappings are easily seen to be linear. The identities

PX + PY = I, PX2 = PX , PY2 = PY , and PX PY = PY PX = 0 (8.14)

follow easily from the definitions. Furthermore,

X = ran PX = ker PY and Y = ran PY = ker PX . (8.15)

Conversely, if PX and PY are linear mappings on Z that satisfy (8.14), then Z = X ⊕ Y,


where X := ran PX and Y := ker PX . Indeed, any z ∈ Z may be written z = P z+(z−P z),
where P z ∈ X and z − P z ∈ Y. Since z ∈ X ∩ Y ⇒ z = P z = 0, the assertion follows.
Now let Z be a normed space with subspaces X and Y and let X × Y have a norm that
generates the product topology. The algebraic isomorphism (x, y) 7→ x + y is obviously
continuous. If the inverse x + y 7→ (x, y) is also continuous, that is, if (x, y) 7→ x + y is
a topological isomorphism, then Z is said to be the topological direct sum of Y and Z.
8.4.3 Proposition. Let Z be a normed space that is the algebraic direct sum of subspaces
X and Y. Then Z is a topological direct sum iff PX (equivalently PY ) is continuous.
Proof. The map PX is the composition of the algebraic isomorphism x + y 7→ (x, y) with
the continuous map (x, y) 7→ x. Hence if (x, y) 7→ x + y is a topological isomorphism,
then PX is continuous. Conversely, if PX is continuous, then so is PY = I − PX , hence
xn + yn → x + y ⇒ xn → x and yn → y ⇒ (xn , yn ) → (x, y). Therefore, (x, y) 7→
x + y is a topological isomorphism.
216 Principles of Analysis

The notion of a normed linear space X as an algebraic direct sum X1 ⊕ · · · ⊕ Xn of


subspaces Xj of X is defined analogously. The requirement here is that every member of X
be uniquely expressible as a sum x1 + · · · + xn . The associated projection mappings Pj ,
defined by Pj (x1 + · · · + xn ) = xj , satisfy
n
X
Pj = I, Pj2 = Pj and Pi Pj = 0, i 6= j.
j=1

Topological direct sums are defined as above by requiring that the map x1 + · · · + xn 7→
(x1 , · · · , xn ) be continuous. The proof of the following proposition is a straightforward
modification of that of 8.4.3.
8.4.4 Proposition. Let X be a normed space which is the algebraic direct sum of sub-
spaces X1 , . . . , Xn . Then X is the topological direct sum iff the projection mappings Pj are
continuous.

Quotient Spaces
Recall that if Y is a subspace of a linear space X, then X/Y is the linear space of all
equivalence classes x + Y with the operations
(x1 + Y) + (x2 + Y) = (x1 + x2 ) + Y and c(x + Y) = cx + Y.
Relative to these operations the quotient map
Q : X → X/Y, Qx := x + Y,
is linear with kernel Y. We show in this subsection that if X is a normed space, then the
quotient space has a natural norm, called the quotient norm, with respect to which Q is
continuous and open.
8.4.5 Theorem. Let Y be a closed linear subspace of a normed space X. Then
kQxk = kx + Yk := inf{kx + yk : y ∈ Y} (8.16)
defines a norm on X/Y. Moreover, if X is complete, then so is X/Y.
Proof. Since 0 ∈ Y, kQ(0)k = 0. Let x, x1 , x2 ∈ X and c ∈ K. If c 6= 0, then
kc(x + Y)k = inf{kcx + yk : y ∈ Y} = |c| inf{kx + c−1 yk : y ∈ Y} = |c| kx + Yk.
For the triangle inequality, note that for any y1 , y2 ∈ Y,
k(x1 + Y) + (x2 + Y)k ≤ kx1 + x2 + y1 + y2 k ≤ kx1 + y1 k + kx2 + y2 k.
Taking infima over y1 and y2 yields k(x1 + Y) + (x2 + Y)k ≤ kx1 + Y)k + kx2 + Yk.
For positivity, assume that kQxk = 0. Then there exists a sequence (yn ) in Y such that
kx + yn k → 0. Since Y is closed, x ∈ Y, hence x + Y = Y, that is, Qx = 0. Therefore,
(8.16) defines a norm.
Now assume that X is complete.
P∞ To show that X/Y is complete, we use 0.4.3: Let (xn )
be a sequence in X such that n=1 kxnP+ Yk < ∞. For each n choose yn ∈ Y such that

kxn +Pyn k < kxn + Yk + 1/2n . Then n=1 kxn + yn k < ∞, so the sequence of partial
n
sums j=1 (xj + yj ) converges to some x ∈ X. Since
X X
n n

(xj + Y) − (x + Y) ≤ (xj + yj ) − x
,
j=1 j=1
P∞
the series n=1 (xn + Y) converges to x + Y.
Banach Spaces 217

8.4.6 Theorem. Let Y be a closed linear subspace of a normed space X and let X/Y have
the quotient norm. Then the quotient map Q has the following properties:
(a) Q is a bounded linear operator. If Y 6= X, then kQk = 1.
(b) Q is an open mapping.
(c) If Z is a normed space and T : X/Y → Z a linear mapping such that T Q is bounded,
then T is bounded.
Proof. (a) By (8.16), kQxk ≤ kxk, hence kQk ≤ 1. If Y 6= X, then kQxk = 1 for some
x ∈ X and so for each r > 1 there exists yr ∈ Y such that kx + yr k < r. Then
kQk kx + yr k ≥ kQ(x + yr )k = kQxk = 1, hence kQk ≥ 1/ kx + yr k > 1/r. Since r was
arbitrary, kQk ≥ 1.
 
(b) Note first that Q Br (0) = Br Q(0) (Ex. 8.57). Since an open set is a union of open
balls and since translations of open balls are open balls, it follows that Q is open.

(c) If T Q is continuous and V is open in Z, then U := Q−1 T −1 (V ) is open in X/Y
and so T −1 (V ) = Q(U ) is open in X/Y by (b).
The machinery of quotient spaces allows a simple proof of the following result.
8.4.7 Proposition. Let X be a normed space, Y a closed subspace of X, and F a finite
dimensional subspace of X. Then Y + F is a closed subspace of X.
Proof. Since
 Q(F) is a finite dimensional subspace of X/Y, it is closed. Therefore, Y + F =
Q−1 Q(F) is closed.

Exercises
8.47 Prove 8.4.1.

8.48 Prove 8.4.2.

8.49 Prove that c is the topological direct sum Ke ⊕ c0 , where e = (1, 1, . . .).
Q
8.50 Let (Xn ) be a sequence of normed spaces. The product vector space X := n Xn is the
collection of all sequences (x1 , x2 , . . .) (xn ∈ Xn ), with coordinate-wise addition and scalar
multiplication. Show that there exists a norm on X that induces the product topology of X iff
Xn = {0} for all sufficiently large n.
Q
8.51 [↑ 8.50] Let (Xn ) be a sequence of normed spaces and let n Xn have the product vector space
structure. For x = (x1 , x2 , . . .) define kxk∞ := supn kxn k. Show that X := {x : kxk < ∞}
is a normed space under k·k∞ . Show also that X is complete iff each space Xn is complete.

8.52 Let Z be a linear space with Z = X ⊕ Y and let T : Z → Z be linear. Prove that T PX = PX T
iff T X ⊆ X and T Y ⊆ Y.

8.53 Let Z be a normed space that is a topological direct sum of closed subspaces X and Y. Show
that Z/X is topologically isomorphic to Y.

8.54 Let X denote any of the sequence spaces c0 , c, `1 , or `∞ . Show that X = X1 ⊕ X2 , where the
summands are closed subspaces of X isometrically isomorphic to X.

8.55 Let X be a normed space, Y a complete (hence closed) subspace of X. Show that if X/Y is
complete, then X is complete.
218 Principles of Analysis

8.56 Let k·k be a seminorm on a linear space X. The notions of sequential convergence and Cauchy
sequence still make sense in this setting, except that limits, if they exist, may not be unique.
The device for handling this situation is as follows: Let Y = {y ∈ X : kyk = 0}. Show that Y
is a linear subspace of X and that kx + Yk = kxk defines a norm on Z := X/Y. Show also
that if the seminorm has the property that every Cauchy sequence converges in X then Z is
complete.
8.57 Let Y be a closed linear
  of a normed space X and Q : X → X/Y the quotient map.
subspace
Show that Q Br (0) = Br Q(0) .
8.58 Let X be a noncompact locally compact Hausdorff space and let X∞ := X ∪ {∞} be the
one-point compactification of X (§ 0.12). Show that C0 (X) is isometrically isomorphic to the
space {f ∈ C(X∞ ) : f (∞) = 0} and that C(X∞ ) is topologically isomorphic to the direct
product C0 (X) × C.
8.59 Let X and Y be normed spaces with dual spaces X 0 and Y 0 , and let X ×Y and X 0 ×Y 0 have the
k·k2 -norms. Given z0 ∈ (X ×Y)0 , define Sz0 ∈ X 0 and T z0 ∈ Y 0 by hx, Sz0 i = h(x, 0), z0 i and
hy, T z0 i = h(0, y), zi. Show that the mapping Rz0 := (Sz0 , T z0 ) is an isometric isomorphism
from (X × Y)0 onto X 0 × Y 0 such that h(x, y), z0 i = hx, Sz0 i + hy, T z0 i.
JFor kz0 k ≤ kRzk use the Cauchy-Schwartz inequality in K2 . For the reverse inequality
let ε > 0, and find x ∈ X and y ∈ Y with norm one such that kSz0 k ≤ |hx, Sz0 i| + ε
and kT z0 k ≤ |hy, T z0 i| + ε. Choose

|a| =0 |b| = 1 such that |hx, Sz0 i| = ahx, Sz0 i and
|hy, T z i| = bhy, T z i and consider (a kSz k x, b kT z k y), z0 . K
0 0 0

8.5 Hahn-Banach Extension Theorems


The Hahn-Banach theorem in its various forms guarantees the existence of a rich supply
of continuous linear functionals. The versions described in this section treat the problem
of extending a linear functional while preserving a certain crucial inequality. Geometric
versions in the form of separation theorems are considered in §9.3.
The version of the Hahn-Banach theorem for real linear spaces is based on the following
notion: A Minkowski functional on a real linear space X is a function p : X → R satisfying
• subadditivity: p(x + y) ≤ p(x) + p(y),
• positive homogeneity: p(tx) = p(tx), t ≥ 0.
Clearly, every seminorm is a Minkowski functional. The function p(f ) = sup f (X) on B(X, R)
is an example of a Minkowski functional that is not a seminorm.

The Real Hahn-Banach Theorem


8.5.1 Theorem. Let X be real linear space, p a Minkowski functional on X, and g a
real-valued linear functional on subspace of Y of X such that g ≤ p on Y. Then there exists
a real-valued linear functional f on X such that f = g on Y and f ≤ p on X.
Proof. Let x0 6∈ Y. We show first that g extends linearly to a function f on Z := Kx0 ⊕ Y
such that f ≤ p on Z. Since the representation z = cx0 + y is unique, for any fixed a ∈ R
the function
f (cx0 + y) := ac + g(y), x ∈ X,
is a well-defined linear functional on Z that extends g. We claim that a may be chosen so
that
f (cx0 + y) ≤ p(cx0 + y) for all c ∈ R and y ∈ Y. (†)
Banach Spaces 219

6 0, then, by positive homogeneity, (†) is


The inequality obviously holds if c = 0. If c =
equivalent to

a + g(y/c) ≤ p(x0 + y/c) if c > 0 and − a + g(−y/c) ≤ p(−x0 − y/c) if c < 0.

These inequalities are clearly implied by the conditions

a + g(y) ≤ p(x0 + y) and − a + g(ỹ) ≤ p(−x0 + ỹ), y, ỹ ∈ Y,

or, equivalently,

−p(−x0 + ỹ) + g(ỹ) ≤ a ≤ p(x0 + y) − g(y), y, ỹ ∈ Y.

Such a choice of a is possible if


   
sup − p(−x0 + ỹ) + g(ỹ) ≤ inf p(x0 + y) − g(y) .
ỹ∈Y y∈Y

But this inequality holds by virtue of the calculations

g(y) + g(ỹ) = g(y + ỹ) ≤ p(y + ỹ) = p(x0 + y − x0 + ỹ) ≤ p(x0 + y) + p(−x0 + ỹ).

Thus (†) holds, which shows that g has the required extension to Z.
Now consider the collection E of all real linear extensions f of g for which f ≤ p on dom(f ).
For two such functions, write f1  f2 if f2 is an extension of f1 , that is, dom(f1 ) ⊆ dom(f2 )
and f1 = f2 on dom(f1 ). Then  is a partial order on E such that every chain has an upper
bound. By Zorn’s lemma, there exists a maximal extension f ∈ E. From the first part of the
proof and maximality, dom f = X. Thus f is the desired extension of g.

The Complex Hahn-Banach Theorem


For the seminorm version of the Hahn-Banach theorem, we need the following lemma.
8.5.2 Lemma (Bohnenblust-Sobczyk). Let X be complex linear space and f a linear
functional on X. Then the real and imaginary parts fr and fi of f are real linear functionals
on X, considered as a linear space over R, and fi (x) = −fr (ix). Conversely, if fr is a real
linear functional on X, then the equation

f (x) = fr (x) − ifr (ix), x ∈ X, (8.17)

defines a complex linear functional on X.


Proof. That fr and fi are real linear functionals is clear. Moreover, for x ∈ X,

fr (ix) + ifi (ix) = f (ix) = if (x) = i fr (x) + ifi (x) = −fi (x) + ifr (x),

hence fr (ix) = −fi (x).


Conversely, if fr is a real linear functional on X, then the functional f defined by (8.17)
is additive and satisfies f (tx) = tf (x) for t ∈ R. The equality

f (ix) = fr (ix) − ifr (−x) = fr (ix) + ifr (x) = if (x)

then implies that f is a complex linear functional.


8.5.3 Theorem. Let X be a real or complex linear space and p a seminorm on X. Let Y
be a subspace of X and g a linear functional on Y such that |g| ≤ p on Y. Then there exists
a linear functional f on X such that f = g on Y and |f | ≤ p on X.
220 Principles of Analysis

Proof. The real case follows from 8.5.1, so we may assume that X is a complex linear space.
By the lemma, gr := Re g and gi := Im g are real linear functionals on Y. Since gr ≤ p on
Y, there exists a real linear extension fr of gr such that fr ≤ p on X. Define f as in (8.17).
By the lemma,

f (y) = fr (y) − ifr (iy) = gr (y) − igr (iy) = g(y), y ∈ Y,

hence f is an extension of g. Writing f (x) = |f (x)|e−iθ , we have

|f (x)| = f (x)eiθ = f (eiθ x) = fr (eiθ x) ≤ p(eiθ x) = |eiθ |p(x) = p(x).

The Hahn-Banach Theorem for Normed Spaces


8.5.4 Theorem. Let Y be a subspace of a normed space X. If g ∈ Y 0 , then there exists an
f ∈ X 0 that extends g such that kf k = kgk.
Proof. Define a seminorm p(x) := kgk kxk. Then |g| ≤ p on Y, hence g has a linear
extension f to X such that |f | ≤ p. It follows that f ∈ X 0 and kf k = kgk.
8.5.5 Corollary. Let Y be a closed subspace of a normed space X and x0 6∈ Y. Then there
exists f ∈ X 0 such that kf k = 1, f (Y) = 0, and f (x0 ) = inf{kx0 + yk : y ∈ Y}.
Proof. Define a linear functional g on the subspace Y0 := Kx0 ⊕ Y by g(cx0 + y) = cd,
where d is the above infimum. Since

|g(cx0 + y)| = |c|d ≤ |c| x0 + c−1 y = kcx0 + yk ,

kgk ≤ 1. Now choose a sequence (yn ) in Y such that kx0 + yn k → d. Then

d = g(x0 + yn ) ≤ kgk kx0 + yn k → kgk d,

hence, since d > 0 (because Y is closed), kgk ≥ 1 and so kgk = 1. An application of 8.5.4
completes the argument.
The second part of the next corollary asserts that X 0 separates points of X.
8.5.6 Corollary. For any x0 = 6 0 in a normed space X, there exists f ∈ X 0 such that
kf k = 1 and f (x0 ) = kx0 k. In particular, if x1 6= x2 then there exists f ∈ X 0 such that
f (x1 ) 6= f (x2 ),
Proof. For the first part, take Y = {0} in 8.5.5. For the second part take x0 = x1 − x2 .
8.5.7 Corollary. Let X be a normed space. If X 0 is separable, then X is separable.
Proof. Let (fn ) be dense in X 0 . For each n, choose xn ∈ X such that kxn k = 1 and
| hxn , fn i | ≥ kfn k /2, and set Y = cl span {x1 , x2 , . . .}. We claim that Y = X. If not, then
by 8.5.5 we may choose f ∈ X 0 with kf k = 1 and f (Y) = {0}. But then

1 − kf − fn k ≤ kfn k ≤ 2| hxn , fn i | = 2| hxn , f − fn i | ≤ 2 kf − fn k ,

hence kf − fn k ≥ 1/3 for all n, contradicting that (fn ) is dense in X 0 .


8.5.8 Corollary. Let X be a normed space. Then for each x ∈ X,

kxk = sup{| hx, f i | : f ∈ X 0 , kf k ≤ 1}.

Proof. Let s denote the supremum. Since | hx, f i | ≤ kxk kf k, s ≤ kxk. By 8.5.6, there
exists f ∈ X 0 such that kf k = 1 and hx, f i = kxk. Therefore, s ≥ kxk.
Banach Spaces 221

The Bidual of a Normed Space


The bidual X 00 of a normed space X is the dual of the dual: X 00 := (X 0 )0 . Given x ∈ X,
the evaluation functional corresponding to x = (xn ) is the linear functional x b defined by

b = hx, f i, f ∈ X 0 .
hf, xi

The collection of all evaluation functionals is denoted by X. b Corollary 8.5.8 asserts that
b 00
X ⊆ X and kxk b = kxk. For example, from §8.3 we see that the bidual of c0 may be
identified with `∞ . To find cb0 in this identification, note that for x = (xn ) ∈ c0 , x
b is the
mapping
X∞
b = hx, yi =
hy, xi xn yn , y ∈ `1 = c00 .
n=1

Now recall that in the identification of ` with (`1 )0 , a sequence (xn ) in `∞ is identified

with the linear functional on `1 defined precisely by the above equation. Thus we see that
cb0 may be identified with the subspace c0 of `∞ .
8.5.9 Theorem. Let X be a normed space. Then the mapping x → x b is a linear isometry
b in X 00 is a concrete realization of the completion
into the bidual of X. Thus the closure of X
of X.

Invariant Versions of the Hahn-Banach Theorem
A semigroup of operators on a vector space X is a set S of linear operators S : X → X
that is closed under composition. A subspace Y of X is said to be S-invariant if SY ⊆ Y
for all S ∈ S. A function G on an S-invariant subspace Y is said to be S-invariant if
G(Sy) = G(y) for all y ∈ Y and S ∈ S. The following versions of the Hahn-Banach
theorem, due to Agnew and Morse, address the problem of extending linear functionals that
are invariant under the action of a semigroup of operators.
8.5.10 Theorem. Let X be a real vector space, S a commutative semigroup of operators on
X, and p a Minkowski functional on X such that p(Sx) ≤ p(x) for all x ∈ X and S ∈ S.
Let Y be an S-invariant subspace of X and G an S-invariant, real-valued, linear functional
on Y such that G ≤ p on Y. Then G extends to a real-valued S-invariant linear functional
F on X such that F ≤ p on X.
Proof. We may assume that S contains the identity operator I. Let co S denote the set of
convex combinations of members of S:
X n n
X 
co S = tj Sj : Sj ∈ S, tj ≥ 0, and tj = 1 .
j=1 j=1

Define q(x) on X by
q(x) := inf {p(T x) : T ∈ co S} .
By linearity of T , q is a Minkowski functional on X. Since G(y) = G(T y) ≤ p(T y) for all
T ∈ co S and y ∈ Y, G ≤ q on Y. By the Hahn-Banach theorem, G has a linear extension
F on X such that F ≤ q. It remains to show Pthat F is S-invariant.
n−1
Fix S ∈ S and for each n define Tn = n1 j=0 S j where S 0 := I. Then
 n−1 n 
1 X j X j 1
Tn (I − S) = S − S = (I − S n ),
n j=0 j=1
n
222 Principles of Analysis

hence for all x


  
F (x) − F (Sx) ≤ q (I − S)x ≤ p Tn (I − S)x = 1
np x − Snx ≤ 1
n [p(x) + p(−x)].

Letting n → ∞ we see that F (x) − F (Sx) ≤ 0. Replacing x by −x gives the reverse


inequality. Therefore, F is S-invariant.
The proof of the following seminorm version of 8.5.10 may be modelled along the lines of
the proof of 8.5.3. The details are left to the reader.
8.5.11 Theorem. Let X be a real or complex linear space, S a commutative semigroup of
operators on X, and p seminorm on X such that p(Sx) ≤ p(x) for all x ∈ X and S ∈ S.
Let Y be an S-invariant subspace of X and G an S-invariant linear functional on Y such
that |G| ≤ p on Y. Then G extends to an S-invariant linear functional F on X such that
|F | ≤ p on X.

Exercises
8.60 Show that the converse of 8.5.7 is false.
b is closed in X 00 iff X is a Banach space.
8.61 Let X be a normed space. Show that X

8.62 Let x1 , . . . , xn be linearly independent vectors in a normed space X. Show that X 0 has at least
n linearly independent vectors.

8.63 Show that if X is strictly convex, then for each f ∈ X 0 with kf k > 0 there is at most one x
such that kxk = 1 and f (x) = kf k.

8.64 (a)
R 1 Show that for each n ∈ N there exists a probability measure µn on [0, 1] such that
k
0
x dµ(x) = k + 1 for all integers 0 ≤ k ≤ n. Can this hold for all k ≥ 0? (b) Show that there
R1
exists a probability measure µ on [0, 1] such that 0 xk dµ(x) = (k + 1)−1 for all integers k ≥ 0.

8.65 (Krein) Let X be a set and G ⊆ F linear spaces of real-valued functions f on X such that for
each f ∈ F there exists g ∈ G with g ≥ f . Let I be a positive linear functional on G, that is,
I(g) ≥ 0 whenever g ∈ G and g ≥ 0, Show that I extends to a positive linear functional on F.
JConsider p(f ) := inf{I(g) : g ∈ G and g ≥ f }.K

8.66 Show that a finite dimensional subspace F of a normed space X is complemented, that is,
there exists a closed subspace Y of X such that X = F ⊕ Y. JLet {x1 , . . . , xd } be a basis for
F and define suitable x0j ∈ X 0 .K

*8.6 Applications of the Hahn-Banach Theorem


The Moment Problem
Given a sequence (cn ) of real numbers, the classical version of the moment problem asks
when there exists a real-valued function F of bounded variation on [0, 1] such that
Z 1
tn dF (t) = cn , n = 0, 1, . . .
0

If cn ∈ [0, 1] and c0 = 1, then the problem can be stated in probabilistic terms: When
does there exist a probability measure on [0, 1] with given moments cn ? Note that by the
Stone-Weierstrass theorem, the solution, if one exists, is unique.
Banach Spaces 223
R1
Since the integral 0 g dF defines a continuous linear functional on C[0, 1], the moment
problem may be stated somewhat more abstractly as follows: Given a sequence (cn ), when
does there exist a continuous linear functional F on C[0, 1] such that htn , F i = cn for all n?
This suggests that the problem may be cast in a broader context, where C[0, 1] is replaced by
an arbitrary normed space X and the functions tn are replaced by members of X. Here is the
precise statement of the general moment problem, the resolution of which is a consequence
of the Hahn-Banach theorem.
8.6.1 Theorem. Let X be a normed space, I an arbitrary index set, {xi :∈ I} ⊆ X, and
{ci : i ∈ I} ⊆ K. Then the following statements are equivalent:
(a) There exists x0 ∈ X 0 such that hxi , x0 i = ci for all i ∈ I.

(b) There exists M > 0 such that for all finite subsets I0 ⊆ I and all ti ∈ K,
X X

t c ≤ M t
j j .
x
j j
i∈I0 j∈I0

Proof. (a) ⇒ (b): If such an x0 exists, then


X
X X
t c = ht x , x0
i ≤ kx0
k t
j j .
x
j j j j
j∈I0 j∈I0 j∈I0

(b) ⇒ (a):
P Let Y be the linear span of the set {xi : i ∈ I}. A typical member of Y may
be written j∈I0 tj xj , where I0 ⊆ I is finite. Define a mapping x0 on Y by
X  X
x0 tj xj = tj cj .
j∈I0 j∈I0

The inequality in (b) implies that x0 is well-defined. Moreover, x0 is linear and (b) shows
that kx0 k ≤ M . Therefore, by 8.5.4, x0 has an extension to a member of X 0 .

Invariant Means
Let S be a nonempty set. A mean on B(S, R) is a linear functional m such that

inf{f (s) : s ∈ S} ≤ m(f ) ≤ sup{f (s) : s ∈ S} for all f ∈ B(S, R).

Now let S be a semigroup. A mean m on B(S, R) is invariant if

m(Rs f ) = m(f ) = m(Ls f ) for all s ∈ S and f ∈ B(S, R),

where Rs and Ls are the right and left translation operators on B(S, R) defined by

Rs f (t) = f (ts) and Ls f (t) = f (st), s, t ∈ S.

8.6.2 Theorem. If S is a commutative semigroup, then B(S, R) has an invariant mean.


Proof. Take p to be the functional p(f ) = sup f on B(S, R) and m the identity function on
the space of constant function on S and apply 8.5.10.
In certain circumstances the commutativity hypothesis may be removed,1 but not generally,
as the following example shows.
1 For example, every finite group has an invariant mean. (See Chapter 16.)
224 Principles of Analysis

8.6.3 Example. Let S be the free group on two generators a and b. Thus S consists of
an identity 1 and all concatenations of the symbols a, b, a−1 , and b−1 , these concatenations
called words. A word may be reduced to a unique expression of the form s11 s22 . . . snn where
the εj are integers and a1 := a, b1 := b. (Any pairs aa−1 etc. are omitted.) Assume B(S, R)
has an invariant mean m. Let B denote the subset of S consistingP of the identity and all
n
reduced words starting with b. Since the sets aj B are disjoint, 1 ≥ j=1 1aj B , hence
n
X
1 = m(1) ≥ m(1aj B ) = nm(1B ) for all n
j=1

and so m(1B ) = 0. Now set X


f= 1an B .
n∈Z\{0}

Since 0 ≤ Lb−1 f ≤ 1B we have 0 ≤ m(f ) = m(Lb−1 f ) ≤ m(1B ) = 0, hence m(f ) = 0. On


the other hand, from f + 1B = 1 we obtain the contradictory statement m(f ) = 1. Thus
B(S, R) cannot have an invariant mean. ♦

Banach Limits

A Banach limit on B (0, ∞), R is linear functional, typically denoted by Limt→∞ f (t),
with the following properties:
(a) Limt→∞ f (t) = limt→∞ f (t) whenever the limit on the right exists.

(b) Limt→∞ f (t + s) = Limt→∞ f (t) for all f ∈ B (0, ∞), R and s ∈ (0, ∞).

(c) limt→∞ f (t) ≤ Limt→∞ f (t) ≤ limt→∞ f (t) for all f ∈ B (0, ∞), R .
8.6.4 Theorem. Banach limits exist.

Proof. Define a Minkowski functional p on B (0, ∞), R by p(f ) = limt→∞ f (t). Let F be
the subspace of all functions f : (0, ∞) → R such that the limit L(f ) := limt→∞ f (t) exists in
R. If Ts is the translation operator Ts f (t) = f (s + t), thenTs F ⊆ F, L(Ts f ) = L(f ) = p(f )
for all f ∈ F, and p(Ts f ) = p(f ) for all f ∈ B (0, ∞), R . An application of 8.5.10 yields
the desired functional.
The reader may easily formulate the analogous notion of Banach limit on `∞ (N), replacing
limt→∞ f (t) by limn xn , etc.

Invariant Set Functions


It follows From Ex. 3.72 and the material in §1.9 that there is no translation invariant
measure µ on P(R) with the property µ[a, b] = b − a for all a ≤ b, the countable additivity
requirement being essentially responsible for this state of affairs. One may then reasonably
ask if there exists a translation invariant finitely additive set function µ on P(R) with the
aforementioned length property. Banach has answered this question in the affirmative:
8.6.5 Theorem (Banach). There exists a finitely additive, translation invariant set function
µ on P(R) such that µ[a, b] = b − a for all a ≤ b.
Proof. Let X denote the real linear space of all bounded, real-valued functions on R with
period one. For f ∈ X, set p(f ) = kf k∞ . Clearly, p(Tt f ) = f , where Tt denotes the
translation operator Tt f (s) = f (s + t). Let Y be the subspace of continuous functions
R1
and define a positive linear functional G on Y by G(f ) = 0 f (t) dt. Then G ≤ p, and by
periodicity G is translation invariant. By 8.5.10, G extends to a translation invariant positive
linear functional F on X. Now define µ(E) := F (1E ), E ⊆ R.
Banach Spaces 225

Exercises
8.67 Show that a Banach limit on B(0, ∞) or `∞ (N) is continuous.

8.68 Let aj ∈ R. Find the Banach limit of the sequence x = (a1 , . . . am , a1 , . . . am , . . .).

8.69 Show that there exists a continuous linear functional f → Limt→0 f (t) on B := B (−1, 1), R


with the following properties:


(a) Limt→0 f (t) = limt→0 f (t) whenever the limit on the right exists.
(b) Limt→0 f (rt) = Limt→0 f (t) for all f ∈ B and 0 < r < 1.
(c) limt→0 f (t) ≤ Limt→0 f (t) ≤ limt→0 f (t) for all f ∈ B.

8.70 Show that there exists a finitely additive, translation invariant measure µ on P(R) such that
µ(E) = λ(E) for every bounded, Lebesgue measurable set E ⊆ R.

8.7 Baire Category in Banach Spaces


In this section we prove three basic results which, together with the Hahn-Banach theorem,
form the core of functional analysis.

The Uniform Boundedness Principle


The following theorem asserts that under suitable conditions a family of bounded linear
transformations that is pointwise bounded is uniformly bounded on bounded sets. The proof
depends on the Baire category theorem (0.3.12).

8.7.1 Uniform Boundedness Principle. Let X and Y be Banach spaces and let T be a
subset of B(X, Y) such that supT ∈T kT xk < ∞ for each x ∈ X. Then supT ∈T kT k < ∞.
Proof.S The set Xn := {x ∈ X : kT xk ≤ n ∀ T ∈ T} is closed and, by hypothesis,
X = n Xn . By Baire’s theorem, some Xn contains a closed ball C(x0 , r). Thus

kT yk ≤ n for all T ∈ T and all y ∈ X with ky − x0 k ≤ r.


−1
Now let x 6= 0 ∈ X with kxk ≤ 1 and set y := x0 +r kxk  x. Then
 ky−x0 k ≤ r and x =
r−1 kxk (y − x0 ), hence kT xk ≤ r−1 kxk kT yk + kT x0 k ≤ r−1 n + supS∈T kSx0 k .
The following application will have important consequences later.

8.7.2 Banach-Steinhaus Theorem. Let X and Y be Banach spaces and let (Tn ) be a
sequence in B(X, Y). Then limn Tn x exists in Y for all x ∈ X iff the following conditions
hold:
(a) supn kTn k < ∞ and

(b) limn Tn u exists in Y for all u in a dense subset D of X.

Moreover, if (a) and (b) hold, then the pointwise limit T := limn Tn is a member of B(X, Y)
and kT k ≤ limn kTn k ≤ supn kTn k.
226 Principles of Analysis

Proof. If T x := limn Tn x exists for all x ∈ X, then T is linear and supn kTn xk < ∞ for
all x. Therefore, by the uniform boundedness theorem, supn kTn k < ∞. From kT xk =
limn kTn xk ≤ kxk limn kTn k we have kT k ≤ limn kTn k.
Now assume (a) and (b) hold and set s = supn kTn k. For x ∈ X and ε > 0, choose u ∈ D
such that kx − uk < ε/s. Then

kTn x − Tm xk ≤ kTn (x − u)k + kTn u − Tm uk + kTm (u − x)k ≤ 2ε + kTn u − Tm uk .

The expression on the right is < 3ε for all sufficiently large m and n, hence (Tn x) is a
Cauchy sequence. Since Y is complete, (Tn x) converges in Y.

The Open Mapping Theorem


Let X and Y be normed spaces. by definition, a mapping T : X → Y is open iff for each
x ∈ X and r > 0 the image T Br (x) contains an open ball Bs (T x). If T is linear, then
one has the following simplification:
8.7.3 Proposition. A linear mapping T : X → Y is open iff T (B1 ) ⊇ Bt for some t > 0,
in which case T is surjective.
Proof. The necessity is clear. For the sufficiency, let T (B1 ) ⊇ Bt for some t. For any r > 0
and x ∈ X, Br (x) = x + rB1 , hence by linearity

T Br (x) = T x + rT (B1 ) ⊇ T x + rBt = Brt (T x).

Finally, for any y 6= 0 ∈ Y, t(2 kyk)−1 y ∈ Bt , hence y ∈ ran T .


The proof of the open mapping theorem rests on the following lemma.
8.7.4 Lemma. Let X and Y be Banach spaces, T ∈ B(X, Y), and ε > 0. Suppose that
Bε ⊆ cl T (B1 ). Then Bε/2 ⊆ T (B1 ).
Proof. Let y ∈ Bε . By hypothesis, we have ky − y1 k < ε/2 for some y1 ∈ T (B1 ). Then
y − y1 ∈ Bε/2 ⊆ cl T (B1/2 ), hence ky − y1 − y2 k < ε/4 for some y2 ∈ T (B1/2 ). By
induction, we obtain a sequence (xn ) in X such that

ky − T (x1 + · · · + xn )k < ε/2n , where xn ∈ B1/2n−1 .

The sums x1 + · · · + xn form a Cauchy sequence with limit x, say. By continuity, T x = y.


Since X X ∞
n

kxk = lim xk ≤ 1/2k−1 = 2,
n
k=1 k=1

y ∈ T (B2 ). Therefore, Bε ⊆ T (B2 ), hence Bε/2 ⊆ T (B1 ).


8.7.5 Open Mapping Theorem. If X and Y are Banach spaces and T ∈ B(X, Y) is
surjective, then T is an open mapping.
Proof. By 8.7.3 and S∞ 8.7.4, it suffices to show that cl T (B1 ) ⊇ Bε for some ε. Now, by
surjectivity, Y = n=1 T (Bn ), hence, by the Baire category theorem, for some n the closure
cl T (Bn ) = n cl T (B1 ) contains an open ball. Thus for some y0 ∈ Y and ε > 0,

cl T (B1 ) ⊇ Bε (y0 ) = y0 + Bε , and cl T (B1 ) ⊇ −Bε (y0 ) = −y0 + Bε ,

the second inclusion following from the first because −Br = Br . Thus for y ∈ Bε we have
y ± y0 ∈ cl T (B1 ), so by convexity y = 12 (y + y0 ) + 12 (y − y0 ) ∈ cl T (B1 ), as required.
Banach Spaces 227

The following fundamental result is immediate.

8.7.6 Banach Isomorphism Theorem. Let X and Y be Banach spaces and T : X → Y


a continuous algebraic isomorphism onto Y. Then T is a topological isomorphism.
8.7.7 Corollary. Let Z be a Banach space that is an algebraic direct sum of closed subspaces
X and Y. Then Z is a topological direct sum of X and Y.

Proof. Since the algebraic isomorphism (x, y) → x + y is continuous, the assertion follows
from the Banach isomorphism theorem.
8.7.8 Corollary. Let Z be a Banach space and P : Z → Z linear such that P 2 = P . If
ran P , ker P are closed, then P is continuous and Z is the topological direct sum of ran P
and ker P .

Proof. As noted earlier, Z is the algebraic direct sum of ran P and ker P . By the preceding
corollary, the sum is topological, hence P is continuous (8.4.3).
8.7.9 Corollary. Let X and Y be Banach spaces, let T ∈ B(X, Y) be surjective, and
let Q : X → X/ ker T be the quotient map. Then there exists a topological isomorphism
S : X/ ker T → Y such that SQ = T .

Proof. Define S by the equation SQ = T . Since ker Q = ker T , S is well-defined and


bijective. Moreover, S is clearly linear, and since SQ is continuous so is S (8.4.6(c)).
Banach’s isomorphism theorem now implies that S is a topological isomorphism.
8.7.10 Corollary (Sard Quotient Theorem). Let X, Y, and Z be Banach spaces and
let TXY ∈ B(X, Y) and TXZ ∈ B(X, Z) with TXY is surjective. Then there exists TY Z ∈
B(Y, Z) such that TXZ = TY Z TXY iff ker TXY ⊆ ker TXZ .
−1
SXZ SXY
Z X/ker TXY Y

Q
TXZ TXY
X
FIGURE 8.2: Sard Quotient Theorem

Proof. The necessity is obvious. For the sufficiency use 8.7.9 to obtain a topological isomor-
phism SXY : X/ ker TXY → Y such that SXY Q = TXY , where Q : X → X/ ker TXY is the
quotient map. Since ker Q = ker TXY ⊆ ker TXZ , we may define SXZ ∈ B(X/ ker TXY , Z)
−1 −1 −1
so that SXZ Q = TXZ . Since Q = SXY TXY , we have TXZ = SXZ SXY TXY . Therefore, SXZ SXY
is the desired map TY Z .

8.7.11 Example. Let X be a compact Hausdorff space and Y ⊆ X closed. We show that
C(Y ) is isometrically isomorphic to C(X)/Y, where Y = {g ∈ C(X) : g(Y ) = 0}.
Define a bounded linear map T : C(X) → C(Y ) by T f = f Y . Then ker T = Y, and T
is surjective by Tietze’s extension theorem. Let Q : C(X) → C(X)/Y denote the quotient
map. By 8.7.9, there exists a topological isomorphism S ∈ B(C(X)/ ker T, Y) such that
SQ = T . It remains to show S is an isometry, that is,

inf kf + gk∞ : g ∈ Y = f Y ∞ , f ∈ C(X).
228 Principles of Analysis

the infimum. For any g ∈ Y and y ∈ Y , |f (y)| = |f (y) + g(y)| ≤ kf + gk∞ ,


Let α denote
hence f Y ∞ ≤ α. For the reverse inequality, let ε > 0 and set

U := x ∈ X : |f (x)| < kf Y k∞ + ε .

Then U is open and contains the compact set Y , hence there exists continuous function h
such that 0 ≤ h ≤ 1, h = 0 on Y , and h = 1 on U c . Setting g = −f h we have g ∈ Y and

|f (x) + g(x)| = |f (x)| |1 − h(x)| ≤ kf Y k∞ + ε, x ∈ X.

Therefore α ≤ kf + gk∞ ≤ kf Y k∞ + ε. Since ε was arbitrary, α ≤ kf Y k∞ . ♦

The Closed Graph Theorem


Let X and Y be Banach spaces and let X × Y have the product topology and vector
space structure. For a mapping T : X → Y, set GT := {(x, T x) : x ∈ X}. Note that GT is
closed iff the following condition holds:

(xn ) ⊆ X, (x, y) ∈ X × Y and (xn , T xn ) → (x, y) ⇒ T x = y. (8.18)

In particular, if T is continuous, then GT is closed. The converse holds for linear maps:
8.7.12 Closed Graph Theorem. Let X and Y be Banach spaces and let T : X → Y be
a linear map such that GT is closed in X × Y. Then T is continuous.
Proof. Give X × Y the norm k(x, y)k = max{kxk , kyk}, which generates the product
topology (see §8.4). Since T is linear, GT is a linear subspace of X × Y. Define projection
mappings PX : GT → X and PY : GT → Y by PX (x, T x) = x and PY (x, T x) = T x.
These maps are clearly linear and PX (x, T x) is trivially continuous. Moreover, because T
is closed, PY is continuous. Since PX is a bijection, PX−1 : X → GT is continuous by the
Banach isomorphism theorem. Thus T = PY PX−1 is continuous.
The following corollary is sometimes called the two norm theorem.
8.7.13 Corollary. Let X be a Banach space with respect to norms kxk and ||| x |||. Suppose
there exists a constant c such that ||| x ||| ≤ c kxk for all x. Then the norms are equivalent.
Proof. We show that the identity map I : (X, ||| · |||) → (X, k·k) is continuous. It will follow
that kxk = kIxk ≤ kIk ||| x |||, proving the corollary.
Let (xn , Ixn ) = (xn , xn ) → (x, y) in (X, ||| · |||) × (X, k·k), so ||| xn − x ||| → 0 and
kxn − yk → 0. Since ||| xn − y ||| ≤ c kxn − yk, we also have ||| xn − y ||| → 0. Therefore,
x = y. By the closed graph theorem, I is continuous.

Exercises
8.71 Define linear functionals fn (x) = nj=1 xj on c00 . Show that supn |fn (x)| < ∞ for all x, yet
P
supn kfn k = ∞. Conclude that the completeness of X and Y in 8.7.1 is essential.

8.72 Let X be a normed space and A ⊆ X such that sup{|f (x)| : x ∈ A} < ∞ is bounded for
every f ∈ X 0 . Prove that sup{kxk : x ∈ A} < ∞. Thus weak boundedness implies norm
boundedness.

8.73 Let X, Y be Banach spaces and T : X → Y linear such that T is weakly continuous, that
is, f ◦ T is continuous for each f ∈ Y 0 . Show that T is continuous.

8.74 [↑ 8.2] Let X, Y, Z be Banach spaces and let B : X × Y → Z be bilinear and separately
continuous, that is, continuous in x for each y and continuous in y for each x. Show that B is
bounded.
Banach Spaces 229

8.75 Let X, Y be Banach spaces and T ∈ B(X, Y) injective. Prove: T −1 is continuous on ran(T ) iff
ran(T ) is closed.

8.76 [↓ 10.2.11] Let X, Y be Banach spaces and T ∈ B(X, Y) surjective. Show that there exists
c > 0 such that for each x there exists x1 with T x1 = T x and kx1 k ≤ c kT xk. JUse 8.7.9.K

8.77 Let X and Y be Banach spaces and T : X → Y linear. Suppose T has the property that
xn → 0 and T xn → y ⇒ y = 0. Prove that T is continuous.

8.78 Let (X, F, µ) be a measure space and T : L1 (µ) → L1 (µ) linear with the property that if (fn )
a.e a.e
is a sequence in L1 with fn → 0, then T fn → 0. Show that T is bounded.

8.79 Let X be a Banach space, T ∈ B(X) injective, and S : X → X linear with T S is continuous.
Prove that S is continuous.

8.80 Let C 1 [0, 1] and C[0, 1] have the sup norms. Show that the linear map D : C 1 [0, 1] → C[0, 1],
Df = f 0 , has a closed graph but is unbounded. Thus the completeness hypothesis in the closed
graph theorem is essential.

b(f ) = f (x), x ∈ [0, 1] ∩ Q


8.81 Let k·k be a complete norm on C[0, 1] such that the evaluation maps x
are continuous on C[0, 1]. Show that k·k and k·k∞ are equivalent.

8.82 Let (X, F, µ) be a measure space and g measurable such that f g ∈ L1 for all f ∈ L1 . Show
that the linear mapping T : f → f g on L1 is continuous and that g ∈ L∞ .

8.83 Let (X, F, µ) be a measure space and E ∈ F. Let Y = {g ∈ L1 : g(E) = 0}. Show that
L1 (X, F, µ)/Y is isometrically isomorphic to L1 (E, F ∩ E, ν), where ν = µ F∩E .

*8.8 Applications
Divergent Fourier Series
Let f : R → C be a periodic function with period 2π. The Fourier series of f is the
formal series Z 2π

X 1
f (t) ∼ ck eikt , where ck := e−ikx f (x) dx.
2π 0
k=−∞

The L convergence of Fourier series is discussed in §11.3. Deeper questions center around
2

pointwise convergence. In the current subsection, we merely demonstrate the existence of a


continuous f for which the above series diverges at t = 0. The proof here does not actually
construct such a function; however, a concrete example was given by Fejer [19].
Let X be the space of all continuous functions f : R → C with period 2π. Then X is a
Banach space under the sup norm. For n ∈ N, define Fn ∈ X 0 by
n
X Z 2π
1
Fn (f ) = e−ikx f (x) dx,
2π 0
k=−n

which is the nth partial sum of the Fourier series for f evaluated at t = 0. We show
that limn kFn k = ∞. It will then follow from the uniform boundedness principle that
supn |Fn (f )| = ∞ for some f ∈ X, as claimed.
230 Principles of Analysis

As a first step, we express Fn in terms of the Dirichlet kernel


 
1

 sin n + 2 t if t 6∈ 2πZ,
Dn (t) = 2 sin 21 t (†)

n + 1
2 otherwise.

Expand the numerator in (†) as


n
X n
X
1
    
sin n + 2 t = sin 12 t + sin k + 1
2 t − sin k − 1
2 t = sin 12 t + 2 cos kt sin 21 t.
k=1 k=1

Since 2 cos θ = eiθ + e−iθ , we then have


n n
1 X 1 X ikt
Dn (t) = + cos kt = e .
2 2
k=1 k=−n

The nth partial sum of the Fourier series for f may now be written
Xn  Z 2π  Z
1 1 2π
e−ikx f (x) dx eikt = f (x)Dn (t − x) dx.
2π 0 π 0
k=−n

Setting t = 0 and noting that Dn is an even function, we have


Z
1 2π
Fn (f ) = f (x)Dn (x) dx, f ∈ X.
π 0
We claim that Z 2π
1
kFn k = |Dn (x)| dx. (‡)
π 0
R 2π
The inequality kFn k ≤ π −1 0 |Dn (x)| dx is clear. For the reverse inequality, define g(x) = 1
if Dn (x) > 0 and g(x) = −1 if Dn (x) ≤ 0, so that |Dn (x)| = g(x)Dn (x). Since Dn changes
sign at only finitely many points in [0, 2π], given ε > 0 there exists f ∈ X with norm one
R 2π
such that 0 |f (x) − g(x)| dx < επ/ kDn k∞ . Therefore,
Z Z
1 2π
1 2π
|Dn (x)| dx ≤ |Fn (f )| + Fn (f ) − |Dn (x)| dx
π 0 π 0
Z 2π
1
= |Fn (f )| + [f (x) − g(x)]Dn (x) dx
π 0
< kFn k + ε,
verifying (‡).
Finally, from (†), (‡), and the inequality 2| sin 12 x| ≤ |x| we have
Z 2π Z 2π  Z (2n+1)π
1 sin n + 12 x | sin t|
kFn k = |Dn (x)| dx ≥ dx = dt
π 0 0 x 0 t
X Z kπ
2n+1
| sin t| X 1 Z kπ
2n+1
= dt ≥ | sin t| dt
(k−1)π t kπ (k−1)π
k=1 k=1
2n+1
X 2
= .

k=1

Thus kFn k → ∞, as required.


Banach Spaces 231

Vector-Valued Analytic Functions


Let X be a complex Banach space, U an open subset of C, and f : U → X. Then f is
said to be strongly analytic on U if the limit
f (z) − f (z0 )
lim
z→z0 z − z0
exists in the norm topology of X for each z0 ∈ U . Analogously, f is said to be weakly
analytic on U if the limit holds in the weak topology, that is, if x0 ◦ f is analytic on U for
each x0 ∈ X 0 . Clearly, strong analyticity implies weak analyticity. The following theorem is
the converse.
8.8.1 Theorem. A weakly analytic function f is strongly analytic.
Proof. Let z0 ∈ U . Since X is complete, it suffices to prove the Cauchy property
f (z) − f (z0 )
lim kg(z) − g(w)k = 0, where g(z) = .
z,w→z0 z − z0
Let C : z = z0 + reit (0 ≤ t ≤ 2π) with r so small that C and its interior are contained in
U . By continuity, supz∈C |(x0 ◦ f )(z)| < ∞, hence, by the uniform boundedness principle
applied to the mappings Tz : x0 → (x0 ◦ f )(z), there exists a constant M > 0 such that
supz∈γ |(x0 ◦ f )(z)| ≤ M for all x0 ∈ X 0 with kx0 k ≤ 1. Fix such an x0 and set h = x0 ◦ f .
By the Cauchy integral formula,
Z
1 h(ξ)
h(z) = d ξ, |z − z0 | < r,
2πi C ξ − z
hence
Z   Z
h(z) − h(z0 ) 1 h(ξ) h(ξ) 1 h(ξ)
= − dξ = d ξ.
z − z0 2πi(z − z0 ) C ξ−z ξ − z0 2πi C (ξ − z)(ξ − z0 )
Now let |z − z0 | < r/2 and |w − w0 | < r/2. Then for ξ ∈ C, |ξ − z| ≥ r/2 and |ξ − w| ≥ r/2,
so from the calculations
Z
h(z) − h(z0 ) h(w) − h(z0 ) 1 z−w
hg(z) − g(w), x0 i = − = h(ξ) dξ
z − z0 w − z0 2πi C (ξ − z)(ξ − w)(ξ − z0 )

we see that |hg(z) − g(w), x0 i| ≤ 4M r−2 |z − w|. Therefore, kg(z) − g(w)k ≤ 4M r−2 |z − w|,
verifying the Cauchy property.

Summability
Let A be an infinite matrix with entries amn ∈ C. Then A
Pmaps sequences x = (x1 , x2 , . . .)

onto sequences y = Ax with mth term the series ym := n=1 amn xn (which may or may
not converge). We denote the limit of a sequence x, if it exists, by lim x. The following
theorem characterizes those matrices that preserve limits. It asserts that the summability
property lim Ax = lim x holds iff the `1 (N) norms of the rows of A are uniformly bounded,
the columns are members of c0 , and the row sums tend to one.
8.8.2 Theorem (Silverman-Toeplitz). Ax ∈ c and lim Ax = lim x for all x ∈ c iff the
following conditions are satisfied:

X ∞
X
(a) sup |amn | < ∞, (b) lim amn = 0 ∀ n ∈ N, (c) lim amn = 1.
m m m
n=1 n=1
232 Principles of Analysis
P∞
Proof. (Sufficiency) Let x ∈ c and x = lim x. By (a), the series ym := n=1 amn xn is
absolutely convergent for each m. Now write

X ∞
X
ym = amn (xn − x) + x amn .
n=1 n=1

By (c), the second term on the right has limit x as m → ∞. Therefore, to show that ym → x
it suffices to show that the first term on the right tends to zero as m → ∞. Let s denote the
supremum in (a). Given ε > 0, choose N so that |xn − x| < ε/s for all n > N . Then
∞ N N
X X X X
a (x − x) ≤ |a | |x − x| + |a | |x − x| ≤ |amn | |xn − x| + ε,
mn n mn n mn n
n=1 n=1 n>N n=1
P∞
hence, by (b), limm n=1 amn (xn − x) ≤ ε, which implies the desired conclusion.
n
(Necessity) Fix n and let x := (0, . . . , 0, 1, 0, . . .). Then Ax = (a1n , a2n , . . .) and
lim x = 0, hence limm→∞ amn = lim Ax = 0, proving (b). For (c), take x = (1, 1 . . .) and
argue similarly. P∞
To prove (a), we show first that n=1 |amn | < ∞. IfPthis is not the case, then there exists
nk+1
a strictly increasing sequence of indices nk such that j=n k +1
|amj | > k. Define

xj = 0, 1 ≤ j ≤ n1 xj = k −1 sgn amj , nk + 1 ≤ j ≤ nk+1 , k = 1, 2, . . . .

Then xj → 0, hence x = (xj ) ∈ c and lim Ax = 0. On the other hand,


∞ ∞ nk+1
X X X |amj |
(Ax)m
= amj xj = = ∞.
j=1 j=n +1
k
k=1 k

This verifies the claim.


Now define a sequence of linear functionals x0m on c by

X
hx, x0m i = amn xn = (Ax)m .
n=1

By the above, x0m ∈ c0 . Since limm hx, x0m i exists 0


P∞for each x ∈ c, supm kxm k < ∞ by the
0
uniform boundedness principle. Since kxm k = n=1 |amn | (§8.3), (a) holds.

Schauder Bases
A sequence (en ) in a normed space X is said to be a Schauder basis or simply a basis
P∞X if ken k = 1 for all n and if each x ∈ X can be represented uniquely as a series
for
k=1 ck ek , that is, there exist unique scalars ck ∈ K such that

n
X

lim x − ck ek = 0.
n
k=1

n
For example, the sequences en := (0, . . . , 0, 1, 0 . . .) form a basis for each of the spaces c0
and `p , 1 ≤ p < ∞. In c, one must augment this set by e = (1, 1, . . .) (Ex. 8.85, 8.86).
The uniqueness of the representation implies that the coefficients ck depend linearly on x.
Thus we may write
X∞
x= ck (x)ek , ck (xj ) = δjk . (8.19)
k=1
Banach Spaces 233

The functions ck are called coordinate functionals.


A normed space with a basis is clearly separable. Schauder conjectured that the converse
holds: every separable Banach space has a basis. Later, Enflo disproved the conjecture by
exhibiting a separable Banach space with no basis. For details on Schauder bases, see [44].
Our goal in this subsection is simply to prove the following noteworthy result.
8.8.3 Theorem (Banach). In a Banach space X, the coordinate functionals of a basis are
continuous.
Pn
Proof. Define ||| x ||| = supn j=1 cj (x)ej . It is easy to check that ||| · ||| is a norm.
Pn
Moreover, since ||| x ||| ≥ k j=1 cj (x)ej k → kxk, we have kxk ≤ ||| x |||. We claim that X
is complete in the new norm. To this end, let (xn ) be a ||| · |||-Cauchy sequence. Given ε > 0,
choose N such that for all m, n ≥ N , ||| xm − xn ||| < ε. Thus
X
p
cj (xm − xn )ej
< ε, p ≥ q ≥ 1, m, n ≥ N. (†)
j=q

In particular, for each j the sequence cj (xn )ej n is k·k-Cauchy and so k·k-converges. It
follows that the limit αj := limn cj (xn ) exists in K (Ex. 8.2). Moreover, for n ≥ N and all
sufficiently large q,
p X p X
X p  X p
α e ≤ α − c (x ) e + c (x − x )e + c (x )e
j j j j n j j n N j j N j
j=q j=q j=q j=q
X
p 

≤ αj − cj (xn ) ej
+ 2ε.
j=q
P
p
Letting n → ∞, we obtain the inequality j=q αj ej ≤ 2ε. Thus the partial sums of
P∞
the series
P∞ j=1 αj ej form a Cauchy sequence, so the series converges in (X, k·k). Set
x := j=1 αj ej , so that cj (x) = αj = limm cj (xm ). Letting m → ∞ in (†) with q = 1 we
have p
X
cj (x − xn )ej
≤ ε, n ≥ N, p ≥ 1.
j=1

Taking the supremum over all p, we have ||| x − xn ||| ≤ ε, proving that xn → x in (X, ||| · |||).
Therefore, X, ||| · |||) is complete.
It now follows from 8.7.13 that k·k and ||| · ||| are equivalent. Since |ck (x)| = kck (x)ek k ≤
||| x |||, ck is ||| · |||-continuous, hence also k·k-continuous.

Exercises
8.84 Show that the matrix A = [aij ], where aij = 1/i for j ≤ i and aij = 0 otherwise, satisfies (a),
(b), and (c) of 8.8.2. Conclude that limn (x1 + · · · + xn )/n = limn xn .
n
8.85 Show that {en = (0, . . . , 0, 1, 0, . . .) : n ∈ N} is a basis for c0 and `p , 1 ≤ p < ∞, but not for `∞
n
8.86 Show that the vectors en = (0, . . . , 0, 1, 0, . . .) together with e = (1, 1 . . .) form a basis for c.
n
8.87 Let d1 = (1, 0, . . .) and dn = (1, 0 . . . , 0, 1/n, 0, . . .), n ≥ 2.
(a) Show that (dn ) is a basis for c00 .
JConsider c1 (x) := x1 − ∞
P
k=2 kxk and cn (x) := nxn (n ≥ 2).K
(b) Show that kdn − d1 k → 0 but c1 (dn − d1 ) 6→ 0. Conclude that completeness of the
normed space X in 8.8.3 is essential.
234 Principles of Analysis

8.9 The Dual Operator


Definition and Properties
Let X and Y be normed spaces. The dual of an operator T ∈ B(X, Y) is the mapping
T 0 : Y 0 → X 0 defined by

hx, T 0 f i = hT x, f i, x ∈ X, f ∈ Y 0 .

Clearly, T 0 is linear. Moreover, by definition of the norm in X 0 ,

kT 0 f k = sup{|hx, T 0 f i| : kxk ≤ 1} = sup{|hT x, f i| : kxk ≤ 1} ≤ kT k kf k ,

hence T 0 ∈ B(Y 0 , X 0 ) with kT 0 k ≤ kT k. On the other hand, by 8.5.8,

kT xk = sup{| hT x, f i | : kf k ≤ 1} = sup{| hx, T 0 f i | : kf k ≤ 1} ≤ kT 0 k kxk ,

hence kT 0 k ≤ kT k. We have proved


8.9.1 Proposition. T 0 ∈ B(Y 0 , X 0 ) and kT 0 k = kT k.
The elementary algebraic properties of the dual operation are given in the following
proposition. The proof is left as an exercise.
8.9.2 Proposition. Let X, Y, and Z be normed spaces, T, U ∈ B(X, Y), S ∈ B(Y, Z),
and c ∈ K. Then

(T + U )0 = T 0 + U 0 , (cT )0 = cT 0 , and (ST )0 = T 0 S 0 .

Annihilators
Let X be a normed space. The annihilators A⊥ and ⊥ B of subsets A ⊆ X and B ⊆ X 0
are defined by

A⊥ := {x0 ∈ X 0 : hx, x0 i = 0 ∀ x ∈ A} and ⊥


B := {x ∈ X : hx, x0 i = 0 ∀ x0 ∈ B}.

Clearly, A⊥ and ⊥ B are closed linear subspaces of X 0 and X, respectively. Moreover, it is


easily established that

A ⊆ ⊥ (A⊥ ), B ⊆ (⊥ B)⊥ , X ⊥ = {0}, and ⊥


X 0 = {0},

the last property by 8.5.6.


8.9.3 Proposition. ⊥ (A⊥ ) is the closed linear span of A ⊆ X. Thus the linear span of A
is dense in X iff A⊥ = {0}.
Proof. Let Z denote the closed linear span of A. Since ⊥ (A⊥ ) is closed. linear, and contains
A, it must contain Z. Let f ∈ X 0 with f (Z) = {0}. Then f ∈ A⊥ , hence f = 0 on ⊥ (A⊥ ).
By 8.5.5, Z = ⊥ (A⊥ ). The last assertion follows from the obvious fact that ⊥ {0} = X.
The proof of the next proposition is an exercise for the reader (8.96).
8.9.4 Proposition. Let X and Y be normed spaces and T ∈ B(X, Y). Then
 ⊥
ker T 0 = ran T and ker T = ⊥ [ran T 0 ].
Banach Spaces 235

The following lemma is sometimes useful in establishing invertibility of an operator.


8.9.5 Lemma. Let X and Y be Banach spaces and T ∈ B(X, Y). Then T is injective
and ran T is closed iff there exists a > 0 such that kT xk ≥ akxk for all x. In this case
T −1 ∈ B ran T, X .
Proof. If T is injective
 and ran T is closed, then, by the Banach isomorphism theorem,
T −1 ∈ B ran T, X and kxk = kT −1 T xk ≤ kT −1 k kT xk for all x ∈ X. Conversely,
assume the inequality holds. Then T is obviously injective. If T xn → y, then the inequality
shows that (xn ) is a Cauchy sequence and so converges to some x ∈ X. Therefore,
y = T x ∈ ran T , hence ran T is closed.
8.9.6 Proposition. Let X and Y be Banach spaces and T ∈ B(X, Y). Then T is invertible
iff T 0 is invertible, in which case (T −1 )0 = (T 0 )−1 .

Proof. Assume that T 0 is invertible and set c := (T 0 )−1 . Then c kT 0 y0 k ≥ ky0 k for all y0 ,
hence

c kT xk = sup{|hx, T 0 (cy0 )i| : ky0 k ≤ 1} = sup{|hx, T 0 y0 i| : ky0 k ≤ c}


≥ sup{|hx, T 0 y0 i| : kT 0 y0 k ≤ 1} = kxk .
  ⊥
By the lemma, ran T is closed and T −1 ∈ B ran T, X . Since ran T = ker T 0 = {0},
ran T is also dense in Y. Therefore, ran T = Y, hence T is invertible. Since

T 0 (T −1 )0 = (T −1 T )0 = I and (T −1 )0 T 0 = (T T −1 )0 = I,

(T −1 )0 = (T 0 )−1 . A similar argument shows that if T is invertible, then T 0 is invertible.

Duals of Quotient Spaces and Subspaces

The following theorem uses the quotient map to identify the dual of X/Y with Y ⊥ .
8.9.7 Theorem. Let X be a normed space, Y a closed subspace, and Q : X → X/Y the
quotient map. Then Q0 : (X/Y)0 → X 0 is an isometry onto Y ⊥ .
Proof. We claim that the range of Q0 is Y ⊥ . Indeed, if ψ ∈ (X/Y)0 and y ∈ Y, then
hy, Q0 ψi = hQy, ψi = 0, hence Q0 ψ ∈ Y ⊥ . Conversely, if f ∈ Y ⊥ , then the equation
hQx, ψi = hx, f i defines ψ ∈ (X/Y)0 with Q0 ψ = f .
Now, since kQk ≤ 1,

kQ0 ψk = sup{| hx, Q0 ψi | : kxk ≤ 1} = sup{| hQx, ψi | : kxk ≤ 1} ≤ kψk .

To see that kψk ≤ kQ0 ψk, let 0 < r < 1. Since Q is surjective, we may choose Qx with
norm one such that |hQx, ψi| > r kψk. Since kQxk < r−1 we may choose y ∈ Y with
kx + yk < r−1 . Then

r kψk < |hQx, ψi| = |hQ(x + y), ψi| = |hx + y, Q0 ψi| ≤ r−1 kQ0 ψk ,

and letting r → 1 yields kψk ≤ kQ0 ψk.


The next result identifies the dual of a subspace with a quotient space. An analogous
result is given in Ex. 8.97.
8.9.8 Theorem. Let X be a normed space, Y a closed subspace, and Q : X 0 → X 0 /Y ⊥
the quotient map. Then there exists a linear isometry
T from X 0 /Y ⊥ onto Y 0 such that
0 0
T Q : X → Y is the restriction mapping f → f Y .
236 Principles of Analysis

Proof. Let I : Y
,→ X denote the inclusion map. Then I 0 : X 0 → Y 0 is the restriction
mapping f → f Y , which has kernel Y and which is surjective by 8.5.4. By 8.7.9, there

exists a topological isomorphism T from X 0 /Y ⊥ onto Y 0 such that T Q = I 0 . To show that


T is an isometry, let f ∈ X 0 and g ∈ Y ⊥ . Then

kT Qf k = kf Y k = sup{| hy, f i | : kyk ≤ 1} = sup{| hy, f + gi | : kyk ≤ 1} ≤ kf + gk .

Taking the infimum on g yields kT Qf k ≤ kQf k. On the other hand, given h ∈ Y 0 there exists
an f ∈ X 0 such that I 0 f = h and kf k = khk (8.5.4), so kT Qf k = kI 0 f k = kf k ≥ kQf k .

Exercises
8.88 Prove 8.9.2.

8.89 Let X, Y be normed linear spaces and T ∈ B(X, Y). Prove that kT xk = kT 00 xk.
b

8.90 Let X and Y be normed spaces and T ∈ B(X, Y). Prove that T is an isometry onto Y iff T 0
is an isometry onto X 0 .

8.91 [↑ 8.34] Let Tr and T` be the right and left shift operators on c0 . Identify c00 with `1 as in §8.3.
Find Tr0 and T` 0

8.92 Find the dual of the multiplication map Mφ of 8.2.3(c) for the case 1 < p < ∞.

8.93 Let X be a compact Hausdorff space and ϕ : X → X continuous. Define T on C(X) by


T f = f ◦ ϕ. Find T 0 by identifying C(X)0 with Mra (X).
Z ∞
8.94 Define T : L1 (0, ∞) → C0 (0, ∞) by (T f )(x) = f (t) dt. Find the dual of T .
x

8.95 Let 1 ≤ p < ∞, r 6= 0, and let Dr : L (λ ) → L (λd ) be the dilation operator Dr f (x) = f (rx).
p d p

Find Dr0 by identifying the dual of Lp with Lq (p−1 + q −1 = 1).

8.96 Prove 8.9.4.

8.97 Let Y be a closed subspace of a normed space X. Prove that X/Y is isometrically isomorphic
to (Y ⊥ )0 .

8.98 Show that there is a norm one projection of X 000 onto X 0 (identified with X
c0 ).

8.10 Compact Operators


Throughout this section, X, Y, and Z denote Banach spaces over K.
In this section we describe the basic properties of compact operators on Banach spaces.
A detailed analysis of compact operators on Hilbert spaces is given in Chapter 12.
An operator T ∈ B(X, Y) is said to be compact if T maps bounded sets onto relatively
compact sets. Equivalently, T is compact iff for any bounded sequence (xn ) in X the image
sequence (T xn ) has a convergent subsequence in Y. The collection of all compact operators
in B(X, Y) is denoted by B0 (X, Y):

B0 (X, Y) = {T ∈ B(X, Y) : T is compact}.

We write B0 (X) for B0 (X, X).


Banach Spaces 237

A simple yet fundamental example of a compact operator T ∈ B(X, Y) is an operator


with finite dimensional range. Such an operator is said to be of finite rank and may be
expressed in the form
Xn
Tx = hx, x0j iyj ,
j=1

where y1 , . . . , yn is a basis for ran T and x0j ∈ X. The collection of all operators of finite
rank is denoted by B00 (X, Y):

B00 (X, Y) = {T ∈ B(X, Y) : T has finite rank}.

We write B00 (X) for B00 (X, X).


The proof of the following proposition is an exercise for the reader.
8.10.1 Proposition. B00 (X, Y) and B0 (X, Y) are linear subspaces of B(X, Y). Moreover,
in the obvious notation,

B0 (X, Y) ◦ B(Z, X) ⊆ B0 (Z, Y) and B(Y, Z) ◦ B0 (X, Y) ⊆ B0 (X, Z),

with the analogous inclusions holding for B00 . In particular, B0 (X) and B00 (X) are ideals
in the Banach algebra B(X).
8.10.2 Theorem. B0 (X, Y) is operator-norm closed in B(X, Y).

Proof. Let T ∈ B(X, Y) and Tn ∈ B0 (X, Y) with kTn − T k → 0. S Given ε > 0 choose n
m
such that kTn − T k < ε, and let x1 , . . . , xm ∈ C1 so that Tn (C1 ) ⊆ j=1 Bε (Tn xj ). Then
for each x ∈ C1 there exists j such that kTn x − Tn xj k < ε, hence

kT x − T xj k ≤ kT x − Tn xk + kTn x − Tn xj k + kTn xj − T xj k < 3ε.


Sm
Therefore, T (C1 ) ⊆ j=1 B3ε (T xj ). Since ε was arbitrary, T (C1 ) is totally bounded and
hence relatively compact.
Here is one of several similar examples that are prototypical compact operators.
8.10.3 Example. Let X be a compact Hausdorff topological space, µ a Borel probability
measure on X, and k ∈ L2 (X × X). Define
Z
(Kf )(x) = k(x, y)f (y) dµ(y), f ∈ L2 (µ).
X

An application of the CBS inequality shows that kKf k2 ≤ kkk2 kf k2 , hence K is a bounded
linear operator on L2 (µ) with kKk ≤ kkk2 . The operator K is called an integral operator
with kernel k. We show that K is compact.
First, assume that k is continuous. Then the collection of functions F :=S{k(·, y) : y ∈ X}
n
is compact in C(X), hence given ε > 0 there exist yj ∈ X such that F ⊆ j=1 Bε (k(·, yj )).
Let

Aj = {y ∈ X : kk(·, y) − k(·, yj )k∞ < ε}, B1 := A1 , and Bj := Aj ∩ Ac1 ∩ · · · ∩ Acj−1 .

Then X is the disjoint union of the sets Bj . Define


n
X Z
T f (x) = k(x, yj ) f (y) dµ(y), f ∈ L2 (µ).
j=1 Bj
238 Principles of Analysis

Then T has finite rank, and for all f with kf k2 ≤ 1 and all x ∈ X,
n Z
X Z
|Kf (x) − T f (x)| ≤ |k(x, y) − k(x, yj )| |f (y)| dµ(y) ≤ ε |f | ≤ ε.
j=1 Bj

It follows that kK − T k < ε. Therefore, K can be approximated by finite rank operators


and so is compact.
In the general case, let kn be a sequence in C(X × X) such that kkn − kk2 → 0 (7.1.2).
By the preceding paragraph, the corresponding integral operators Kn are compact. From
kK − Kn k ≤ kk − kn k2 → 0 we see that K is compact. ♦
8.10.4 Theorem. Let T ∈ B(X, Y). Then T is compact iff T 0 is compact.
Proof. Suppose T is compact. We show that if (fn ) ⊆ Y 0 with s := supn kfn k < ∞, then
(T 0 fn ) has a convergent subsequence. Since T is compact, ran T has a countable dense set,
say (yn ) (Ex. 8.101). A standard diagonal argument shows that (fn ) has a subsequence (gn )
such that limn gn (yk ) exists for all k. We claim that limn gn (y) exists for all y ∈ cl ran T .
Indeed, for such y and any m, n, k,

|gm (y) − gn (y)| ≤ |gm (y) − gm (yk )| + |gm (yk ) − gn (yk )| + |gn (yk ) − gn (y)|
≤ 2Cky − yk k + |gm (yk ) − gn (yk )|,

and since y may be approximated by a yk we see that (gn (y))n is a Cauchy sequence,
verifying the claim.
Now let g(y) := limn gn (y) (y ∈ cl ran T ). Clearly, g is linear and |g(y)| = limn |gn (y)| ≤
skyk, hence g is continuous on cl ran T . Therefore, g ◦ T ∈ X 0 , and for any x ∈ X

limhx, T 0 gn i = limhT x, gn i = hx, g ◦ T i.


n n

0
We claim that kT gn − g ◦ T k → 0. Suppose the claim is false. Then there exists ε > 0 such
that kT 0 gn − g ◦ T k ≥ ε for infinitely many n, say for n ∈ S. For each n ∈ S choose xn with
norm one such that

|gn (T xn ) − g(T xn )| = |hxn , T 0 gn − g ◦ T i| ≥ ε/2. (†)

Since T is compact, there exists a strictly increasing sequence (nk )k in S and y ∈ Y such that
T xnk → y. Since supn kgn k < ∞, gnk (T xnk ) → g(y). But this contradicts (†). Therefore,
kT 0 gn − g ◦ T k → 0, hence T 0 is compact. The proof that T 0 compact ⇒ T compact is left
as an exercise (8.103).

Fredholm Alternative for Compact Operators
Let A be an n × n matrix. A standard argument shows that one of the following holds:
(i) The system of equations Ax = 0 has a nonzero solution in Kn .
(ii) The system Ax = y has a unique solution for each y ∈ Kn .
In this subsection we prove an infinite dimensional version of this result using the following
lemmas.
8.10.5 Lemma. Let T ∈ B(X) be compact, and for each x ∈ X let d(x) denote the
distance from x to ker(I − T ). Then there exists M > 0 such that d(x) ≤ M k(I − T )xk
for all x.
Banach Spaces 239

Proof. We may assume (I − T )x 6= 0. If the conclusion of the lemma is false, then


the ratio d(x)/ k(I − T )xk is unbounded, so there exists a sequence (xn ) such that
d(xn )/ k(I − T )xn k → ∞. Since ker(I − T ) is closed, d(xn ) = kxn − un k for some
un ∈ ker(I − T ). Set yn = d(xn )−1 (xn − un ). We then have
1
(I − T )yn = (I − T )xn → 0. (†)
d(xn )

Since kyn k = 1 and T is compact, some subsequence of (T yn ) converges, say T ynk → y.


By (†), ynk → y, hence (I − T )y = 0 and so unk + d(xnk )y ∈ ker(I − T ). But then
1
xn − (un + d(xn ))y ≥ 1,
kynk − yk = k k k
d(xnk )
contradicting that ynk → y.
8.10.6 Lemma. Let T ∈ B(X) be compact. Then ran (I − T 0 ) is closed.
Proof. Set S := I −T . Since ran S 0 ⊆ cl ran S 0 = (ker S)⊥ , it suffices to show that (ker S)⊥ ⊆
ran S 0 . Let f ∈ (ker S)⊥ and define a linear map g on ran S by g(Sx) = f (x). By 8.10.5,
there exists M > 0 and for each x ∈ X a member x e ∈ ker S such that d(x) = kx − xk e ≤
M kSxk. Therefore,

|g(Sx)| = g S(x − x)
e = |f (x − x)| e ≤ kf k kx − xk e ≤ M kf k kSxk ,

hence g is continuous on ran S and so has a continuous extension to cl ran S. By the


Hahn-Banach theorem, g extends to a member h of X 0 . Then for all x,

hx, f i = hSx, gi = hSx, hi = hx, S 0 hi,

hence f = S 0 h ∈ ran S 0 .
We may now prove
8.10.7 Theorem (Fredholm). Let T ∈ B0 (X) and λ 6= 0. Then λI − T is surjective iff
λI − T is injective. Thus one of the following holds:
(i) The equation T x − λx = 0 has a nonzero solution.
(ii) The equation T x − λx = y has a unique solution for any y ∈ X.
Proof. Since λI − T = λ(I − λ−1 T ) and λ−1 T is compact, we may take λ = 1. Set S := I − T .
Suppose that S is surjective but not injective. Then Sx1 = 0 for some x1 6= 0. We claim that
the containment ker(S n−1 ) ⊆ ker(S n ) is proper. Indeed, since S is surjective, there exists x2
such that Sx2 = x1 , and in general there exists a vector xn such that Sxn = xn−1 . Then
S n xn = S n−1 xn−1 = · · · = Sx1 = 0 and S n−1 xn = S n−2 xn−1 = · · · = Sx2 = x1 6= 0, so
xn ∈ ker(S n ) \ ker(S n−1 ), verifying the claim. By 8.1.6 there exists yn ∈ ker(S n ) \ ker(S n−1 )
such that
kyn k = 1 and inf{kyn − yk : y ∈ ker(S n−1 )} ≥ 1/2.
Now write
 
T yn − T ym = (I − S)yn − (I − S)ym = yn + Sym − ym − Syn .

The term in square brackets is in ker(S n−1 ) for all n > m, hence kT yn − T ym k ≥ 1/2. But
then (T yn ) has no convergent subsequence, contradicting that T is compact. Therefore, S
is injective.
240 Principles of Analysis

Conversely, assume that S is injective. We claim that ran(S) is closed. To verify this, we
use the following simple observation regarding sequences (xn ) in X:

T xn → u and Sxn → v ⇒ xn → u + v ⇒ S(u + v) = v. (†)

Now let Sxn → v. We consider two cases:


Case 1. (xn ) has a bounded subsequence. By compactness of T , there exists a subsequence
(xnk ) of (xn ) such that T xnk → u for some u ∈ X. Applying (†) we then have v =
S(u + v) ∈ ran S, as required.
−1
Case 2. (xn ) has no bounded subsequence. Then kxn k → ∞, and setting yn = kxn k xn
we have kyn k = 1 and Syn → 0. Going to a subsequence if necessary, we may assume that
T yn → u for some u. Applying (†) with v = 0, we then have Su = 0 and yn → u. But
the latter implies that u =6 0, which contradicts the injectivity of S. Therefore, Case 2 is
not possible, verifying the claim.
Now, since S is injective, cl ran(S 0 ) = ker(S)⊥ = X. But by 8.10.6, ran(S 0 ) is closed.
Therefore, S 0 is surjective. By the necessity of the theorem applied to S 0 , recalling that T 0 is
compact, S 0 is injective. Thus, since ran(S) is closed, ran(S) = ⊥ ker(S 0 ) = X, that is, S is
surjective, completing the proof.
The Fredholm alternative is typically applied to integral equations in L2 of the form
λf − Kf = g, where K is defined as in 8.10.3. If |λ| > kKk it may be shown that (λI − K)−1
exists and may be expanded into an operator norm convergent series:

(λI − K)−1 = λ−1 I + λ−2 K + λ−3 K 2 + · · ·

(see 13.1.3). The solution to the equation λf − Kf = g is therefore given by

f = λ−1 g + λ−2 Kg + λ−3 K 2 g + · · · .

Exercises
8.99 Let X be a Banach space, (xn ),P(x0n ) sequences in X and X 0 with norm ≤ 1, and (cn ) ⊆ `1 (N).
Show that the operator T x = ∞ 0
k=1 ck hx, xk ixk is compact.

8.100 Prove 8.10.1.

8.101 Prove that if T ∈ B(X, Y) is compact, then T has separable range.

8.102 Let X be a normed space, λ 6= 0, and T ∈ B(X) compact. Prove that ker (λ − T )m is finite
dimensional for all m ∈ N.

8.103 Let X and Y be Banach spaces and T ∈ B(X, Y). Prove: If T 0 is compact, then T is compact.
Chapter 9
Locally Convex Spaces

A locally convex linear space X is a generalization of a normed space, the topology on X


given by a family of seminorms rather than a single norm. These spaces occur in a variety of
contexts, including operator theory and distributions. In the present chapter we develop the
properties of locally convex spaces to a sufficient extent that will allow the discussion of weak
and weak∗ topologies in the next chapter and the material on distributions in Chapter 15 to
be seen from a general vantage point. Additional properties of locally convex spaces as well
as applications are considered in Chapter 14.

9.1 General Properties


Let X be a vector space over K. A vector topology on X is a topology relative to which
the vector space operations
(x, y) 7→ x + y : X × X → X and (c, x) 7→ cx : K × X → X
are continuous. A vector space X with a Hausdorff vector topology τ is called a topological
vector space (TVS) and is denoted by Xτ . We omit the subscript when there is no possibility
of confusion. Since translation x 7→ x + y in a TVS is a homeomorphism, a neighborhood
base at x is of the form x + U0 , where U0 is a neighborhood base at zero. It follows that if
X and Y are topological vector spaces, then a linear map T : X → Y is continuous iff it is
continuous at zero.
A locally convex topology is a vector topology with a neighborhood base at zero
consisting of open convex sets. A locally convex space (LCS) is vector space with a
Hausdorff locally convex topology. Every normed space is a LCS since the balls Br are
convex. Additional examples are given in the exercises and in later chapters.

Geometry and Topology


The continuity of vector operations in a TVS implies that the closure of a subspace is
a linear space, the closure of a convex set is convex, and the closure of a balanced set is
balanced. The closed convex hull of a subset A of a TVS X is the intersection of all
closed convex subsets of X containing A. It may also be characterized as the closure of the
convex hull of A. Similarly, the closed, convex, balanced hull of A is the intersection of
all closed, convex, balanced subsets of X containing A and may be realized as the closure of
the convex balanced hull of A (Ex. 9.3.)
The open, balanced neighborhoods of zero in TVS form a neighborhood base at zero, as
do the open, convex, balanced neighborhoods of zero in LCS. To establish these facts we
need the following lemma:
9.1.1 Lemma. Let X be a TVS and E a subset with nonempty interior. If E is balanced
(convex), then int E is balanced (convex).

241
242 Principles of Analysis

Proof. Let E be balanced and 0 < |c| ≤ 1. Since x → cx is a homeomorphism, c int E =


int cE ⊆ int E. Therefore, int E is balanced.
Now let E be convex and let x, y ∈ int E. Choose a neighborhood U of zero such that
x + U ⊆ E and y + U ⊆ E. For arbitrary u ∈ U , the vectors x + u and y + u are in E,
hence, by convexity,

tx + (1 − t)y + u = t(x + u) + (1 − t)(y + u) ∈ E.

Thus tx + (1 − t)y + U ⊆ E and so tx + (1 − t)y ∈ int E. Therefore, int E is convex.


9.1.2 Proposition. A TVS has a neighborhood base at zero of open (closed) balanced sets.
A LCS has a neighborhood base at zero of open (closed) convex balanced sets.
Proof. Let W be any neighborhood of zero. By continuity of scalar multiplication at zero,
exists an open neighborhood V of zero and δ > 0 such that cV ⊆ W for all |c| ≤ δ.
there S
Then |c|≤δ cV is a balanced, open neighborhood of zero contained in W . Thus a TVS has
a neighborhood base at zero of open balanced sets.
By the first paragraph and continuity of addition at (0, 0), we may choose a balanced
neighborhood U of zero so that U + U ⊆ W . If x ∈ cl U , then (x + U ) ∩ U 6= ∅, hence
there exists u1 , u2 ∈ U such that x + u1 = u2 . Since U is balanced, −u1 ∈ U , hence
x = u2 −u1 ∈ U +U ⊆ W . Therefore, cl U ⊆ W . This shows that a TVS has a neighborhood
base at zero of closed balanced sets.
For a LCS, we may take W to be convex. Let V be a balanced, open neighborhood of zero
contained in W . The convex hull co V is still balanced and co V ⊆ W . Since V ⊆ co V and
V is open, V ⊆ int co V . In particular, 0 ∈ int co V . Therefore, int co V is an open, convex,
balanced neighborhood of zero contained in W . This shows that a LCS has a neighborhood
base at zero of open, convex, balanced sets. An argument similar to that in the preceding
paragraph shows that a LCS has a neighborhood base at zero of closed, convex, balanced
sets.

Seminormed Spaces
Let P be a family of seminorms on a vector space X. The initial topology induced by
the collection of all functions of the form z 7→ p(z − y), where p ∈ P and y ∈ X, is called
the seminorm topology generated by P. The space X with this topology is called a
seminormed space. A neighborhood base at x for a seminorm topology consists of finite
intersections of sets of the form

{z ∈ X : |p(z − y) − p(x − y)| < ε}, p ∈ P, y ∈ X, ε > 0.

Taking y = x produces the smaller collection of sets

{z ∈ X : p(z − x) < ε}, p ∈ P, ε > 0.

It is sufficient to consider these, as the inequality |p(z − y) − p(x − y)| ≤ p(z − x)


implies that the smaller collection produces an equivalent neighborhood base at x. Thus a
neighborhood base at x consist of sets of the form
n o n o
z ∈ X : max pj (z − x) < ε = x + U, U := y ∈ X : max pj (y) < ε , (9.1)
1≤j≤n 1≤j≤n

where pj ∈ P and ε > 0. In particular, a net (xα ) converges to x in this topology iff
p(xα − x) → 0 for all p ∈ P. It follows easily from properties of seminorms that the
seminorm topology on X is a vector topology.
Locally Convex Spaces 243

A family P of seminorms on a vector space X is said to be separating if


\
{x : p(x) = 0} = {0};
p∈P

equivalently, x 6= y ⇒ p(x − y) > 0 for some p ∈ P. Setting ε = p(x − y) and defining


V := {z : p(z) < ε/2} we see that the neighborhoods x + V and y + V are disjoint, hence
the seminorm topology of a separating family P is Hausdorff. Conversely, if the topology is
6 0 ∈ X there exists a set U as in (9.1) such that x 6∈ U , hence
Hausdorff, then for each x =
pj (x) ≥ ε for some j. We have proved
9.1.3 Proposition. A seminormed space is Hausdorff iff the family of seminorms is
separating.
Now let X have the seminorm topology generated by a separating family P of seminorms.
The triangle inequality |p(x) − p(y)| ≤ p(x − y) implies that a seminorm is continuous.
Conversely, if q is any continuous seminorm on X, then the set {x : q(x) < ε} is open in X,
hence the enhanced family of seminorms P ∪ {q} generates no new open sets and therefore
produces the same topology as the original family. This observation is occasionally useful in
reducing the complexity of some arguments. For example, the basic neighborhood of zero U
in (9.1) may be described in terms of a single continuous seminorm p(x) = ε−1 maxj pj (x)
as U = {x : p(x) < 1}.
We show next that every Hausdorff seminormed space is a locally convex space and vice
versa. For this we need the following notion: For a neighborhood of zero U in a TVS X,
define the Minkowski functional of U by

pU (x) := inf{t > 0 : x ∈ tU }. (9.2)

By continuity of scalar multiplication, sx ∈ U for sufficiently small s > 0, hence pU (x) < ∞.
The following result is the key to establishing the connection between locally convex spaces
and seminormed spaces.
9.1.4 Proposition. Let U be an open, convex, balanced neighborhood of zero in a TVS X.
Then pU is a Minkowski functional that is continuous in the topology of X. Moreover,

U = {x : pU (x) < 1}. (9.3)

Proof. To verify the subadditivity property, let x ∈ sU and y ∈ tU (s, t > 0). By convexity
of U ,
1 s t
(x + y) = (s−1 x) + (t−1 y) ∈ U,
s+t s+t s+t
hence pU (x + y) ≤ s + t. Since s and t were arbitrary, pU (x + y) ≤ pU (x) + pU (y).
For positive homogeneity, let c ∈ F, c 6= 0. Since U is balanced, c−1 U = |c−1 |U , hence

pU (cx) = inf{t > 0 : cx ∈ tU } = inf{t > 0 : x ∈ tc−1 U } = inf{|c|s > 0 : x ∈ sU }


= |c|pU (x).

Therefore, pU is a Minkowski functional.


To see that pU is continuous, let xα → x in X and ε > 0. Then ε−1 (x − xα ) → 0 hence,
eventually, ε−1 (x − xα ) ∈ U and so |pU (x) − pU (xα )| ≤ pU (x − xα ) ≤ ε.
To establish (9.3), let x ∈ U and note that by continuity of scalar multiplication there
exists t > 1 such that tx ∈ U , hence pU (x) ≤ 1/t < 1. Therefore, U ⊆ {x : pU (x) < 1}.
For the reverse inclusion, let pU (x) < 1 and choose t so that x ∈ tU and pU (x) < t < 1.
By the balanced property, x = t (x/t) ∈ U .
244 Principles of Analysis

9.1.5 Theorem. A Hausdorff seminormed space X is locally convex. Conversely, every


LCS is a Hausdorff seminormed space.
Proof. Let X be a Hausdorff seminormed space, and let xα → x, yα → y in X and cα → c
in K. Then

p xα + yα − (x + y) ≤ p(xα − x) + p(yα − y) → 0 and
p(cα xα − cx) ≤ |cα |p(xα − x) + |cα − c|p(x) → 0,
hence X is a TVS. As the sets U in (9.1) are open and convex, X is a LCS. Conversely, let
X be a LCS. By (9.3), the seminorms pU , where U runs through a base of open, convex,
balanced neighborhoods of zero, generate the given locally convex topology.
In view of Theorem 9.1.5, we may (and shall) abandon the phrase “seminormed space”
and use instead the LCS terminology.

Fréchet Spaces
If X is a LCS with a countable generating class (pn ) of seminorms (or, equivalently, a
countable basis of open convex neighborhoods of zero), then

X pn (x − y)
d(x, y) := 2−n (9.4)
n=1
1 + pn (x − y)

defines a metric for the locally convex topology of X, as is readily verified. If X is complete
in this metric, then X is called a Fréchet space. The metric d is not derived from a norm,
since homogeneity fails (dramatically). We shall call d the standard metric for X. Clearly
every Banach space is a Fréchet space. Here are some nontrivial examples:
9.1.6 Examples.
(a) The space C(U ). Let U ⊆ Rd be open. Define compact subsets of U by
Kn := {x ∈ Rd : |x| ≤ n, d(x, U c ) ≥ 1/n} n ∈ N.
Then Kn ⊆ int Kn+1 and Kn ↑ U . Now define seminorms pn on C(U ) by
pn (f ) = sup{|f (x)| : x ∈ Kn }.
Since the sets {x : d(x, U c ) > 1/n} form an increasing open cover of U , every compact set
is contained in some Kn . Thus convergence in the locally convex topology generated by the
seminorms pn is uniform convergence on compact subsets of U , also called local uniform
convergence. Since each space C(Kn ) is complete, C(U ) is a Fréchet space.
(b) The space H(U ). For U ⊆ R2 , the space H(U ) of analytic (holomorphic) functions is a
closed subspace of C(U ) in (a), since the property of analyticity is conveyed by local uniform
convergence. Therefore H(U ) is also a Fréchet space.
(c) The space C ∞ (U ). Let U and (Kn ) be as in (a). Define a countable family of seminorms
pm,α on C ∞ (U ) by
pm,α (f ) = sup{|∂ α f (x)| : x ∈ Km },
where α = (α1 , . . . , αd ) (αj ∈ Z+ ), is a multi-index. A sequence (fn ) converges to zero in
the locally convex topology generated by these seminorms iff ∂ α fn → 0 locally uniformly for
all α. To see that C ∞ (U ) is a Fréchet space, let (φn ) be a Cauchy sequence with respect to
the standard metric, so that
lim sup |∂ α φn (x) − ∂ α φm (x)| = 0 ∀ j and ∀ multi-index α.
m,n x∈Kj
Locally Convex Spaces 245

Since C(Kj ) is complete and Kj ↑ U , for each multi-index-α there exists φα ∈ C(U ) such
that ∂ α φn → φα uniformly on each compact subset of U . Set φ = φ(0,...,0) , so φn → φ locally
uniformly. Letting n → ∞ in
Z x1
φn (x1 , . . . , xd ) = ∂ (1,0...,0) φn (t1 , x2 . . . , xd ) dt1 ,
0

we obtain Z x1
φ(x1 , . . . , xd ) = φ(1,0,...,0) (t1 , x2 . . . , xd ) dt1 .
0
This shows that ∂ (1,0,...,0) φ(x1 , . . . , xd ) exists and equals φ(1,0,...,0) (x1 , . . . , xd ). In a similar
manner, it may be shown that ∂ α φ(x1 , . . . , xd ) exists and equals φα (x1 , . . . , xd ) for all
multi-indices α. Therefore, C ∞ (U ) is complete.
For later reference we note that the space Cc∞ (U ) is dense in C ∞ (U ). Indeed, by
Urysohn’s lemma for C ∞ functions, for each n there exists a function φn ∈ Cc∞ (U ) such
that φn = 1 on Kn . For any f ∈ C ∞ (U ) we then have φn f ∈ Cc∞ (U ) and φn f = f on Kn ,
hence for n > m and all α, pm,α (φn f − f ) = 0.
(d) Schwartz space. The space S of rapidly decreasing functions is a Fréchet space under
the countable family of norms qα,m defined by
qα,m (φ) = sup (1 + |x|)m |∂ α φ(x)|.
x∈Rd

The proof that S is complete with respect to the standard metric is similar to that of (b).
By 6.3.2, the same Fréchet topology is obtained by using the countable family of norms
pα,β (φ) = sup |xα ∂ β φ(x)|. ♦
x∈Rd

There are metrizable TVS that are not locally convex and hence not Fréchet spaces. Here
is one such example:
9.1.7 Example. Let (X, F, µ) be a finite measure space and let L0 = L0 (X, F, µ) denote
the linear space of measurable functions f : X → K. Then
Z
|f − g|
d(f, g) = dµ
1 + |f − g|
defines a metric on L0 (where, as usual, we identify functions equal a.e.). Convergence in
this metric is simply convergence in measure (Ex. 3.22). The inequalities
µ{|(fn + gn ) − (f + g)| ≥ 2ε|} ≤ µ{|f − fn | ≥ ε|} + µ{|g − gn | ≥ ε|} and
µ{|cn fn − cf | ≥ 2ε|} ≤ µ{|cn fn − cn f | ≥ ε|} + µ{|cn f − cf | ≥ ε|}
≤ µ{|fn − f | ≥ (|cn | + 1)−1 ε|} + µ{|f | ≥ |cn − c|−1 ε|}
then imply that L0 is a TVS under the usual pointwise operations.
Now consider the measure space ([0, 1], B[0, 1], λ). If L0 were locally convex, then the
open ball B1/2 (0) would contain an open convex neighborhood of zero, which in turn would
contain an open ball Br (0), whose convex hull is then contained in B1/2 (0). For each n > 1/r,
let fj = 1[(j−1)/n,j/n) , 1 ≤ j ≤ n. Then
Z
|fj | 1
d(fj , 0) = dλ = < r,
1 + |fj | 2n
Pn
hence the convex combination f := (1/n) j=1 fj is in B1/2 (0). But f = 1/n a.e. and so
Z
1/n n 1
d(f, 0) = dλ = > (n > 1). ♦
1 + 1/n n+1 2
246 Principles of Analysis

Exercises
9.1 Let X be a TVS and A, B ⊆ X with A compact and B closed. Show that A + B is closed.

9.2 Let U be a neighborhood base at zero in TVS X and A, B ⊆ X. Prove:


T
(a) cl A = U ∈U (U + A).
(b) A + int B ⊆ int(A + B).
(c) If A is compact, B is closed, and A∩B = ∅, then (A+U )∩(B +U ) = ∅ for some neighborhood
of zero.

9.3 Show that the closed convex hull of subset A of a TVS X is the closure of co A and that the
closed, convex, balanced hull of A is the closure of cobal A.

9.4 Let X be a linear space. Show that if U is balanced and |a| ≤ |b|, then aU ⊆ bU .

9.5 Let U and V be open, convex, balanced neighborhoods of zero in a TVS X. Show that
pU ∩V = max{pU , pV }.

9.6 A subset E of a TVS is bounded if for each neighborhood V of zero there exists t > 0 such
that E ⊆ tV . Verify the following
(a) If E1 , . . . , En are bounded, then E1 + · · · + En , n
S
j=1 Ej , cE1 , and cl E1 are bounded.
(b) Every compact set K is bounded.
(c) E is bounded iff xn ∈ E, tn ∈ K and tn → 0 ⇒ tn xn → 0.
(d) In a LCS, E is bounded iff p(E) is bounded for every continuous seminorm p.

9.7 (Kolmogorov). Let X be a TVS with a bounded, convex, balanced neighborhood U of zero.
Show that pU is a norm that gives the original topology of X. JFor positivity, let x 6= 0 and V
a balanced neighborhood of zero that does contain x. If U ⊆ tV , then pU (x) ≥ 1/t. For the
equality of topologies, consider suitable nets.K

9.8 Let X be a LCS generated by a family of seminorms P and let Y a linear subspace of X. Prove
that the relative topology of Y is the locally convex topology τ induced by the seminorms p Y
(p ∈ P).

9.9 Let p and q be seminorms on a vector space X such that {x : p(x) < 1} = {x : q(x) < 1}.
Show that p = q.

9.10 Let X be a TVS and p a seminorm on X such that the set {x ∈ X : p(x) < 1} is open. Show
that p is continuous.

9.11 Let X be a vector space with locally convex topologies τ1 and τ2 . Show that τ1 ≤ τ2 iff every
τ1 -continuous seminorm is τ2 -continuous.

9.2 Continuous Linear Functionals


Continuity on Topological Vector Spaces
As in the case of a normed space, the space of all continuous linear functionals on a TVS
X is called the dual space of X and is denoted by X 0 . We continue to use both notations
hx, f i and f (x) (x ∈ X and f ∈ X 0 ). The next two propositions summarize the general
properties of linear functionals on X. The first asserts that the kernel of a nontrivial (that
is, not identically zero) linear functional f has co-dimension one, hence the equation f = t
(t ∈ R) defines a hyperplane.
Locally Convex Spaces 247

9.2.1 Proposition. Let f be a nontrivial linear functional on a TVS X. Then f is an open


map, and for each x0 6∈ ker f
X = K x0 ⊕ ker f. (9.5)
Proof. For the first part, it suffices to prove that f (U ) is open for every open neighborhood
of U of zero. By 9.1.2, we may take U to be balanced. Choose x such that f (x) = 1. By
continuity of scalar multiplication at zero, for sufficiently small ε > 0 we have εx ∈ U . Since
U is balanced, |c| < ε ⇒ cx ∈ U ⇒ c = f (cx) ∈ f (U ). Therefore, f (U ) contains the open
disk Bε (0) in K and so is open.
To verify (9.5), for x ∈ X write
 
f (x) f (x)
x= x0 + x − x0
f (x0 ) f (x0 )

and observe that the second term is in ker f . Therefore, X = K x0 + ker f . The sum is direct
since if cx0 ∈ ker f , then 0 = f (cx0 ) = cf (x0 ), hence c = 0.
9.2.2 Proposition. Let f be a linear functional on a TVS X. The following statements
are equivalent:
(a) f is continuous.

(b) ker f is closed.

(c) f is bounded on some neighborhood of zero.

Proof. That (a) ⇒ (b) is clear. For (b) ⇒ (c) we may assume that f is not identically zero.
Let x 6∈ ker f and choose a neighborhood U of 0 such that (U + x) ∩ ker f = ∅. By 9.1.2,
we may assume that U is balanced. We claim that f is bounded on U . If not, then for any
c ∈ K there exists u ∈ U such that |f (u)| > |c|. Setting a := c/f (u) we have |a| < 1 and
so c = f (au) ∈ f (U ). Thus f (U ) = K, and in particular f (u) = −f (x) for some u ∈ U .
But this contradicts (U + x) ∩ ker f = ∅. Therefore, f (U ) must be bounded.
To prove (c) ⇒ (a), let |f (u)| < r for all u in a neighborhood U of zero. If xα → 0 and
ε > 0, then eventually (r/ε)xα ∈ U and so |f (xα )| < ε. Therefore, f is continuous.
It is possible for a TVS not to have any nontrivial continuous linear functionals, as the
following example demonstrates.
9.2.3 Example. We show that the space L0 [0, 1] of Example 9.1.7 has no nontrivial
continuous linear functionals. Let F be such a functional and choose f ∈ L0 [0, 1] such that
F (f ) 6= 0. Next, choose whichever of the functions f 1[0,1/2) or f 1[1/2,1] , call it f1 , has the
property F (f1 ) 6= 0, and note that λ{f1 6= 0} ≤ 1/2. By induction, we obtain a sequence
(fn ) such that αn := F (fn ) 6= 0 and λ{fn 6= 0} ≤ 1/2n . Set gn := αn−1 fn . Then
Z Z
|fn | |fn |
d(gn , 0) = dλ = dλ ≤ λ{fn 6= 0} → 0,
|αn | + |fn | |fn |6=0 |αn | + |fn |

hence gn → 0 yet F (gn ) = 1 for all n. ♦

We shall see in the next section that, unlike the TVS case, a LCS always has a rich supply
of continuous linear functionals.
248 Principles of Analysis

Continuity on Locally Convex Spaces


Continuity in a LCS may be conveniently characterized in terms of seminorms.
9.2.4 Proposition. Let X be a LCS with generating family P of seminorms and let f be
a linear functional on X. The following statements are equivalent:
(a) f is continuous.
(b) There exist p1 , . . . , pn ∈ P and M > 0 such that |f (x)| ≤ M maxj pj (x) for all x.
(c) There exists a continuous seminorm q on X such that |f (x)| ≤ q(x) for all x.

Proof. (a) ⇒ (b): By continuity of f at zero, there exists a basic neighborhood U of zero
as in (9.1) such that |f (u)| < 1 for all u ∈ U . Set p(x) = maxj pj (x). For any x ∈ X and
δ > 0, εx/(p(x) + δ) ∈ U hence |f (x)| < ε−1 (p(x) + δ). Letting δ → 0 yields (b) with
M = 1/ε.
(b) ⇒ (c): Take q = M maxj pj (x).
(c) ⇒ (a): If xα → 0, then q(xα ) → 0, hence f (xα ) → 0.

Continuity on Finite Dimensional Spaces


The following theorem shows that Hausdorff vector topologies on finite dimensional spaces
are unique.
9.2.5 Theorem. Let X be a finite dimensional TVS with dimension d. Then X is topologi-
cally isomorphic to Kd .
Proof. Let v1 , . . . , vd be a basis for X. We show that the algebraic isomorphism
T (x1 , . . . , xd ) = x1 v1 + · · · + xd vd is a homeomorphism. By continuity of the vector
operations, T is continuous. We show that there exists a neighborhood U of zero in X such
that U ⊆ T (B1 ), where B1 is the open unit ball in Euclidean space Kd , which will prove
that T is an open map and hence is a topological isomorphism.
Since the unit sphere S1 in Kd is compact, T (S1 ) is compact in X. Let U Tbe the collection
of
T all closed, balanced neighborhoods of zero in X. Since X is Hausdorff U = {0}, hence
U ∈U U ∩ T (S1 ) = ∅. Since T (S1 ) is compact it cannot have the finite intersection property,
hence there exist U1 , . . . , Un such that U := U1 ∩ . . . ∩ Un does not meet T (S1 ). Therefore,
U ⊆ T (S1c ). Since U is balanced, U ⊆ T (B1 ), as required.

Since every linear functional on Euclidean space is continuous, we have


9.2.6 Corollary. A linear functional on a finite dimensional TVS is continuous.

Exercises
9.12 Let X be a real TVS and f a linear functional on X such that {f ≤ t} is closed for some t.
Show that f continuous.

9.13 Let X be a TVS and f a linear functional on X such that ker f is not dense in X. Show that
f is continuous.

9.14 Let X and Y be locally convex spaces. Show that a linear transformation T : X → Y is
continuous iff p ◦ T is continuous for every continuous seminorm p on Y.

9.15 Let X be a TVS, f ∈ X 0 a nontrivial real linear functional, and t ∈ R. Let C = {x : f (x) ≤ t}
and U = {x : f (x) < t}. Show that cl U = C and int C = U .
Locally Convex Spaces 249

9.3 Hahn-Banach Separation Theorems


The theorems in this section consider the problem of separating disjoint convex sets by
hyperplanes. The proofs rely on the Minkowski functional of a neighborhood of zero, defined
in §9.1.

Weak Separation in a TVS


9.3.1 Theorem. Let A and B be nonempty, disjoint, convex subsets of a TVS X. If A is
open, then there exist f ∈ X 0 and t ∈ R such that

Re f (x) < t ≤ Re f (y) for all x ∈ A and y ∈ B. (9.6)

f =t
f <t
f >t
A

FIGURE 9.1: Separation by a hyperplane.

Proof. Suppose first that K = R. Fix x0 ∈ A and y0 ∈ B and let z0 := y0 − x0 . The set
U := A − B + z0 is convex, contains zero, and is open, the last property because U is a
union of the open sets A − y + z0 (y ∈ B). Let p be the Minkowski functional of U . Since
A and B are disjoint, z0 6∈ U , hence p(z0 ) ≥ 1 by (9.3). Define g on the one-dimensional
space Y := Rz0 by g(cz0 ) = c. Then g ≤ p on Y, hence g extends to a linear functional f
on X with f ≤ p. (8.5.1). Since p < 1 on U , −ε < f < ε on the open set −εU ∩ εU , hence f
is continuous at zero and therefore everywhere. If x ∈ A and y ∈ B, then x − y + z0 ∈ U ,
hence
f (x) − f (y) = f (x − y + z0 ) − 1 ≤ p(x − y + z0 ) − 1 < 0
and so f (x) < f (y). Since convex sets are connected, f (A) and f (B) are disjoint intervals
in R, hence f (A) lies to the left of f (B). Moreover, since A is open and f is nontrivial, f (A)
is open. Therefore, we may take t in (9.6) to be the right endpoint of f (A).
For the case K = C, apply the first part to X as a real linear space to obtain a real
linear functional fr that satisfies fr (x) < t ≤ fr (y) for all x ∈ A and y ∈ B. Then
f (x) := fr (x) − ifr (ix) defines a complex linear functional satisfying (9.6).

Strict Separation in a LCS


In Theorem 9.3.1 it is possible that the hyperplane f = t intersects B. The next theorem
asserts that under suitable conditions one actually has strict separation.
9.3.2 Theorem. Let A and B be nonempty disjoint closed convex subsets of a LCS X. If
A or B is compact, then there exists f ∈ X 0 such that

sup{Re f (x) : x ∈ A} < inf{Re f (y) : y ∈ B}. (9.7)


250 Principles of Analysis

Proof. Suppose A is compact. Let U0 be a neighborhood base at zero of open convex sets.
We claim that there exists U ∈ U0 such that (U + A) ∩ B = ∅. Assuming this and noting
that C := U + A is open and convex, we may choose by 9.3.1 f ∈ X 0 and t ∈ R such that

Re f (x) < t ≤ Re f (y) for all x ∈ C and y ∈ B.

Since A is a compact subset of C,

sup{Re f (x) : x ∈ A} < t ≤ Re f (y) for all y ∈ B,

proving (9.7).
To verify the claim, for each x ∈ A ⊆ B c choose Vx ∈ U0 such that x + Vx ⊆ B c .
Next, choose Ux ∈ U0 so that Ux + Ux ⊆ Vx . This is possible by continuity of addition at
(0, 0). Then the sets x + Ux + US x and B are disjoint. Moreover,
Tby compactness, there exist
n n
x1 , . . . , xn ∈ A such that A ⊆ j=1 (xj + Uxj ). Setting U = j=1 Uxj , we have
n
[ n
[
A+U ⊆ (xj + Uxj + U ) ⊆ (xj + Uxj + Uxj ) ⊆ B c ,
j=1 j=1

verifying the claim and completing the proof for case A compact.
If B is compact, then reversing the roles of A and B yields

sup{Re f (x) : x ∈ B} < inf{Re f (y) : y ∈ A}.

Equation (9.7) then holds with f replaced by −f .


Here is an important variant of the preceding theorem.
9.3.3 Theorem. Let A and B be nonempty, disjoint, closed, convex subsets of a LCS X.
If A is balanced and either A or B is compact, then there exists f ∈ X 0 such that

sup{|f (x)| : x ∈ A} < inf{|f (y)| : y ∈ B}. (9.8)

Proof. Let f be as in 9.3.2 and choose t with

sup{Ref (x) : x ∈ A} < t < inf{Ref (y) : y ∈ B}.

For x ∈ A, write |f (x)| = eiθ f (x) = f (eiθ x) = Re f (eiθ x). Since eiθ x ∈ A we have
sup{|f (x)| : x ∈ A} < t < Re f (y) ≤ |f (y)| for all y ∈ B, verifying (9.8).

Some Consequences of the Separation Theorems


The following are generalizations of results proved earlier for normed spaces.
9.3.4 Theorem. Let Y be a closed subspace of a LCS X and let x0 6∈ Y. Then there exists
an f ∈ X 0 such that f (x0 ) = 1 and f (Y) = 0. In particular, if x1 6= x2 then there exists
an f ∈ X 0 such that f (x1 ) 6= f (x2 ), that is, X 0 separates points of X.
Proof. Take A = Y and B = {x0 } in 9.3.2 to obtain g ∈ X 0 with

sup{Re g(y) : y ∈ Y} < Re g(x0 ).

But because Y is a linear space, Re g(Y) cannot be bounded above unless Re g(Y) = {0},
which then implies that g(x0 ) 6= 0. Since Im g(y) = −Re g(iy), we have g(Y) = {0}. Now
take f = g/g(x0 ).
Locally Convex Spaces 251

9.3.5 Corollary. Let Y be a subspace of X and let g be a continuous linear functional on


Y. Then there exists an f ∈ X 0 such that f = g on Y.
Proof. We may assume that g is not identically zero. Choose y1 ∈ Y such that g(y1 ) = 1.
By 9.3.4 applied to the closure of Y0 := ker g in X, there exists f ∈ X 0 such that f (y1 ) = 1
and f (Y0 ) = 0. If y ∈ Y, then y − g(y)y1 ∈ Y0 and so

f (y) − g(y) = f (y) − g(y)f (y1 ) = f y − g(y)y1 = 0.

Therefore, f extends g.
9.3.6 Corollary. A finite dimensional subspace Y of a LCS X is closed.
Pd
Proof. Let y1 , . . . , yd be a basis for Y. Then y = j=1 gj (y)yj (y ∈ Y), where gj is a
linear functional on Y. By 9.2.6, gj is continuous and so has a continuous extension fj ∈ X 0 .
Therefore, if yα ∈ Y and yα → x ∈ X we have
X X
x = lim fj (yα )yj = fj (x)yj ∈ Y.
i
j j

9.3.7 Corollary. A LCS X is finite dimensional iff it has a compact neighborhood of zero.
Proof. If X is finite dimensional, then X is topologically isomorphic to Kd (9.2.5), proving
the necessity.
For the sufficiency, let V be a neighborhood of zero in X with compact closure. Then
there exists a finite subset F of X such that
[ 
cl V ⊆ x + 12 V = F + 21 V.
x∈F

Let Y be the finite dimensional subspace of X spanned by F . We claim that


1
V ⊆Y+ 2n V for all n. (†)

This is clear for n = 1. If the assertion holds for n, then


1
2V ⊆ 12 Y + 1
2n+1 V =Y+ 1
2n+1 V,

hence
V ⊆ Y + 12 V ⊆ Y + Y + 1
2n+1 V =Y+ 1
2n+1 V,

verifying the claim.


Now let p be any continuous seminorm on X. Since cl V is compact, s := sup p(V ) < ∞.
Let v ∈ V . Using (†), we may write v = yn + 2−n wn for some yn ∈ Y and wn ∈ V . Then
p(v − yn ) = 2−n p(wn ) ≤ 2−n s, so for sufficiently large n, p(v − yn ) < 1. Since p was
arbitrary, v ∈ cl Y. Since Y is finite dimensional, cl Y = Y (9.3.6). We have shown that
V ⊆ Y. Now, for any x ∈ X, eventually x/n ∈ V and so x ∈ Y. Therefore, X = Y, hence
X is finite dimensional.
9.3.8 Corollary. Let X be a LCS that is the algebraic direct sum of subspaces Y and Z
with Y finite dimensional. Then the direct sum is topological, that is, the linear isomorphism
(y, z) 7→ y + z is a homeomorphism in the product topology.
a basis for Y. By 9.3.4 there exist f1 , . . . , fn ∈ X 0 such that
Proof. Let {y1 , . . . , yn } be P
n
fi (xj ) = δij . Define P x = j=1 hx, fj i xj . Then P is a continuous projection and the
continuity of y + z → (y, z) follows as in 8.4.3 (using nets).
252 Principles of Analysis

The Bipolar Theorem


Let X = Xτ be a LCS. The polars of A ⊆ X and B ⊆ X 0 are defined by

A0 = {f ∈ X 0 : |f (x)| ≤ 1 ∀ x ∈ A} and 0
B = {x ∈ X : |f (x)| ≤ 1 ∀ f ∈ B}.

It is easy to check that each polar is convex and balanced.


Polars are related to annihilators, introduced in §8.9. Indeed, if A and B are linear
subspaces, then the two constructions coincide (Ex. 9.17). Here is the polar analog of 8.9.3.
9.3.9 Theorem. If A ⊆ X, then 0A0 is the τ -closed, convex, balanced hull of A.

Proof. Let C denote the closed convex balanced hull of A. Since A ⊆ 0A0 and 0A0 is
closed, convex, and balanced, C ⊆ 0A0 . For the reverse inclusion, let y ∈ C c and choose
f ∈ X and t ∈ R so that sup{|f (x)| : x ∈ C} < t < |f (y)| (9.3.3). Set g := f /t. Then
sup{|g(x)| : x ∈ A} < 1 < |g(y)|, hence y 6∈ 0A0 .

Exercises
9.16 A half-space in Rd is a set of the form {x ∈ Rd : a1 x1 + · · · + ad xd ≤ a}. Show that a closed,
convex subset C of Rd is the intersection of all half-spaces that contain it.

9.17 Let X be a LCS, A ⊆ X, and B ⊆ X 0 . The annihilators of A and B are defined as for normed
spaces by

A⊥ = f ∈ X 0 : hx, f i = 0 ∀ x ∈ A and ⊥

B = {x ∈ X : hx, f i = 0 ∀ f ∈ B} .

Prove that if Y is a subspace of X and Z is a subspace of X 0 , then Y ⊥ = Y 0 and ⊥


Z = 0 Z.

9.18 Show that Y is dense in X iff Y ⊥ = {0}.

9.19 Show that if A and B are open in 9.3.1, then there exists t ∈ R such that Ref (x) < t < Ref (y)
for all x ∈ A and y ∈ B.

9.20 Let A, B, and Ai (i ∈ I) be subsets of a LCS X. Prove that


S 0 T
(a) A ⊆ B ⇒ B 0 ⊆ A0 . (b) i∈I Ai = i∈I A0i . (c) (cA)0 = c−1 A0 , c 6= 0 ∈ K.

*9.4 Some Constructions


Product Spaces
Q
Let {Xi } be a family of TVS and let X denote the product vector space i∈I Xi . If
πi : X → Xi denotes the projection map, then a net (fα ) converges to f in the product
topology iff πi (fα ) → πi (f ) for each i. Since the projections are linear, X is easily seen to
be a TVS in the product topology.
Now assume that each Xi is a locally convex space. Consider seminorms on X of the
form pF (f ) = max{pi f (i) : i ∈ F }, where pi is a continuous seminorm on Xi and F ⊆ I
is finite. By the first paragraph, the family (pF ) generates a locally convex topology which
is the product topology.
Locally Convex Spaces 253

Quotient Spaces
The following results generalize theorems in §8.4 on quotients of normed spaces.
9.4.1 Theorem. Let X be a TVS, Y a closed subspace of X, and X/Y the algebraic
quotient space with quotient map Q : X → X/Y. Then X/Y is a TVS in the quotient
topology and Q is an open map. Moreover, if X is locally convex (Fréchet), then X/Y is
locally convex (Fréchet).
Proof. Recall that the quotient topology on X/Y is the strongest topology relative to which
Q is continuous; equivalently, W is open in X/Y iff Q−1 (W ) is open in X. Now, if U is
open in X, then [

Q−1 Q(U ) = U + Y = U + y,
y∈Y

which is open in X. Therefore, Q(U ) is open in X/Y, hence Q is an open map.


To see that the quotient topology is Hausdorff, suppose that Q(x1 ) 6= Q(x2 ), so that
x1 −x2 is in the open set Y c . Choose a neighborhood of zero in X such that x1 −x2 +U ⊆ Y c .
By continuity of the vector difference operation, there exists a neighborhood V of zero such
that V − V ⊆ U . Then Q(x1 + V ) and Q(x2 + V ) are disjoint neighborhoods Q(x1 ) and
Q(x2 ), respectively.
To see that the quotient topology is a vector topology, consider nets Q(xα ) → Q(x) and
Q(y α ) → Q(y). A typical neighborhood of Q(x) + Q(y) = Q(x + y) in X/Y is of the form
x + y + Q(U ), where U is a neighborhood of zero in X. By continuity of addition, there
exists a neighborhood V of zero in X such that V + V ⊆ U . Then, eventually,
Q(xα + y α ) = Q(xα ) + Q(y α ) ∈ Q(x) + Q(V ) + Q(x) + Q(V ) ⊆ Q(x + y) + Q(U ),
which shows that Q(xα + y α ) → Q(x + y). Therefore, vector addition in Z is continuous.
A similar argument shows that scalar multiplication is continuous.
For the last assertion, note that if U is a basis of open, convex neighborhoods of zero in
X, then Q(U) is a basis of open, convex neighborhoods of zero in X/Y.
9.4.2 Corollary. Let X and Z be TVS and T : X → Z linear, continuous, surjective, and
open. Then X/ ker T is topologically isomorphic to Z.
Proof. Let Q denote the quotient map. Then S : Q(x) 7→ T (x) from X/ ker T to Z is well
defined, linear, bijective, and SQ = T . By definition of the final topology, S is continuous.
Since Q and T are open maps, S is open. Therefore, S is a topological isomorphism.
Tn
9.4.3 Corollary. Let X be a LCS and fj ∈ X 0 , j = 1, . . . , n. Let Y = j=1 ker fj . Then
X is topologically isomorphic to a direct product Y × Kd for some d ≤ n.
Proof. Let T (x) = (f1 (x), . . . , fn (x)). Then ker T = Y and T maps X onto a d-dimensional
subspace of Kn , which we may identify with Kd . Choose x0 ∈ X such that T (x0 ) =
(1, 1, . . . , 1) and let U be any convex balanced neighborhood of zero. Then (−ε, ε)x0 ⊆ U
for sufficiently small  ε > 0, hence T (U ) contains the d-dimensional open neighborhood of
zero T (−ε, ε)x0 = (−ε, ε) × · · · × (−ε, ε). Therefore, T : X :→ Kd is open. By 9.4.2, X/Y
is topologically isomorphic to Kd under a mapping S with SQ = T . In particular, there exist
x1 , . . . , xd ∈ X such that SQ(xj ) = T xj = (0, . . . , 0, 1, 0 . . . , 0), 1 in the jth position. Let
Pj denote the jth projection map on Kd and define
 Xd 

Rx = x − Pj T (x) xj , T x .
j=1

Then R is a linear isomorphism of X onto Y × Kd . Moreover, it is easy to see that xα → 0


iff Rxα → 0. Therefore, R is the desired topological isomorphism.
254 Principles of Analysis

Strict Inductive Limits


Let X be a vector space and (Xn ) a sequence of subspaces with union X such that
Xn ( Xn+1 for all n. Let Xn have a locally convex topology τ n such that the relative
topology on Xn induced by τ n+1 is τ n . The sequence (Xn , τ n ) is called a strict inductive
system for X. Such a system gives rise to a locally convex topology τ on X which will
have important applications in Chapter 15 on distributions. For the construction of τ we
need the following lemmas.
9.4.4 Lemma. Let Y be a LCS and Z a linear subspace of Y. If U is a convex, balanced,
open neighborhood of Z in the relative topology induced by Y, then there exists a convex,
balanced, open neighborhood V of zero in Y such that V ∩ Z = U . Moreover, if y 6∈ cl Z,
then V may be chosen to exclude y.
Proof. Choose a convex, balanced, open neighborhood W of zero in Y such that W ∩Z ⊆ U ,
and let V be the convex hull of U ∪ W . Since U and W are balanced, it is readily verified
that V is balanced. Moreover, V ∩ Z ⊇ U . To show equality, note that, by the convexity
of U and W , a member z of V ∩ Z may be written as z = tu + (1 − t)w, where u ∈ U ,
w ∈ W , and 0 ≤ t ≤ 1. If t = 1 then z = u ∈ U . If t < 1 then w = (1 − t)−1 (z − tu) ∈ Z,
hence w ∈ U and so, again, z ∈ U . Therefore, V ∩ Z = U .
To see that V is open, note first that for each t ∈ [0, 1), the set tU + (1 − t)W is open in
Y, as it is the sum
S of two sets, one of which is open in Y. By the S observation in the first
paragraph, V = t∈[0,1] [tU + (1 − t)W ]. We claim that, in fact, V = t∈[0,1) [tU + (1 − t)W ].
For the verification, we show that each u ∈ U is in the set tU + (1 − t)W for some 0 < t < 1.
To this end choose r > 0 so that ru ∈ W and write

u = t t−1 [1 − r + tr]u + (1 − t)ru, 0 < t < 1.

The braced expression is in Z and tends to u as t → 1. Since U is open in Z, the braced


expression is in U for some t < 1. Since ru ∈ W , we see that u ∈ tU + (1 − t)W , verifying
the claim and proving that V is open.
For the last assertion, choose W so that (y + W ) ∩ Z = ∅. If y = tu + (1 − t)w, where
u ∈ U , w ∈ W , and 0 ≤ t ≤ 1, then the vector y − (1 − t)w = tu is simultaneously in
y + W and U , impossible since U ⊆ Z.
9.4.5 Lemma. Every continuous seminorm q on Z extends to a continuous seminorm p
on Y. Moreover, if y 6∈ cl Z, then p may be chosen so that p(y) ≥ 1.
Proof. Take U := {z ∈ Z : q(z) < 1} in 9.4.4 and set p = pV , the Minkowski functional of
the set V of 9.4.4. Since V is open, p is continuous (9.1.4). Since V ∩ Z = U ,

{z ∈ Z : p(z) < 1} = {z ∈ Z : q(z) < 1},



hence p Z = q (Ex. 9.9). If y 6∈ cl Z and V is chosen so that y 6∈ V , then p(y) ≥ 1.
Here is the main result of the subsection:

9.4.6 Theorem. Let (Xn , τ n ) be a strict inductive system for X.


(a) There exists a strongest locally convex topology τ on X such that the relative topology
on Xn induced by τ is τ n .
(b) A seminorm p on X is τ -continuous iff its restriction to Xn is τ -continuous for each n.
(c) τ is Hausdorff iff each τ n is Hausdorff.
Locally Convex Spaces 255

Proof.
Let P denote the family of all seminorms on X with property that the restriction
p Xn is a continuous seminorm on Xn . The identically zero seminorm obviously has this
property, hence P is nonempty. Let τ denote the locally convex topology on X generated by
P and let τ 0n denote the relative topology on Xn induced by τ . By Ex. 0
9.8, τ n is generated
by the collection P Xn . Since, by definition, the seminorms in P Xn are τ n -continuous,
τ 0n ≤ τ n (Ex. 9.11). To show that τ n ≤ τ 0n , it suffices to show that every τ n -continuous
seminorm pn may be extended to a τ -continuous seminorm p on X. Indeed, it will then
follow that pn is continuous in the relative topology, implying the inequality. To construct
the extension, we use 9.4.5. By induction, for each m ≥ n there exists aSτ m+1 -continuous
seminorm pm+1 on Xm+1 such that pm+1 Xm = pm . Define p on X = m≥n Xm so that
p = pm on each Xm . Then p is a well-defined seminorm on X and by construction p ∈ P.
Now let σ be a locally convex topology with property that the relative topology on Xn
induced by σ is τ n . If q is a σ -continuous seminorm on X, then U := {x ∈ X : q(x) < 1}
is σ -open hence U ∩ Xn is τ n -open, which implies that q Xn is τ n -continuous (Ex. 9.10).
Thus q ∈ P, hence σ ≤ τ .
It remains to verify (d). Assume that Xn is Hausdorff for all n. Let x ∈ X and x = 6 0.
Then x ∈ Xn for some n, hence there exists a continuous seminorm pn with pn (x) 6= 0.
By the preceding, pn extends to a τ -continuous seminorm p on X. Since p(y) 6= 0, τ is
Hausdorff. The converse is similar.
The space X with the topology τ is called the inductive limit of the system (Xn , τ n ).
9.4.7 Corollary. Let each Xn be a Fréchet space. Then the inductive limit topology τ has
the following properties:
(a) A sequence (xn ) τ -converges to x in X iff there exists a k such that (xn ) ⊆ Xk and
xn → x in the topology τ k .
(b) If T is a linear mapping from X to a LCS Y, then T is τ -continuous iff for each k the
restriction of T to Xk is τ k -continuous. In particular, τ -continuity and τ -sequential
continuity of linear maps on X are equivalent.
Proof. (a) The sufficiency is clear. For the necessity, we may take x = 0. Suppose, for a
contradiction, that the necessity is false. Thus for each k, xn 6∈ Xk for infinitely many n. Set
Y1 = X1 and choose xn1 6∈ Y1 . Next, choose j > 1 such that xn1 ∈ Xj and set Y2 = Xj .
Continuing in this manner, we obtain a subsequence (yk := xnk ) of (xn ) and a subsequence
(Yk ) of (Xn ) such that Yk ↑ X and yk ∈ Yk+1 \ Yk . It is easy to see that the inductive limit
of (Yk ) is the same as that of (Xn ) (Ex. 9.21). Now let p1 be a continuous seminorm on Y1
such that p1 (y1 ) = 1. By the construction in the proof of 9.4.6, there exists a continuous
seminorm p on X that extends p1 such that p|Yk is a continuous seminorm on Yk for each
k. Incorporating the second assertion of 9.4.5 into this construction shows that p may be
chosen so that p(yk ) ≥ 1 for all k. Then (yk ) cannot converge to zero in X.
(b) The necessity is clear. For the sufficiency, let q be any continuous seminorm on Y.
Then p := q ◦ T is a seminorm on X. Since T |Xn is continuous, p|Xn is continuous, so p is
continuous on X. Therefore, T is continuous (Ex. 9.14).

Exercises
9.21 Let (Xn , τ n ) be a strict inductive system for X and let (nk ) be a strictly increasing sequence
of positive integers. Set Yk = Xnk and σk = τ nk . Show that the inductive limit of (Yk , σk ) is
the same as that of (Xn , τ n ).
9.22 Show that X is not a Fréchet space. JAssume the contrary. Choose xn ∈ Xn+1 \ Xn and εn > 0
so that d(εn xn , 0) < 1/n and apply (a) of 9.4.7.K
Chapter 10
Weak Topologies on Normed Spaces

In this chapter we consider two important locally convex topologies: the weak topology on a
normed space X and the weak∗ topology on its dual X 0 . The chapter relies on some of the
material developed in Sections 9.1–9.3.

10.1 The Weak Topology


Definition and General Properties
The weak topology on a LCS X = Xτ is the initial topology induced by the family
of functions X 0 (see §0.6). We denote this topology by w and the space X with the weak
topology by Xw . Net convergence in Xw is described by
w
xα → x iff f (xα ) → f (x) for every f ∈ X 0 .

In particular, if xα → x and yα → y in X and cα → c in K, then, by applying continuous


linear functionals, we have

xα + yα → x + y and cα xα → cx.

It follows that w is a vector topology. By 0.6.4, a neighborhood base at zero is given by the
open, convex, balanced sets

U (f1 , . . . , fk ; ε) := y : |fj (y)| < ε, j = 1, . . . , k , fj ∈ X 0 , ε > 0. (10.1)

Thus Xw is a LCS with generating seminorms pf (x) = |f (x)|. (The separating property is
a consequence of 9.3.4.)
By definition of initial topologies, every member of Xτ0 is w continuous, and since w ≤ τ ,
0
every member of Xw is τ continuous. Thus

Xw 0 = Xτ 0 and (Xw )w = Xw .

For the remainder of the chapter, we shall be mainly concerned with the weak topology on
normed spaces rather than on general LCS. (We return to the general case in later chapters.)
For ease and uniformity of notation, we frequently denote the norm topology on X by s (for
strong topology). The following result shows that for infinite dimensional normed spaces it is
always the case that w < s.
10.1.1 Proposition. If X is a normed space, then w = s iff X is finite dimensional.
Proof. Assume w = s. Then U := {x : kxk < 1} is w-open and hence Tncontains a neighbor-
hood of 0 of the form U0 := U (f1 , . . . , fn ; ε), as in (10.1). We then have j=1 ker fj ⊆ U0 ⊆ U ,

257
258 Principles of Analysis
Tn
and since U is norm bounded, j=1 ker fj = {0}. The linear map x 7→ (f1 (x), . . . , fn (x))
from X to Kn is therefore 1-1 and so X is finite dimensional.
Conversely, assume that X is finite dimensional. We may then identify X with Euclidean
space Kd for some d. Since the open ball Bn with center 0 and radius n has compact closure,
the weak and norm topologies agree on Bn (0.8.5). S Thus if U is norm open, then U ∩ Bn is
open in the weak topology for every n and so U = n U ∩ Bn is weakly open.

Weak Sequential Convergence


While sequences generally do not have the utility of nets, they have the advantage of
being easier to work with. The following theorem gives necessary and sufficient conditions
for a sequence to converge weakly. It is an immediate consequence of the Banach-Steinhaus
theorem applied to the functionals xbn on the Banach space X 0 .
w
10.1.2 Theorem. Let X be a normed space and (xn ) a sequence in X. Then xn → x ∈ X
iff the following conditions hold:
(a) supn kxn k < ∞;

(b) hxn , x0 i → hx, x0 i for all x0 in some subset D of X 0 with cl span D = X 0 .

If these hold, then kxk ≤ limn kxn k.


10.1.3 Corollary. Let (X, F, µ) be a σ-finite measure space. Then a sequence (fn ) in L1
converges weakly to some f in L1 iff the following conditions hold:
(a) supn kfn k1 < ∞ and
R
(b) limn E
fn dµ exists for all E ∈ F.
w
In particular, xn → x in `1 iff supn kxn k1 < ∞ and xn (j) → x(j) for each j.
Proof. The necessity is clear. For the sufficiency, define complex measures
Z
νn (E) = fn (x) dµ(x), E ∈ F.
E

By the Vitali-Hahn-Saks theorem (5.2.4), ν(E) := limn νn (E) defines a complex measure on
F. Moreover, ν  µ, hence dν = f dµ for some f ∈ L1 (µ). Thus
Z Z
lim fn 1E dµ = ν(E) = f 1E dµ, E ∈ F.
n

Taking D in 10.1.2 to be the collection of measurable indicator functions we see that (fn )
converges weakly to f in L1 .
Proposition 10.1.1, together with Corollary 0.5.6, imply that in every infinite dimensional
normed space there are nets that converge weakly but not strongly. The same assertion
cannot be made for sequences:
10.1.4 Theorem (Schur). A weakly convergent sequence in `1 (N) converges in norm. Thus
the notions of weak and norm sequential convergence in `1 (N) coincide.
Proof. Suppose the assertion is false. Then there exists a sequence (xn ) ∈ `1 and ε > 0 such
w
that xn → 0 and kxn k ≥ 5ε for all n. We construct a subsequence xnk and a member y of
the dual space `∞ such that |hxnk , yi| ≥ ε for all k, producing the desired contradiction.
w
Since xn → 0, xn (j) → 0 for all j. Set m0 = n0 = 1. Let n1 be an integer > n0
Weak Topologies 259
Pm0
such that |xn1 (j)| = |xn1 (m0 )| < ε, and let m1 be an integer > m0 such that
j=1
P∞ Pm1
j=m1 +1 |xn1 (j)| < ε. Next, let n2Pbe an integer > n1 such that j=1 |xn2 (j)| < ε, and let

m2 be an integer > m1 such that j=m2 +1 |xn2 (j)| < ε. In this way we construct strictly
increasing sequences (mk ) and (nk ) such that
mk−1 ∞
X X
|xnk (j)| < ε and |xnk (j)| < ε ∀ k. (†)
j=1 j=mk +1

Now define y ∈ `∞ by y(j) = sgn xnk (j) (mk−1 < j ≤ mk ∈ N). Fix k and set

αj := xnk (j)y(j) − |xnk (j)|.

Then αj = 0 for mk−1 < j ≤ mk , hence


∞ mk−1 ∞
X X X
αj = αj + αj . (‡)
j=1 j=1 j=mk +1
P

Since |y(j)| ≤ 1, we have |αj | ≤ 2|xnk (j)| and so from (†) and (‡) j=1 αj ≤ 4ε.
Therefore, by definition of αj ,
X X X
∞ ∞




|hxnk , yi| = xnk (j)y(j) ≥ |xnk (j)| − αj ≥ ε,
j=1 j=1 j=1

as required.
Combining the last theorem with 10.1.3, we obtain
10.1.5 Corollary. A bounded sequence (xn ) in `1 (N) converges in norm to x iff xn (j) →
x(j) for each j.
Note that Theorem 10.1.4 does not hold in `p for 1 < p < ∞ (see Ex. 10.1).

Convexity and Closure in the Weak Topology


Since w ≤ s, every weakly closed subset of a normed space is norm closed. On the other
hand, 10.1.1 shows that in every infinite dimensional normed space there are norm closed
sets that are not weakly closed. Thus the notions of strong and weak closures in normed
spaces are generally distinct. However, for convex sets the two closures coincide. We prove
this in the general setting of a LCS:
10.1.6 Theorem. Let X = Xτ be a LCS and C a convex subset of X. Then clw C = clτ C.
Thus every τ -closed convex set is weakly closed.
c
Proof. Obviously, clτ C ⊆ clw C. Now let x0 ∈ cls C . By 9.3.2, there exist f ∈ X 0
and t ∈ R such that Re f (x0 ) < t < inf{Re f (y) : y ∈ clτ C}. The weak
c neighborhood
{x : Re f (x) < t} of x0 is therefore disjoint from C and so x0 ∈ clw C .
w
10.1.7 Corollary (Mazur’s Theorem). Let xn → x in a normed space X. Then there
exists a sequence of convex combinations of members of (xn ) that converges in norm to x.
Proof. Let C denote the set of all convex combinations of members of the sequence (xn ).
Pn w
Then zn := n−1 j=1 xj ∈ C and zn → x, hence by the theorem x is in the norm closure
of C, verifying the assertion.
260 Principles of Analysis

Application: Weak Bases
A sequence (en ) in a normed linear space X is a weak basis if ken k = 1 and for each
Pn X there exists a unique sequence (cn ) in K such that the sequence of partial sums
x ∈
j=1 cn en converges weakly to x. By analogy with the strong case, we may then write


X
x = w- cj (x)ej ,
j=1

where the cj are linear functionals satisfying cj (ei ) = δij . We show that if X is a Banach
space, then a weak basis is a Schauder basis.
Let X ∞ denote the linear space of all functions f = (f (1), f (2), . . .) : N → X such that
kf k∞ := supn kf (n)k < ∞. The space X ∞ is easily seen to be a Banach space under this
norm. Define a linear map T : X → X ∞ by
m
X w
T x = (S1 x, S2 x, . . .), where Sm x := cj (x)ej → x.
j=1

Note that kT xk∞ = supn kSn xk. We use the closed graph theorem to show that T is
continuous. Let xn → x in X and T xn → f in X ∞ . In particular, we have the coordinate-
wise convergence
lim Sm xn = f (m) for each m. (†)
n

We claim that
m
X
f (m) = αj ej for some αj ∈ C. (‡)
j=1

For m = 1, we have c1 (xn )e1 = S1 xn → f (1), hence f (1) = α1 e1 for some α1 ∈ C


(Ex. 8.2). If the assertion holds for m, then, since cm+1 (xn )em+1 = Sm+1 (xn ) − Sm (xn )
converges,

  m+1
X
f (m + 1) = lim Sm+1 xn = lim Sm xn + cm+1 (xn )em+1 = αj ej
n n
j=1

for some αm+1 . Therefore, the claim holds by induction. Now let x0 ∈ X 0 . From (†),

limhf (m), x0 i = lim limhSm xn , x0 i = lim limhSm xn , x0 i = limhxn , x0 i


m m n n m n
0
= hx, x i.

The interchange ofPlimits is justified because the convergence in (†) is uniform in m.



Therefore, x = w- j=1 αj ej , hence, by uniqueness, αj = cj (x) for all j. By (‡) we then
have f (m) = Sm x for all m, that is, f = T x, proving continuity of T .
It now follows that kSm x − xk → 0, as required. Indeed, since x is in the weak closure
of the span of (en ) and since thePweak closure is the same as the norm closure (10.1.6),
n
given ε > 0 we may choose yn := j=1 aj ej such that kx − yn k < ε(kT k + 1)−1 . But for
m ≥ n, Sm yn = yn since cj (ei ) = δij . Therefore, for such m,

kSm x − xk ≤ kSm (x − yn )k + kx − yn k ≤ kT k kx − yn k + kx − yn k ≤ ε.
Weak Topologies 261

Exercises
10.1 Find a sequence (xn ) in c0 that converges weakly but not strongly to zero. Do the same for `p ,
1 < p < ∞.
10.2 (von Neumann). Let 1 < p < ∞. For each pair m, n ∈ N with 1 ≤ m < n, define xm,n ∈ `p
m n
by xm,n := (0, . . . 0, 1 , 0 . . . , 0, m, 0, . . .). Let A be the set of all xm,n . Show that zero is in the
weak closure of A in `p (N), but no sequence in A converges in norm to zero.
10.3 Show that the sequence of functions xn (t) = tn in C[0, 1], k·k∞ converges weakly but not


strongly.
w
10.4 Show that xn → x in c iff the following hold:
(a) supn kxn k < ∞, (b) xn (j) → x(j) ∀ j, and (c) limn limj xn (j) = limj x(j).
10.5 Let (X, F, µ) be a σ-finite measure space and 1 < p < ∞. Show that a sequence (fn ) in Lp
converges weakly to f ∈ Lp iff the following hold:
(b) E fn dµ → E f dµ ∀ E ∈ F with µ(E) < ∞.
R R
(a) sup kfn k∞ < ∞,
w
10.6 Let X be a locally compact Hausdorff space. Show that fn → f in C0 (X) iff sup kfn k∞ < ∞
and fn → f pointwise on X.
10.7 Let X and Y be Banach spaces and T : X → Y linear. Show that T is norm continuous iff T is
weak-weak continuous.
10.8 Let X be a normed space. Show that if C is weakly compact, then {cx : x ∈ C, |c| ≤ r} is
weakly compact.
10.9 Let X be a Banach space. Prove the following:
(a) If X is infinite dimensional, then every weak neighborhood U of zero is unbounded.
(b) The weak topology of a normed space X is metrizable iff X is finite dimensional. JConsider
{x ∈ X : d(x, 0) < 1/n} and use the uniform boundedness principle.K
10.10 A sequence (xn ) in a normed space X is said to be weakly Cauchy if hxn , x0 i is Cauchy


in K for all x0 ∈ X 0 . The space X is weakly sequentially complete if every weakly Cauchy
sequence (xn ) in X converges weakly to a member of X. Prove:
(a) A weakly Cauchy sequence is norm bounded.
(b) `1 is weakly sequentially complete.
(c) c0 is not weakly sequentially complete.
(d) C[0, 1] with the uniform norm is not weakly sequentially complete.
10.11 Let X be compact and (fn ) a bounded sequence in C(X) that converges pointwise to f ∈ C(X).
Show that there exists a sequence of convex combinations of members of (fn ) that converges in
the uniform norm to X.
10.12 Prove that in an infinite dimensional normed space the weak closure of S1 is C1 . JSuppose
there exists x0 ∈ C1 \ clw S1 . Choose an open, convex, weak neighborhood U of zero such that
V := U + x0 does not meet S1 .K
10.13 Prove the following result on compact convergence of bounded nets: A bounded net (xα )
converges weakly to x0 in a normed space X iff hxα , f i → hx0 , f i uniformly in f on compact
subsets of X 0 .
JFor each norm compact K ⊆ X 0 and ε > 0, define

U (K; ε) := x ∈ X : supf ∈K | hx, f i | < ε .

If B a bounded subset of X, then, for each x0 ∈ B, the sets x0 + U (K; ε) ∩ B form a
neighborhood base of x0 in the relative weak topology of B.K
262 Principles of Analysis

10.2 The Weak∗ Topology


Definition and General Properties
Let X be a normed space. The weak∗ topology on X 0 , denoted by w∗ , is the initial
topology with respect to the family of functions X.b Since these functions are weakly
0 ∗ 0
continuous on X , w ≤ w := the weak topology on X . Net convergence in the weak∗ topology
is characterized by
w∗
fα → f iff hx, fα i → hx, f i for every x ∈ X.

It follows that w∗ is a vector topology. By 0.6.4, a neighborhood base at zero is given by the
open, convex, balanced sets

U (x1 , . . . , xk ; ε) := f ∈ X 0 : |f (xj )| < ε, j = 1, . . . , k . (10.2)

Therefore, Xw0 ∗ is a LCS with generating seminorms px (f ) = |f (x)|.


0
The Dual of Xw ∗

10.2.1 Proposition. Let X be a normed space. A w∗ -continuous linear functional ϕ on


X 0 is of the form x 0
b for some x ∈ X. Thus (Xw 0 b
∗ ) = X.

Proof. By definition of the w∗ -topology, there exist xj ∈ X and ε > 0 such that |ϕ(f )| < 1
for all f ∈ U := U (x1 , . . . , xT
k ; ε). In particular, if f (xj ) = 0 for all j then nf ∈ U for all
n ∈ N, hence ϕ(f ) = 0. Thus j ker x cj ⊆ ker ϕ, which implies that ϕ is a linear combination
Pk P
of the xcj , say ϕ = j=1 cj x cj (0.2.3). Therefore, ϕ = x, b where x = kj=1 cj xj .

b is dense in the weak∗ topology of X 00 .


10.2.2 Corollary. The space X
b in X 00 .
Proof. Suppose the assertion is false. Choose x00 ∈ X 00 not in the w∗ -closure of X
∗ 00
By 9.3.3, there exists a weak continuous linear functional F on X such that

b : x ∈ X} < |F (x00 )|.


sup{|F (x)|

By the proposition, F = fb for some f ∈ X 0 . Therefore, the preceding inequality becomes

sup{|f (x)| : x ∈ X} < |x00 (f )|.

But the left side is unbounded unless f = 0, in which case x00 (f ) = 0, impossible.
b
10.2.3 Corollary. Let A ⊆ X be finite. Then A00 ⊆ X.
w∗
Proof. Let ψ ∈ A00 . We show that ψ is weak∗ continuous on X 0 . Let fα → 0 in X 0 and
ε > 0. Then ε−1 fα (x) = fα (ε−1 x) → 0 for each x ∈ X, and since A is finite, there exists
α0 such that
sup{ε−1 |fα (x)| : x ∈ A} ≤ 1 for all α ≥ α0 .
For such α, ε−1 fα ∈ A0 , hence |ψ(fα )| < ε. Therefore, ψ(fα ) → 0.
Weak Topologies 263

The Banach-Alaoglu Theorem


The next theorem implies that w∗ < s for every infinite dimensional normed space.
10.2.4 Banach-Alaoglu Theorem. Let X be a normed space. Then the norm closed unit
ball C10 in X 0 is weak∗ compact.
Proof. The proof uses Tychonoff’s theorem (0.8.9).Q For each x ∈ X, let Kx denote the
compact set {z ∈ K : |z| ≤ kxk} and let K := x∈X Kx . Then K is the collection of all
functions f : X → K, linear or not, such that |f (x)| ≤ kxk. In particular, C10 ⊆ K. Since
the product topology of K is the topology of pointwise convergence on X, the relative
topology on C10 from K is precisely the weak∗ topology. Now let (fα ) be a net in C10 that
converges pointwise to a member f of K. Then f is linear and since |f (x)| ≤ kxk for all
x, f ∈ C10 . Therefore, C10 is closed in K. Since K is compact in the topology of pointwise
convergence, C10 is w∗ -compact.
10.2.5 Corollary (Banach). If X is separable, then the norm closed unit ball C10 in X 0 is
metrizable in the weak∗ topology and is weak∗ sequentially compact.
Proof. Let (xn ) be a sequence with dense span in X. Then

X
d(f, g) = 2−k |f (xk ) − g(xk )|
k=1

w∗
defines a metric on C10 such that d(fα , f ) → 0 iff fα → f . Therefore, the metric and
w∗ -topologies agree on C10 , and the conclusion follows from the theorem.

Application: Means on Function Spaces
Let S be a set and F a norm-closed, conjugate-closed, linear subspace of B(S) that
contains the constant functions. A mean on F is a linear functional m on F such that

inf Re f (s) ≤ m(Re f ) ≤ sup Re f (s), f ∈ F. (10.3)


s∈S s∈S

(See 8.6.) The set of means on F is denoted by M (F).


10.2.6 Proposition. A mean m on F has the following properties:
(a) m(1) = 1.

(b) m is positive, that is, f ≥ 0 ⇒ m(f ) ≥ 0.

(c) m(Re f ) = Re m(f ), m(Im f ) = Im m(f ), and m(f ) = m(f ).

(d) m is a bounded linear functional with norm one.

(e) m(f ) is in the closure of co f (S) in C.

Conversely, a linear functional m on F that satisfies (a) and (b) is a mean.


Proof. Parts (a) and (b) are clear from 10.3.
(c) If f = g + ih, where g = Ref = (f + f )/2 and h = Imf = (f − f )/2i, then
m(f ) = m(g) + im(h), and m(f ) = m(g − ih) = m(g) − im(h), Since (10.3) implies that
m(g) and m(h) are real, (c) follows.
264 Principles of Analysis

(d) Let |m(f )| = eiθ m(f ) = m eiθ f . Since the last expression is real, we have from (c)

|m(f )| = m Re (eiθ f ) ≤ sup Re eiθ f (s) ≤ kf k∞ ,
s∈S

which shows that kmk ≤ 1. Since m(1) = 1, kmk = 1.


(e) Let H be any closed half-space in C = R2 containing f (S), say H = {(x, y) : ax+by ≤ c}.
If f = g + ih, then ag + bh ≤ c, hence by positivity, am(g) + bm(h) ≤ m(c) = c and so
m(f ) = m(g) + im(h) ∈ H. Since the closed convex hull of f (S) is the intersection of all
half spaces containing f (S) (Ex. 9.16), part (e) follows.
The last assertion of the proposition is clear.
10.2.7 Theorem. M (F) is convex and w∗ -compact and is the w∗ -closed convex hull of δS ,
where δs (f ) := f (s).
Proof. It is easy to see that a convex combination of means is a mean. If (mα ) is a net in
w∗
M (F) and mα → m in F 0 , then (10.3) holds for each mα and so must hold for m. Therefore,
M (F) is w∗ -closed, hence, by Alaoglu’s theorem, is w∗ -compact. Let C denote the w∗ -closed
convex hull of δS . If m ∈ M (F) \ C, then by the separation theorem there exists f ∈ F
such that
sup Re f (S) = sup{Re hf, δs i : s ∈ S} < Re m(f ),
contradicting 10.3. Therefore, M (F) = C.
The proof of following corollary is an exercise (Ex. 10.27).
10.2.8 Corollary. Let G be a norm-closed, conjugate-closed subspace of F that contains
the constant functions. Then every mean on G extends to a mean on F.

Weak∗ Continuity
Here is a significant extension of 10.2.1 for Banach spaces:
10.2.9 Theorem. Let X be a Banach space. If ϕ is a linear functional on X 0 whose
restriction to the closed unit ball C10 is w∗ -continuous, then ϕ = x
b for some x ∈ X.
Proof. Fix n ∈ N. By hypothesis, the set

U := {f ∈ C10 : | hf, ϕi | < 1/n}

is a w∗ -neighborhood of zero in C10 , hence there exists a weak∗ neighborhood V of zero in


X 0 such that V ∩ C10 ⊆ U . We may assume that

V := {f ∈ X 0 : |f (x)| ≤ 1, x ∈ A} = A0 , where A ⊆ X is finite.

Note that by the bipolar theorem, 0 V 0 = V . We claim that

nϕ ∈ V 0 + C100 , n ∈ N, (α)

where C100 is the closed unit ball in X 00 . To see this, note first that V 0 + C100 convex and
balanced, and since V 0 is weak∗ closed and C100 is weak∗ compact, V 0 + C100 is weak∗ closed
in X 00 (Ex. 9.1). If nϕ 6∈ V 0 + C100 , then by 9.3.3 and 10.2.1 there exists f ∈ X 0 such that

sup{|fb(ψ + φ)| : ψ ∈ V 0 , φ ∈ C100 } < 1 < |fb(nϕ)| = n|ϕ(f )|. (β)

In particular, kf k = kfbk = sup{|fb(φ)| : φ ∈ C100 } < 1 and sup{|fb(ψ)| : ψ ∈ V 0 } < 1. These


Weak Topologies 265

inequalities show that f ∈ C10 ∩ 0 V 0 = C10 ∩ V ⊆ U . But then |ϕ(nf )| < 1, contradicting (β).
Therefore, (α) holds.
b hence from (α) there exists x
By 10.2.3, V 0 = A00 ⊆ X, b n ∈ V 0 and φn ∈ C100 such that
nϕ = xn + φn (n ∈ N). It follows that
b

|ϕ(f ) − f (xn /n)| = |φn (f /n)| ≤ 1/n ∀ f ∈ C10 and n ∈ N. (γ)

By the triangle inequality,



kxn /n) − xm /mk = sup f (xn /n) − f (xm /m) : f ∈ C10 ≤ 1/n + 1/m ∀ n, m,

b by (γ).
hence (xn /n) is a Cauchy sequence. Setting x := limn xn /n, we have ϕ = x
10.2.10 Corollary. Let X and Y be Banach spaces and T : X 0 → Y linear. If T restricted
to the closed unit ball C10 of X 0 is weak∗ -weak continuous, then T is weak∗ -weak continuous
on X 0 .
Proof. For each y0 ∈ Y 0 , the map x0 7→ hT x0 , y0 i is w∗ -continuous on C10 , hence there
exists x depending on y0 such that hT x0 , y0 i = hx, x0 i for all x0 ∈ X 0 . Thus hT x0 , y0 i is
weak∗ continuous in x0 .

The Closed Range Theorem
Let X and Y be normed spaces and T ∈ B(X, Y). Recall that
 ⊥
ran T = ker T 0 and ⊥
[ran T 0 ] = ker T.

By the bipolar theorem applied to ran T ⊆ Xs and ran T 0 ⊆ Xw


0
∗ we then have

clw ran T = cl ran T = ⊥ [ker T 0 ] and clw∗ ran T 0 = [ker T ]⊥ .

With this context we may now prove


10.2.11 Theorem (Banach). Let X and Y be Banach spaces and T ∈ B(X, Y). The
following are equivalent:

(a) ran T is norm closed. (b) ran T 0 is w∗ closed. (c) ran T 0 is norm closed.

Proof. (a) ⇒ (b): By the preceding, it suffices to show that [ker T ]⊥ ⊆ ran T 0 . To this end
let x0 ∈ [ker T ]⊥ and define g on ran T by g(T x) = hx, x0 i. Then g is well-defined and
linear. We claim that g is continuous. For the verification, we use Ex. 8.76, which asserts
that for some c > 0 and each y ∈ ran T , the inequality kxk ≤ c kyk holds for some x with
T x = y. Let yn ∈ X such that yn → 0, and choose kxn k ≤ c kyn k such that T xn = yn .
Then xn → 0, hence g(yn ) = hxn , x0 i → 0, establishing the claim. By the Hahn-Banach
theorem there exists y0 ∈ Y’ that extends g, that is, hT x, y0 i = hx, x0 i for all x. It follows
that T 0 y0 = x0 , hence x0 ∈ ran T 0 .
(c) ⇒ (a): Let S : X → Z := cl ran T be the mapping T but with the indicated new
codomain. Let I : Z ,→ Y denote the inclusion map, so that T = IS and the dual map
I 0 : Y 0 → Z 0 is the restriction mapping. By the Hahn-Banach theorem, I 0 is surjective.
It follows that ran S 0 = S 0 (Z 0 ) = S 0 (I 0 (Y 0 )) = T 0 (Y 0 ) = ran T 0 , hence ran S 0 is closed.
Moreover, if z0 ∈ Z 0 and S 0 z0 = 0 then z0 = 0 on ran S = ran T hence z0 = 0 on Z.
Therefore, S 0 is 1-1 and so S 0 : Z 0 → ran S 0 is invertible. Thus there exists ε > 0 such that
kS 0 z0 k ≥ ε kz0 k for all z0 ∈ Z 0 . We claim that in the space Z, Bε ⊆ cl S(B1 ) (= cl S(C1 ));
it will follow from 8.7.4 that S is surjective, hence ran T = S(X) = Z = cl ran T , completing
266 Principles of Analysis

the proof. To verify the claim, let z ∈ Z \ cl S(C1 ). By 9.3.3, there exists z0 ∈ Z 0 with
norm one such that
sup{|hSx, z0 i| : kxk ≤ 1} < |hz, z0 i|.
The right side is ≤ kzk and the left side equals kxk ≤ 1} = kS 0 z0 k ≥ ε kz0 k = ε. Therefore,
kzk > ε, as required.
10.2.12 Corollary. T is surjective iff ran T 0 is closed and T 0 is injective. In this case, T 0
has a continuous inverse (T 0 )−1 : ran T 0 → Y 0 .
 ⊥
Proof. (Necessity). By the theorem, ran T 0 is closed. Moreover, ker T 0 = ran T = {0},
hence T 0 is injective. By the open mapping theorem, (T 0 )−1 : ran T 0 → Y 0 is continuous.
(Sufficiency). By the theorem, ran T is closed. Thus ran T = ⊥ [ker T 0 ] = ⊥ {0} = X.
10.2.13 Corollary. T 0 is surjective iff ran T is closed and T is injective. In this case, T
has a continuous inverse T −1 : ran T → Y.
Proof. (Necessity). By the theorem, ran T is closed. Moreover, ker T = ⊥ [ran T 0 ] = {0}.
Therefore, T is injective and so has a continuous inverse T −1 : ran T → Y.
(Sufficiency). By the theorem, ran T 0 is w∗ -closed. Thus ran T 0 = [ker T ]⊥ = {0}⊥ = X 0 .

Exercises
10.14 [↑ 9.6] Show that the set E := {nen : n ∈ N} is bounded in the weak∗ topology of `1 (N) = c000
but is not norm bounded.

10.15 Find an example of a Banach space X for which the unit sphere in X 0 is not weak∗ compact.

10.16 Show that `1 (N) has two distinct weak∗ topologies.

10.17 Formulate and prove the weak∗ analog of Ex. 10.13.

10.18 Let X be a normed space. Show that w∗ = s iff X is finite dimensional.

10.19 [↑ 10.9] Show that the weak∗ topology of X 0 is metrizable iff X is finite dimensional.

10.20 Let X and Y be Banach spaces with Y separable and let T ∈ B(X, Y). Prove that T 0 is
compact iff T 0 carries weak∗ convergent sequences (y0n ) in Y 0 onto norm convergent sequences
(T y0n ) in X 0 .

10.21 Let X be a normed space and E ⊆ X 0 . Show that E is weak∗ dense in X 0 iff for every x 6= 0
there exists x0 ∈ E such that hx, x0 i =
6 0.

10.22 Let Z be a subspace of X 0 . Show that (⊥ Z)⊥ is the w∗ -closure of Z.

10.23 Show that the dual space of a Banach space w∗ -sequentially complete, that is, if hxn , x0 i

0 0 ∗
is Cauchy in K for all x ∈ X , then (xn ) in X weak converges to a member of X. Give an
example to show that the assertion is generally false if X is not complete.

10.24 Let X and Y be Banach spaces. Prove that a linear map T : Y 0 → X 0 is w∗ -w∗ continuous iff
T = S 0 for some S ∈ B(X, Y). Thus w∗ -w∗ continuity implies s-s continuity.

10.25 [↑ §7.4] Prove the analog of 10.1.2 for weak∗ sequential convergence. Conclude the following:
Let X be compact and Hausdorff and let (µn ) be a sequence in M (X). Then (µn ) converges in
the weak∗ topology iff supn kµn k < ∞ and limn µn (E) exists for every E ∈ F.

10.26 Let X be a locally compact Hausdorff space and µ a Radon measure on X. Prove that C0 (X)
is weak∗ dense in L∞ (µ).
Weak Topologies 267

10.27 Prove 10.2.8.

10.28 Show that F := {f dλ : f ∈ L1 [a, b]} is a norm-closed, non-weak∗ closed subspace of M [a, b].

10.29 Let X be a locally compact Hausdorff space, {fi : i ∈ I} ⊆ C0 (X), and {ci : i ∈ RI} ⊆ C. Suppose
for each finite set F ⊆ I there exits µF ∈ Mra (X) with kµF k ≤ 1 such R that fi dµF = ci for
all i ∈ F . Prove that there exists µ ∈ Mra (X) with kµk ≤ 1 such that fi dµ = ci for all i ∈ I.
Formulate more generally.

10.30 Let S be a nonempty set and F a conjugate closed, norm closed subspace of B(S). Show that
the convex balanced hull of {δs : s ∈ S} is weak∗ -dense in the closed unit ball C10 of F 0 .

10.31 Let X be a compact Hausdorff space and P(X) the space of probability measures on X.
Identifying P(X) with a subset of C(X)0 , show that P(X) is the w∗ -closed convex hull of the
set δX of all Dirac measures on X.

10.3 Reflexive Spaces


A normed space X is said to be reflexive if X b = X 00 . The mapping x → x b is then a
linear isometry from X onto X (§8.5). Identifying X with X 00 under this isometry, we see
00

that a reflexive space is a dual space and hence is complete. Moreover, by Alaoglu’s theorem,
the ball C1 in a reflexive space is X is weakly compact.1 Note that the property of reflexivity
is invariant under a change to an equivalent norm. This is a consequence of the fact that
dual spaces are defined topologically and hence remain the same under such a change.

Examples and Basic Properties

Every finite dimensional space X is reflexive since X b and X 00 have the same dimension.
The spaces L (1 < p < ∞) are reflexive, as can be seen by identifying (Lp )0 with Lq and
p

(Lq )0 with Lp , where q is conjugate to p. The space L1 is not reflexive unless it is finite
dimensional. This may be seen as a simple consequence of a later result on extreme points
that implies in the infinite dimensional case that L1 is not a dual space (see 14.4.7(b)). The
spaces c0 and c are not reflexive, as their bidual is `∞ (§8.3). The spaces C(X), X compact,
and L∞ are not reflexive unless they are finite dimensional (Ex. 10.42, 10.43).
The next theorem shows that the property of reflexivity is either common to both X and
X 0 or to neither. The proof is a simple consequence of the following general result.
c0 ⊕ X
10.3.1 Lemma. In any normed space X, X 000 = X b⊥.

Proof. Let ϕ ∈ X 000 and define f ∈ X 0 by hx, f i = hx,


b ϕi. Noting that hx, b fbi = hx, f i, we
see that ϕ − fb ∈ Xb and so X = X
⊥ 000 c0 + X b . To see that the sum is direct, let ϕ be in the

intersection of the spaces, so that ϕ = fb, f ∈ X 0 and fb ∈ X b fbi = 0 for


b ⊥ . Then hx, f i = hx,
all x, hence f = 0 and therefore ϕ = 0.
10.3.2 Theorem. X is reflexive iff X 0 is reflexive.
b ⊥ = {0} iff X
Proof. By the lemma, X 0 is reflexive iff X b = X 00 .

10.3.3 Theorem. If X is reflexive and Y is a closed subspace of X, then Y is reflexive.


1 This property actually characterizes reflexivity. See 10.3.5.
268 Principles of Analysis

Proof. Let ϕ ∈ Y 00 and let T : X 0 → Y 0 be the restriction map T f = f Y . Since T 0 ϕ ∈ X 00 ,
T 0ϕ = x
b for some x ∈ X. We claim that x ∈ Y. Indeed, if f ∈ X 0 and f (Y) = {0}, then
T f = 0 and so
b = hf, T 0 ϕi = hT f, ϕi = h0, ϕi = 0.
hx, f i = hf, xi
An application of 8.5.6 verifies the claim.
For all f ∈ X 0 , we now have

hT f, ϕi = hf, T 0 ϕi = hf, xi
b = hx, f i = hx, T f i = hT f, xi
b .

b
Since T is surjective (8.5.4), ϕ = x.

Weak Compactness and Reflexivity


As noted earlier, if X is reflexive then the closed unit ball C1 of X is weakly compact.
The converse also holds. For the proof we need the following result.
10.3.4 Lemma (Goldstine Theorem). Let X be a normed space. The image C c1 of C1 under
∗ 00 00
b is w -dense in the closed unit ball C1 of X .
the canonical imbedding x → x

Proof. Let K denote the w∗ -closure of C c1 in X 00 . Since C


c1 ⊆ C 00 and C 00 is w∗ -closed,
1 1
K ⊆ C1 . Suppose there exists ϕ ∈ C1 \ K. By 9.3.2 and 10.2.1, we may choose f ∈ X 0 such
00 00

that
b fbi| : kxk ≤ 1} < |hϕ, fbi|.
sup{|hx,
But the left side is kf k while the right side is ≤ kf k. Therefore, K = C100 .
10.3.5 Theorem. Let X be a normed space. Then X is reflexive iff C1 is weakly compact.
Proof. We have already noted the necessity. For the sufficiency, if C1 is weakly compact,
c1 is weak∗ -compact in X 00 . By the lemma, C
then, by definition of the w∗ -topology, C c1 = C 00 ,
1
hence Xb = X 00 .

Theorem 10.3.5 holds if weak compactness is replaced by weak sequential compactness. We


state and prove the necessity here; the sufficiency is a consequence of the Eberlein-S̆mulian
theorem (§14.1).

10.3.6 Theorem. The closed unit ball C1 in a reflexive Banach space X is weakly sequen-
tially compact.
Proof. Let xn ∈ X with kxn k ≤ 1 for all n. The closed linear span Y of (xn ) is separable
b = Y 00 is separable and therefore so is Y 0 (8.5.8). By
and by 10.3.3 is also reflexive. Thus Y
10.2.5, the closed unit ball in Y is weak∗ sequentially compact. By reflexivity, this is simply
00

the assertion that the closed unit ball in Y is weakly sequentially compact. Thus (xn ) has
a weakly convergent subsequence.

Exercises
10.32 Let X and Y be reflexive spaces. Show that T ∈ B(X, Y) is compact iff T (C1 ) is compact.

10.33 Let X be a normed linear space. Show that the weak and weak∗ topologies on X 0 are equal iff
X is reflexive.

10.34 Let X be a Banach space and T : X → X 00 : x → x b the canonical embedding. Show that X is
reflexive iff the adjoint T 0 : X 000 → X 0 is 1-1.
Weak Topologies 269

10.35 Let Y be a subspace of a normed space X. Prove that Y b ⊆ Y ⊥⊥ and that equality holds iff
Y is reflexive. JIf Y is reflexive and F ∈ Y , define G on Y 0 by G(g) = F (e
⊥⊥
g ), where ge is an
extension of g to X with ke g k = kgk.K

10.36 Prove that X is reflexive iff every norm closed subspace of X 0 is weak∗ closed.

10.37 Show that the weak∗ analog of 10.1.6 holds in X 0 iff X is reflexive.

10.38 Let X be reflexive and A ⊆ X 0 . Show that (⊥ A)⊥ is the norm closed linear span of A.

10.39 [↑ 10.10] Show that a reflexive space X is weakly sequentially complete.

10.40 Use 10.3.6 to show that `1 is not reflexive. Conclude that c0 and c are not reflexive.
10.41 Let X be reflexive and x0 ∈ X 0 . Prove that kx0 k = hx, x0 i for some x with kxk = 1. (R.C.
James showed that every space with this property is reflexive.) Give an example of a Banach
space for which the assertion is false.

10.42 Argue as follows to show that L1 [0, 1] P


is not reflexive: Let An = 1/(n + 1), 1/n , n ∈ N,

1 1 ∞ −1
and define T : ` → L [0, 1] by T x = n=1 xn [λ(An )] 1An . Show that T is an isometric
isomorphism into L1 .

10.43 Show that Mra [0, 1], and therefore C[0, 1], is not reflexive by using the following argument:
1 1
P first the version of ` that consists of real sequences and define T : ` → Mra [0, 1] by
Consider
Tx = xn δ1/n . Show that T is an isometry. Then consider the complex case.

*10.4 Uniformly Convex Spaces


Definition and General Properties
A normed linear space X is said to be uniformly convex if for each ε ∈ (0, 2) there
exists δ > 0 such that

kxk ≤ 1, kyk ≤ 1, and kx − yk ≥ ε ⇒ 12 (x + y) ≤ 1 − δ. (10.4)

Geometrically, this says that the midpoints of line segments in the closed unit ball with
lengths bounded away from zero are uniformly distant from the surface. For ease of reference
we let P (x, y) denote antecedent of the implication 10.4 and Q(x, y) the consequent.
A normed space that satisfies the parallelogram law 2
2 2 2 2
kx + yk + kx − yk = 2 kxk + 2 kyk
22
is uniformly convex. √ Indeed, if P (x, y) holds, then kx + yk ≤ 4 − ε , hence Q(x, y)
holds for δ := 1 − 12 4 − ε2 . In particular, L2 (X, F, µ) is uniformly convex. More generally,
Clarkson has shown that Lp (X, F, µ) is uniformly convex for 1 < p < ∞ [8]. (See Ex. 10.44
for the case p ≥ 2.)
Here is a useful sequential characterization of uniform convexity:

10.4.1 Proposition. A normed linear space X is uniformly convex iff for any sequences
(xn ) and (yn ) in C1 with k 21 (xn + yn )k → 1 it follows that kxn − yn k → 0.
2 Spaces that satisfy the parallelogram law are called inner product spaces. These are discussed in detail

in the next chapter.


270 Principles of Analysis

Proof. Let X be uniformly convex with sequences in C1 such that kxn − yn k 6→ 0. Then
there exist ε > 0 such that kxn − yn k ≥ ε for infinitely many n. By uniform convexity,
there exists δ > 0 such that k 12 (xn + yn )k ≤ 1 − δ for infinitely many n. Therefore,
k 12 (xn + yn )k 6→ 1.
ε ∈ (0, 2) and
Now suppose that X is not uniformly convex. Then there exists an sequences
(xn ) and (yn ) with kxn k ≤ 1, kyn k ≤ 1, kxn − yn k ≥ ε, and 12 (xn + yn ) > 1 − 1/n.
It follows that 12 (xn + yn ) → 1. Since kxn − yn k 6→ 0, the sequential criterion fails.

Connections with Strict Convexity


It follows directly from Ex. 8.20(ii) that a uniformly convex space is strictly convex. The
following example shows that the two notions are distinct.
10.4.2 Example. (A strictly convex non-uniformly convex Banach space). Consider the
linear space `1 with the norm kxk := kxk1 + kxk2 . Since kxk1 ≤ kxk2 , k·k is equivalent
to k·k1 . Recalling that (`1 , k·k1 ) is not reflexive and that reflexivity is preserved under a
change to an equivalent norm, we see that (`1 , k·k) is not reflexive. Thus by 10.4.6 below,
(`1 , k·k) is not uniformly convex.
To see that (`1 , k·k) is strictly convex let x = 6 y with kxk = kyk = 1. We show
that kx + yk < 2. Set aj = |xj |, bj = |yj |, a = (a1 , a2 , . . .), and b = (b1 , b2 , . . .). Then
kak = kxk = 1, kbk = kyk = 1, and kx + yk ≤ ka + bk. We consider two cases.
Suppose that a 6= b. Since kak1 = 1 − kak2 , kbk1 = 1 − kbk2 , and aj , bj ≥ 0, we have

ka + bk = kak1 + kbk1 + ka + bk2 = 2 − kak2 − kbk2 + ka + bk2 < 2,

the last inequality by the strict convexity of `2 . Therefore, kx + yk < 2.


Now suppose a = b, that is, |xj | = |yj | for all j. Since xk 6= yk for some k, it follows
that |xk + yk | < |xk | + |yk |. Therefore kx + yk < ka + bk ≤ 2. ♦
In the finite dimensional case, the two notions are equivalent:
10.4.3 Proposition. A finite dimensional strictly convex space is uniformly convex.
Proof. Suppose X is not uniformly convex. By 10.4.1, we may choose sequences (xn ) and
(yn ) in C1 such that k 12 (xn + yn )k → 1 and kxn − yn k 6→ 0. By compactness of C1 we
may assume that xn → x and yn → y. Then k 12 (x + y)k = 1 and x 6= y, so X is not
strictly convex.

Weak and Strong Convergence In Uniformly Convex Spaces


The next theorem makes an important connection between weak and strong sequential
convergence in uniformly convex spaces. For the proof we need
10.4.4 Lemma.
Let X be uniformly convex. If (yn ) is a sequence in C1 such that
limm,n→∞ 21 (yn + ym ) = 1, then (yn ) is Cauchy.
Proof. If (yn ) is not Cauchy, then there exists an ε > 0 and strictly increasing sequences of
indices nk and mk such that kynk − ymk k ≥ ε for all k. But this contradicts 10.4.1, since
k 12 (ynk + ymk )k → 1.
10.4.5 Theorem (Radon-Riesz). Let (xn ) be a sequence in a uniformly convex Banach
s w
space X. Then xn → x ∈ X iff xn → x and kxn k → kxk.
Proof. The necessity is clear. For the sufficiency we may take x 6= 0, since otherwise there
is nothing to prove. Then kxn k = 6 0 for all large n, hence we may assume that kxn k = 6 0
Weak Topologies 271
−1 −1
for all n. Set yn = kxn k xn and y = kxk x and note that the hypotheses imply that
w
yn → y. Thus for any f ∈ X 0 ,
1

lim f 2 (ym + yn ) = f (y). (†)
m,n→∞

1= kf k = 1 (8.5.6).
Choose f so that f (y) Then f 12 (ym +yn ) ≤ 12 (ym + yn ) ≤ 1, hence
from (†), limm,n→∞ 2 (ym + yn ) = 1. By the lemma, (yn ) is Cauchy and so converges in
norm to y. From the inequality

kxn − xk = kxn kyn − kxky ≤ kxn k − kxk kyn k + kxk kyn − yk
s
we see that xn → x.

Connection with Reflexivity


Here is the main result of the section. It is remarkable in the sense that uniform convexity
is a purely geometric notion, defined exclusively as a property of the norm, while reflexivity
may be viewed, via say 10.3.5, as being a topological property.
10.4.6 Theorem (Milman-Pettis). A uniformly convex Banach space is reflexive.
Proof. We follow [39]. Suppose for a contradiction that X is uniformly convex but not
reflexive. Let C1 and C100 denote the closed unit balls in X and X 00 , respectively. By
assumption, Cc1 is a proper subset of C 00 , hence since C
c1 is closed there exists x00 ∈ C 00 at a
1 1
positive distance from Cc1 , that is,

kx00 − xk
b ≥ 2ε for all x ∈ C1 . (†)
Define n o
δ = δ(ε) := inf 1 − k 21 (x + y)k : x, y ∈ C1 , kx − yk ≥ ε .
It follows easily from 10.4.1 that δ(ε) > 0. Since kx00 k = 1, we may choose f ∈ C10 such that
|hf, x00 i − 1| < δ/2. Thus x00 is in the weak∗ open set

V := y00 ∈ X 00 : |hf, y00 i − 1| < δ/2 .
b1 (approximate hf, x00 i by hf, xi.)
By 10.3.4, x00 is in the weak∗ closure of V ∩ C b Now, for
b b
any x and y in V ∩ C1 ,b
b − 1) + (hf, yi
|hx + y, f i| = |2 + (hf, xi b − 1)| ≥ 2 − δ,
hence kx + yk ≥ 2 − δ and so 1 − k 21 (x + y)k ≤ 12 δ < δ. From the definition of δ,
kx − yk < ε. Thus V ∩ C c1 ⊆ xb + εC1 ‘00 . Since x
b + εC1 ‘00 is weak∗ closed and since x00 is in
∗ c1 , we conclude that x00 ∈ x
the weak closure of V ∩ C b + εC1 ‘00 . But this contradicts (†).

Exercises
10.44 Verify steps (a)–(d) below and then use (d) to show that Lp is uniformly convex for p ≥ 2.
(a) For c > 0, the function f (p) = (1 + cp )1/p is strictly decreasing in p on [2, ∞).
(b) (ap + bp )1/p ≤ (a2 + b2 )1/2 (a ≥ 0, b ≥ 0, p ≥ 2).
(c) c2 + d2 ≤ 21−2/p (cp + dp )2/p (c ≥ 0, d ≥ 0). JFor p > 2, use Hölder’s inequality.K
(d) |s + t|p + |s − t|p ≤ 2p−1 (|s|p + |t|p ) (s, t ∈ R, p ≥ 2).
10.45 Let X be a uniformly convex Banach space and f 6= 0 ∈ X 0 . Show that there exists a unique
x ∈ S1 such that kf k = f (x). JIt suffices to consider the case kf k = 1. Let xn ∈ S1 and
f (xn ) → 1. Use 10.4.4 to show that (xn ) is Cauchy. (One may also use 10.41.) For uniqueness,
suppose also that f (y) = 1 for some y ∈ S1 with kx − yk > 0 and consider 12 (x + y).K
Chapter 11
Hilbert Spaces

A Hilbert space is a Banach space whose norm is derived from an inner product. This
feature endows Hilbert spaces with rich geometric structure that accounts for the broad
applicability of the subject to areas such as harmonic analysis, differential equations, and
quantum mechanics. In this chapter we examine the structure of Hilbert spaces. The next
chapter treats operators on these spaces.

11.1 General Principles


Sesquilinear Forms
Let X and Y be vector spaces over K. A sesquilinear1 functional on X × Y is a
mapping B : X × Y → K that is linear in the first variable and conjugate linear in the
second; that is, for all x, xj , y, yj ∈ Y, and cj ∈ K,
B(c1 x1 + c2 x2 , y) = c1 B(x1 , y) + c2 B(x2 , y), and
B(x, c1 y1 + c2 y2 ) = c1 B(x, y1 ) + c2 B(x, y2 ).
Of course, if K = R, then B is simply a bilinear functional. The set of sesquilinear functionals
is easily seen to a linear space over K under the usual pointwise operations.
A sesquilinear functional B on X × X is called a sesquilinear form on X. Such a
mapping is said to be Hermitian or self-adjoint2 if B(x, y) = B(y, x), and positive if
B(x, x) ≥ 0. The set of Hermitian sesquilinear forms is easily seen to a real linear subspace
of the vector space of all sesquilinear forms.
The following result is useful in reducing some proofs involving sesquilinear forms to
simpler arguments.
11.1.1 Theorem (Polarization Identities). Let B be a sesquilinear form on X.
(a) If K = R, then 4B(x, y) = B(x + y, x + y) − B(x − y, x − y).
P3
(b) If K = C, then 4B(x, y) = k=0 ik B(x + ik y, x + ik y).
Proof. For (b) we have
B(x + ik y, x + ik y) = B(x, x) + ik B(y, x) + ik B(x, y) + +B(y, y).
Multiplying by ik and summing, we see that
3
X 3
X k 
ik B(x + ik y, x + ik y) = 4B(x, y) + i B(x, x) + (−1)k B(y, x) + ik B(y, y) .
k=0 k=0

1 From the Latin “semis qui” meaning one and a half.


2 For K = R, the term symmetric is also used.

273
274 Principles of Analysis
P3 P3
Since k=0 ik = 0 and k=0 (−1)
k
= 0, the desired formula follows.
11.1.2 Corollary. If K = C, then a sesquilinear form on X is Hermitian iff B(x, x) is
real for all x.
Proof. The necessity is clear. For the sufficiency, factor out ik to write the general term of
k
the sum in (b) as ik B(i k x + y, i x + y). Taking conjugates in (b) and using the hypothesis
we have
3
X 3
X
k k k
4B(x, y) = i B(i x + y, i x + y) = ik B(ik x + y, ik x + y) = 4B(y, x).
k=0 k=0

Semi-Inner-Product Spaces
For the remainder of the chapter, we use the notation (· | ·) for positive Hermitian
sesquilinear forms. A vector space X over K equipped with such a form is called a semi-
inner-product space. Define an associated function k·k : X → [0, ∞) by
p
kxk = (x | x), x ∈ X. (11.1)
The polarization identities may then be written as
3
X
2 2 2
4 (x | y) = kx + yk − kx − yk (K = R) and 4 (x | y) = ik x + ik y (K = C).
k=0

Moreover, a direct calculation yields the parallelogram law


2 2 2 2
kx + yk + kx − yk = 2 kxk + 2 kyk . (11.2)
The following inequality is one of the most important tools in Hilbert space theory. As we
shall see, it is the essential ingredient in frequent calculations involving operators on Hilbert
spaces.
11.1.3 CBS Inequality (Cauchy, Bunyakovsky, Schwarz). Let X be a semi-inner-product
space. Then
| (x | y) | ≤ kxk kyk , x, y ∈ X. (11.3)
Equality holds iff ktx + αyk = 0 for some t ∈ R and α ∈ K.
Proof. We may assume (x | y) 6= 0. Set α = (x | y) /| (x | y) |. For any t ∈ R,
2 2
0 ≤ ktx + αyk = (tx + αy | tx + αy) = t2 kxk + tα (x | y) + tα (y | x) + |α|2 | kyk
2 2
= t2 kxk + 2t| (x | y) | + kyk =: at2 + bt + c. (†)
Since the quadratic in (†) is never negative, it has at most one zero, hence the discriminant
b2 − 4ac = 4| (x | y) |2 − 4 kxk kyk cannot be positive. This establishes (11.3).
Equality holds in (11.3) iff the discriminant b2 − 4ac is zero. Letting t be the unique zero
of at2 + bt + c, we see that ktx + αyk = 0.
11.1.4 Corollary. k·k is a seminorm.
Proof. The only property that is not immediately evident is the triangle inequality. For this
we use the CBS inequality as follows:
2 2 2 2 2
kx + yk = (x + y | x + y) = kxk + kyk + 2Re (x | y) ≤ kxk + kyk + 2| (x | y) |
2 2 2
≤ kxk + kyk + 2 kxk kyk = kxk + kyk
Taking square roots yields the desired inequality.
Hilbert Spaces 275

11.1.5 Corollary. kxk = sup{| (x | y) | : kyk ≤ 1}.


Proof. Let s denote the supremum. By the CBS inequality, s ≤ kxk, hence the assertion holds
−1
kxk = 0. If kxk =
6 0, take y = kxk x in the definition of s to get kxk = | (x | y) | ≤ s.

Inner Product Spaces. Hilbert Spaces


A positive Hermitian sesquilinear form on X whose associated seminorm (11.1) is a norm
is called an inner product. A vector space equipped with an inner product is called an
inner product space. An inner product space that is complete with respect to the induced
norm is called a Hilbert space.
Note that in an inner product space, the CBS inequality implies that (x | y) is continuous
in (x, y) in the norm topology, as may be seen by writing

(x | y) − (a | b) ≤ (x − a | y) + (a | y − b)

A norm on a vector space X is an inner product norm if there is an inner product


that induces the norm via (11.1). Such a norm must satisfy the parallelogram law (11.2).
The converse holds:
11.1.6 Theorem. A norm k·k on a vector space X over K that satisfies the parallelogram
law is an inner product norm.
Proof. (For the case K = C): Define (x | y) by
4
1X k 2
(x | y) = i x + i k y . (†)
4
k=1

Then
2 2 2 2 
4(x | y) = kx + yk − kx − yk + i kx + i yk − kx − i yk
2 2 2 2 
= ky + xk − ky − xk − i ky + i xk − ky − i xk
= 4(y | x).

In particular, from the first equality we have

(−x | x) = − (x | x) , (ix | x) = i (x | x) and


h i
2 2 2 2
(x | x) = 41 k2xk + i k(1 + i )xk − k(1 − i )xk = kxk .

To prove the additive property of inner products, use (†) to write


2 2 2 2 
4 (x ± y | z) = kx ± y + zk − kx ± y − zk + i kx ± y + i zk − kx ± y − i zk

and then add to obtain

4 (x + y | z) + 4 (x − y | z)
h i h i
2 2 2 2
= k(x + z) + yk + k(x + z) − yk − k(x − z) + yk + k(x − z) − yk
h i h i
2 2 2 2
+ i k(x + i z) + yk + k(x + i z) − yk − i k(x − i z) + yk + k(x − i z) − yk .

Applying the parallelogram identity to each bracketed expression reduces the right side to
h i h i
2 2 2 2
2 kx + zk − kx − zk + 2i kx + izk − kx − i zk = 8 (x | z) .
276 Principles of Analysis

We now have
(x + y | z) + (x − y | z) = 2 (x | z) . (‡)

Taking x = y and noting that (0 | z) = 0, we have (2y | z) = 2 (y | z) hence 2 12 y | z =
(y | z) for all y. Setting x + y = u and x − y = v in (‡) yields

(u | z) + (v | z) = 2 12 (u + v) | z = (u + v | z) .

This, together with (y | x) = (x | y), proves the biadditivity of (· | ·).


By induction, (ny | z) = n (y | z) for all n ∈ N, and since (−y | z) = − (y | z), the
equality holds for all integers n. Replacing y by (1/n)y yields ((1/n)y | z) = (1/n) (y | z),
hence

((m/n)y | z) = m ((1/n)y | z) = (m/n) (y | z) for all m, n ∈ Z, n 6= 0.

Noting that (y | z) is continuous in y we see that (ay | z) = a (y | z) for all real


a. Combining this with (i y | z) = i (y | z) we obtain (i by | z) = i b (y | z) and so
((a + i b)y | z) = (a + i b) (y | z), for all a, b ∈ R, completing the proof.
11.1.7 Corollary. Let X be a Banach space and Y a dense subspace of X with an inner
product that induces the norm on Y. Then X is a Hilbert space. In particular, the norm
completion of an inner product space is a Hilbert space.
Proof. The norm of X satisfies the parallelogram law on the dense subspace Y. Since the
norm and vector operations are continuous, the law holds on X.
11.1.8 Examples.
(a) The space Kd is a Hilbert space with respect to the Euclidean inner product
d
X
(x | y) = xj y j , x := (x1 , . . . , xd ), y := (y1 , . . . , yd ).
j=1

The associated norm is the Euclidean norm on Kd . Note that the parallelogram law fails for
the norms k·k1 and k·k∞ on Kd , hence these are not inner product norms.
(b) Let A := [aij ]d×d be a matrix with entries in Kd that satisfies

• aji = aij (A is Hermitian or self-adjoint), and


d
X
• aij xi xj > 0 for all (x1 , . . . , xd ) 6= 0 (A is positive definite).
j=1

Then
d X
X d
(x | y) := aij xj yj , x := (x1 , . . . , xd ), y := (y1 , . . . , yd ),
i=1 j=1

defines an inner product on Kd . One obtains the Euclidean inner product of (a) by taking A
to be the identity matrix.
(c) The trace tr(A) of a square matrix A is the sum of the diagonal elements of A.
Clearly, tr(·) is linear and tr(A∗ ) = tr(A), where A∗ is the conjugate transpose of A. Let
Mmn = Mmn (C) denote the vector space over C of m×n complex matrices. For A, B ∈ Mmn ,
(A | B) = tr(B ∗ A) defines an inner product on Mmn called the trace inner product.
Hilbert Spaces 277

(d) The space L2 (X, F, µ) with the L2 norm is a Hilbert space under the inner product
Z
(f | g) = f g dµ, f, g ∈ L2 .

(As usual, we identify functions equal a.e.) In particular, `2 is a Hilbert space. On the other
hand, for p 6= 2 the Lp norm is not induced by an inner product, since the parallelogram law
fails (Ex. 11.3).
(e) Let U be open in C and let A2 (U ) be the space of functions in L2 (U ) that are analytic
on U . Then A2 (U ) is closed in the L2 -norm and hence is a Hilbert space. To see this, we
first establish the formula
Z
1
f (z) = 2 f (w) dλ2 (w) z ∈ U, Cr (z) ⊆ U, f ∈ A2 (U ). (†)
πr Cr (z)

In fact, by radial integration (§3.6) we have


Z Z Z r Z 2π
2 2
f (w) dλ (w) = f (w + z) dλ (w) = t f (teiθ + z) dθ dt,
Cr (z) Cr (0) 0 0
P∞
and if f (w) = n=0 cn (w − z)n we can evaluate the inner integral by integrating term by
term: Z 2π Z 2π

X
f (teiθ + z) dθ = cn t n einθ dθ = 2πc0 = 2πf (z).
0 n=0 0

Thus Z Z r
f (w) dλ2 (w) = 2πf (z) t dt = πr2 f (z),
Cr (z) 0

which is (†).
1 c
Now let K be a compact subset of U and let r := 2 dist(K, U ). If z ∈ K, then
Cr (z) ⊆ U , hence from (†) and the CBS inequality,
Z Z !1/2
1 2 1 2 2 1
|f (z)| ≤ 2 |f (w)| dλ (w) ≤ √ |f (w)| dλ (w) ≤ √ kf k2 .
πr Cr (z) πr Cr (z) πr

By considering a finite cover of K by disks Cr (z), we see from the above inequality that if
fn ∈ A2 (U ) and kfm − fn k2 → 0, then (fn ) is uniformly Cauchy on compact subsets of U
and therefore converges uniformly (and in L2 ) to a continuous function f . Thus f is analytic
and so A2 (U ) is closed in L2 (U ). ♦

Isomorphisms of Hilbert Spaces


Hereafter, we use the notation H and K for Hilbert spaces. If there is a possibility of
ambiguity, we use a subscript for the inner product, as in (· | ·)H . An isomorphism of
H and K (both over K) is a bijective linear mapping T : H → K that preserves inner
products:
(T x | T y) = (x | y) , x, y ∈ H. (11.4)
If an isomorphism exists, then H and K are said to be isomorphic (as Hilbert spaces).
Thus isomorphic Hilbert spaces are “structurally” identical.
Taking x = y in (11.4), we see that an isomorphism of Hilbert spaces is an isometry.
The converse is true: every isometry of H onto K is a Hilbert space isomorphism. This
follows directly from the polarization identity.
278 Principles of Analysis

Exercises
11.1 Verify the parallelogram law.

11.2 Show that the uniform norm on C[0, 1] is not an inner product norm.

11.3 Show that for 1 ≤ p < ∞, the Lp norm on C[0, 1] is not an inner product norm unless p = 2.

11.4 (Apollonius’ identity). Prove that in an inner product space,


2
kx − zk2 + ky − zk2 = 1 kx − yk2 + 2 1 (x + y) − z .

2 2


11.5 Let X, (· | ·) be a semi-inner-product space with associated seminorm k·k. By Ex. 8.56,
Y := {x : kxk = 0} is a subspace of X. Let Q : X → X/Y denote the quotient map. Show
that hQx | Qyi := (x | y) is a well-defined inner product on X/Y.

11.6 Let H denote the linear space of absolutely continuous functions f on [0, 1] such that f (0) = 0
and f 0 ∈ L2 [0, 1]. Show that (f | g) = f 0 (t)g 0 (t) dt defines an inner product on H relative to
R

which H is a Hilbert space.

11.7 Prove directly (without using uniform convexity) that an inner product space is strictly convex.

11.8 Let U : H → K be a bijection that preserves inner products: (U x | U y) = (x | y) for all


x, y ∈ H. Show that U must be linear.

11.2 Orthogonality

Throughout this section, H and K denote Hilbert spaces over K.

The central feature of a Hilbert space that accounts for its rich structure is the concept of
orthogonality. This leads to the notions of orthogonal complement and orthonormal bases,
considered in this section and the next.

Orthogonal Complements
Vectors x and y in H are said to be orthogonal, written x ⊥ y, if (x | y) = 0. The
following result on orthogonality generalizes the classical Pythagorean theorem.
11.2.1 Proposition. Let x, y ∈ H.
2 2 2
(a) If K = R, then x ⊥ y iff kx + yk = kxk + kyk .
2 2 2
(b) If K = C, then x ⊥ y iff kαx + βyk = kxk + kyk for all α, β ∈ T.

Proof. Part (a) follows from the expansion


2 2 2
kx + yk = (x + y | x + y) = kxk + kyk + 2 (x | y) .

For part (b), a similar calculation yields


2 2 2
kαx + βyk = (αx + βy | αx + βy) = kxk + kyk + αβ (x | y) + αβ(x | y)

for all α, β ∈ T. Thus the norm identity in (b) is equivalent to

αβ (x | y) + αβ(x | y) = 0 for all α, β ∈ T. (†)


Hilbert Spaces 279

This is trivially satisfied if x ⊥ y. Conversely, if (†) holds take α = β = 1 to obtain


(x | y) + (x | y) = 0, and take α = i, β = 1 to obtain i (x | y) − (x | y) = 0. Conclude
that (x | y) = 0.
By induction, we have
Pn 2 Pn 2
11.2.2 Corollary. If xj ∈ H and xj ⊥ xk for j 6= k, then j=1 xj = j=1 kxj k .
If x ⊥ y for all y in a subset S of H, then x is said to be orthogonal to S, written
x ⊥ S. The collection of all vectors orthogonal to S is called the orthogonal complement
of S and is denoted by S ⊥ :

S ⊥ := {x ∈ H : (x | y) = 0 ∀ y ∈ S}.

A direct argument shows that S ⊥ is a closed linear subspace of H.


The theorem below regarding orthogonal complements is of fundamental importance in
Hilbert space theory. For the proof we need the following lemma.
11.2.3 Lemma. A nonempty, closed, convex subset K of H has a unique member of
smallest norm.
Proof. Let d := inf{kxk : x ∈ K}. We claim that for x, y ∈ K,
2 2 2
kx − yk ≤ 2 kxk + 2 kyk − 4d2 . (†)

Indeed, since K is convex, 12 (x + y) ∈ K, hence


2 2
4d2 ≤ (x + y | x + y) = kxk + kyk + 2Re (x | y) ,

and so
2 2 2 2 2
kx − yk = kxk + kyk − 2Re (x | y) ≤ 2 kxk + 2 kyk − 4d2 .
2 2
Now let xn ∈ K and kxn k → d. Then 2 kxn k + 2 kxm k → 2d2 , hence from (†),
2
kxn − xm k → 0. The limit x := limn xn is then a member of K with smallest norm.
2
If y ∈ K also has smallest norm, then by (†), kx − yk ≤ 2d2 + 2d2 − 4d2 = 0, hence
x = y.
11.2.4 Theorem. If M is a closed subspace of H, then H = M ⊕ M ⊥ . Moreover, if
x = m + m⊥ , then m is the unique member of M closest to x.
Proof. For a fixed x ∈ H, there exists, by the lemma, a unique member y of x + M such
that
kyk ≤ kx + mk for all m ∈ M.
We show that y ∈ M ⊥ . Let m ∈ M. Since y + tm = x + (y − x + tm) ∈ x + M, the
function
2 2 2
f (t) := ky + tmk = kyk + 2t Re (y | m) + t2 kmk
2
has minimum value kyk = f (0). It follows that f 0 (0) = 0 and so Re (y | m) = 0. Replacing
m by im yields Im (y | m) = 0. Therefore, (y | m) = 0, hence y ∈ M ⊥ .
We may now write x = (x − y) + y, which shows that H = M + M ⊥ . The sum is direct
since z ∈ M ∩ M ⊥ ⇒ (z | z) = 0. Since

k(x − y) − xk = kyk ≤ km − xk for all m ∈ M,

x − y is the unique member of M closest to x.


280 Principles of Analysis

11.2.5 Corollary. If M is a closed subspace of H, then M ⊥⊥ = M.


Proof. Let x ∈ M ⊥⊥ and write x = y + z, y ∈ M, z ∈ M ⊥ . Then

0 = (x | z) = (y | z) + (z | z) = (z | z) ,

hence z = 0 and so x = y ∈ M. Therefore, M ⊥⊥ ⊆ M. The reverse inclusion is clear.

The Riesz Representation Theorem


For y ∈ H define a mapping fy on H by

fy (x) := (x | y)

Then fy is a linear functional with fy = kyk (11.1.5). Furthermore, fay+bz = afy + bfz .
Thus the map y → fy is a conjugate linear isometry from H into H 0 . The next theorem
asserts that the mapping is surjective.
11.2.6 Riesz Representation Theorem. Every f ∈ H 0 is of the form fy for some
y ∈ H.
Proof. We may assume that f is not the zero functional. Then H = ker f ⊕ (ker f )⊥ where
2
(ker f )⊥ has dimension one. Choose z ∈ (ker f )⊥ with f (z) = 1 and set a = 1/ kzk . For
x ∈ H we may write x = u + cz, where u ∈ ker f , hence

f (x) = cf (z) = c = (cz | az) = (u + cz | az) = faz (x).

Therefore, f = faz .
Recall that a net (xα ) in a normed space X converges weakly to x ∈ X if hxα , f i → hx, f i
for all f ∈ X 0 . Thus from the Riesz representation theorem we have
11.2.7 Corollary. A net (xα ) in H converges weakly to x iff (xα | y) → (x | y) ∀ y ∈ H.
11.2.8 Corollary. A Hilbert space is reflexive.
Proof. The dual space H 0 is a Hilbert space under the inner product

fx | fy = (y | x) .

(The transposition of the elements on the right side is necessary to compensate for the
conjugate linearity of the mapping x → fx .) Let ϕ ∈ H 00 . By the Riesz representation
theorem applied to H 0 , there exists fy such that for all x,

b x ).
ϕ(fx ) = fx | fy = (y | x) = fx (y) = y(f

c
b and so H 00 = H.
Therefore, ϕ = y

Exercises
11.9 Let S be a subset of a Hilbert space H. Use 11.2.4 to prove that S ⊥⊥ = cl span(S).

11.10 Let X and Y be subspaces of H. Show that (X + Y)⊥ = X ⊥ ∩ Y ⊥ .

11.11 Prove that x ⊥ y iff kxk ≤ kx + cyk for all c ∈ C.


w
11.12 Show that in a Hilbert space, if xn → x and kxn k → kxk then kxn − xk → 0. Find an
example of a Banach space for which the assertion is false.
Hilbert Spaces 281

11.13 A function f ∈ L2 [−1, 1] is odd (even) if f (−t) = −f (t) (f (−t) = f (t)) for all a.a. t ∈ [−1, 1].
Let O (E) denote the linear space of odd (even) functions. Show that each space is the orthogonal
complement of the other and that L2 [−1, 1] = O ⊕ E.
11.14 For each linear functional F on `2 (N), find a function g such that F (f ) = (f | g) for all f .
(a) F (f ) = m (b) F (f ) = f (2) − f (1). (c) F (f ) = ∞ −n
P P
j=1 f (j). n=1 2 [f (n) − f (n + 1)].

11.15 Find an example of an inner product space X and a continuous linear functional f ∈ X 0 such
that no vector y ∈ X exists for which f (x) = (x | y) for all x ∈ X.
11.16 Let H be the Hilbert space defined in Ex. 11.6. Let F be the evaluation functional F (f ) = f (1/2).
Find a function g such that F (f ) = (f | g) for all f ∈ H.
11.17 Let T ∈ B(H) be weak-norm continuous. Show that ran T is finite dimensional. JThere exist
xj ∈ H and ε > 0 such that | (x | xj ) | < ε (1 ≤ j ≤ n) implies kT xk < 1. K
11.18 [↑ 8.34] Let Tr and T` denote the right and left shift operators on `2 and let x ∈ H. Compute
the weak limits limn T`n x and limn Trn x

11.3 Orthonormal Bases

Throughout this section, H and K denote Hilbert spaces over K.


A subset E of H is said to be orthonormal if for all e, f ∈ E
(
1 if e = f,
(e | f) =
0 otherwise.

The scalars (x | e) are called the Fourier coefficients of x with respect to E. An


orthonormal basis (or, simply, basis) is an orthonormal set whose span is dense in H.
We discuss the role orthonormal bases in Fourier series on R below and in a more general
setting in Chapter 16.
11.3.1 Proposition. An orthonormal set E is a basis iff E ⊥ = {0}.
Proof. Let M := cl span(E). Since H = M ⊕ M ⊥ and M ⊥ = E ⊥ , the assertion follows.
11.3.2 Proposition. Every orthonormal subset F of H is contained in an orthonormal
basis. In particular, every (nontrivial) Hilbert space has an orthonormal basis.
Proof. Order the family E of all orthonormal subsets of H containing F upward by inclusion.
The union of a chain in E is clearly orthonormal, hence is an upper bound for the chain.
−1
By Zorn’s lemma, E has a maximal element. If x ∈ E ⊥ and x 6= 0, then E ∪ {kxk x}
is an orthonormal set properly containing E, contradicting maximality. Therefore, E is an
orthonormal basis, proving the first assertion of the theorem. The second assertion follows
from the first by taking F to consist of a single vector of norm one.
The next results depend on the material in §0.4 on unordered sums.
11.3.3 Theorem (Bessel’s Inequality). Let E be an orthonormal set and x ∈ H. Then
X 2
| (x | e) |2 ≤ kxk ,
e∈E

where at most countably many of the terms in the sum are nonzero.
282 Principles of Analysis
P
Proof. Let F ⊆ E be finite and set y = e∈F (x | e) e. By orthonormality and sesquilin-
earity, X
(y | y) = (x | e) (x | e) = (x | y) ,
e∈F

hence (x − y | y) = 0. Thus
2 2 2 2 2
X
kxk = kx − y + yk = kx − yk + kyk ≥ kyk = | (x | e) |2 .
e∈F

Since F was arbitrary, the assertions follow from 0.4.5.


11.3.4 Theorem. Let E be an orthonormal set in H. The following are equivalent:
(a) E is a basis.
P
(b) For each x ∈ H, x = (x | e) e (Fourier expansion of x).
e∈E
P
(c) For each pair x, y ∈ H, (x | y) = e∈E (x | e) (e | y) .
2 P
(d) For each x ∈ H, kxk = e∈E | (x | e) |2 (Parseval’s identity).

In (b) – (d), at most countably many of the Fourier coefficients (x | e) are nonzero.
Proof. (a) ⇒ (b): Denote the nonzero Fourier coefficients of x by (x | en ). We show that

X
x= (x | en ) en .
n=1

By the Pythagorean relation,


m 2 m
X X

(x | ek ) ek = | (x | ek ) |2 , m > n,

k=n k=n

which,
Pn by Bessel’s inequality, tends to 0 as n → ∞. Therefore, the sequence of partial sums
k=1 (x | ek ) ek is Cauchy and so converges to some y. It remains to show that y = x.
Now, for any e ∈ E, by continuity of the inner product we have
n
X
(y | e) = lim (x | ek ) (ek | e) .
n
k=1

If e = em for some m, then the right side is (x | em ). If e 6= em for all m, then both
(x | e) and (y | e) are zero. Thus (x − y | e) = 0 for all e ∈ E. Since E is a basis, x = y.
(b) ⇒ (c): Using a common sequence (en ) for x and y we have

X ∞
X
x= (x | en ) en and y = (y | en ) en .
n=1 n=1

Now observe that


Xn Xn  X n
X

(x | ej ) ej (y | ek ) ek = (x | ej ) (ek | y) (ej | ek ) = (x | ej ) (ej | y) .
j=1 k=1 j,k j=1

Letting n → ∞ and using the continuity of the inner product yields (c).
2
(d) ⇒ (a): Then kxk = 0 for every x ∈ E ⊥ , hence E is a basis.
Hilbert Spaces 283

The Dimension of a Hilbert Space


The notion of dimension rests on the following result.
11.3.5 Proposition. All bases in a Hilbert space have the same cardinality.
Proof. We may assume that H is not finite dimensional. Let E and F be bases with
cardinality |E| and |F|, respectively. It suffices
S to show that |E| ≤ |F|. For each f ∈ F,
define Ef = {e ∈ E : (e | f) 6= 0}. Then E = f∈F Ef , and since each Ef is countable, the
cardinality of the union is ≤ |F| · ℵ0 = |F|. Therefore, |E| ≤ |F|.
The cardinality of a basis in a Hilbert space H is called the dimension of H. The
following corollary shows that Hilbert spaces with the same dimension are “structurally
identical.”
11.3.6 Corollary. H and K have the same dimension iff they are isomorphic as Hilbert
spaces.
Proof. Let E and F be bases for H and K, respectively. If a Hilbert space isomorphism
T : H → K exists, then T (E) is an orthonormal set in K, hence |E| = |T (E)| ≤ |F|. But
T −1 is also a Hilbert space isomorphism, hence |F| ≤ |E|. Therefore, |E| = |F|.
Conversely, let E and F have the same cardinality and let Ψ : E → F be any bijection.
For x ∈ H, define T : H → K by
X
Tx = (x | e) Ψ(e).
e∈E

By Bessel’s inequality, T x is well-defined, and at most countably many terms are nonzero.
By sesquilinearity and continuity of the inner product,
X X  X

(T x | T y) = (x | e) Ψ(e) e ) Ψ(e
(y | e e) = (x | e) (e | y) = (x | y) .
e∈E e ∈E
e e∈E

The Gram-Schmidt Process


The members of an orthonormal set are easily seen to be linearly independent. Indeed,
if the vectors e1 , . . . , en are orthonormal and c1 e1 + · · · + cn en = 0, then taking inner
products of both sides of the equation with ej shows that cj = 0. The following proposition
is a converse of sorts: it allows a finite set of linearly independent vectors to be replaced by
an orthonormal set without changing the span. The technique in the proof is known as the
Gram-Schmidt process.
11.3.7 Proposition. Let H be an inner product space and A := {x1 , x2 , . . .} a linearly
independent set. Then there exist an orthonormal set E := {e1 , e2 , . . .} such that

span{x1 , . . . , xn } = span{e1 , . . . , en } for all n.

Proof. Set An := {x1 , . . . , xn }. We construct E inductively. Define e1 = x1 / kx1 k. Assume


that the desired set En := {e1 , . . . , en } has been constructed. Then xn+1 6∈ span En , so the
vector
Xn
yn+1 := xn+1 − (xn+1 | ej ) ej
j=1

is not zero. Define en+1 = yn+1 / kyn+1 k. Then en+1 ∈ span An+1 , (en+1 | ek ) = 0 for
k ≤ n, and span An+1 = span{e1 , . . . , en+1 }.
284 Principles of Analysis

For example, applying the Gram-Schmidt process to the set of monomials xn = tn (n ≥ 0)


on [−1, 1] yields the Legendre polynomials
 1/2
2n + 1 1 dn 2
en (t) := (t − 1)n , n = 0, 1, . . . .
2 2n n! dtn

For this and other interesting examples of orthonormal bases on L2 [a, b], the reader is referred
to [28].
Most infinite dimensional Hilbert spaces one encounters in applications are separable.
Analysis of such spaces is somewhat easier because of the following result:
11.3.8 Proposition. If a Hilbert space H is separable, then it has a countable basis.
Proof. We may assume that H is not finite dimensional. Let (xn ) be a dense sequence of
nonzero vectors in H. If x2 is a multiple of x1 , we may remove it without changing the span
of (xn ). Likewise, if xn is a linear combination of its predecessors, then it may be removed
without affecting the span. By induction, we obtain a linear independent subsequence (yn )
of (xn ) with span (yn ) = span (xn ). The Gram-Schmidt process may be applied to (yn )
to obtain an orthonormal sequence (en ) such that span (en ) = span (yn ) = span (xn ). If
x ⊥ en for all n, then x ⊥ xn for all n, and since (xn ) is dense in H, x = 0. Therefore,
(en ) is a basis.
For example, the vectors en = (0, . . . , 0, 1, 0, . . .) (n ∈ N) form an orthonormal basis in
`2 . It follows from 11.3.8 that every separable Hilbert space is isomorphic to `2 . This fact,
however, does not necessarily lead to simplifications in the study of separable Hilbert spaces,
as the isomorphism may obscure certain essential properties of concrete Hilbert spaces such
as L2 [0, 1]. Nonetheless, it is of some interest to know that, structurally, all separable Hilbert
spaces are “like” `2 .

Fourier Series
We show that the functions

en (t) = e2πint , t ∈ [0, 1], n ∈ Z, (11.5)

form an orthonormal basis for L2 [0, 1] with respect to Lebesgue measure. The calculation
Z 1 Z 1
2πint 2πimt
e e dt = e2πi(n−m)t dt
0 0

shows that (en )n is an orthonormal set. Let A denote the algebra of continuous functions
f : [0, 1] → C with f (0) = f (1). Since C[0, 1] is dense in L2 [0, 1], a simple linearization
argument shows that the same is true for A. For each f ∈ A define Ff : T → C by

Ff e2πit = f (t), t ∈ [0, 1].

By the periodicity of f , the function Ff is well-defined and continuous. Moreover,


n
X n
X
Fτ (z) = ck z k (z ∈ T), where τ (t) := ck e2πint .
k=−n k=−n

Let T denote the collection of all such functions τ . By the Stone-Weierstrass theorem, Fτ
is uniformly dense in C(T). It follows that T is dense in A in the uniform norm and is
therefore dense in L2 [0, 1]. Thus (en )n∈Z is a basis, as claimed.
Hilbert Spaces 285

From 11.3.4 we see that every f ∈ L2 [0, 1] has a Fourier series expansion

X Z 1
f= fb(n)en , fb(n) := (f | en ) = f (t)e−2πint dt, (11.6)
n=−∞ 0

where convergence is in L2 [0, 1]. The function fb is called the Fourier transform of f .
The convergence of the series in (11.6) implies that limn fb(n) = 0, which is the classical
Riemann-Lebesgue lemma.
The following is an interesting application to the Fourier transform of a rapidly decreasing
function on R (see §6.3).
11.3.9 Theorem (Poisson Summation Formula). Let ϕ be a rapidly decreasing function on
R with Fourier transform ϕ.
b Then

X ∞
X
ϕ(n) = b
ϕ(n).
n=−∞ n=−∞

Proof. Define

X
f (t) := ϕ(t + n), t ∈ R.
n=−∞

The rapidly decreasing property of ϕ implies that the series, as well as all derived P∞ series,
converge absolutely and locally uniformly. Thus f is a C ∞ function. Moreover, n=−∞ ϕ(n) b
converges because ϕ b is also rapidly decreasing. Since f (t + 1) = f (t) for all t, we may consider
f ∈ L2 [0, 1]. Multiplying (†) by e−2πimt and integrating term by term, we have
∞ Z
X 1 ∞ Z
X n+1
fb(m) = ϕ(t + n)e−2πimt dt = ϕ(t)e−2πim(t−n) dt
n=−∞ 0 n=−∞ n

X∞ Z n+1 Z ∞
= ϕ(t)e−2πimt dt = ϕ(t)e−2πimt dt
n=−∞ n −∞

b
= ϕ(m),

Thus for a.a. t,



X ∞
X ∞
X
ϕ(t + n) = f (t) = fb(n)en (t) = b
ϕ(n)e2πint
.
n=−∞ n=−∞ n=−∞

Since both series are continuous in t, the equation holds for all t. Setting t = 0 yields the
desired equality.

Exercises
11.19 Let (en )n be an orthonormal basis and fix y ∈ H. Show that P
the minimum value of the function
x → kx − yk for x ∈ span {e1 , . . . , em } occurs when x = m j=1 (y | ej ) ej .

11.20 Show that the sequence 1, z, z 2 , . . . is orthogonal in L2 (D, λ2 ). Is the normalized sequence
z n kz n k−1
2 a basis?

11.21 A Hamel basis for a vector space is a linearly independent set that spans the space. Let H be
an infinite dimensional Hilbert space. Show that an orthonormal basis cannot be a Hamel basis.
Show that a Hamel basis in H is uncountable.
286 Principles of Analysis
w
11.22 Show that in a Hilbert space, xn → 0 iff supn kxn k < ∞ and (xn | e) → 0 for every e in an
orthonormal basis.

11.23 (Wirtinger’s inequality). Let f ∈ C 1 [0, a] with f (0) = f (a) = 0. Show that π kf k2 ≤ a kf 0 k2 .
JExtend f to [−a, a] as an odd function. Use Parseval’s identity on f ∈ L2 [−1, 1] with the basis
√1 eibnt dt (b := 2π/a) and integrate fb(n) by parts.K
2a

11.24 Show that the Fourier transform is a linear isometry from L2 [0, 1] onto `2 (Z).

11.25 Let (X, F, µ) be σ-finite and φ ∈ L∞ (µ). Show that the range of the multiplication mapping
Mφ f := f φ on L2 (µ) is closed iff φ = 1E for some E ∈ F.

11.26 Let (X, F, µ) be σ-finite and φ ∈ L∞ (µ). Show that φ−1 ∈ L∞ iff supn kfn k2 < ∞ for any
sequence (fn ) in L2 for which (φfn ) converges in L2 . JFor the sufficiency, suppose φ−1 6∈ L∞ .
Choose
Pn An ∈ F such that An ⊆ {|φ| < 1/n2 } and 0 < µ(An ) < ∞ (how?) and set fn =
−1/2
k=1 µ(A k) 1Ak .K

11.4 The Hilbert Space Adjoint

Throughout this section, H and K denote complex Hilbert spaces.

The Hilbert space adjoint of an operator T ∈ B(H) is closely related to the Banach space
dual operator T 0 , the essential difference being that the former acts on H while the latter
acts on H 0 . The existence of an adjoint operation in B(H) accounts to a large extent for
the rich structure of B(H) and its various subalgebras, this structure absent in the Banach
space case. For the construction of the adjoint we need the following notion.

Bounded Sesquilinear Functionals


A sesquilinear functional B on H × K is said to be bounded if

kBk := sup{|B(x, y)| : kxk ≤ 1, kyk ≤ 1} < ∞. (11.7)

For example, if T ∈ B(K, H), then

BT (x, y) := (x | T y)H , x ∈ H, y ∈ K,

defines a bounded sesquilinear functional on H × K with

kBT k = sup{| (x | T y)H | : kxk ≤ 1, kyk ≤ 1} = kT k ,

the last equality from 11.1.5. One easily checks that (11.7) defines a norm on the linear
space S(H × K) of all bounded sesquilinear functionals on H × K and that S(H × K) is
complete in this norm (Ex. 11.27). Moreover, the mapping T → BT is a conjugate linear
isometric isomorphism from B(K, H) into S(H × K). The following theorem shows that
the mapping is surjective.
11.4.1 Theorem. If B is a bounded sesquilinear functional on H × K, then B = BT for
some T ∈ B(K, H).
Hilbert Spaces 287

Proof. Fix y ∈ K. Since B(·, y) ∈ H 0 , by the Riesz representation theorem there exists a
unique vector T y ∈ H such that
(x | T y)H = B(x, y) for all x ∈ H.
For each x, the right side is conjugate linear in y, so T is linear. Moreover, since kBk < ∞,
T is bounded.

The Lax-Milgram Theorem


The following consequence of 11.4.1 has important applications in the theory of partial
differential equations (see §15.6).
11.4.2 Theorem (Lax-Milgram). Let B be a bounded sesquilinear form on H such that
for some c > 0
2
B(x, x) ≥ c kxk for all x ∈ H.
Then for each continuous linear functional f on H there exists a unique y ∈ H such that
B(x, y) = f (x) for all x ∈ H.
Proof. By 11.4.1, there exists T ∈ B(H) such that B(x, y) = (x | T y). Also, by the Riesz
representation theorem, there exists u ∈ H such that f (x) = (x | u) for all x ∈ H. To
complete the proof we must therefore find a y ∈ H such that (x | T y) = (x | u) for
2
all x ∈ H, that is, T y = u. Now, from c kxk ≤ B(x, x) = (x | T x) ≤ kxk kT xk we
have kT xk ≥ c kxk, hence ran T is closed and T has a continuous inverse on ran T . If
2
x ∈ (ran T )⊥ , then c kxk ≤ B(x, x) = (x | T x) = 0, hence x = 0. Therefore, ran T = H
and so T is invertible. Now take y = T −1 u.

Definition and Properties of the Adjoint


11.4.3 Theorem. Let T ∈ B(H, K). Then there exists a unique operator T ∗ ∈ B(K, H)
such that
(T x | y)K = (x | T ∗ y)H for all x ∈ H, y ∈ K. (11.8)
Proof. Take B(x, y) = (T x | y)K in 11.4.1, so that B = BT ∗ for some T ∗ .
The operator T ∗ : K → H is called the adjoint of T . The operation T 7→ T ∗ on the
Banach algebra B(H) is an example of an involution. The properties of the involution
operation are summarized in the next theorem. Note that the last assertion of theorem
implies that the involution operation is continuous in the operator norm.
11.4.4 Theorem. Let S, T ∈ B(H) and z ∈ C. Then
2 2
(S+T )∗ = S ∗ +T ∗ , (zT ∗ ) = zT ∗ , (ST )∗ = T ∗ S ∗ , T ∗∗ = T, and kT ∗ T k = kT k = kT ∗ k .
Proof. The verification of the first three equalities is an exercise for the reader. That T ∗∗ = T
follows from
(T x | y) = (x | T ∗ y) = (T ∗ y | x) = (y | T ∗∗ x) = (T ∗∗ x | y) , x, y ∈ H.
The norm equality is proved as follows: By the CBS inequality we have
2
kT xk = (T x | T x) = (x | T ∗ T x) ≤ kxk kT ∗ T xk ≤ kT ∗ T k , kxk ≤ 1.
Taking the sup over all such x we obtain
2
kT k ≤ kT ∗ T k ≤ kT ∗ k kT k . (†)
∗ ∗ ∗∗ ∗
Thus kT k ≤ kT k. Replacing T by T and using T = T we have kT k ≤ kT k. Therefore,
kT ∗ k = kT k. The inequalities in (†) are then equalities, giving the desired conclusion.
288 Principles of Analysis

The following is the Hilbert adjoint analog of 8.9.2. The proof is an exercise for the reader.
 ⊥
11.4.5 Proposition. Let T ∈ B(H). Then ker T ∗ = ran T and ker T = [ran T ∗ ]⊥ .

B(H) as a C ∗ -algebra
The properties in the conclusion of 11.4.4 assert that B(H) is a C ∗ -algebra. A norm
closed subalgebra C of B(H) that is closed under the operation of involution is called a
C ∗ -subalgebra of B(H). For example, if T ∈ B(H) and T T ∗ = T ∗ T , then the closure in
B(H) of the set of all polynomials in T , T ∗ is a commutative C ∗ -algebra (see §13.1).
The following concept will occasionally be needed: The commutant of a subset S of
B(H) is the set
S0 := {T ∈ B(H) : T S = ST ∀ S ∈ S}.
The notation is in conflict with that for dual spaces, but this should not be a problem,
as context will indicate the intended meaning. The bicommutant of S is defined by
S00 := (S0 )0 , that is, the commutant of the commutant. The proof of following is an exercise
(11.35).
11.4.6 Proposition. The commutant of S ⊆ H is a C ∗ -subalgebra of B(H) containing
the identity operator. Moreover, S ⊆ S00 .

Exercises
11.27 Prove that (11.7) defines a complete norm on S(H).

11.28 Let S, T ∈ B(H). Prove the polarization identity for operators

3
1X k
T ∗S = i (S + ik T )∗ (S + ik T ).
4 i=0

11.29 Let H be a Hilbert space. Verify the following relations:


(a) (ran T )⊥ = ker T ∗ . (b) (ran T ∗ )⊥ = ker T . (c) (ker T )⊥ = cl(ran T ∗ ).
(d) (ker T ∗ )⊥ = cl(ran T ).
Conclude that T ∗ is injective iff ran T is dense in H, and T is injective iff ran T ∗ is dense in H.

11.30 Let T ∈ B(H). Suppose there exist a, b > 0 such that kT xk ≥ a kxk and kT ∗ xk ≥ b kxk for
all x. Show that T is invertible.

11.31 Let H be a Hilbert space, T ∈ B(H), and M a closed subspace M. Then M is said to be
invariant under T if T M ⊆ M. If both M and M ⊥ are T -invariant, then M is said to reduce
T . Let P be the orthogonal projection onto M. Prove:

(a) M is T -invariant iff M ⊥ is T ∗ -invariant. (b) M is T -invariant iff P T P = T P .


⊥ ∗
(c) M reduces T iff M reduces T . (d) M reduces T iff P T = T P .

11.32 Show that for T ∈ B(H), ker T ∗ T = ker T .

11.33 [↑ 8.34] Find the adjoints of the left and right shift operators T` and Tr on `2 .

11.34 Show that Mφ∗ = Mφ for the multiplication operator Mφ on L2 (X, F, µ), where φ ∈ L∞ .

11.35 Prove 11.4.6.


Chapter 12
Operator Theory

The special structure of Hilbert spaces allows the construction of classes of operators that
have no analogs in general Banach spaces. In this chapter we discuss the main properties of
these operators and consider as well various algebras of operators on Hilbert spaces.

Throughout the chapter, H and K denote complex Hilbert spaces.

12.1 Classes of Operators


Normal Operators
An operator T ∈ B(H) is said to be normal if T T ∗ = T ∗ T . For example, a normal
operator in Euclidean space Cd is a linear transformation whose matrix commutes with the
conjugate transpose. For an infinite dimensional example, consider a multiplication operator
Mφ on L2 (X, F, µ), where φ ∈ L∞ . Since Mφ∗ = Mφ (Ex. 11.34), we have Mφ Mφ∗ = M|φ|2 =
Mφ∗ Mφ . Here is an important characterization of normal operators in terms of norms.
12.1.1 Proposition. Let T ∈ B(H). Then T is normal iff kT xk = kT ∗ xk for all x ∈ H.
2 2
Proof. If T is normal, then kT xk = (T x | T x) = (T ∗ T x | x) = (T T ∗ x | x) = kT ∗ xk .
Conversely, if kT xk = kT ∗ xk for all x, then (T x | T x) = (T ∗ x | T ∗ x), hence, by the
polarization identity, (T x | T y) = (T ∗ x | T ∗ y). Therefore, for all x and y,

(T ∗ T x | y) = (T x | T y) = (T ∗ x | T ∗ y) = (T T ∗ x | y) ,

which shows that T ∗ T = T T ∗ .


2
12.1.2 Corollary. If T ∈ B(H) is normal, then T 2 = kT k .

Proof. By the proposition, T 2 x = kT (T x)k = kT ∗ (T x)k. Taking the supremum over all
2
x with kxk = 1, we obtain T 2 = kT ∗ T k = kT k , the last equality by 11.4.4.

Self-Adjoint Operators
An operator T ∈ B(H) is said to be self-adjoint if T ∗ = T . For example, a multiplication
operator Mφ is self-adjoint iff φ is real-valued. Clearly, every self-adjoint operator is normal.
On the other hand, the operator iI is normal but not self-adjoint.
It is clear that the sum of self-adjoint operators is self-adjoint. The product of self-adjoint
operators S, T need not be self-adjoint. Indeed, the equality (ST )∗ = T ∗ S ∗ = T S shows
that ST is self-adjoint iff ST = T S.
For any S ∈ B(H), the operators S ∗ S, SS ∗ , S + S ∗ and i(S − S ∗ ) are self-adjoint. These
examples suggests that self-adjoint operators may be viewed as the analogs of real numbers
in the complex number system, the adjoint operation being the analog of conjugation. The

289
290 Principles of Analysis

following proposition strengthens this analogy. The proof is left as an exercise for the reader
(12.2).
12.1.3 Proposition. For T ∈ B(H), define

Re T = 21 (T + T ∗ ) and Im T = 1
2i (T − T ∗ ).

Then Re T and Im T are self-adjoint and T = Re T + i Im T. Moreover, the decomposition


is unique. That is, if T = A + iB, where A and B are self-adjoint, then A = Re T and
B = Im T .
The operators Re T and Im T in the statement of the proposition are called the real and
imaginary parts of T .
Proposition 12.1.5 below gives a characterization of self-adjointness in terms of the inner
product. First, we prove

12.1.4 Lemma. Let T ∈ B(H) such that (T x | x) = 0 for all x. Then T = 0.


Proof. For any x, y ∈ H and c ∈ C,

0 = (T (cx + y) | cx + y) = |c|2 (T x | x) + (T y | y) + c (T x | y) + c (T y | x)
= c (T x | y) + c (T y | x) .

Taking c = 1 and c = i yields (T x | y) + (T y | x) = 0 and (T x | y) − (T y | x) = 0,


respectively. Adding the last two equations and taking y = T x, we see that (T x | T x) = 0.
Therefore, T x = 0.
12.1.5 Proposition. An operator T is self-adjoint iff (T x | x) is real for all x ∈ H.
Proof. From the calculation (T x | x) = (x | T ∗ x) = (T ∗ x | x) we see that (T x | x) is real
iff ((T − T ∗ )x | x) = 0. The conclusion now follows from the lemma.
Corollary 11.1.5 asserts that for any T ∈ B(H), kT k = sup{| (T x | y) | : kxk , kyk ≤ 1}.
For self-adjoint operators, there is a considerable simplification:
12.1.6 Theorem (Rayleigh). Let T ∈ B(H) be self-adjoint. Then

kT k = sup{| (T x | x) | : kxk ≤ 1}.

Proof. Let s denote the supremum. Obviously, s ≤ kT k. For the reverse inequality, let
x, y ∈ C1 . Since (T y | x) = (y | T x) = (T x | y), we have

(T (x + y) | x + y) − (T (x − y) | x − y) = 2 (T x | y) + 2 (T y | x) = 4Re (T x | y) .

By definition of s, the left side of the equation is not bigger than


2 2 2 2
s kx + yk + s kx − yk = 2s(kxk + kyk ) ≤ 4s.

Therefore, Re (T x | cy) ≤ s for all x and y with norm ≤ 1 and all with |c| = 1. Choosing
c so that Re (T x | cy) = | (T x | y) |, we have | (T x | y) | ≤ s. Taking the supremum over
all x, y ∈ C1 shows that kT k ≤ s.
We give an application of Rayleigh’s theorem in 12.1.9. The theorem actually holds more
generally for normal operators (13.2.10), but the proof is considerably deeper, depending on
notions of spectral theory.
Operator Theory 291

Positive Operators
An operator T is said to be positive, written T ≥ 0, if (T x | x) ≥ 0 for all x ∈ H. Thus
a positive operator is self-adjoint; the converse is trivially false.
If S ∈ B(H), then S ∗ S and SS ∗ are clearly positive. The next theorem shows that all
positive operators are of this form. The theorem reinforces the analogies between self-adjoint
operators and real numbers and between positive operators and nonnegative real numbers.
A direct proof of the theorem may be given now, but we prefer to wait until §13.6 when the
machinery for a simpler proof will be available.
12.1.7 Theorem. Let T ∈ B(H).
(a) If T is positive, then T has a unique positive square root, that is, a unique positive
operator T 1/2 that satisfies (T 1/2 )2 = T . Moreover, if T is invertible, then T −1 is
positive, T 1/2 is invertible, and (T 1/2 )−1 = (T −1 )1/2 .
(b) If T is self-adjoint, then there exists a unique pair of positive operators T + and T −
such that T = T + − T − and T + T − = T − T + = 0.
(c) The operators T 1/2 in (a) and T ± in (b) are members of the bicommutant {T }00 of T .

12.1.8 Corollary. The operator |T | := (T ∗ T )1/2 is the unique positive operator |T | with
the property kT xk = k |T |x k for all x ∈ H. Moreover, |T | = T + + T − .
Proof. For the norm equality we have
2  2
kT xk = (T x | T x) = (T ∗ T x | x) = |T |2 x | x = (|T |x | |T |x) = k |T |xk .

If also S ≥ 0 and kSxk = kT xk for all x, then


 2 2
S 2 x | x = (Sx | Sx) = kSxk = kT xk = (T ∗ T x | x) .

By 12.1.4, S 2 = T ∗ T , hence, by uniqueness of the positive square root, S = (T ∗ T )1/2 = |T |.


The last assertion of the corollary is left as an exercise (12.4).
For self-adjoint operators S, T ∈ B(H) we write S ≥ T if S − T is positive, that is,
(Sx | x) ≥ (T x | x) for all x ∈ H. The relation is a partial order on the set of all self-adjoint
operators on H (Ex. 12.8). The following theorem asserts the existence of a least upper
bound for an increasing sequence of self-adjoint operators bounded above.

12.1.9 Theorem. Let R and Tn be self-adjoint operators such that Tn ≤ Tn+1 ≤ R for all
n. Then there exists self-adjoint operator T such that
(a) Tn ≤ T for all n.

(b) If S is self-adjoint and Tn ≤ S for all n, then T ≤ S.

(c) kTn x − T xk → 0 for all x ∈ H.

Proof. Replacing Tn by Tn − T1 and R by R − T1 , we may assume that Tn ≥ 0 for all n. By


hypothesis, the sequence (Tn x | x) is increasing and bounded, hence has a finite limit
B(x, x) ≥ 0. The polarization identity
4
1X k 
(x | Tn y) = (Tn x | y) = i Tn x + ik y | Tn x + ik y
4
k=1
292 Principles of Analysis

shows that (Tn x | y) converges to a function B(x, y) satisfying


3
1X k
B(y, x) = B(x, y) = i B(x + ik y, x + ik y).
4
k=0

By Rayleigh’s theorem, kTn k = sup{(Tn x | x) : kxk ≤ 1} ≤ sup{(Rx | x) : kxk ≤ 1} =


2
kRk , hence c := supn kTn k < ∞. Since B(x, x) = lim (Tn x | x) ≤ c kxk , we have for
kxk , kyk ≤ 1,
3
1X
|B(x, y)| ≤ |B(x + ik y, x + ik y)| ≤ ckx + ik yk2 ≤ 4c < ∞.
4
k=0

By 11.4.1, there exists a self-adjoint operator T such that B(x, y) = (T x | y) for all
x, y ∈ H. Thus (Tn x | y) → (T x | y) and (Tn x | x) ↑ (T x | x) for all x, y. In particular,
T satisfies (a) and (b). Since Sn := T − Tn ≥ 0, by the CBS inequality applied to the positive
sesquilinear form (Sn x | y) we have, for any pair of unit vectors x, y,
| (Sn x | y) |2 ≤ (Sn x | x) (Sn y | y) ≤ (Sn x | x) kSn k ≤ (Sn x | x) (c + kT k).
2
Taking the sup over all such y yields kSn xk ≤ (Sn x | x) (c + kT k). Since (Sn x | x) → 0
we see that kSn xk → 0, proving (c).

Orthogonal Projections and Idempotents

Let M be a closed subspace of a Hilbert space H, so that H = M ⊕ M ⊥ . The projection


P onto M is called the orthogonal projection of H onto M. Thus, in the obvious notation,
P (m + m⊥ ) = m. By 8.7.8 and the last assertion of 11.2.4, we have
12.1.10 Proposition. Let M be a closed subspace of H. Then the orthogonal projection
P : H → M is continuous, P x is the unique member of M nearest x, ran P = M, and
ker P = M ⊥ .
An operator T ∈ B(H) is called an idempotent if T 2 = T . For such an operator,
ran T = {x : T x = x}. Obviously, projection mappings are idempotents; the converse is
false (Ex. 12.19).
12.1.11 Proposition. Let T ∈ B(H) be a nonzero idempotent. Then ran T is closed and
kT k ≥ 1.

Proof. The first assertion is clear. For the second use kT k = T 2 ≤ kT k kT k.
Here is a characterization of orthogonal projections in terms of idempotents.
12.1.12 Proposition. An operator P = 6 0 ∈ B(H) is an orthogonal projection iff P is a
self-adjoint idempotent. In this case, P ≥ 0 and kP k = 1.
Proof. Let P be an idempotent and M := ran P = {x : P x = x}. If P is an orthogonal
projection and x = m + m⊥ , y = n + n⊥ (m, n ∈ M), then
 
(P x | y) = m | n + n⊥ = (m | n) = m + m⊥ | n = (x | P y) ,

so P is self-adjoint and hence positive as well: (P x | x) = P 2 x | x = (P x | P x) ≥ 0.
2 2 2 2
Also, kxk = kP xk + k(I − P )xk ≥ kP xk , hence kP k ≤ 1. By 12.1.11, kP k = 1.
Conversely, assume P is self-adjoint and positive. Since P m⊥ ∈ M, we have
  
0 = P m⊥ | m⊥ = P 2 m⊥ | m⊥ = P m⊥ | P m⊥ .
Therefore P m⊥ = 0 and so P (m + m⊥ ) = P m = m, as required.
Operator Theory 293

Here are some additional characterizations of orthogonal projections:

12.1.13 Proposition. Let P 6= 0 be an idempotent. Then P is an orthogonal projection iff


any one of the following holds.

(a) P is positive. (b) P is self-adjoint. (c) P is normal.


⊥ ⊥
(d) ker P = (ran P ) . (e) (ker P ) = ran P. (f) kP k = 1.

Proof. If P is an orthogonal projection, then (a) – (f) obviously hold. If (c) holds, then
from 12.1.1, P x = 0 iff P ∗ x = 0, hence ker P = ker P ∗ = (ran P )⊥ . Taking orthogonal
complements yields (e). Therefore, (c) implies (d) and (e). Conversely, if (e) holds, then we
have the orthogonal decomposition H = ran P ⊕ ker P , hence P is an orthogonal projection.
It follows that (a) – (e) are equivalent and imply that P is an orthogonal projection.
Finally, we show that if kP k ≤ 1, then (e) holds. Let x ∈ (ker P )⊥ . Since x−P x ∈ ker P ,
2 2
kxk = (x − P x + P x | x) = (P x | x) ≤ kP xk kxk ≤ kxk ,
2 2
hence kxk = kP xk = (P x | x). Therefore,
2 2 2
kx − P xk = kxk + kP xk − 2Re (P x | x) = 0

and so x ∈ ran P . Thus (ker P )⊥ ⊆ ran P . For the reverse inclusion, let x ∈ ran P and write
x = y + z, where y ∈ ker P and z ∈ (ker P )⊥ ⊆ ran P . Then x = P x = P y + P z =
P z = z hence x in (ker P )⊥ .

Unitary Operators
An operator U ∈ B(H) is said to be unitary if

U ∗ U = U U ∗ = I. (12.1)

Thus a unitary operator is an invertible normal operator with U −1 = U ∗ .


12.1.14 Proposition. An operator U ∈ B(H) is unitary iff it is a surjective isometry. In
this case,
(U x | U y) = (x | y) for all x, y ∈ H. (12.2)

Proof. The necessity is clear. Conversely, if U is a surjective isometry, then (12.2) holds by
the polarization identity, hence U ∗ U = I. Therefore, U ∗ = U −1 , hence U U ∗ = I.
For example, the translation operator and the Fourier transform are unitary operators on
L2 (Rd ). The right shift on `2 (N) is an isometry that is not unitary.
Note that the operator αI is unitary iff |α| = 1. This suggests that the set of unitary
operators is the analog of the subset T of C. The next proposition reinforces this analogy.
12.1.15 Proposition. The set U of all unitary operators in B(H) is a group under
composition.
Proof. If U ∈ U, then (U −1 )∗ U −1 = U ∗∗ U ∗ = U U ∗ = I and similarly U −1 (U −1 )∗ = I,
hence U −1 ∈ U. If V ∈ U , then (U V )∗ (U V ) = V ∗ U ∗ U V = V ∗ IV = I hence U V ∈ U.
Here is an application of unitary operators due to von Neumann. We give a generalization
in Corollary 17.6.9.
294 Principles of Analysis

12.1.16 Mean Ergodic Theorem. Let U ∈ B(H) be unitary and let P : H → M be the
orthogonal projection from H to M := {m ∈ H : U m = m}. Then for every x ∈ H,
n−1
1X k
lim Sn x = P x, where Sn := U . (12.3)
n n
k=0

Proof. (F. Riesz). The set K of all x for which (12.3) holds is clearly a linear space containing
M. We claim that U x − x ∈ K for all x ∈ H. Indeed, the calculation

(U x − x | m) = x | U −1 m − (x | m) = (x | m) − (x | m) = 0
n
shows that U x − x ⊥ M, hence P (U x − x) = 0, and because kU n k ≤ kU k ≤ 1 we also
have
lim Sn (U x − x) = lim n1 (U n x − x) = 0,
n n

verifying the claim.


Now let x ∈ K ⊥ . By the first paragraph, (U x − x | x) = 0, hence, by Ex. 12.5,
2
kU x − xk = −2Re (U x − x | x) = 0. Thus x ∈ M, and since also x ∈ M ⊥ we conclude
that x = 0. Therefore, K ⊥ = {0}, hence K is dense in H. Since supn kSn k ≤ 1 and
T x := limn Sn x exists for all x in a dense subset of H, the limit exists for all x ∈ H and
defines a bounded linear operator T . Since T = P on K, T = P on H.

Partial Isometries
An operator U ∈ B(H) is a partial isometry if U is an isometry on (ker U )⊥ . Thus U
is a partial isometry iff there exists a closed subspace K of H such that U is an isometry on
K and U = 0 on K ⊥ . If U is a partial isometry, then (ker U )⊥ is called the initial space
of U and ran U the final space.
The following proposition characterizes a partial isometry in terms of the orthogonal
projection onto the initial space.
12.1.17 Proposition. Let U ∈ B(H) and set P = U ∗ U . Then U is a partial isometry iff
P is an orthogonal projection. In this case (ker U )⊥ = ran P .
Proof. In the proof we make frequent use of the identity
2
kU xk = (U x | U x) = (P x | x) , x ∈ H. (†)

Let U be a partial isometry. Then kU k ≤ 1, and since kU ∗ k = kU k we have kP k ≤


2 2 2
kU ∗ k kU k ≤ 1. Now, for x ∈ (ker U )⊥ , kxk = kU xk = (P x | x) ≤ kP xk kxk ≤ kxk ,
2 2
hence kP xk = (P x | x) = kxk and so
2 2 2
kP x − xk = (P x − x | P x − x) = kP xk + kxk − (P x | x) − (x | P x) = 0.

Therefore, P is the identity on (ker U )⊥ . Since P = 0 on ker U , P is the orthogonal projection


onto (ker U )⊥ .
2 2
Conversely, assume that P is an orthogonal projection. By (†), kU xk = kxk if x ∈
2
ran P , and kU xk = 0 if x ∈ (ran P )⊥ = ker P . Therefore, U is a partial isometry with
initial space ran P .

For example, the left shift T` (x1 , x2 , . . .) = (x2 , x3 , . . .) on `2 (N) is a partial isometry
with final space `2 and initial space consisting of all vectors of the form (0, x2 , x3 , . . .). The
orthogonal projection P is T`∗ T` x = Tr T` x = (0, x2 , x3 , . . .).
Operator Theory 295

12.1.18 Corollary. An operator U on H is a partial isometry iff U ∗ is a partial isometry.

Proof. If U is a partial isometry and P = U ∗ U , then I − P is a projection onto (ran P )⊥ =


ker P = ker U , hence U (I − P ) = 0. Therefore, (U U ∗ )2 − U U ∗ = U (P − I)U ∗ = 0 and so
U U ∗ is a self-adjoint idempotent, that is, an orthogonal projection. The converse follows
from U ∗∗ = U .
The next result is a generalization of the polar decomposition z = eiθ |z| of a complex
number.
12.1.19 Polar Decomposition Theorem. Let T ∈ B(H). Then there exists a unique
partial isometry U ∈ B(H) such that T = U |T | and ker U = ker T .
2 2
Proof. Define U on ran |T | by U |T |x = T x. Since kT xk = k|T |xk , U is a well-defined
isometry on ran |T |. As such, it has an extension to a linear isometry on cl ran |T | = (ker |T |)⊥ .
Extend U to all of H by defining U to be zero on ker |T |. Since

(ran |T |)⊥ = ker |T | = ker |T |2 = ker T ∗ T = ker T,

we see that U |T |x = T x on ker |T |, hence T = U |T |.


If also T = V |T |, where V is a partial isometry and ker V = ker T , then V = U on ran |T |
and V = 0 = U on ker |T | = (ran |T |)⊥ , hence V = U .
12.1.20 Corollary. Every invertible operator T ∈ B(H) may be written uniquely as
T = U |T |, where U is unitary.

Exercises
12.1 Show that if T is self-adjoint (normal), then T n is self-adjoint (normal) (n ∈ N).

12.2 Prove 12.1.3.

12.3 Let S, T ∈ B(H) be normal and T ∗ S = ST ∗ . Prove that S + T and ST are normal.

12.4 Let T ∈ B(H) be self-adjoint. Show that |T | = T + + T − .

12.5 Let U ∈ B(H). Prove that U is an isometry iff kU x − xk2 = 2Re (x − U x | x).

12.6 Show that if Tn ∈ B(H) is normal for all n and Tn → T ∈ B(H), then T is normal.

12.7 Show that if S, T and ST are self-adjoint, then ST = T S.

12.8 Show that the relation T ≤ S iff S − T ≥ 0 is a partial order on the set of all self-adjoint
operators on H.

12.9 [↑ 12.8] Prove that |S + T |2 ≤ 2|S|2 + 2|T |2 . JConsider (S ± T )∗ (S ± T ).K

12.10 Show that |cT | = |c||T | (c ∈ C).

12.11 Prove that T − = (−T )+ .

12.12 Let S, T ∈ B(H) with S ≥ 0 and T ≥ 0. Show that ST ≥ 0 iff ST = T S. Show that one then
has (ST )1/2 = S 1/2 T 1/2 .

12.13 Show that T ∈ B(H) is normal iff the real and imaginary parts of T commute.

12.14 Let T ∈ B(H). Show that I + T ∗ T is invertible and (I + T ∗ T )−1 ≤ 1.


12.15 [↑ 12.8] Let T ∈ B(H) and 0 ≤ T ≤ I. Show that 0 ≤ T 2 ≤ T .


296 Principles of Analysis

12.16 Let S, T ∈ B(H) with S ≥ 0 and T self-adjoint. Show that T ST ≥ 0.


12.17 (Hellinger-Toeplitz ). Let T : H → H be linear and satisfy (T x | y) = (x | T y) for all x, y.
Show that T ∈ B(H). JT (C1 ) is weakly bounded.K
12.18 Let P be an orthogonal projection. Prove: If kP xk = kxk, then P x = x.
12.19 Give an example of an idempotent in a Hilbert space that is not an orthogonal projection.
12.20 Let P , Q be orthogonal projections. Prove that the following are equivalent:
(a) P Q is a orthogonal projection. (b) QP is a orthogonal projection. (c) P Q = QP .
Show that if these hold, then ran (P Q) = ran P ∩ ran Q.
12.21 Let P , Q be orthogonal projections. Prove that the following are equivalent:
(a) P Q = 0. (b) QP = 0. (c) ran P ⊥ ran Q.
Show that if these hold, then P + Q is an orthogonal projection and ran (P + Q) = ran P ⊕ ran Q.
12.22 Let P and Q be orthogonal projections on a Hilbert space H. Show that Q − P is an orthogonal
projection iff any of the following holds:
(a) Q − P is positive. (b) ran P ⊆ ran Q. (c) P Q = P . (d) QP = P .
Show that if these hold, then ran (Q − P ) = ran Q ∩ ker P .
JFor (a) ⇒ (b), let x ∈ ran P and use 12.18 to show that Qx = x.K
12.23 Let M be a closed subspace of H and x ∈ H. Prove that
d(x, M) = sup{| (x | y) | : y ∈ M ⊥ , kyk = 1}

12.24 Let T ∈ B(H) be normal. Show that if T is invertible, then so is |T | and |T |−1 = |T −1 |
12.25 (a) Let T ∈ B(H) be self-adjoint. Show that T + iI is invertible.
(b) Define the Cayley transform U of T by U = (T − iI)(T + iI)−1 . Show that U is unitary.
(c) Let U be a unitary operator on H such that I − U is invertible. Show that the operator
T := i(I + U )(I − U )−1 is self-adjoint and that U is its Cayley transform.
12.26 Let φ, φ−1 ∈ L∞ (X, F, µ). Find the polar decomposition of the multiplication operator Mφ .
12.27 Let T = U |T | be the polar decomposition of T ∈ B(H). Show that
(a) U ∗ U |T | = |T |, U ∗ T = |T |, and U U ∗ T = T . (b) U |T |U ∗ = |T ∗ | (use uniqueness of |T ∗ |).
(c) T is normal iff |T ∗ | = |T |. (d) T is normal iff U |T | = |T |U and U U ∗ = U ∗ U .

12.2 Compact Operators and Operators of Finite Rank


Recall that an operator on a Banach space is compact if it maps bounded sets onto
relatively compact sets (§8.10). In a Hilbert space the property has an alternate formulation.
Let T ∈ B(H, K). Since C1 is weakly compact and T is weak-weak continuous, T (C1 ) is
weakly compact and therefore norm closed. It follows that T is compact iff T (C1 ) is norm
compact.
Referring to §8.10 we see that the collection B0 (H, K) of compact operators in B(H, K)
a closed linear space and that the set B00 (H, K) of operators of finite rank is a linear
subspace of B0 (H, K). Moreover, both B0 (H) and B00 (H) are ideals in the Banach algebra
B(H). In this section we show that a compact operator in B(H, K) may be approximated
in norm by members of B00 (H, K).
Operator Theory 297

Rank One Operators


For x ∈ H and y ∈ K, define the rank one operator x ⊗ y ∈ B00 (H, K) by

e = (x
(x ⊗ y)x e | x) y, x
e ∈ H. (12.4)

It is easy to check that x ⊗ y is linear in y, conjugate linear in x and that kx ⊗ yk =


kxk kyk . Moreover,

T ◦ (x ⊗ y) = x ⊗ (T y) and (x ⊗ y) ◦ S = (S ∗ x) ⊗ y, T ∈ B(K), S ∈ B(H). (12.5)

Clearly, every linear combination of rank one operators is of finite rank. Conversely, every
T ∈ B00 (H, K) may be written
Xn
T = xj ⊗ yj (12.6)
j=1

for suitable xj ∈ H and yj ∈ K. Indeed, if {y1 , . . . , yn } is an orthonormal basis for the


finite dimensional space T (H), then for all x ∈ H,
n
X n
X n
X  ∗ 
Tx = (T x | yj ) yj = (x | T ∗ yj ) yj = (T yj ) ⊗ yj x,
j=1 j=1 j=1

hence (12.6) holds with xj = T ∗ yj .

An Approximation Theorem
Here is the main result of the section.
12.2.1 Theorem. B0 (H, K) is the operator norm closure of B00 (H, K).
Proof. We show that an arbitrary operator T ∈ B0 (H, K) is the limit of a sequence of
operators of finite rank. Since cl ran T is separable (Ex. 8.101), it has a countable orthonormal
basis (en ). For each n define a finite rank operator
n
X
Pn := (T ∗ ek ) ⊗ ek .
k=1

Since

X n
X
Tx = (T x | ek ) ek and Pn x = (T x | ek ) ek ,
k=1 k=1

by Parseval’s identity and Bessel’s inequality


X n
X
2 2 2
k(T − Pn )xk = | (T x | ek ) |2 → 0 and kPn xk = | (T x | ek ) |2 ≤ kT xk .
k>n k=1

These facts, together with the compactness of T (C1 ), implySm that kPn − T k → 0. Indeed,
given ε > 0, choose x1 , . . . , xm ∈ C1 such that T (C1 ) ⊆ j=1 Bε (T yj ). Let x ∈ C1 and
choose j so that kT x − T xj k < ε. Then

k(T − Pn )xk ≤ kT x − T xj k + kT xj − Pn xj k + kPn (xj − x)k ≤ 2ε + kT xj − Pn xj k

and so kT − Pn k ≤ 2ε + maxj kT yj − Pn yj k . Therefore, limn k(T − Pn )k ≤ 2ε.


298 Principles of Analysis

For the case K = H, we have the following version:

12.2.2 Theorem. Let T ∈ B0 (H). Then there exists a net (Pα ) of projections of finite
rank such that kPα T − T k → 0.
Proof. Let E be an orthonormal basis for H, and for each finite set α ⊆ E let Pα denote the
projection of H onto span α. Then (Pα ) is aPnet, where the indices are directed upward by
inclusion. Set Qα := Pα − I. For each x = e∈E (x | e) e we have, by Parseval’s identity,

2
X
kQα xk = | (x | e) |2 → 0. (†)
e∈E\α

If it is not the case that kPα T − T k → 0, then there exists an ε > 0, a subnet (Qβ ), and a
net (xβ ) of unit vectors with kQβ T xβ k ≥ ε for all β. Since T is compact we may assume
that T xβ → y for some y. But then

ε ≤ kQβ T xβ k ≤ kQβ (T xβ − y)k + kQβ yk ≤ kT xβ − yk + kQβ yk → 0,

impossible. Thus kQα T k → 0.


A subset A of B(K) is said to be self-adjoint if T ∈ A ⇒ T ∗ ∈ A.
12.2.3 Proposition. B0 (H) and B00 (H) are self-adjoint.
Proof. If T ∈ B00 (H), then K = ran T ⊕ ker T ∗ , hence T ∗ (K) = T ∗ (ran T ), which is finite
dimensional. Therefore T ∗ ∈ B00 (H). Now let T ∈ B0 (H) and let (Tn ) be a sequence in
B00 (H) with kTn − T k → 0. Since kTn∗ − T ∗ k = kTn − T k, we have kTn∗ − T ∗ k → 0. Since
Tn∗ ∈ B00 (H), it follows that T ∗ is compact.
12.2.4 Corollary. B0 (H) is a C ∗ -subalgebra.

Exercises
12.28 Let T ∈ B(H). Show that the commutant of B00 (H) is C I, hence B00 (H)00 = B(H).
w
12.29 Prove that T ∈ B(H) is compact iff xn → 0 ⇒ kT xn k → 0. Show that this is false in `1 (N).

12.30 Show that T ∈ B(H) is compact iff the following condition holds:
w w
xn → x and yn → y ⇒ (T xn | yn ) → (T x | y) .

12.31 Let φ ∈ c0 . Show that the multiplication operator Mφ on `2 (N) is compact. Show that the
analogous assertion for φ ∈ c is false.

12.32 Let S, T ∈ B(H). Prove: S ∗ S ≤ T ∗ T and T compact ⇒ S compact.

12.33 Let T ∈ B(H). Show that T ∗ T compact ⇒ T compact.

12.34 Let φ ∈ L∞ (0, 1). Show that if the multiplication operator Mφ on L2 (0, 1) is compact then
φ = 0 a.e. Find an example of a measure space (X, F, µ) for which the assertion is false in
L2 (X, F, µ).

12.35 Prove that kx ⊗ yk = kxk kyk.

12.36 Verify the assertions in (12.5).

12.37 Show that T is compact (has finite rank) iff |T | is compact (has finite rank). JUse a polar
decomposition.K
Operator Theory 299

12.3 The Spectral Theorem for Compact Normal Operators


Eigenvalues and Eigenvectors
An eigenvalue of T ∈ B(H) is a complex number α such that ker(αI − T ) 6= {0}.
The subspace ker(αI − T ) of H is called the eigenspace of T corresponding to α. A
nonzero member of the eigenspace is called an eigenvector. The spectral theorem for
finite dimensional
P spaces asserts that a normal operator T may be decomposed into a finite
sum T = j αj Pαj , where the αj are the eigenvalues of T P and the operators Pαj are the
projections onto the mutually orthogonal eigenspaces with j Pαj = I. In this section, we
prove an infinite dimensional version of this result for compact normal operators.1
12.3.1 Proposition. Let T ∈ B(H) be normal.
(a) α is an eigenvalue of T iff α is an eigenvalue of T ∗ .

(b) If α and β are distinct eigenvalues, then the eigenspaces ker(αI − T ) and ker(βI − T )
are mutually orthogonal.
Proof. (a) Since αI − T is normal with adjoint αI − T ∗ , we have k(αI − T )xk =
k(αI − T ∗ )xk. Therefore, (αI − T )x = 0 iff (αI − T ∗ )x = 0.
(b) Let T x = αx and T y = βy, where x, y 6= 0. Then

α (x | y) = (T x | y) = (x | T ∗ y) = x | βy = β (x | y) .

Since α 6= β, (x | y) = 0.

Diagonalizable Operators
An operator T ∈ B(H) is said to be diagonalizable if there exists an orthonormal basis
{ei : i ∈ I} of H and a bounded set of complex numbers {αi : i ∈ I} such that
X
Tx = αi (x | ei ) ei for all x ∈ H. (12.7)
i

Since αi = (T ei | ei ) we may write


X X
Tx = (T ei | ei ) (x | ei ) ei = (T ei | ei ) (ei ⊗ ei )x,
i i

or simply X
T = (T ei | ei ) ei ⊗ ei .
i∈I
P P
From (12.7) we see that T x = αx iff i (αi − α) (x | ei ) ei = 0 iff i |αi − α|2 | (x | ei ) |2 =
0. Thus the eigenvalues of T are the numbers αi . Moreover, since x is an eigenvector
corresponding to α iff (x, ei ) = 0 for all i with αi 6= α, we see that the eigenspace
corresponding to α is the span of those ei for which αi = α. Thus
X
x= (x | ei ) ei , x ∈ ker (αI − T ). (12.8)
i:αi =α

The next two propositions give the basic properties of diagonalizable operators.
1 We remove the compactness requirement in §13.6.
300 Principles of Analysis

12.3.2 Proposition. If T is diagonalizable relative to {ei : i ∈ I} and {αi , i ∈ I}, then


T ∗ is diagonalizable relative to {ei : i ∈ I} and {αi , i ∈ I}. In this case we have
X
T ∗x = αi (x | ei ) ei for all x ∈ H. (12.9)
i
P P
Proof. Writing x = i (x | ei ) ei and y =
(y | ej ) ej , we have from (12.7)
j
X  X 

(x | T ∗ y) = (T x | y) = αi (x | ei ) (y | ej ) (ei | ej ) = x αj (y | ej ) ej ,
i,j j
P
hence T ∗ y = j αj (y | ej ) ej .
12.3.3 Proposition. A diagonalizable operator T is normal. Moreover, T is self-adjoint iff
αi is real for every i, and T is positive iff αi ≥ 0 for all i.
Proof. From (12.7) and (12.9),
X X
T ∗T x = αi (x | ei ) T ∗ ei = αi αi (x | ei ) ei = T T ∗ x,
i i

which shows that T is normal. Equations (12.7) and (12.9)


P also prove the self-adjoint part
of the theorem. For the positivity part, use (T x | x) = i αi | (x | ei ) |2 , and (T ei | ei ) =
αi .
For the spectral decomposition of compact diagonalizable operators, the following ter-
minology will be convenient. A set A of complex numbers is said to vanish at infinity if
{c ∈ A : |c| ≥ ε} is finite for all ε > 0. By taking ε = 1/n we see that all but countably many
members of such a set A are zero.
12.3.4 Lemma. A diagonalizable operator T is compact iff the set A := {αi : i ∈ I} of
eigenvalues of T vanishes at infinity. In this case, the finite rank operator
X
Tε x := αi (x | ei ) ei
|αi |≥ε

converges in operator norm to T as ε → 0.


Proof. Suppose Aε := {i ∈ I : |αi | ≥ ε} is infinite. Then there exists a sequence (en ) of
w
distinct members of {ei : i ∈ I} such that kT en k = |αn | ≥ ε. Since en → 0, T cannot be
compact.
Conversely, suppose that Aε is finite for all ε. Then each operator Tε x has finite rank.
Moreover, by Bessel’s inequality,
2
X 2 2
kT x − Tε xk = kαi (x | ei ) ei k ≤ ε2 kxk .
|αi |<ε

Therefore, kT − Tε k ≤ ε, and since ε was arbitrary, T ∈ cl B00 (H) = B0 (H).


Here is the main result regarding compact diagonalizable operators.
12.3.5 Theorem. A compact diagonalizable operator T has only countably many distinct
eigenvalues. If the nonzero distinct eigenvalues are denoted by λn and if Pn denotes the
projection of H onto the eigenspace ker(λn − T ), then
X X X
T = λn Pn , Pn = ei ⊗ ei , and I = Pn , (12.10)
n αi =λn n
Operator Theory 301

the first equation holding in the operator norm and the second and third holding pointwise in
the norm topology of H. Moreover, the sequence (|λn |) may be taken to be decreasing, hence
in the infinite case |λn | ↓ 0.
Proof. The first assertion follows from the preceding lemma. For the proof of 12.10, we
consider only the case where the sequence (λn ) is infinite. Collecting together the terms in
the expansion (12.7) corresponding to the same αi , we have
X X X X
Tx = λn Pn x, Pn x = (x | ei ) ei = (ei ⊗ ei )x and x = Pn x.
n αi =λn αi =λn n

Since (λn ) vanishes at infinity, given 0 < ε ≤ |λ1 | we may choose the smallest n = n(ε) for
whichP |λk | < ε for all k > n. For k ≤ n we then have |αi | ≥ ε for all αi coinciding with λk .
n
Thus k=1 λk Pk is the operator
P Tε in the lemma. Since n(ε) increases as ε decreases, the
lemma implies that T = n λn Pn holds in the operator norm. By considering the finite sets
{|λn | ≥ 1} ⊆ {|λn | ≥ 1/2} ⊆ · · · , we may arrange the sequence (λn ) so that |λn+1 | ≤ |λn |
for all n.
The multiplicity of an eigenvalue λn is the dimension of ran Pn , where Pn is the
projection of the theorem.

The Spectral Theorem


For the main result of the section, we shall need some aspects of spectral theory, a subject
developed fully and in a more general setting in the next chapter.
The spectrum of T ∈ B(H) is the set

σ(T ) = {λ ∈ C : λI − T is not invertible in B(H)}.

The following result is proved in Chapter 13. It will be used here to prove the existence
of eigenvalues for a compact normal operator, the essential ingredient in the proof of the
spectral theorem.
12.3.6 Lemma. Let T ∈ B(H). Then σ(T ) is nonempty and bounded. Moreover
1/n
sup{|λ| : λ ∈ σ(T )} = lim kT n k . (12.11)
n

We shall need two more lemmas:


12.3.7 Lemma. Let V be a normal operator on H. Then V is invertible iff there exists
c > 0 such that kV xk ≥ c kxk for all x.
−1
Proof. The necessity follows by taking c = V −1 . For the sufficiency, note that the
inequality implies that V is injective and ran V is closed. Since V is normal, kV ∗ xk = kV xk,
hence V ∗ is also injective and so ran V = ker(V ∗ )⊥ = H.
12.3.8 Lemma. A compact normal operator T has an eigenvalue.
Proof. We may assume that T 6= 0. Noting that powers of a normal operator are normal
2 n
(Ex. 12.1), by iterating the equality T 2 = kT k we see that kT n k = kT k for n = 2k .
Therefore, the limit in (12.11) is simply kT k, and since kT k > 0, σ(T ) must have a nonzero
member λ. Set S := λ−1 T . Then S is compact and normal, and since λI − T is not invertible
neither is I − S. By 12.3.7 applied to V = I − S, for each n there exists an xn with unit norm
302 Principles of Analysis

such that k(I − S)xn k ≤ 1/n. By compactness of S we may take a convergent subsequence
Sxnk → y. We then have
xnk = (I − S)xnk + Sxnk → y
and so Sy = y, that is, T y = λy, and kyk = 1.
Here is the main result of the section:
12.3.9 Theorem. Let T ∈ B(H) be compact and normal. Then T is diagonalizable.
Proof. Let O denote the family of all orthonormal sets whose members are eigenvectors of T .
By 12.3.8, O 6= ∅. A standard Zorn’s lemma argument shows that O has a maximal member,
that is, an orthonormal set E of eigenvectors that is not properly contained in a larger such
set. Let K denote the closed linear span of E and observe that T (K) ⊆ K. Also, by 12.3.2,
T ∗ (K) ⊆ K, hence T (K ⊥ ) ⊆ K ⊥ . Since T is diagonalizable on K it therefore suffices to
show that K ⊥ = {0}. 
Suppose that K ⊥ 6= {0}. We consider two cases: If T K ⊥ = 0, then every unit vector
in K ⊥ is an eigenvector with eigenvalue zero. If T K ⊥ 6= 0, then, by 12.3.8, T has an
eigenvector in K ⊥ . Each outcome contradicts the maximality of E, hence K ⊥ = {0}.
The following application of the spectral theorem will be needed in the discussion of
Hilbert-Schmidt integral operators in the next section.
12.3.10 Corollary. If T ∈ B0 (H, K) is not the zero operator, then there exist orthonormal
(possibly finite) sequences (xn ) ⊆ H, (yn ) ⊆ K, and (αn ) ⊆ (0, ∞) such that in the operator
norm X
T = αn (xn ⊗ yn ). (12.12)
n
If the sequences are infinite, then αn ↓ 0.
Proof. By the spectral theorem applied to T ∗ T ∈ B0 (H), there exists an orthonormal
sequence (xn ) of eigenvectors of T ∗ T and a decreasing sequence of corresponding eigenvalues
βn > 0 such that
X∞
T ∗T x = βn (x | xn ) xn , x ∈ H. (†)
n=1
We assume that (βn ) is an infinite sequence (hence βn ↓ 0); otherwise the sum in (†) is finite
and the notation
√ in the remainder of the proof may be adjusted accordingly.
Set αn = βn and yn = αn−1 T xn . The calculation
αm αn (ym | yn ) = (T xm | T xn ) = (T ∗ T xm | xn ) = αm
2
(xm | xn )
implies that (yn ) is orthonormal, hence it remains to show that (12.12) holds.
Now, by Bessel’s inequality,
X 2 X
m m
2
α (x | x ) y = αk2 | (x | xk ) |2 ≤ βn kxk ≤ βn , kxk ≤ 1,
k k k
k=n k=n
Pm
hence the operators k=n αk xk ⊗ yk form a Cauchy sequence in B00 (H, K). Let
n
X
S := lim αk xk ⊗ yk (operator norm convergence),
n
k=1

which is the operator on the right in (12.12). Since Sxm = αm ym = T xm , T = S on the


closed linear span X of (xn ). Moreover, T = S on X ⊥ . Indeed, S is obviously zero on X ⊥ ,
and (†) implies that T ∗ T = 0 on X ⊥ so that T = 0 on X ⊥ as well. This verifies (12.12).
Operator Theory 303

Exercises
12.38 Let (X, F, µ) be a σ-finite measure space and let φ ∈ L∞ (X, F, µ). Show that λ is an eigenvalue
of the multiplication operator Mφ on L2 iff φ = λ on a set of positive measure.

12.39 Find the eigenvalues of the left shift operator T` on `2 . Show that the right shift operator Tr
has no eigenvalues.

12.40 Let f, g ∈ L2 [0, 1] and extend f periodically to R so that the convolution operator Tg f := f ∗ g
is defined on [0, 1]:
Z 1
(Tg f )(x) = f (x − y)g(y) dy.
0

Referring to (11.5), show that en is an eigenvector of Tg with eigenvalue gb(n).

12.41 Show that the operator T on L2 [0, 1] defined by (T f )(t) = tf (t) is self-adjoint with no
eigenvalues.
Rx
12.42 Show that the operator T on C[0, 1] defined by T f (x) = 0 f (t) dt does not have an eigenvalue.

12.43 Let T ∈ B(H) be self-adjoint, λ ∈ C, and let P be the projection of H onto ker (λ − T ).
Show that S ∈ B(H) and ST = T S ⇒ SP = P S. Conclude in (12.10) that for T self-adjoint,
ST = T S ⇒ SPn = Pn S for all n. (By 13.6.2, these assertions hold for normal T .)

*12.4 Hilbert-Schmidt Operators


In this section, H, K, and L denote Hilbert spaces over C,
and E and F are orthonormal bases for H and K, respectively.

The Hilbert-Schmidt Norm


The Hilbert-Schmidt norm of an operator T ∈ B(H, K) is defined by
!1/2
X 2
kT k2 := kT ek .
e∈E

Note that by Parseval’s equality and an interchange of summations,


X 2
XX XX X 2
kT ek = | (T e | f) |2 = | (e | T ∗ f) |2 = kT ∗ fk .
e∈E e∈E f∈F e∈E f∈F f∈F

Thus the definition of kT k2 is independent of the choice of the orthonormal basis and
kT k2 = kT ∗ k2 . If kT k2 < ∞, then T is called a Hilbert-Schmidt operator. The set of all
Hilbert-Schmidt operators is denoted by B2 (H, K). It is easy to check that B2 (H, K) is a
linear space and kT k2 is a norm. For example, the triangle inequality kT + Sk2 ≤ kT k2 +kSk2
follows easily from the CBS inequality in H and the triangle inequality in `2 (N).
The following proposition makes important connections between the operator norm and
the Hilbert-Schmidt norm.
12.4.1 Proposition. If S ∈ B(L, H), T ∈ B2 (H, K), and R ∈ B(K, L), then

kT k ≤ kT k2 , kT Sk2 ≤ kSk kT k2 , and kRT k2 ≤ kRk kT k2 ,

hence T S ∈ B2 (L, K) and RT ∈ B2 (H, L). In particular, B2 (H) is an ideal in B(H).


304 Principles of Analysis

Proof. By Parsevals’s identity and the CBS inequality, for x ∈ H with kxk ≤ 1,
2
X X X 2 2 2
kT xk = | (T x | f) |2 = | (x | T ∗ f) |2 ≤ kT ∗ fk = kT ∗ k2 = kT k2 ,
f f f

proving the first inequality. For the remaining inequalities we have


2
X 2 2
X 2 2
kRT k2 = kRT ek ≤ kRk kT ek2 = kRk kT k2 , and
e e
2 ∗ 2 ∗ 2 2 2 2 2
kT Sk2 = k(S T ∗ )∗ k2 = kS ∗
T k2 ≤ kS ∗ k kT ∗ k2 = kSk kT k2 .

12.4.2 Theorem. The inclusions B00 (H, K) ⊆ B2 (H, K) ⊆ B0 (H, K) hold. Moreover,
under the Hilbert-Schmidt norm, B2 (H, K) is a Banach space and B00 (H, K) is dense in
B2 (H, K).
Proof. To show that B2 (H, K) is complete, let (Tn ) be a Cauchy sequence in B2 (H, K) with
respect to k·k2 . Then (Tn ) is Cauchy with respect to the operator norm, hence there exists
T ∈ B(H, K) such that kTn − T k → 0. Given ε > 0, choose N so that kTm − Tn k2 < ε
for all m, n ≥ N . For such n and any finite E ⊆ E,
X 2
X 2 2
k(T − Tn )ek = lim k(Tm − Tn )ek ≤ lim kTm − Tn k2 ≤ ε.
m m
e∈E e∈E

2
Since E was arbitrary, kT − Tn k2 ≤ ε. Therefore, T = T − Tn + Tn ∈ B2 (H, K) and Tn → T
in the Hilbert-Schmidt norm, proving that B2 (H, K) is a Banach space.
Now let T ∈ B00 (H, K) and choose an orthonormal basis {f1 , . . . , fn } in ran T . Then
by Parseval’s identity,
X n
XX n X
X n
X
2 2
kT ek = | (T e | f j ) |2 = | (e | T ∗ f j ) |2 = kT ∗ f j k < ∞.
e∈E e∈E j=1 j=1 e∈E j=1

Therefore, B00 (H, K) ⊆ B2 (H, K).


If T ∈ B2 (H, K), then there exists a sequence (en ) ⊆ E such that T e = 0 for all
e ∈ E \ (en ). If Tn ∈ B00 (H, K) is defined so that

Tn ej = T ej , 1 ≤ j ≤ n, and Tn e = 0 for e ∈ E \ {e1 , . . . , en },


2 P 2
then limn kT − Tn k2 = limn j>n kT ej k = 0, hence also limn kT − Tn k = 0 and so
T ∈ B0 (H, K).

The Hilbert-Schmidt Inner Product


Let S, T ∈ B2 (H, K). The Hilbert-Schmidt inner product of S and T is defined by
X X
(S | T ) := (Se | T e) = (T ∗ Se | e). (12.13)
e∈E e∈E

12.4.3 Proposition. (S | T ) is independent of the orthonormal basis and is a well-defined


inner product on B2 (H, K) with associated norm k · k2 . Moreover,

(x ⊗ y | u ⊗ v) = (x | u) (y | v) , x, u ∈ H, y, v ∈ K. (12.14)
Operator Theory 305
P3 2
Proof. From the polarization identity 4(Se | T e) = k=0 ik Se + ik T e , we have
3
X X 3
k k k
 X
4(S | T ) = i (S + i T )e | (S + i T )e = ik kS + ik T k22 ,
k=0 e∈E k=0

which shows that the series in (12.13) converges absolutely and that the definition of (S | T )
is independent of the basis. The proof that (S | T ) is an inner product is straightforward.
For the verification of (12.14), note that the left side is
X  X 
(x ⊗ y)e | (u ⊗ v)e = (e | x) y | (e | u) v = (y | v) (u | x) .
e∈E e∈E

12.4.4 Proposition. The set G := {e ⊗ f : e ∈ E, f ∈ F} is an orthonormal basis for


B2 (H, K).
Proof. By (12.14), G is orthonormal. Let T ∈ G⊥ so that (T | e ⊗ f) = 0 for all e and f.
By sesquilinearity and joint continuity of the mapping (y, x) 7→ x ⊗ y,
X
x⊗y= (x | e) (e | y) e ⊗ f,
e,f

hence (T | x ⊗ y) = 0 for all x and y. Since B00 (H, K) is dense in B2 (H, K), (T | T ) = 0,
hence T = 0. Therefore, G is a basis.
12.4.5 Example. Let (X, F, µ) and (Y, G, ν) be measure spaces such that L2 (µ) and
L2 (ν) are separable
 with orthonormal bases (φn ) and (ψn ), respectively. We show that
B2 L2 (ν), L2 (µ) and L2 (µ ⊗ ν) are isomorphic as Hilbert spaces under a mapping U such
that U (f ⊗ g) = f g, where (f g)(x, y) = f (x)g(y).
The calculation
Z Z
(φm ψn | φj ψk ) = (φm ψn ) (φj ψk ) = φm (x)ψn (y)φj (x)ψk (y) dµ(x) dν(y)
ZX×Y Z X×Y

= φm φj · ψn ψk = (φm | φj ) (ψn | ψk )
X Y
= (φm ⊗ ψn | φj ⊗ ψk )
shows that the image (φm ψn )m,n under U of the orthonormal basis (φm ⊗ ψn )m,n is an
orthonormal set. It remains then to show that the set is complete.
Let f ∈ L2 (µ ⊗ ν) such that for all m, n,
ZZ

0 = φm ψ n | f = φm (x)ψn (y) f (x, y) dµ(x) dν(y)
Z Z −
= ψn (y) φm (x) f (x, y) dµ(x) dν(y). (†)

By Fubini’s theorem, ZZ
2
|f (x, y)|2 dµ(x) dν(y) = kf k2 < ∞,

hence f (·, y) ∈ L2 (µ) for a.a. y. For such y, by the CBS inequality
Z Z

φm (x)f (x, y) dµ(x) ≤ |φm (x) |f (x, y)| dµ(x) ≤ kφm k kf (·, y)k < ∞.
2 2

Thus the inner integral in (†) is an L2 function of y and so must be zero, by the completeness of
(ψn )n . Using the completeness of (φm )m , we conclude that f = 0 a.e. Therefore, (φm ψn )m,n
is complete. ♦
306 Principles of Analysis

The Hilbert-Schmidt Operator A ⊗ B


The following construction is closely related to the tensor product of operators, a notion
that occurs frequently in physics, notably in quantum mechanics and electrodynamics. We
shall use the construction developed here in a crucial part of the proof of the Peter-Weyl
Theorem (16.5.18).
12.4.6 Theorem. Given A ∈ B(H) and B ∈ B(K), there exists a unique bounded linear
operator A ⊗ B on the Hilbert space B2 (H, K) such that

(A ⊗ B)(x ⊗ y) = (Ax) ⊗ (By), x ∈ H, y ∈ K. (12.15)

Proof. The mapping A ⊗ B is defined by

A ⊗ B : T → BT A∗ : B2 (H, K) → B2 (H, K).

Clearly A ⊗ B is linear in T and since kBT A∗ k2 ≤ kBk kT k2 kA∗ k (12.4.1), we see that
A ⊗ B is bounded with kA ⊗ Bk2 ≤ kAk kBk. By (12.5),

(A ⊗ B)(x ⊗ y) = B(x ⊗ y)A∗ = (Ax) ⊗ (By), x ∈ H, y ∈ K.

For uniqueness, simply note that a pair of bounded linear operators on B2 (H, K) that
agree on the set {x ⊗ y : x ∈ H, y ∈ K} must in fact be equal, since the span of this set
is dense in B2 (H, K) (12.4.2).
12.4.7 Proposition. The following properties hold:
(a) (A, B) → A ⊗ B is sesquilinear.

(b) (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).

(c) (A ⊗ B)∗ = A∗ ⊗ B ∗ .

(d) kA ⊗ Bk2 = kAk kBk.

(e) A ⊗ B is invertible iff both A and B are invertible, and then (A ⊗ B)−1 = A−1 ⊗ B −1 .

(f ) If A and B are unitary, then A ⊗ B is unitary.

Proof. Parts (a)–(c) follow from uniqueness and the properties of rank one operators. For
example,
 
[(A1 + A2 ) ⊗ B](x ⊗ y) = (A1 + A2 )x ⊗ (By) = (A1 x) ⊗ (By) + (A2 x) ⊗ (By)
= (A1 ⊗ B)(x ⊗ y) + (A2 ⊗ B)(x ⊗ y),
(A ⊗ B)(C ⊗ D)(x ⊗ y) = (A ⊗ B)(Cx ⊗ Dy) = (ACx) ⊗ (BDy)
= [(AC) ⊗ (BD)](x ⊗ y) and

 
(A ⊗ B) (x ⊗ y) | u ⊗ v = x ⊗ y | (Au) ⊗ (Bv) = (x | Au) (y | Bv)
= (A∗ x | u) (B ∗ y | v)

= (A∗ x) ⊗ (B ∗ y) | u ⊗ v .

For (d), we have already shown that kA ⊗ Bk2 ≤ kAk kBk. For the reverse inequality, let
kxk = kyk = 1. Then kx ⊗ yk = kxk kyk = 1, hence

kA ⊗ Bk2 ≥ kA ⊗ Bk ≥ k(A ⊗ B)(x ⊗ y)k = kAx ⊗ Byk = kAxk kByk .


Operator Theory 307

Taking the supremum over all such x and y yields kA ⊗ Bk2 ≥ kAk kBk.
For (e), if A and B are invertible, then (A ⊗ B)(A−1 ⊗ B −1 ) = AA−1 ⊗ BB −1 = I ⊗ I,
which is the identity operator in B2 (H, K). Conversely, suppose that A ⊗ B is invertible.
Then
I ⊗ I = (A ⊗ B)−1 (A ⊗ B) = (A ⊗ B)−1 (I ⊗ B)(A ⊗ I)
and
I ⊗ I = (A ⊗ B)(A ⊗ B)−1 = (A ⊗ I)(I ⊗ B)(A ⊗ B)−1 ,
hence A ⊗ I is invertible. Thus there exists c > 0 such that

kAxk kyk = k(A ⊗ I)(x ⊗ y)k ≥ c kx ⊗ yk = kxk kyk .

Taking y 6= 0 we see that kAxk ≥ c kxk for all x, which implies that A is injective. Since
(A ⊗ B)∗ is invertible and (A ⊗ B)∗ = A∗ ⊗ B ∗ , the same argument applied to A∗ ⊗ B ∗ shows
that A∗ is injective. Therefore, A is surjective and so is invertible. Similarly, B is invertible.
Finally, if A and B are unitary, then A∗ A = I and B ∗ B = I, hence (A∗ ⊗ B ∗ )(A ⊗ B) =
A A ⊗ B ∗ B = I ⊗ I and so A ⊗ B is unitary, proving (f).

Note that the converse of (f) is false. (Take A = (1/2)I and B = 2I.)

Hilbert-Schmidt Integral Operators


The prototype of the Hilbert-Schmidt operator is the Hilbert-Schmidt integral operator,
which has applications in integral and differential equations. (See, for example, [9].)
Let (X, F, µ) be a separable, σ-finite measure space and let k ∈ L2 (µ ⊗ µ). The Hilbert-
Schmidt integral operator K with kernel k is defined by
Z
Kf (x) = k(x, y)f (y) dµ(y), f ∈ L2 (µ). (12.16)

By Ex. 12.44, K is bounded with kKk ≤ kkk2 . We show in this subsection that K is a
Hilbert-Schmidt operator.
First, we show that K is compact. Let (φn )n be an orthonormal basis for L2 (µ) and
define φn φm on X × X by

(φn φm )(x, y) = φn (x)φm (y).

This is a slight variation of the definition given in (12.4.5), but still gives an orthonormal
basis for L2 (µ ⊗ µ). Thus we have the Fourier expansion

X
k= (k | φn φm ) φn φm .
m,n=1

Moreover,
ZZ
(Kφm | φn ) = k(x, y)φm (y)φn (x) dµ(y) dµ(x) = (k | φn φm ) ,

from which it follows that



X
| (Kφm | φn ) |2 < ∞. (†)
m,n=1

Now let Pn denote the orthogonal projection of L2 (µ) onto the span of {φ1 , . . . , φn } and
308 Principles of Analysis

set Kn = KPn + Pn K − Pn KPn . Then Kn has finite rank, hence to show K is compact it
suffices to show that Kn → K in operator norm. For f ∈ L2 (µ) and ck := (f | φk ) we have

X n
X
f= ck φ k , Pn f = ck φk ,
k=1 k=1
X∞ n
X
Kf = ck Kφk , KPn f = ck Kφk ,
k=1 k=1
Xn ∞
n X
X
Pn Kf = (Kf | φk ) φk = ci (Kφi | φk ) φk , and
k=1 k=1 i=1
Xn X n X n
Pn KPn f = (KPn f | φk ) φk = ci (Kφi | φk ) φk .
k=1 k=1 i=1

Thus
X n X
X
Kf − Kn f = (K − KPn )f + (Pn KPn − Pn K)f = ci Kφi − ci (Kφi | φk ) φk ,
i>n k=1 i>n

so for each j
X n X
X
(Kf − Kn f | φj ) = ci (Kφi | φj ) − ci (Kφi | φk ) (φk | φj ) .
i>n k=1 i>n
P
The right side is zero if j ≤ n and equals i>n ci (Kφi | φj ) otherwise. By the CBS inequality
in `2 (N) and, by Bessel’s inequality, for j > n we have
X X 2
X
| (Kf − Kn f | φj ) |2 ≤ |ci |2 | (Kφi | φj ) |2 ≤ kf k2 | (Kφi | φj ) |2 ,
i>n i>n i>n

hence
2
X 2
XX
kKf − Kn f k2 = | (Kf − Kn f | φj ) |2 ≤ kf k2 | (Kφi | φj ) |2 .
j>n j>n i>n

By (†), the term on the right tends to 0 as n → 0, hence kKn − Kk → 0. This shows that K
is compact.
To show that K is a Hilbert-Schmidt operator we use 12.3.10, which guarantees the
existence of orthonormal sequences (ϑn ) and (ψn ) in L2 (X) and λn ↓ 0 such that
X
Kf = λn (f | ϑn ) ψn , f ∈ L2 (X).
n
R
Now, for fixed x λn ψn (x) = Kϑn (x) = k(x, y)ϑn (y) dµ(y), the integral being a Fourier
coefficient of the function k(x, ·) with respect to the basis (ϑn ). By Bessel’s inequality,
X X Z
2
|Kϑn (x)|2 = |λn ψn (x)|2 ≤ kk(x, ·)k2 = |k(x, y)|2 dµ(y).
n n

Since this holds for a.a. x, integrating we obtain


X ZZ
2
kKϑn k22 ≤ |k(x, y)|2 dµ(y) dµ(x) = kkk2 < ∞.
n

Therefore, K is a Hilbert-Schmidt operator.


Operator Theory 309

Exercises
12.44 Let K be as in (12.16).
(a) Show that K is bounded with kKk ≤ kkk2 .
(b) Compute the adjoint of K. When is K self-adjoint?
(c) Let L be the Hilbert-Schmidt operator with kernel `. Find the kernel of LK. Give a condition
on the kernels of K and L that implies LK = KL.
(d) Use (b) and (c) to give a sufficient condition on k for K to be normal.
Z t
(e) Show that the Volterra operator (Kf )(t) = f (s) ds (t ∈ [0, 1]) is a Hilbert-Schmidt
0
integral operator on L2 [0, 1].

12.45 Show that the Hilbert-Schmidt inner product satisfies (T ∗ | S ∗ ) = (S | T ).

12.46 Show that T ∈ B2 (H) iff |T | ∈ B2 (H), in which case k |T | k2 = k T k2 .


P
12.47 Let T be
Pnormal 2and E an orthonormal basis such that T = e∈E αe (e ⊗ e). Show that
kT k2 = e∈E |αe | .

*12.5 Trace Class Operators


The Trace Norm
Let H be a complex Hilbert space with orthonormal basis E. The trace norm of an
operator T ∈ B(H) is defined by
X
kT k1 := (|T |e | e) . (12.17)
e∈E

If kT k1 < ∞, then T is said to be of trace class. The set of all trace class operators is
denoted by B1 (H). The calculation
X X 2
kT k1 = (|T |1/2 e | |T |1/2 e) = k |T |1/2 e k2 = |T |1/2 2
e∈E e∈E

shows that kT k1 is independent of the choice of orthonormal basis and that T ∈ B1 (H) iff
|T |1/2 ∈ B2 (H).
We show below that B1 (H) is a linear space and that k·k1 is indeed a norm on B1 (H).
First, we establish some preliminary results.

12.5.1 Proposition. T ∈ B1 (H) iff any one (hence both) of the following conditions holds:
(a) T = AB for some A, B ∈ B2 (H).

(b) |T | = AB for some A, B ∈ B2 (H).

Proof. Let T ∈ B1 (H). Using the polar decomposition U |T | of T we have T = AB with


A = U |T |1/2 and B = |T |1/2 . Since |T |1/2 ∈ B2 (H) and B2 (H) is an ideal, (a) holds. Since
|T | = |T |1/2 |T |1/2 , (b) also holds.
310 Principles of Analysis

Conversely, if (b) holds, then


!1/2  1/2
X X X 2
X 2
∗ ∗ ∗
kT k1 = (Be | A e) ≤ kBek kA ek ≤ kBek kA ek
e∈E e∈E e∈E e∈E
= kBk2 kA∗ k2 ,

hence T ∈ B1 (H).
Finally, if (a) holds, then using the polar decomposition of T again we have |T | = U ∗ T =

(U A)B (12.27), which gives (b).
12.5.2 Corollary. B00 (H) ⊆ B1 (H) ⊆ B2 (H) ⊆ B0 (H).
Proof. The second inclusion follows from the proposition and the fact that B2 (H) is an
algebra. For the first inclusion, let T ∈ B00 (H) and let T = U |T | be the polar decomposition
of T . From U ∗ T = |T | we see that ran |T | is finite dimensional. Thus we may choose an
orthonormal basis E for H so that some finite subset F is an orthonormal basis for ran |T |.
Since e ⊥ ran |T | for e ∈ E \ F , the sum in (12.17) is finite and so T ∈ B1 (H).
12.5.3 Theorem. B1 (H) is a self-adjoint ideal of B(H) and the trace norm is a norm.
Proof. Absolute homogeneity of k·k1 follows from Ex. 12.10. For the triangle inequality,
let S, T ∈ B1 (H) and let S = U |S|, T = V |T |, and S + T = W |S + T | be the polar
decompositions. Then

|S| = U ∗ S, |T | = V ∗ T and |S + T | = W ∗ (S + T ) = W ∗ U |S| + W ∗ V |T |,

hence for any e in E,

(|S + T |e | e) = ((W ∗ U |S|)e | e) + ((W ∗ V |T |) | e) e = (|S|e | U ∗ W e) + (|T |e | V ∗ W e)


 
= |S|1/2 e |S|1/2 U ∗ W e + |T |1/2 e |T |1/2 V ∗ W e .

Thus by the CBS inequality,



(|S + T |e | e) ≤ |S|1/2 e |S|1/2 U ∗ W e + |T |1/2 e |T |1/2 V ∗ W e .

Summing over a finite subset F of E we then have


X X X 1/2 1/2 ∗
(|S + T |e | e) ≤ |S|1/2 e |S|1/2 U ∗ W e + |T | e |T | V W e . (†)
e∈F e∈F e∈F

Applying the CBS inequality in `2 to the first sum in (†), we have


X   
X 1/2 2 1/2 X 1/2 ∗ 1/2
|S|1/2 e |S|1/2 U ∗ W e ≤ |S| e |S| U W e 2
e∈F e∈F e∈F
X 1/2  X 1/2
1/2 2 1/2 ∗
≤ |S| e |S| U W e 2
e∈E e∈E

= |S|1/2 2 |S|1/2 U ∗ W 2 .

By 12.4.1 and the fact that U and W are partial isometries,


1/2 ∗
|S| U W ≤ |S|1/2 kU ∗ W k ≤ |S|1/2 .
2 2 2
Operator Theory 311

Similarly, 1/2 ∗
|T | V W ≤ |T |1/2 kV ∗ W k ≤ |T |1/2 .
2 2 2
Since F was arbitrary, we obtain from (†) the triangle inequality

kS + T k1 ≤ |S|1/2 2 + |T |1/2 2 = kSk1 + kT k1 .

In particular, B1 (H) is a linear space.


Now let kT k1 = 0. Since |T | is compact and normal it is diagonalizable, hence there exists
an orthonormal basis F such that
X
|T |x = αf (x | f) f for all x ∈ H,
f

where
P the αf are the eigenvalues of |T | with corresponding eigenvectors f. Since
f∈F (|T |f | f) = kT k1 = 0 and the terms αf = (|T |f | f) are nonnegative, αf = 0
for all f. Therefore, |T | = 0 and so T = 0.
To show that B1 (H) is an ideal in B(H), let T ∈ B1 (H) and S ∈ B(H). By 12.5.1,
T = AB for some A, B ∈ B2 (H), hence T S = A(BS). Thus T is a product of members
of B2 (H), hence B1 (H)B(H) ⊆ B1 (H). Similarly B(H)B1 (H) ⊆ B1 (H). Therefore,
B1 (H) is an ideal of B(H). Since T ∗ = B ∗ A∗ and A∗ , B ∗ ∈ B2 (H), T ∗ ∈ B1 (H).
Therefore, B1 (H) is self-adjoint.

The Trace
The trace tr T of T ∈ B1 (H) is defined in terms of the orthonormal basis E by
X
tr T := (T e | e) . (12.18)
e∈E

The following proposition shows that tr T is well-defined and independent of the basis.
P
12.5.4 Proposition. For T ∈ B1 (H), the sum e∈E (T e | e) converges absolutely. More-
over,
tr(B ∗ A) = (A | B) A, B ∈ B2 (H), (12.19)
where the right side is the Hilbert-Schmidt inner product of A and B.
Proof. By 12.5.1, T = B ∗ A, where A, B ∈ B2 (H). Then
1 2 1 2
| (T e | e) | = | (Ae | Be) | ≤ kAek kBek ≤ 2 kAek + 2 kBek .

Summing over e ∈ E we have


X X 2
X 2 2 2
| (T e | e) | ≤ 12 kAek + 1
2 kBek = 1
2 kAk2 + 1
2 kBk2 < ∞.
e∈E e∈E e∈E

This proves the first assertion of the proposition. The second assertion follows directly from
the definition of the trace and the Hilbert-Schmidt inner product.
Here are additional noteworthy properties of the trace and the trace norm.
12.5.5 Theorem. Let T ∈ B1 (H), S ∈ B(H). Then
(a) tr(·) is a linear functional on B1 (H) and is positive, that is, T ≥ 0 ⇒ tr T ≥ 0.

(b) tr T ∗ = tr T ( =: tr T ).
312 Principles of Analysis

(c) tr(ST ) = tr(T S).

(d) |tr(ST )| ≤ kSk kT k1 . In particular, tr(·) is k·k1 -continuous.

(e) kT ∗ k1 = kT k1 .

(f ) kST k1 ≤ kSk kT k1 and kT Sk1 ≤ kSk kT k1 .

(g) kT k ≤ kT k1 , hence if S ∈ B1 (H), then kST k1 ≤ kSk1 kT k1 .


P P
(h) If T is normal, so that T = e∈E αe (e ⊗ e) (T e = αe e), then kT k1 = e∈E |αe |.
Proof. Part (a) is clear. By 12.5.3, the left side of (b) is defined. A simple calculation shows
that the equality holds.
For (c), note first that by Ex. 12.45, (12.19), and (b),

tr(D∗ C) = (C | D) = (D∗ | C ∗ ) = tr(CD∗ ) = tr[(DC ∗ )∗ ] = tr(DC ∗ ), C, D ∈ B2 (H).

Now write T = B ∗ A for some A, B ∈ B2 (H) (12.5.1). Then


    
tr(ST ) = tr (SB ∗ )A = tr (BS ∗ )A∗ = tr B(S ∗ A∗ ) = tr B ∗ (AS) = tr T S .

For (d), let T = U |T | be the polar decomposition of T . For any e ∈ E,



| (ST e | e) | = | (|T |e | U ∗ S ∗ e) | = |T |1/2 e |T |1/2 U ∗ S ∗ e ≤ k |T |1/2 ek k |T |1/2 U ∗ S ∗ ek,

and summing over a finite subset F of E we have


X X
| (ST e | e) | ≤ k |T |1/2 ek k |T |1/2 U ∗ S ∗ ek
e∈F e∈F
X 1/2  X 1/2
1/2 2 1/2 ∗ ∗ 2
≤ k |T | ek k |T | U S ek .
e∈F e∈F

Since F was arbitrary, we see that


2
|tr(ST )| ≤ |T |1/2 2 |T |1/2 U ∗ S ∗ 2 ≤ |T |1/2 2 kU ∗ S ∗ k,
2
the last inequality from 12.4.1. Finally, |T |1/2 2 = kT k1 and kU ∗ S ∗ k ≤ kU ∗ k kS ∗ k ≤ kS ∗ k,
completing the proof of (d).
For (e), (f), (g), and (h) let T = U |T | be the polar decomposition of T . From (c) and the
equalities U |T |U ∗ = |T ∗ | and U ∗ U |T | = |T | (Ex. 12.27) we have

kT ∗ k1 = tr |T ∗ | = tr(U |T |U ∗ ) = tr(U ∗ U |T |) = tr |T | = kT k1 ,

proving (e).
For (f), let ST = V |ST | be the polar decompositions of ST . Then |ST | = V ∗ ST =
V SU |T |, hence, by (d), kST k1 = tr(V ∗ SU |T |) ≤ kV ∗ SU k k|T |k1 ≤ kSk k|T |k1 . Using this

result and (e) we also have kT Sk1 = kS ∗ T ∗ k1 ≤ kS ∗ k kT ∗ k1 = kSk kT k1 .


To prove (g), note that since |T | is compact and normal it is diagonalizable:
X
|T |x = αe (x | e) e, |T |e = αe e, αe ≥ 0.
e∈E

Then for kxk ≤ 1, X X


(|T |x | x) = αe | (x | e) |2 ≤ αe ,
e∈E e∈E
Operator Theory 313

and using 12.1.6 we have


X X
kT k = kU |T |k ≤ k |T | k ≤ αe = (|T |e | e) = kT k1 .
e∈E e∈E

The verification of (h) is left as an exercise (12.49).


12.5.6 Theorem. B1 (H) is a Banach algebra in the trace norm.

Proof. By 12.5.5(g), B1 (H) is a normed algebra. To show completeness, let (Tn ) be a


Cauchy sequence in B1 (H) with respect to k·k1 . Then (Tn ) is Cauchy with respect to the
operator norm (12.5.5(g)), hence there exists T ∈ B(H) such that kTn − T k → 0. Given
ε > 0, choose N so that kTm − Tn k1 < ε for all m, n ≥ N . Let F ⊆ E be finite and let P be
the projection of H onto the span of F . For a fixed n ≥ N , let T − Tn = U |T − Tn | be the
polar decomposition of T − Tn . Then
X X X

(|T − Tn |e | e) = (U ∗ (T − Tn )e | e) = lim (U ∗ (Tm − Tn )e | e)
m
e∈F e∈F e∈F
X

= lim (U ∗ (Tm − Tn )e | P e) = lim |tr(P U ∗ (Tm − Tn ))|
m m
e∈E

≤ lim kTm − Tn k1 ≤ ε,
m

the inequality by 12.5.5(d). Since F was arbitrary, kT − Tn k1 = tr(|T − Tn |) ≤ ε.

The Dual Spaces B0 (H)0 and B1 (H)0


Define ΨA (T ) := tr(T A), where either

(1) A ∈ B1 (H) and T ∈ B0 (H) or (2) A ∈ B(H) and T ∈ B1 (H).

In (1), |ΨA (T )| ≤ kAk1 kT k and in (2) |ΨA (T )| ≤ kT k1 kAk (12.5.5). Thus we obtain linear
mappings

(10 ) Ψ : A → ΨA : B1 (H) → B0 (H)0 , kΨA k ≤ kAk1 , and


(20 ) Ψ : A → ΨA : B(H) → B1 (H)0 , kΨA k ≤ kAk .

The next theorem uses the map Ψ to identify B1 (H) with B0 (H)0 and B(H) with B1 (H)0 .
12.5.7 Theorem. The mappings Ψ in (10 ) and (20 ) are isometric isomorphisms.
Proof. We follow the treatment in [36]. For (10 ) we need to prove that kAk1 ≤ kΨA k and
that every member ψ of B0 (H)0 is of the form ΨA for some A ∈ B1 (H).
Since kSk ≤ kSk2 for S ∈ B2 (H) ⊆ B0 (H), ψ restricted to B2 (H) is a member of
B2 (H)0 . By the Riesz representation theorem, there exists a T in the Hilbert space B2 (H)
such that ψ(·) = (·, T ). Set A := T ∗ , so that ψ(S) = tr(AS) for all S ∈ B2 (H). It remains
to show that A ∈ B1 (H). For this let A = U |A| be the polar decomposition of A. If F ⊆ E
is finite and P is the projection of H onto the span of F , then
X X X
(|A|e | e) = (U ∗ Ae | e) = (P U ∗ Ae | e) = tr(P U ∗ A) = tr(AP U ∗ ) = ψ(P U ∗ ).
e∈F e∈F e∈E
P
Since |ψ(P U ∗ )| ≤ kψk kP U ∗ k ≤ kψk, we have e∈F (|A|e | e) ≤ kψk for all finite F .
Therefore kAk1 < ∞, completing the proof of the first part of the theorem.
314 Principles of Analysis

For (2’), we need to prove that kAk ≤ kΨA k and that every member ψ of B1 (H)0 is of
the form ΨA for some A ∈ B(H). Now, for any x, y ∈ H, by direct calculation we have
2
(y ⊗ x)∗ = x ⊗ y and (x ⊗ y)(y ⊗ x) = kxk (y ⊗ y), (†)

hence  1/2  1/2


2
|y ⊗ x| = (y ⊗ x)∗ (y ⊗ x) = kxk (y ⊗ y) . (‡)
−1 −1 
The operator P := kyk kxk kyk y ⊗ kyk y is positive, and it is easily verified that
2  2 1/2
P 2 = kxk (y ⊗ y). Thus by uniqueness of positive square roots, P = kxk (y ⊗ y)
and so by (‡)
−1 −1 
|y ⊗ x| = kyk kxk kyk y ⊗ kyk y .
Now define a sesquilinear map B(x, y) = ψ(y ⊗ x). We then have
−1
|B(x, y)| ≤ kψk ky ⊗ xk1 = kψk tr |y ⊗ x| = kxk kyk kψk tr(y ⊗ y) = kxk kyk kψk ,

the last equality from the calculation


X X
tr(u ⊗ v) = ((u ⊗ v)e | e) = (e | u) (v | e) = (v | u) .
e∈E e∈E

Therefore, B is bounded with kBk ≤ kψk. By 11.4.1, there exists an operator S ∈ B(H)
with kSk = kBk ≤ kψk such that

ψ(y ⊗ x) = B(x, y) = (x | Sy) x, y ∈ H.

Now, if T ∈ B1 (H) is self-adjoint, then T may be expressed as in 12.5.5(h), hence setting


A = S ∗ we have kAk ≤ kψk and
X X X X
ψ(T ) = αe ψ(e ⊗ e) = αe (e | Se) = (T e | Se) = (AT e | e) = ΨA (T ),
e∈E e∈E e∈E e∈E

the last equality from 12.5.5(c). Since every operator is a linear combination of self-adjoint
operators, ψ = ΨA .

Exercises
12.48 If H is finite dimensional, show that tr(T ) is the sum of the diagonal elements of the matrix of
T relative to any basis.

12.49 Verify part (h) of 12.5.5.

12.50 Verify the equations in (†) in the proof of 12.5.7.


Chapter 13
Banach Algebras

13.1 Introduction
In this chapter we develop the essential properties of commutative Banach algebras. The
main goal is the Gelfand representation theorem, which asserts that such an algebra may be
represented as the algebra of continuous functions on some topological space. Applications
to operator theory, including the spectral theorem for normal operators, are given in §13.6.

Definitions and Examples


Recall that a Banach algebra is an algebra A over C which is a Banach space relative to
a norm that satisfies
kxyk ≤ kxk kyk for all x, y ∈ A.
The inequality implies that multiplication in a Banach algebra is jointly continuous (Ex. 13.1).
An involution on a Banach algebra A is a mapping x 7→ x∗ on A with the properties

(x + y)∗ = x∗ + y∗ , (cx)∗ = cx∗ , (xy)∗ = y∗ x∗ , x = x∗∗ , and kx∗ k = kx∗ k .

Note that if A is unital with identity e, then

e∗ = ee∗ = (ee∗ )∗ = e∗∗ = e.

A Banach algebra with an involution is called a Banach ∗-algebra. A Banach ∗-algebra


whose norm satisfies
2
kxx∗ k = kxk
2
is called C ∗ -algebra. Note that the identity e in a unital C ∗ -algebra A satisfies kek =
kee∗ k = kek , hence if A is nontrivial then kek = 1.
A homomorphism from a Banach algebra A into a Banach algebra B is bounded linear
transformation ϕ : A → B such that

ϕ(xy) = ϕ(x)ϕ(y) for all x, y ∈ A.

If A and B are Banach ∗-algebras and ϕ(x∗ ) = ϕ(x)∗ for all x ∈ A, then ϕ is called a
∗-homomorphism.
Recall that an ideal I of a Banach algebra A is a linear subspace such that xy, yx ∈ A
for all x ∈ A and y ∈ I. If I = 6 A, then I is called a proper ideal. If I is closed, then
A/I is a Banach algebra under multiplication (x + I)(y + I) = xy + I, and the quotient
map is an algebra homomorphism (Ex. 13.7). Quotient algebras will be of considerable
importance later in connection with maximal ideals and characters of a Banach algebra.
We have seen several examples of Banach algebras and C ∗ -algebras throughout the text.
For convenience, we include some of these in the following list.

315
316 Principles of Analysis

13.1.1 Examples.
(a) If X is a (nontrivial) Banach space, then B(X) is a unital, noncommutative Banach
algebra under the operator norm and with respect to operator composition.
(b) If H is a Hilbert space, then B(H) is a C ∗ -algebra, where involution is the adjoint
operation. The spaces B00 (H), B0 (H), B1 (H), and B2 (H) are ideals of B(H).
(c) If X is a set, then B(X) is a unital, commutative C ∗ -algebra with involution f → f .
(d) If X is a topological space, then Cb (X) is a unital, commutative C ∗ -subalgebra of B(X).
(e) If X is a noncompact, locally compact, Hausdorff topological space, then C0 (X) is a
non-unital C ∗ -subalgebra of Cb (X).
(f) `1 group algebra. The space `1 (Z) of all bilateral sequences
P∞ x = (. . . , x−1 , x0 , x1 , . . .) is a
commutative Banach ∗-algebra under the norm kxk1 := k=−∞ |xk | < ∞ with convolution
product x ∗ y and involution x∗ defined by

X
(x ∗ y)(n) = xn−k yk , and x∗ (n) = x∗ (−n).
k=−∞

0 2
Moreover, `1 (Z) has identity e0 := (. . . , 0, 1, 0, . . .). In general kx∗ ∗ x∗ k 6= kxk , hence
`1 (Z) not a C ∗ -algebra (Ex. 13.2).
(g) L1 group algebra. The space L1 (Rd ) is a commutative, non-unital Banach ∗-algebra
under convolution f ∗ g and involution f ∗ defined by
Z
f ∗ g(x) = f (x − y)g(y) dy, and f ∗ (x) = f (−x).

(h) Measure algebra. The space M (Rd ) of complex Borel measures on Rd with the total
variation norm is a commutative Banach algebra under convolution.

The Group of Invertible Elements


A member x of a unital Banach algebra A is said to be invertible if there exists an
element y ∈ A, called the inverse of x, such that xy = yx = e. In this case, y is unique
and is denoted by x−1 . If A is a ∗-algebra and x ∈ A is invertible, then x∗ is invertible and
(x∗ )−1 = (x−1 )∗ , as may be seen from the calculation e = e∗ = (xx−1 )∗ = (x−1 )∗ x∗ .
The set of invertible elements G = GA of A is easily seen to be a group with identity e.
The following theorem shows that members of A sufficiently near e are invertible, a fact
that is of critical importance in spectral theory.
P∞
13.1.2 Theorem (C. Neumann). If ke − xk < 1, then x ∈ G and x−1 = n=0 (e − x)n .
n
P∞ Let
Proof. y := e − x. By induction, kyn k ≤ kyk . Since kyk < 1, the series s :=
n=0 Py , where y0 := e, converges absolutely and hence converges. Since the partial sum
n
n
sn := k=0 yk satisfies
n
X n+1
X
sn (e − y) = yk − yk = e − yn+1
k=0 k=1
n+1
and since yn+1 ≤ kyk → 0, we see that sx = s(e − y) = limn sn (e − y) = e.
Similarly, xs = e. Therefore, s = x−1 .
Banach Algebras 317

13.1.3 Corollary. If x ∈ A and z ∈ C with |z| > kxk, then ze − x is invertible and

X
(ze − x)−1 = z −n−1 xn .
n=0

(e − z −1 x) = z −1 x has norm less than one, e − z −1 x is invertible and


Proof. Since e −P

(e − z x) = n=0 z −n xn . Multiplying by z −1 yields the desired expansion.
−1 −1

13.1.4 Theorem. The group G of invertible elements in A is open and the map x → x−1
on G is continuous.
−1
Proof. Let x0 ∈ G and set r = x−1
0
. Then G contains the open ball Br (x0 ). Indeed, if
kx − x0 k < r, then

xx−1 − e = (x − x0 )x−1 ≤ kx − x0 k kx−1 k < 1,
0 0 0

hence xx−1 −1
0 is invertible. Denoting the inverse by y and setting a = x0 y, we see that
−1
xa = xx0 y = e. A similar argument produces an element b such that bx = e. Thus
x is invertible, verifying the claim and proving that G is open.
To show continuity
P∞of the inverse at e, let xn → e in G. By 13.1.2, for sufficiently large
n we have x−1n = k=0 (e − x n )k
, hence for 0 < ε < 1 and ke − xn k < ε,

−1 X k ε
xn − e ≤ ke − xn k ≤ .
1−ε
k=1

Therefore limn x−1
n − e ≤ ε(1 − ε)
−1
and letting ε → 0 shows that limn x−1
n = e.
In the general case, let xn → x in G. Then xn x−1 → e, hence, by the preceding
paragraph, xx−1n = (xn x
−1 −1
) → e and so x−1n →x
−1
.

The Cauchy Product of Series


P∞ P∞
The Cauchy product of series n=0 an and n=0 b n in a Banach algebra A is the
series
X∞ n
X
cn , where cn = ak b n−k .
n=0 k=0
The following result for numerical series is due to Mertens. The proof in the general setting
of Banach algebras is the same.
P∞ P∞
13.1.5 Proposition. If the series A := n=0 an and B := n=0 b n converge in A and
at least one of the series converges absolutely, then the Cauchy product C converges and
C = AB.
P∞
Proof. Assume that n=0 an converges absolutely. Let
n
X n
X n
X ∞
X
An = ak , B n = b k , Cn = ck , and α = kan k .
k=0 k=0 k=0 n=0

Then

Cn = a0 b 0 + (a0 b 1 + a1 b 0 ) + · · · + (a0 b n + a1 b n−1 + · · · + an b 0 )


= a0 Bn + a1 Bn−1 + · · · + an B0
= a0 (Bn − B + B) + a1 (Bn−1 − B + B) + · · · + an (B0 − B + B)
= a0 (Bn − B) + a1 (Bn−1 − B) + · · · + an (B0 − B) + An B.
318 Principles of Analysis

Thus to show that Cn → AB it suffices to verify that

Xn := a0 (Bn − B) + a1 (Bn−1 − B) + · · · + an (B0 − B) → 0.

Given ε > 0, choose N such that

kBn − Bk < ε/2α for all n > N . (†)

Since kan k → 0, we may choose N 0 > N so that for all n > N 0

kan (B0 − B) + an−1 (B1 − B) + · · · + an−N (BN − B)k < ε/2. (‡)

For such n, from (†) and (‡) we have

kXn k ≤ kan (B0 − B) + an−1 (B1 − B) + · · · + an−N (BN − B)k


+ kan−N −1 k kBN +1 − Bk + kan−N −2 k kBN +2 − Bk + · · · + ka0 k kBn − Bk
< ε.

Exercises
13.1 Verify that multiplication in a Banach algebra is jointly continuous.

13.2 Verify that `1 (Z) is a Banach ∗-algebra but not a C ∗ -algebra.

13.3 (Banach algebra generated by x and e). Let A be a commutative unital Banach algebra and
let x ∈ A. Show that the intersection B ofP all closed subalgebras of A containing x and e is
the closure of the set P of all polynomials n j 0
j=0 cj x in x, where x := e.

13.4 The commutant of a nonempty subset E of a unital Banach algebra A is the set E 0 :=
{x : xy = yx ∀ y ∈ E}. The bicommutant E 00 of E is the commutant of the commutant:
E 00 = (E 0 )0 . Show that E 0 is a closed unital subalgebra of A. Show also that if x ∈ GA , then
x−1 ∈ {x}00 .

13.5 Let X, Y be topological spaces and τ : Y → X a continuous function. Show that ϕ(f ) := f ◦ τ
defines a ∗-homomorphism from Cb (X) into Cb (Y ).

13.6 When is the dilation operator (Dr f )(x) = f (rx) (r > 0), a homomorphism on the group algebra
L1 (Rd )?

13.7 Let A be a Banach algebra and I a closed ideal in C. Show that the Banach space A/I is
a Banach algebra under multiplication (x + I)(y + I) = xy + I and that the quotient
map Q is a homomorphism. Show also that if A is a Banach ∗-algebra and I is closed under
involution, then A/I is a Banach ∗algebra under involution (x + I)∗ = x∗ + I and Q is a
∗-homomorphism.

13.8 [↓ 13.3.4] Let I be a proper ideal of a unital Banach algebra. Show that cl I is a proper ideal.

13.9 [↑ 8.46] Show that the space C n [0, 1] of n-times continuously differentiable functions on [0, 1] is
a Banach algebra with the norm kf k = n (k)
P
k=0 kf k∞ .

13.10 Let X be a compact Hausdorff space, µ a probability Radon measureR on X and k : X × X → C


continuous and never zero. Define K on L1 (µ) by Kf (x) = f (x) + λ X k(x, y)f (y) dµ(y), where
|λ| < kkk−1 1

∞ . Prove that K ∈ B L (µ) and is invertible. R If Kf = g show that for each x ∈ X
there exists a Borel measure µx on X such that f (x) = X g dµx .

13.11 Let A and B be unital Banach algebras and Φ : A → B a homomorphism that maps identity
onto identity. Show that Φ(GA ) ⊆ GB .
Banach Algebras 319
2 2πint
P∞ on L [0, 1], set en (t) = e
13.12 Let A denote the Banach algebra of bounded linear operators ,
and define T ∈ A so that T en = en+1 , that is, T x = n=−∞ (x | en ) en+1 . Let B be the
Banach algebra generated by T and I. Show that T ∈ GA \ GB .

−1A be a unital Banach algebra and (xn ) ⊆ GA such that xn → x 6∈ GA . Show that
13.13 Let
xn → ∞.

13.14 [↑ 8.1.2] (Disk algebra). Let A(D) denote algebra of all bounded continuous functions on the
closed unit disk cl(D) that are analytic on D. Show that A(D) is a unital commutative C ∗ -algebra
with respect to the sup norm and involution f ∗ (z) = f (z).

13.15 [↑ 6.4, 7.1.6] Show that the set of all measures µ ∈ M (Rd ) with µ  λ is an ideal in M (Rd ).

13.16 (Arens multiplication). Let A be a Banach algebra. For f in the dual space A 0 and x ∈ A,
define x f ∈ A 0 by x f (y) = f (xy). Next, for F, G in the bidual A 00 and f ∈ A 0 define Gf ∈ A 0
by Gf (x) = G(x f ) and F G ∈ A 00 by F G(f ) = F (Gf ). Show that A 00 is a Banach algebra under
the multiplication (F, G) 7→ F G and that the canonical embedding x 7→ x b is a homomorphism.

13.17 (`1 semigroup algebra). Show +


P that `(Z ) is a non-unital commutative Banach algebra under
convolution (f ∗ g)(n) = j+k=n f (j)g(k). Determine whether the shift operators Tr and T`
are homomorphisms on `(Z+ ).

13.2 Spectral Theory

In this section, A denotes a unital Banach algebra.

The Spectrum of an Element


Let x ∈ A. The following definitions are used throughout spectral theory:

• The spectrum of x: σ(x) = {z ∈ C : ze − x is not invertible}.


• The spectral radius of x: r(x) = sup{|z| : z ∈ σ(x)}.
• The resolvent set of x: ρ(x) = C \ σ(x) = {z ∈ C : ze − x is invertible}.
13.2.1 Proposition. σ(x) is compact and r(x) ≤ kxk. In particular,

{z ∈ C : |z| > kxk} ⊆ {z ∈ C : |z| > r(x} ⊆ ρ(x).

Proof. By 13.1.3, if ze − x is not invertible, then |z| ≤ kxk. Therefore, σ(x) is bounded
and r(x) ≤ kxk. Since the mapping f (z) = ze − x is continuous and ρ(x) = f −1 (GA ) is
open, σ(x) is closed.

The following lemma will be used to prove the key property that σ(x) 6= ∅.
13.2.2 Lemma. Let x ∈ A and ϕ ∈ A 0 . Define f on the open set ρ(x) by


f (z) = (ze − x)−1 , ϕ . (13.1)


Then f is analytic on ρ(x) and f 0 (z) = − [ze − x]−2 , ϕ .
320 Principles of Analysis

Proof. Fix z ∈ ρ(x). For h sufficiently small, take a = (z + h)e − x and b = ze − x in


the identity a−1 − b −1 = a−1 (b − a)b −1 to obtain
 −1  −1  −1  −1
(z + h)e − x − ze − x = (z + h)e − x (−he) ze − x ,

from which follows


f (z + h) − f (z) D −1  −1 E
= − (z + h)e − x ze − x , ϕ .
h
Letting h
→ 0 and using the continuity of the inversion map we see that f 0 (z) exists and
equals − [ze − x]−2 , ϕ . Since the latter is continuous in z ∈ ρ(x), f is analytic.
13.2.3 Theorem. σ(x) is nonempty.
Proof. If σ(x) = ∅, then the mapping f in (13.1) is entire. Moreover, for |z| > kxk and
kϕk ≤ 1,


X n 1
|f (z)| = | (ze − x)−1 , ϕ | ≤ kϕk (ze − x)−1 ≤ |z|−n−1 kxk = ,
n=0
|z| − kxk

hence lim|z|→∞ |f (z)| = 0. By Liouville’s theorem, f is identically zero. Since ϕ was arbitrary,
(ze − x)−1 is zero for all z, impossible.
13.2.4 Theorem (Gelfand-Mazur). If A is a division algebra (that is, every nonzero element
in A is invertible), then A = Ce.
Proof. Let x ∈ A and z ∈ σ(x). Then ze − x is not invertible and so equals 0.

The Spectral Radius Formula


The theorem in this subsection gives an important formula for r(x). The formula has
already played a key role in the proof of the spectral theorem for compact normal operators
(§12.3) and will figure prominently in the proof of the Gelfand representation theorem later.
To establish the formula we need
P∞ n
13.2.5 Lemma. Let g(z) = n=0 an z be analytic
P∞ in a region that contains the disk
Dr := {z ∈ C : |z| ≤ r}. For x ∈ X, set g(x) := n=0 an xn . Then

g(σ(x) ∩ Dr ) ⊆ σ(g(x)). (13.2)

Proof. Note first that for kxk ≤ r the series g(x) is absolutely convergent, hence converges.
Now let |z| ≤ r. From the identity

(z n e − xn ) = (ze − x)yn , yn := z n−1 e + z n−2 x + · · · + zxn−2 + xn−1 ,

we have

X ∞
X
g(z)e − g(x) = an (z n e − xn ) = (ze − x) an yn .
n=1 n=1

Since kyn k ≤ nrn−1 , the series on the right converges to some y ∈ A which commutes with
(ze − x), that is,
g(z)e − g(x) = (ze − x)y = y(ze − x).
Thus if g(z)e − g(x) is invertible, then so is ze − x, verifying (13.2).
Banach Algebras 321
1/n
13.2.6 Theorem. r(x) = limn kxn k .
1/n
Proof. By 13.2.5, z ∈ σ(x) ⇒ z n ∈ σ(xn ) ⇒ |z n | ≤ kxn k ⇒ |z| ≤ kxn k . Therefore,
1/n
r(x) ≤ limn kxn k .
1/n
To see that limn kxn k ≤ r(x), notePfirst that if |z|
> kxk, then the function f in (13.1)

with kϕk ≤ 1 is well-defined and f (z) = k=0 xk , ϕ z −k−1 . By 13.2.2, f (z) is analytic on
the larger set |z| > r(x). It follows that the preceding Laurent series expansion for f is valid
for |z| > r(x) and converges uniformly on |z| ≥ r for any r > r(x). Multiplying the series
expansion by z n+1 and integrating term by term along the contour z = reiθ yields
Z 2π ∞
X Z 2π

k n−k
rn+1 ei(n+1)θ f (reiθ ) dθ = x ,ϕ r ei(n−k)θ dθ = 2π hxn , ϕi .
0 k=0 0


Now set s := supθ k(reiθ e − x)−1 k. Noting from (13.1) that |f (z)| ≤ (ze − x)−1 , we
have
Z
1 2π n+1 i(n+1)θ
n
| hx , ϕi | = r e f (re ) dθ ≤ rn+1 sup |f (reiθ )| ≤ rn+1 s.

2π 0 θ

1/n
Since ϕ was arbitrary, kxn k ≤ rn+1 s. Thus limn kxn k ≤ r, and since r > r(x) was
1/n
arbitrary, limn kxn k ≤ r(x), as required.

Normal Elements in a C ∗ -Algebra A


The following are generalizations of definitions given earlier for operators on a Hilbert
space. An element x in A is said to be
(a) normal if x∗ x = xx∗ .
(b) self-adjoint if x∗ = x.
(c) positive, written x ≥ 0, if x = y∗ y for some y ∈ A.
(d) unitary if A ∗ is unital and x∗ x = xx∗ = e.
(e) a projection if x2 = x = x∗ .
2
Note that for a unitary element x, 1 = kek = kxx∗ k = kxk , hence kxk = 1.
If A is a unital C ∗ -algebra and x ∈ A is normal, we denote by C ∗ (x) the unital
commutative C ∗ -algebra generated by x, x∗ , and e. Thus C ∗ (x) is the closure in A of the
algebra of polynomials in x and x∗ .

13.2.7 Proposition. If x is self-adjoint, then the series


X∞
1
exp(ix) := (ix)n
n=0
n!

converges and is unitary.


Proof. The series clearly converges absolutely, hence converges. Set u := exp(ix). By
continuity of involution,
X∞ X∞
1 1
u∗ := (−ix∗ )n = (−ix)n .
n=0
n! n=0
n!
322 Principles of Analysis
P∞
Let n=0 vn be the Cauchy product of series for u and u∗ . Then v0 = e, and for n ≥ 1
n
X n  
1 k 1 n−k (ix)n X n
vn = (ix) (−ix) = (−1)n−k = 0.
k! (n − k)! n! k
k=0 k=0
P∞
Therefore, uu∗ = u∗ u∗ = n=0 vn = e.
13.2.8 Theorem. Let A be a unital C ∗ -algebra. The following hold for a member x of A.
(a) σ(x∗ ) = σ(x).

(b) If x is unitary, then σ(x) ⊆ T.

(c) If x is self-adjoint, then σ(x) ⊆ R.

Proof. (a) z ∈ σ(x∗ ) iff ze − x∗ is not invertible iff ze − x = (ze − x∗ )∗ is not invertible
iff z ∈ σ(x).
(b) If z ∈ σ(x), then z ∈ σ(x∗ ) by (a). Since x∗ = x−1 , z −1 ∈ σ(x) by Ex. 13.21.
Therefore, |z| and |z −1 | are both ≤ kxk = 1 and so |z| = 1.
(c) If x is self-adjoint, then exp(ix) is unitary (13.2.7), hence σ(exp(ix)) ⊆ T by (b).
Now let z ∈ σ(x). By 13.2.5, eiz ∈ σ(exp(ix)). Since |eiz | = 1, z ∈ R.
For a normal element x, the converses of (b) and (c) hold (Ex. 13.37). Moreover, if x is
self-adjoint, then x ≥ 0 iff σ(x) ⊆ R+ (Ex. 13.51). The proofs use the functional calculus
developed in §13.6.
13.2.9 Proposition. If x ∈ A is normal, then kxk = r(x).
2 n 2n
Proof. If x is self-adjoint, then x2 = kxk ; iterating yields x2 = kxk . In the
general case, apply this result to the self-adjoint element x∗ x using kx∗ xk = kxk2 to
obtain
2n+1 2n n n n n
kxk = kx∗ xk = k(x∗ x)2 k = k(x2 )∗ x2 k = kx2 k2 .
The assertion now follows from 13.2.6.
Here is an application of 13.2.9 to normal operators. The formula for the special case of a
self-adjoint operator was proved in 12.1.6.
13.2.10 Corollary. Let H be a complex Hilbert and T ∈ B(H) normal. Then

kT k = sup{| (T x | x) | : kxk = 1}.

Proof. Let s denote the supremum. By 13.2.9, we may choose λ ∈ σ(T ) such that |λ| = kT k.
By 12.3.7, there exists a sequence (xn ) with unit norm such that kT xn − λxn k → 0. Then

(T xn | xn ) = (T xn − λxn | xn ) + (λxn | xn ) = (T xn − λxn | xn ) + λ → λ.

Therefore, s ≥ | (T xn | xn ) | → |λ| = kT k ≥ s.
Banach Algebras 323

Exercises
13.18 [↓ 13.6.1] Let A and B be unital Banach algebras and Φ : A → B a homomorphism that maps
the identity onto the identity. Show that σ Φ(x) ⊆ σ(x).

13.19 Let A and B be unital C ∗ algebras and Φ : A → B a ∗-homomorphism that maps identity
onto identity. Show that kΦ(x)k ≤ kxk and hence that Φ is continuous. JConsider r(x∗ x) and
r(Φ(x∗ )Φ(x)).K

13.20 Let A be the finite dimensional algebra of upper triangular matrices

···
 
x11 x12 x1n
 0 x22 ··· x2n 
x= .
 
.. .. .. 
 .. . . . 
0 0 ··· xnn

Show that σ(x) = {x11 , x22 , . . . , xnn }.

13.21 Let A be a unital algebra and x ∈ A invertible. Show that σ(x−1 ) = {z : z −1 ∈ σ(x)}.

13.22 Let X be a nonempty set and f ∈ B(X). Show that σ(f ) = cl f (S).

13.23 Let U be an open subset of C and let A be the Banach algebra of all bounded analytic functions
on U with the sup norm. Show that for any f ∈ A, σ(f ) = cl f (U ).

13.24 Define T on L2 [0, 1] by (T f )(x) = xf (x). Find σ(T ).

13.25 [↑ 8.34] Find the spectrum of the left shift and right shift operators on `2 .

13.26 Let A be a unital Banach algebra and x, y ∈ A. Prove:


(a) e − xy is invertible iff e − yx is invertible. JIf z := (e − xy)−1 , consider e + yzx.K
(b) σ(xy) \ {0} = σ(yx) \ {0}.
(c) r(xy) = r(yx).
(d) For the shift operators on `2 (N), σ(Tr T` ) 6= σ(T` Tr ).

13.27 Let x, y ∈ A such that xy = yx. Show that r(xy) ≤ r(x)r(y) and that equality holds if
x = y.

13.28 (Resolvent identity). The resolvent function of a member x of a unital Banach algebra is
the function R(z) = (ze − x)−1 , z ∈ ρ(x). Verify that R(z) − R(w) = (w − z)R(z)R(w).

13.29 Consider the Banach algebra C 1 [0, 1] of Ex. 13.9. Let f (x) = x. Show that r(f ) = 1 < kf k.

13.30 Let A be a unital C ∗ -algebra and B a closed C ∗ -subalgebra of A containing the identity. Let
x ∈ B. Obviously, σA (x) ⊆ σB (x), hence ρB (x) ⊆ ρA (x). Carry out the following steps to
prove that σB (x) ⊆ σA (x) and hence that σA (x) = σB (x).
(a) If U ⊆ V are open subsets of C and V ∩ bd(U ) = ∅, then every component of U is a
component of V . JIf U 0 is a component of U , then bd U 0 ⊆ bd U . K
(b) If z is a boundary point of the open set ρB (x) and zn ∈ ρB (x) with zn → z, then
k(zn e − x)−1 k → ∞. JUse Ex. 13.13.K
(c) ρA (x) ∩ bd ρB (x) = ∅.
(d) σB (x) is the union of σA (x) and certain bounded components of ρA (x).
(e) If x is self-adjoint, then σA (x) = σB (x). JρA (x) is connected.K
(f) If x is invertible in A, it is invertible in B. Jx∗ x is invertible in B.K
(g) σA (x) = σB (x).
324 Principles of Analysis

13.3 The Spectrum of an Algebra

In this section, A denotes a commutative, unital Banach algebra.

Characters
A character of A is a homomorphism χ from A into C that is not identically zero. Thus
χ(e) 6= 0, and it follows from the calculation χ(e) = χ(e2 ) = χ(e)2 that χ(e) = 1. The
collection of all characters of A is called the spectrum or character space of A and is
denoted by σ(A). For example, if X is a topological space and x ∈ X, then the mapping
f 7→ f (x) is a character of the Banach algebra Cb (X).
13.3.1 Proposition. If χ is a character, then χ is continuous and kχk ≤ 1.
Proof. Let x ∈ A and suppose that |χ(x)| > kxk. Set α = 1/χ(x). Then kαxk < 1, so
e − αx is invertible. Denote the inverse by y, so that y − αyx = y(e − αx) = e. But
then 1 = χ(e) = χ(y) − αχ(y)χ(x) = χ(y) − χ(y) = 0. Therefore, |χ(x)| ≤ kxk, hence
kχk ≤ 1.
The preceding proposition shows that σ(A) is a subset of the closed unit ball of A 0 . As
such it inherits the weak∗ topology of A 0 , also called the Gelfand topology of σ(A).
13.3.2 Example. The spectrum of C(X). Let X be a compact Hausdorff space. For x ∈ X
b denote the character x
let x b(f ) = f (x), f ∈ C(X). We show that the mapping x → x b is a
homeomorphism onto the spectrum Σ := σ(C(X)) of C(X).
The mapping x → x b is obviously continuous in the weak∗ topology of C(X)0 . Moreover,
since the functions in C(X) separate points (Urysohn’s lemma), the mapping is 1-1. It
remains to verify surjectivity.
Let χ ∈ Σ. We claim that there exists x0 ∈ X such that g(x0 ) = 0 for all g ∈ ker χ.
If this is not the case, then for each x ∈ X there exists gx ∈ ker χ such that gx (x) 6= 0.
By continuity, there exists an open neighborhood Ux of x such that gx = 6 0 on Ux . By
compactness of X, P , . . . , xn ∈ X such that X = Ux1 ∪ · · · ∪ Uxn . Set gj = gxj .
there exist x1P
n n
The function g := j=1 gj g j = j=1 |gj |2 is then positive on X and hence invertible in
Pn
C(X). On the other hand, χ(g) = j=1 χ(gj )χ(gj ) = 0, impossible for an invertible element.
This verifies the claim.
Now let f ∈ C(X). Then h := f − χ(f ) · 1 ∈ ker χ, hence h(x0 ) = 0 and so x b0 (f ) = χ(f ).
Therefore, the mapping x → x b is surjective. ♦

Maximal Ideals
A maximal ideal of A is a proper ideal that is not contained in a larger proper ideal.
Here is an interesting and illuminating example.
13.3.3 Example. Let X be a (nontrivial) compact Hausdorff space. For a subset Y of X,
set IY := {f ∈ C(X) : f (Y ) = 0}. Then IY is easily seen to be a proper ideal of C(X). We
show that IY is maximal iff Y is a singleton.
To show that Iy is maximal, suppose that Iy is properly contained in an ideal I and let
f ∈ I \ Iy , so that f (y) 6= 0. Define g(x) = f (x) − f (y) (x ∈ X). Then g(y) = 0, hence
g ∈ Iy ⊆ I. It follows that the nonzero constant function f − g = f (y) is in I, hence
I = C(X). Therefore, Iy is maximal.
Conversely, if Y has more than one element and y ∈ Y , then by Urysohn’s lemma we can
construct a function f ∈ Iy \ IY . Then IY is properly contained in Iy , so is not maximal. ♦
Banach Algebras 325

13.3.4 Proposition. Every proper ideal I is contained in a maximal ideal and every
maximal ideal is closed.
Proof. Partially order the collection of proper ideals of A containing I by inclusion. The
union J of a chain of proper ideals containing I is an ideal containing I and is proper
since e 6∈ J. Therefore, J is an upper bound for the chain. By Zorn’s lemma, I is contained
in a maximal ideal.
For the second part of the proposition, let M is a maximal ideal that is not closed. Then
M is properly contained in cl(M). But cl(M) is a proper ideal (Ex. 13.8), contradicting the
maximality of M.
Recall that the quotient space A/I of A by a closed ideal I is a Banach algebra and
the quotient map Q : A → A/I is a continuous homomorphism (Ex. 13.7). The following
theorem will be needed in the proof of 13.3.6 below.

13.3.5 Theorem. A closed ideal I in A is maximal iff A/I is a field.


Proof. Assume that I is maximal. Let QI : A → A/I denote the quotient map. If x ∈ A
and Q(x) is not invertible, then the ideal I := Q(Ax) = Q(A)Q(x) of A/I cannot contain
the identity and so is a proper ideal of A/I. It follows that Q−1 (I) is a proper ideal of A
containing I = Q−1 (0), so Q−1 (I) = I by maximality of I. Since x ∈ Q−1 (I), Q(x) = 0.
Therefore, the only non-invertible member of A/I is the zero element, hence A/I is a
field.
Conversely, suppose A/I is a field and I is not maximal. Let M be a maximal ideal
properly containing I. Then there exists a continuous homomorphism T : A/I → A/M
such that T QI = QM . Let x ∈ M \ I, so that QM x = 0 and QI x = 6 0. Then (QI x)−1
exists and so
  
QM e = T QI x T (QI x)−1 = (QM x)T (QI x)−1 = 0,

impossible since A/M is nontrivial. Therefore, I must be maximal.


13.3.6 Theorem. The mapping χ → ker χ is a 1-1 correspondence between the spectrum
of A and the collection of all maximal ideals of A.
Proof. Since χ(e) = 1, ker χ is a proper ideal of A. Let I be an ideal properly containing
ker χ and let x ∈ I \ ker χ. Then every member a of A may be written
   
a = χ(a)χ(x)−1 x + a − χ(a)χ(x)−1 x .

The first term on the right is in I and the second is a member of ker χ ⊆ I. Therefore,
a ∈ I, proving that I = A. Therefore, ker χ is maximal.
Now let M be any maximal ideal. By 13.3.5, A/M is a field, hence, by the Gelfand-Mazur
theorem, A/M = {ze + M : z ∈ C}. Now define χ0 (ze + M) = z and set χ = χ0 ◦ QM .
Then χ is a character with kernel M.
Finally, if ker χ1 = ker χ2 , then cχ1 = χ2 for some c ∈ C (0.2.3). Since χ1 (e) = χ2 (e) = 1,
χ1 = χ2 .
Because of the 1-1 correspondence in 13.3.6, the spectrum σ(A) of A is also called the
maximal ideal space of A.
326 Principles of Analysis

Exercises
13.31 Show that GcA is the union of all maximal ideals in A.
13.32 Let A and B be commutative, unital Banach algebras and Φ : A → B a surjective homomor-
phism. Prove: if M is a maximal ideal in A, then Φ(M) is a maximal ideal in B.
13.33 [↑ 13.14] Showthat the evaluation mapping zb is a homeomorphism from cl D onto the spectrum
Σ := σ A(D) of the disk algebra. JFor surjectivity, let χ ∈ Σ and show that there exists a
z ∈ T such that χ(P ) = P (z) for every polynomial P on cl D.K
13.34 Let A be a unital Banach algebra, x ∈ A and let B be the closed subalgebra of A generated
by x and e. Show that the map F (χ) = χ(x) defines a homeomorphism from ΣB onto σB (x).
13.35 Let χ be a linear functional on A with χ(e) 6= 0. Prove that the following are equivalent.
(a) χ is a character of A. (b) ker A is an ideal. (c) ker A is a subalgebra and χ(e) = 1.
13.36 The radical of A is the intersection of all maximal ideals in A. Prove that the radical consists
of all x ∈ A such that limn (cx)n = 0 for all c ∈ C.

13.4 Gelfand Theory


The Representation Theorem
Here is the main result of the chapter. Applications illustrating the broad utility of the
theorem are given below and in §13.6.
13.4.1 Gelfand Representation Theorem. Let A be a unital, commutative Banach
algebra with spectrum Σ := σ(A).
(a) Σ is a weak∗ -compact subset of the closed unit ball in A ∗ .
(b) For each x ∈ A, the function xb on Σ defined by x(χ)
b = χ(x) is a continuous function
b
such that x(Σ) = σ(x), hence kxkb = r(x) ≤ kxk.
(c) The map x → x b is a continuous homomorphism from A into C(Σ).
b is invertible in C(Σ).
(d) A member x of A is invertible iff x
(e) If A is a C ∗ -algebra, then x → x
b is an isometric ∗-isomorphism of A onto C(Σ).
Proof. (a) Since the pointwise limit of a net of characters is a character, Σ is w∗ -closed
subset of the closed unit ball of A 0 and so is w∗ -compact by Alaoglu’s theorem.
(b) Clearly, xb is w∗ -continuous. If z ∈ σ(x), then e 6∈ (ze − x)A, hence (ze − x)A is
a proper ideal and so is contained in a maximal ideal. Therefore, χ(ze − x) = 0 for some
b
character χ, hence z = x(χ) b
∈ x(Σ). Conversely, if z 6∈ σ(x), then ze − x is not contained
b
in any maximal ideal, hence χ(x) 6= z for all χ, that is, z 6∈ x(Σ). b
Therefore, x(Σ) = σ(x).
\ −1
Part (c) is clear. For (d), if x is invertible, then 1 = xx = x d −1
b x , hence (x)b −1
=x d−1 .

Conversely, if xb is invertible in C(Σ), then 0 6∈ x(Σ)


b = σ(x), so x is invertible.
(e) By 13.2.9, r(x) = kxk, hence by (b) the map x 7→ x b is an isometry. We prove next
that xc∗ = x.b By 13.2.8, if x is self-adjoint, then its range σ(x) is real, so the desired
equation holds trivially. In the general case write x = y + i z, where y = (x + x∗ )/2 and
z = (x − x∗ )/2i. Since these are self-adjoint, x∗ = y − i z, hence x c∗ = y
b − iz b = x.
b We
have shown that x → x b is an isometric ∗-isomorphism of A into C(Σ). It follows that Ab
is a conjugate closed norm closed subalgebra of C(Σ). Since Ab separates points of Σ and
contains the constant functions, Ab = C(Σ) by the Stone-Weierstrass theorem.
Banach Algebras 327

b ∈ C(Σ) in the representation theorem is called the Gelfand transform


The function x
of x. The map
Γ : A → C(Σ), Γ(x) = x, b
is called the Gelfand transform of A.

Application: The Stone-Čech Compactification


In this subsection, we apply the Gelfand representation theorem to unital C ∗ -subalgebras
F of Cb (S), where S is a topological space. We denote by S F the spectrum of F with the
Gelfand topology. For each x ∈ S, the mapping x b on F defined by

b(f ) := f (x), f ∈ F
x

is clearly a member S F . The Gelfand transform fb ∈ C(S F ) of f ∈ F then satisfies


(fb)(b b(f ) = f (x). The mapping
x) = x

ιS : S → S F , ιS (x) = xb,
is called the canonical mapping from S to S F . The Gelfand representation theorem yields
a simple proof of the following generalization of the Stone-Čech compactification theorem:
13.4.2 Theorem. Let S be a topological space and F a unital C ∗ -subalgebra of Cb (S).
(a) S F is compact Hausdorff topological space and ιS is a continuous function from S
onto a dense subset of S F .

(b) The adjoint map ι∗S : C S F → F is a surjective isometric isomorphism.
(c) Let T be a topological space, G a unital C ∗ -subalgebra of Cb (T ), and ϕ : S → T a
continuous function such that the dual map ϕ∗ : Cb (T ) → Cb (S) maps G into F. Then
there exists a continuous map ϕe : S F → T G such that the following diagram commutes:
ϕ
S F −−−−→ T G
e

x x
ιS 
 ιT 

ϕ
S −−−−→ T

In particular, S F is unique up to homeomorphism.


Proof. The Gelfand transform f 7→ fˆ : F → C(S F ) is an isometric isomorphism onto C(S F ).
To see that the mapping ιS is continuous, let xα → x and note that fb(b xα ) = f (xα ) →
f (x) = fb(b
x). Since S F is compact and every function in C(S F ) is of the form fb, it follows
that xbα → x b in S F . From the identity ι∗S (fb)(x) = fb(b
x) = f (x) we see that ι∗S is the inverse
of the Gelfand transform. Since every function fb that is zero on ιS (S) is identically zero,
ιS (S) is dense in S F . This proves (a) and (b). 
For (c), given χ ∈ S F define ϕ(χ)
e to be the character g → χ ϕ∗ (g) on G. In particular,
for x ∈ S we have

e
ϕ(x̂)(g) =x [
b(g ◦ ϕ) = g ϕ(x) = ϕ(x)(g), g ∈ G,

e ◦ ιS = ιT ◦ ϕ. Clearly, ϕ
hence ϕ e is continuous in the Gelfand topology.
328 Principles of Analysis

The mapping ι = ιS need not be a homeomorphism or even 1-1. However, if S is completely


regular and F = Cb (S), then ι does have these properties. Indeed, that ι is 1-1 in this case
follows from the fact that Cb (S) separates points of S. To see that ι is an open map and
hence a homeomorphism, let U ⊆ S and x1 ∈ U . By the complete regularity of S, there
exists f ∈ Cb (S) such that f (x1 ) = 1 and f = 0 on S \ U . The set {b x : f (x) = hf, x
bi > 0}
is then a weak∗ neighborhood of x b1 contained in ι(U ). Thus ι(U ) is open in ι(S). We have
proved the following classical result:
13.4.3 Theorem (M. Stone, E. Čech). If S is completely regular and F = Cb (S), then
ιS : S → ιS (S) is a homeomorphism.
The space S Cb (S) in the preceding theorem is called Stone-Čech compactification of
S and is denoted by βS. We shall also use this notation for the spectrum of Cb (S) for any
topological space S.

Application: Wiener’s Theorem


An absolutely convergent trigonometric series is an infinite series of the form

X ∞
X
fx (t) = xk eikt , t ∈ R, where kxk1 = |xk | < ∞.
k=−∞ k=−∞

In this subsection we use the Gelfand representation theorem to prove the following classical
result:
13.4.4 Theorem (N. Wiener). The reciprocal of an absolutely convergent, nonvanishing
trigonometric series is an absolutely convergent trigonometric series.
Proof. (Gelfand). We apply the representation theorem to the unital, commutative Banach
algebra `1 (Z) (13.1.1(f)). We claim that the characters of `1 (Z) are the functions χz defined
by
X∞
χz (x) = xn z n , x ∈ `1 (Z), z ∈ T.
n=−∞
Clearly χz (e0 ) = 1. The calculation

X ∞
X ∞
X ∞
X
χz (x ∗ y) = xn−k yk z n = yk z k z n−k xn−k = χz (x)χz (y)
n=−∞ k=−∞ k=−∞ n=−∞

P∞ Now let χ be any character and define en (n ∈ Z)


then shows that χz is a character.
by en (k) = δnk . Then x = k=−∞ xn en in the `1 -norm. Moreover, from the identity
en ∗ em = em+n we see that e1 ∗ · · · ∗ e1 = en and en ∗ e−n = e0 , hence χ(en ) = χ(e1 )n
for all n ∈ Z. Setting z = χ(e1 ) ∈ T we have

X ∞
X
χ(x) = xn χ(en ) = xn z n ,
n=−∞ k=−∞

that is, χ = χz , verifying the claim.


The spectrum of `1 (Z) may now be identified with T under the bijection χz → z.
Consequently, the Gelfand transform x b of x ∈ `1 (Z) may be written as

X
b
x(z) b z) =
= x(χ xn z n , z ∈ T.
n=−∞

b If x
Thus the absolutely convergent trigonometric series are precisely the characters x. b is
never zero, then by (d) of the representation theorem the reciprocal 1/x b is the Gelfand
transform of a member of `1 (Z), proving the theorem.
Banach Algebras 329

Exercises
13.37 Let A be a unital C ∗ -algebra and x ∈ A normal. Prove the following:
(a) x is unitary iff σ(x) ⊆ T.
(b) x is self-adjoint iff σ(x) ⊆ R.
(c) x is a projection iff σ(x) ⊆ {0, 1}.
(d) If A is commutative, then x ≥ 0 iff σ(x) ⊆ R+ . (See Ex. 13.51 for a strengthened version.)
JConsider C ∗ (x).K

13.38 Let A be a unital commutative C ∗ algebra and let x ∈ A be a projection such that x =
6 0 and
x 6= e. Show that the spectrum of x is disconnected.

13.39 Show that the spectrum of x ∈ `1 (Z) consists of all numbers ∞n=−∞ xn z with z ∈ T.
n
P

13.40 Let A be a unital Banach algebra and x, y ∈ A with xy = yx. Prove: r(x+y) ≤ r(x)+r(y).

13.41 Let A be the Banach algebra C 1 [0, 1] with the norm kf k = kf k∞ +kf 0 k∞ . One may argue exactly
as in 13.3.2 that the mapping x → x b is a homeomorphism from [0, 1] onto the spectrum of A,
so the spectrum may be identified with [0, 1]. Show that the Gelfand transform Γ : A → C[0, 1]
is neither surjective nor an isometry.

*13.5 The Non-unital Case


The Unitization of a Banach Algebra
Let A be a non-unital, commutative Banach algebra. The spectrum σ(A) of A is defined
exactly as in the unital case, namely, the set of all non-identically zero homomorphisms
χ : A → C. To make use of the Gelfand machinery, we adjoin an identity to A as follows:
Let A1 = A × C be the product vector space and define multiplication in A1 by

(x, a)(y, b) = (xy + ay + bx, ab). x, y ∈ A, a ∈ C.

Then A1 is an algebra with identity (0, 1). Moreover, (x, a) = (x, 0) + (0, a)(0, 1), so
identifying A × {0} with A and {0} × C with C we may write (x, a) = x + a. With this
algebraic identification, A is a maximal ideal in A1 . Moreover, it is easy to check that A1 is
a Banach algebra under the norm

k(x, a)k = kxk + |a|

and that A is isometrically isomorphic to A × 0. (In Ex. 13.43, the reader is asked to verify
these assertions.) The algebra A1 is called the unitization of A.
The spectrum of A1 is related to that of A as follows: For χ ∈ σ(A) define

χ1 (x, a) = χ(x) + a. (13.3)

Then χ1 is easily seen to be a character of A1 (Ex. 13.44). In particular, |χ(x)| = |χ1 (x, 0)| ≤
kχ1 k k(x, 0)k = kxk, so kχk ≤ 1. Thus a character of A is a member of the closed unit ball
of A 0 . The spectrum of A may not be closed, but it is the case that σ(A) ∪ {0} is closed
and hence weak∗ compact. Indeed, if χα → ϕ in the weak∗ topology of σ(A), then ϕ is easily
seen to be a homomorphism, hence either ϕ = 0 or ϕ ∈ σ(A). Now let ϕ be any character of
σ(A1 ). Then
ϕ(x, a) = ϕ(x, 0) + ϕ(0, a) = ϕ(x, 0) + a.
330 Principles of Analysis

The map x → ϕ(x, 0) is either a character of A 0 or the zero homomorphism. In the former
case, ϕ is of the form χ1 as in (13.3), and in the latter case ϕ is the character ϕ0 (x + a) := a.
Thus we see that

σ(A1 ) = {χ1 : χ ∈ σ(A)} ∪ {ϕ0 } and σ(A1 ) A = σ(A) ∪ {0}. (13.4)

In this way we may identify σ(A1 ) with σ(A) ∪ {0}. From (13.3), the Gelfand transforms
Γ : A → σ(A) and Γ1 : A1 → σ(A1 ) are related by

Γ1 (x, a)(χ1 ) = Γ(x)(χ) + a.

The Non-unital Representation Theorem


13.5.1 Theorem. Let A be a non-unital, commutative Banach algebra. Then
(a) The spectrum Σ := σ(A) is locally compact in the w∗ -topology.
(b) The Gelfand transform Γ is a continuous algebra homomorphism of A into C0 (Σ).
1/n
b ∞ = limn kxn k
(c) For all x ∈ A, kxk .
c∗ = x
(d) If A is a ∗-algebra and x b for all x, then Γ(A) is dense in C0 (Σ).

Proof. Since σ(A) ∪ {0} is weak∗ compact and since removing a point from a compact space
produces a locally compact space, we see that σ(A) is locally compact, proving (a). Thus if
σ(A) is not compact, then σ(A) ∪ {0} is the one-point compactification of σ(A).
Part (b) is clear. To prove (c) recall that, by the unital case,

\
k(x, 0)k∞ = lim k(x, 0)n k
1/n 1/n
= lim kxn k .
n n

Furthermore,

\
(x, b
0)(χ1 ) = Γ((x, 0))(χ1 ) = Γ(x)(χ) = x(χ) \
and (x, 0)(ϕ0 ) = 0,

\
hence k(x, b ∞ . Therefore, (b) holds.
0)k∞ = kxk
Finally, the hypothesis in (d) implies that Γ(A) is conjugate closed. Since Γ(A) trivially
separates points and characters are not identically zero, the locally compact version of the
Stone-Weierstrass theorem (0.12.13) implies that Γ(A) is dense in C0 (X).

The Spectrum of C0 (X)


Let X be a noncompact, locally compact Hausdorff space. We show that the mapping
F : x 7→ x b is a homeomorphism from X onto the spectrum Σ of C0 (X), so that Σ
may be identified with X. The argument uses Example 13.3.2 applied to the one-point
compactification X∞ = X ∪ {∞} of X.
The mapping F is obviously a continuous injection into Σ. To show that F is surjective,
let χ ∈ Σ and define χ∞ on C(X∞ ) by

χ∞ (f ) = χ(f0 ) + f (∞), f0 := f X − f (∞), f ∈ C(X∞ ).

Then χ∞ agrees with χ on the space C0 (X), which may be identified with the set of functions
in C(X∞ ) that are zero at ∞. We claim that χ∞ is in the spectrum Σ∞ of C(X∞ ). Clearly,
χ∞ is linear and χ∞ (1) = 1. From
  
f0 g0 = f X − f (∞) g X − g(∞) = (f g) X − f (∞)g0 − g(∞)f0 − f (∞)g(∞)
Banach Algebras 331

we have
(f g)0 = (f g) X − f (∞)g(∞) = f0 g0 + f (∞)g0 + g(∞)f0 .
Since f0 , g0 ∈ C0 (X),

χ∞ (f g) = χ (f g)0 + (f g)(∞) = χ(f0 )χ(g0 ) + f (∞)χ(g0 ) + g(∞)χ(f0 ) + f (∞)g(∞)
= χ∞ (f )χ∞ (g).

Therefore, χ∞ ∈ Σ∞ and so χ∞ = c ∞ or x b for some x ∈ X. But if the former, then for all
f ∈ C(X∞ ) we have f (∞) = χ∞ (f ) = χ(f0 ) + f (∞), which implies that χ(g) = 0 for all
g ∈ C0 (X), contrary to the definition of character. Thus χ∞ = x b for some x ∈ X and so
χ=x b = F (x), proving that F is surjective.
It remains to show that F −1 is continuous. This follows from the implications x bα → xb
in Σ ⇒ f (xα ) → f (x) for all f ∈ C0 (X) ⇒ f (xα ) → f (x) for all f ∈ C(X∞ ) ⇒ xbα → x
b in
Σ∞ ⇒ xα → x in X∞ ⇒ xα → x in X. ♦

The Spectrum of L1 (Rd )

We show that the spectrum Σ of the Banach algebra L1 (Rd ) (13.1.1(g)) may be identified
with Rd as follows: For t ∈ Rd , define a function φt on L1 (Rd ) by
Z
φt (f ) := ei t·x f (x) dx. (†)

Thus φt (f ) is simply a variation of the Fourier transform of f evaluated at t and so φt is a


member of Σ (see 6.2.1(a)). Now define F : Rd → Σ, F (t) = φt . Then F is continuous by
the dominated convergence theorem. We show that F is a homeomorphism, allowing us to
identify the topological spaces Rd and Σ.
That F is 1-1 follows from (L1 , L∞ ) duality. Indeed, if φt1 = φt2 , then ei x·t1 = ei x·t2 for
a.a. x, hence, by continuity, t1 = t2 . To show that F is surjective, let φ ∈ Σ and use duality
again to choose h ∈ L∞ (Rd ) such that
Z
φ(f ) = f (x)h(x) dx for all f ∈ L1 (Rd ).

For f ∈ L1 and y ∈ Rd , set fy (x) = f (x − y). If f, g ∈ L1 (Rd ) we have


Z
φ(f ∗ g) = φ(f )φ(g) = g(y)φ(f )h(y) dy. (†)

On the other hand, by Fubini’s theorem,


Z ZZ
φ(f ∗ g) = (f ∗ g)(x)h(x) dx = f (x − y)g(y)h(x) dy dx
Z Z
= g(y) f (x − y)h(x) dx dy
Z
= g(y)φ(fy ) dy. (‡)

Therefore, from (†) and (‡),


Z Z
g(y)φ(f )h(y) dy = g(y)φ(fy ) dy ∀ g ∈ L1 (Rd ).

It follows that for each f ∈ L1 , φ(f )h(y) = φ(fy ) for a.a. y. Choosing f so that φ(f ) 6= 0, we
332 Principles of Analysis

then have h(y) = φ(f )−1 φ(fy ) for a.a. y. The right side of this equation is then a continuous
version of h. Replace h by this version and note that h, which is uniquely determined by φ,
does not depend of f . Thus for all f ∈ L1 and y ∈ Rd , φ(f )h(y) = φ(fy ) and so

φ(f )h(x + y) = φ(fx+y ) = φ((fx )y ) = φ(fx )h(y) = φ(f )h(x)h(y).

Therefore, h(x + y) = h(x)h(y) for all x, y ∈ Rd . Since h is continuous, h(x) = exp (i t · x)


for some t ∈ Rd (Ex. 13.42) and so φ = φt from (†). Thus F is surjective.
It remains to show that F −1 is continuous. For ease of notation we do this for the case
d = 1; the proof for general case is entirely similar. Let t ∈ R and let (tα ) be a net in R such
that Z
F (tα ) − F (t) = (eitα x − eitx )f (x) dx → 0 for all f ∈ L1 . (‡)

Replacing f (x) by eitx f (x), we may take t = 0. Thus we must show that tα → 0. Taking
f = 1[0,1] in (‡) and integrating shows that the net (tα ) must be bounded. Let (tβ ) be
any convergent subnet, say tβ → s. Then ei tβ x → ei sx uniformly in x ∈ [0, 1], and taking
f = (e−i tβ x − 1)1[0,1] in (‡) we see that
Z 1
|ei sx − 1|2 dx = 0.
0

i sx
Therefore, e = 1 for all x ∈ [0, 1], which is possible only if s = 0. This shows that tα → 0,
completing the argument. ♦

Exercises
13.42 Let f ∈ Cb (Rd ) such that f (x) 6= 0 and f (x + y) = f (x)f (y) for all x, y ∈ Rd . Carry out the
following steps to prove that there exists t ∈ Rd such that f (x) = exp (i t · x) for all x.
Z a Z a
(a) There exists a > 0 such that α := ··· f (y1 , . . . , yd ) dy1 . . . dyd 6= 0.
0 0
Z a+x1 Z a+xd
(b) αf (x) = ··· f (y1 , . . . , yd ) dy1 . . . dyd , hence f is continuously differentiable.
x1 xd

(c) There exists zj ∈ C such that ∂j f (0, . . . 0, xj , 0, . . . , 0) = zj f (0, . . . 0, xj , 0, . . . , 0).


(d) f (0, . . . 0, xj , 0, . . . , 0) = eizj xj , zj = itj , tj real. Draw the desired conclusion.

13.43 Let A be a Banach algebra and A1 the unitization of A. Prove the following:.
(a) A1 is an algebra with identity 1 := (0, 1).
(b) (x, a) = (x, 0) + (0, a)(0, 1), so that identifying A × {0} with A with {0} × C with C we
may write (x, a) = x + a.
(c) A1 is commutative iff A is commutative.
(d) A1 is a Banach algebra with the norm k(x, a)k = kxk+|a| and A is isometrically isomorphic
to A × 0.
(e) A is a maximal ideal of A1 .

13.44 Let A be a nonunital commutative Banach algebra with spectrum Σ and let A1 be the unitization
of A. Prove that χ1 is a character of A1 and that the mapping χ → χ1 is an injection from Σ
into the spectrum Σ1 of A1 .
Banach Algebras 333

13.6 Operator Calculus


In this section we use the Gelfand representation theorem to prove two general results
leading to the construction of functions of a normal operator on a Hilbert space.

The Continuous Functional Calculus


13.6.1 Theorem. Let A be a unital C ∗ algebra and let x ∈ A be normal. Then there
exists an isometric ∗-isomorphism Ψ : f 7→ f (x) from C σ(x) onto C ∗ (x) such that the
following hold:
(a) Ψ(z) = x and Ψ(1) = e, where z denotes the identity function on σ(x) and 1 denotes
the constant function.
(b) For any polynomial p(z, z) in z and z on σ(x), Ψ(p(z, z)) = p(x, x∗ ).
  
(c) σ f (x) = f σ(x) for every f ∈ C σ(x) .

(d) The isometric ∗-isomorphism Ψ is unique with respect to property (a).

(e) If B is a unital C ∗ -algebra and Φ : A → B is a ∗-homomorphism


 that maps identity
onto the identity, then σ Φ(x) ⊆ σ(x) and Φ(f (x)) = f Φ(x) for f ∈ C(σ(x)).
Proof. Set Σ := σ(C ∗ (x)). By 13.4.1, there exists an isometric ∗- isomorphism Γ : C ∗ (x) →
C(Σ) such that Γ(x) = x b maps Σ continuously onto σ(x), where x(χ) b = χ(x). If
χ1 (x) = χ2 (x), then, by taking complex conjugates, we have χ1 (x∗ ) = χ2 (x∗ ). By
considering limits of polynomials in (x, x∗ ), we b : Σ → σ(x)
 see that χ1 = χ2 . Therefore, x
is a homeomorphism. Now define Ψ : C σ(x) → C ∗ (x) by

Ψ(f ) = f (x) := Γ−1 (f ◦ x),
b f ∈ C σ(x) .

Then Ψ is an isometric ∗-isomorphism onto C ∗ (x). Moreover,

Ψ(z) = Γ−1 (idσ(x) ◦ x)


b = Γ−1 (x)
b = x,

and since Γ(e) is the constant function 1,

Ψ(1) = Γ−1 (1 ◦ x)
b = Γ−1 (1) = e.

Also, since Γ(x∗ ) = Γ(x) = x


b =xc∗ we see that
   
Ψ(z) = Ψ idσ(x) = Γ−1 x b ∗ = x∗ .
b = Γ−1 x

Thus (b) holds. 


For (c) we have z ∈ σ f (x) iff f (x) − ze is not invertible in C ∗ (x) iff f − z · 1 is
 in C(σ(x)) (by the isomorphism Ψ) iff f (w) = z for some w ∈ σ(x) iff
not invertible
z ∈ f σ(x) .
To verify (d), note that any two isometric ∗-star isomorphisms that satisfy (a) must agree
on polynomials in (z, z) and hence, by continuity and the Stone-Weierstrass theorem, must
be equal.
The inclusion in (e) is Ex. 13.18. The equality in (e) clearly holds for polynomials f in z
and z. The general equality then follows from the Stone-Weierstrass theorem.
The mapping f 7→ f (x) in the theorem is known as the continuous functional calculus.
334 Principles of Analysis

Applications to Operators on Hilbert Space


The next few results use the functional calculus f 7→ f (T ) to obtain several important
properties of normal operators T on a complex Hilbert space H, some of which were stated
without proof in Chapter 11. The functional calculus provides a unified approach to the
proofs.
13.6.2 Theorem (Fuglede-Putnam). Let S, T ∈ B(H) with T normal. If ST = T S, then
ST ∗ = T ∗ S.
P∞
Proof. (Rosenblum [40]). For any Q ∈ C ∗ (T ), the series exp(Q) := n=0 Qn /n! converges
absolutely and is a member of C ∗ (T ). For z ∈ C define

R(z) := exp(−zT ∗ )S exp(zT ∗ ) and U (z) := exp(zT ∗ − zT ).

Since −i(zT ∗ − zT ) is self-adjoint, U (z) is unitary (13.2.7). Moreover, by the functional


calculus,
U (−z) exp(−zT ) = exp(−zT ∗ + zT ) exp(−zT ) = exp(−zT ∗ ).
Therefore,
R(z) = U (−z) exp(−zT )SU (z) exp(zT ). (†)
Since S commutes with T it commutes with exp(−zT ). Furthermore, by the functional
calculus, exp(−zT ) exp(zT ) = I and exp(−zT ) commutes with U (z). It follows from (†)
that R(z) = U (−z)SU (z). In particular, for fixed x, y ∈ H, the function

fxy (z) := (R(z)x | y) = (SU (z)x | U (z)y)

is bounded on C. We claim that f is an entire function. Assuming this for the moment, we
conclude from Liouville’s theorem that fxy is constant. Therefore,

(R(z)x | y) = (R(0)x | y) = (Sx | y)

for all x, y, that is, R(z) = S for all z. Thus S exp(zT ∗ ) = exp(zT ∗ )S and so by induction

S exp(zT ∗ )n = exp(zT ∗ )n S for all n and z. (‡)

Setting cn := (S(T ∗ )n x | y) − ((T ∗ )n Sx | y), we then have by (‡)

X∞
zn
cn = 0 for all z.
n=0
n!
n
Since for some M > 0, |cn | ≤ M kT ∗ k , the series converges uniformly on bounded sets
and therefore defines an analytic function of z. Since the function is identically zero, the
coefficients cn are zero. In particular c1 = 0, which implies the desired result.
To see that f := fxy is entire, set cm,n := ((T ∗ )n S(T ∗ )m x | y) and note that
X (−1)n
f (z) = (exp(−zT ∗ )S exp(zT ∗ )x | y) = z n+m cm,n .
m,n
n!m!

m+n
Since for some C > 0 |cm,n | ≤ C kT ∗ k , the series converges uniformly on bounded sets.
It follows that f is entire.
13.6.3 Corollary. Let S, T ∈ B(H) with T normal. If ST = T S, then Sf (T ) = f (T )S
for all f ∈ C σ(T ) . That is, f (T ) ∈ {T }00 .
Banach Algebras 335

Proof. By 13.6.2, S commutes with all polynomials in T, T ∗ . Since these generate C ∗ (T ),


the assertion follows.
13.6.4 Theorem. Let T ∈ B(H). Then T is a positive operator iff T is positive as an
element of the C ∗ -algebra B(H). In this case, there exists a unique positive square root T 1/2 ,
that is, a unique positive operator that satisfies (T 1/2 )2 = T . Moreover, if T is invertible,
then T −1 is positive, T 1/2 is invertible, and (T 1/2 )−1 = (T −1 )1/2 .
Proof. If T is positive in the C ∗ -algebra B(H), then T = S ∗ S for some S ∈ B(H) and so
(T x | x) = (Sx | Sx) ≥ 0 for all x. Therefore, T is a positive operator.
Conversely, assume that T is a positive operator. Then σ(T ) ⊆ R+ . Indeed, if t < 0 and
x ∈ H, then
ktx − T xk2 = kT xk2 − 2 (T x | x) + t2 kxk ≥ −2t (T x | x) + t2 kxk ≥ t2 kxk,

hence tI − T is invertible by 12.3.7. Applying the functional calculus to f (t) = t (t ∈ σ(T ))
we have f (T ) = T 1/2 and T = f 2 (T ) = f (T )2 = (T 1/2 )2 . Therefore, T is positive and has a
positive square root.
To show uniqueness, let A and B be positive operators with A2 = B 2 = T . Let (pn ) be a
sequence of polynomials converging uniformly on σ(A) ∪ σ(B) to the square root function.
Since
σ(A2 ) = σ(A)2 = {t2 : t ∈ σ(A)}
and pn (t2 ) → t on σ(A), pn (T ) = pn (A2 ) → A by the functional calculus. Similarly,
pn (T ) → B. Therefore, A = B, verifying uniqueness.
If T is invertible, then T −1 is a limit of polynomials in T and is therefore a member of
C (T ). Moreover, σ(T −1 ) ⊆ (0, ∞) by 13.2.8(b), √

hence T −1 is positive. Therefore, (T −1 )1/2
is defined. The functional calculus with f (t) = 1/ t proves the last part of the theorem.
13.6.5 Theorem. Let T ∈ B(H) be self-adjoint. Then there exists a unique pair of
positive operators T + and T − such that T = T + − T − and T + T − = T − T + = 0. Moreover,
kT k = max{kT + k , kT − k} and |T | = T + + T − .
Proof. Apply the functional calculus to the nonnegative functions x+ and x− on σ(T ) ⊆ R
using the relations
n o
x = x+ − x− , |x| = x+ + x− , x+ x− = 0, and sup |x| = max sup x+ , sup x− .
x∈σ(T ) x∈σ(T ) x∈σ(T )

This gives positive operators T + and T − with the desired properties.


To show uniqueness, let T = A − B, where A and B are positive operators with AB =
BA = 0. Note that σ(T ) ⊆ R and σ(A) ∪ σ(B) ⊆ [0, ∞). Using the above relations, we
see that T k = Ak + (−B)k for all k ∈ N and therefore p(T ) = p(A) + p(−B) for all
polynomials p with p(0) = 0. Let X be a compact, symmetric subinterval of R containing
σ(A)∪σ(−B)∪σ(T ). By the Stone-Weierstrass theorem there exists a sequence of polynomials
pn with pn (0) = 0 such that pn (x) → x+ uniformly on X. In particular, pn (x) → x on
σ(A) and pn (−x) → 0 on σ(B). By the functional calculus, we then have pn (T ) → T + and
pn (T ) = pn (A) + pn (−B) → A, hence A = T + . Similarly, B = T − .
13.6.6 Theorem. Every operator T on a Hilbert space H is a linear combination of unitary
operators.
Proof. Since T is a linear combination of self-adjoint operators (12.1.3), we may assume
T is self-adjoint. Moreover, it is no loss of √ generality to assume that kT k ≤ 1. Then
σ(T ) ⊆ [−1, 1], hence the function f (t) = t + i 1 − t2 is defined on σ(T ). Since f f = 1 and
id[−1,1] = 12 (f + f ), by the functional calculus f (T )f (T )∗ = I and T = 12 (f (T ) + f (T )∗ ),
proving the theorem.
336 Principles of Analysis

The Borel Functional Calculus


Let H be a complex Hilbert space and T ∈ B(H) normal. In this subsection we extend
the continuous functional calculus f 7→ f (T ) to Borel functions f on σ(T ). For this we need
the following lemma.
13.6.7 Lemma. Let K ⊆ C be compact. Then the space BL(K) of bounded complex-valued
Borel functions on K is the smallest conjugate closed algebra of bounded functions that (a)
contains all polynomials on K and (b) is closed under pointwise limits of uniformly bounded
sequences of functions on K.
Proof. Let A(K) denote the intersection of all conjugate closed algebras of bounded functions
on K with properties (a) and (b). Then A(K) has properties (a) and (b), and since BL(K)
has properties (a) and (b), A(K) ⊆ BL(K). Moreover, A(K) is an algebra, and by the
Stone-Weierstrass theorem and (b), A(K) contains C(K). Now let A denote the set of all
E ⊆ K such that 1E ∈ A(K). We claim that A is a σ-field that contains all the open sets.
Assuming this for the moment, we see that A must then contain all Borel subsets of K.
Therefore, A(K) contains all simple functions, and since every bounded Borel function f on
K is a pointwise limit of a uniformly bounded sequence of simple functions, A(K) ⊇ BL(K),
proving the lemma.
It remains to verify the claim. Let U be open in K and Kn = {x ∈ U : d(x, U c ) ≥ 1/n}.
Choosing fn ∈ C(K) such that 1Kn ≤ fn ≤ 1U , we have f1 ∨· · ·∨fn ↑ 1U , hence 1U ∈ A(K).
Therefore, A(K) contains all open sets. If E, F ∈ A, then 1E c = 1 − 1E and 1E∩F = 1E 1F ,
hence A is closed under complements and finite intersections and therefore also under finite
unions. If En ∈ A and En ↑ E, then 1En ↑ 1E , hence 1E ∈ A(K) and so E ∈ A. Therefore,
A is a σ-field.
13.6.8 Theorem. Let T be a normal operator on a Hilbert space H and set K := σ(T ).
Then there exists a ∗-homomorphism f 7→ f (T ) from BL(K) into B(H) such that
(a) 1(T ) = I and idK (T ) = T .

(b) If (fn ) is a uniformly bounded sequence in BL(K) that converges pointwise to f , then
w
fn (T )x → f (T )x for every x ∈ H.
P  Pn
n k
(c) k=0 ak z (T ) = k=0 ak T k .

(d) kf (T )k ≤ kf k∞ .

(e) If S ∈ B(H) and ST = T S, then Sf (T ) = f (T )S for every f ∈ BL(K).

Moreover, the ∗-homomorphism f 7→ f (T ) is unique with respect to properties (a) and (b).
Proof. (a) and (c) hold by the continuous functional calculus f 7→ f (T ) : C(K) → B(H).
We extend this to BL(K) as follows: For each pair x, y ∈ H, the mapping f → (f (T )x | y)
is a bounded linear functional on C(K), hence, by the Riesz representation theorem, there
exists a complex measure µ(x, y) on K such that for each f ∈ C(K)
Z
(f (T )x | y) = f dµ(x, y). (13.5)
K

We claim that

(i) µ(ax + by, z) = aµ(x, z) + bµ(y, z). (ii) µ(y, x) = µ(x, y).
(iii) µ(x, x) ≥ 0. (iv) dµ(g(T )x, y) = gdµ(x, y), g ∈ C(K).
Banach Algebras 337

Indeed, by integrating against a continuous function f and using (13.5), we see that (i) holds
because (f (T )x | y) is sesquilinear in (x, y), and (ii) follows from the calculation
Z
 
f dµ(y, x) = (f (T )y | x) = (y | f (T )∗ x) = y | f (T )x = f (T )y | x
K
Z Z
= f dµ(x, y) = f d µ(x, y).
K K

For (iii), if f ≥ 0 and g = f 1/2 , then, by the continuous functional calculus, we have
f (T ) = g 2 (T ) = g(T )g(T ), hence
Z
f dµ(x, x) = (f (T )x | x) = (g(T )x | g(T )x) ≥ 0.
K

Finally, (iv) follows from the calculation


Z Z
f g dµ(x, y) = ((f g)(T )x | y) = (f (T )g(T )x | y) = f dµ(g(T )x, y).
K K
R
Now, for f ∈ BL(K) and x, y ∈ H we have | K f dµ(x, y)| ≤ kf k∞ kxk kyk, hence
by (i) and (ii) and 11.4.1 there exists a unique f (T ) ∈ B(H) such that (13.5) holds. The
mapping f 7→ f (T ) : BL(K) → B(H) is clearly linear and satisfies kf (T )k ≤ kf k∞ and
f (T ) = f (T )∗ , the latter by (ii). Moreover, if g ∈ C(K), then from (iv) we have
Z Z
gf dµ(x, y) = f dµ(g(T )x, y) = (f (T )g(T )x | y) = (g(T )x | f (T )∗ y)
K K
Z
= g dµ(x, f (T )∗ y),
K

hence f dµ(x, y) = dµ(x, f (T )∗ y). Therefore, for all f, g ∈ BL(K) and x, y ∈ H,


Z Z
(f (T )g(T )x | y) = (g(T )x | f (T )∗ y) = g dµ(x, f (T )∗ y) = gf dµ(x, y)
K K
= ((f g)(T )x | y) ,

which shows that f (T )g(T ) = (f g)(T ). We have proved that the mapping f 7→ f (T ) from
BL(K) into B(H) is a ∗-homomorphism satisfying (a), (c), and (d).
To verify (b), we apply the dominated convergence theorem to obtain
Z Z
(fn (T )x | y) = fn dµ(x, y) → f dµ(x, y) = (f (T )x | y) .
K K

For (e), if ST = T S and f ∈ C(K), then by the continuous functional calculus Sf (T ) =


f (T )S and so for all x, y,
Z Z
f dµ(x, S ∗ y) = (f (T )x | S ∗ y) = (Sf (T )x | y) = (f (T )Sx | y) = f dµ(Sx, y).
K K

Therefore, µ(x, S ∗ y) = µ(Sx, y), hence for f ∈ BL(K)


Z Z
(Sf (T )x | y) = (f (T )x | S ∗ y) = f dµ(x, S ∗ y) = f dµ(Sx, y) = (f (T )Sx | y) ,
K K

proving (e).
338 Principles of Analysis

It remains to show uniqueness with respect to properties (a) and (b). Let f 7→ fe(T ) be
another ∗-homomorphism with these properties. Then the collection of all f ∈ BL(K) for
which fe(T ) = f (T ) is a conjugate closed algebra containing all polynomials on K and is
closed under pointwise limits of uniformly bounded sequences and so must coincide with
BL(K) by 13.6.7.
The mapping f 7→ f (T ) in the above theorem is known as the Borel functional calculus.

The Spectral Theorem for Normal Operators


It is possible to recast the Borel functional calculus in terms of projection-valued measures,
as described in the following proposition.
13.6.9 Proposition. Let T ∈ B(H) be normal with Borel functional calculus f 7→ f (T ). For
each Borel subset E of K := σ(T ), set P (E) := 1E (T ). Then the mapping P : B(K) → B(H)
has the following properties:
(a) P (E) is an orthogonal projection.

(b) P (∅) = 0, P (K) = I.

(c) P (E ∩ F ) = P (E)P (F ).
S P∞
(d) If E1 , E2 , . . . are disjoint, and E = n En , then the series n=1 P (En )x converges
in norm to P (E)x for every x.
Proof. Parts (a) – (c) follow immediately from theSnBorel functional calculus, as does (d) for
finite sequences. For infinite sequences, set Fn = j=1 Ej . Then 1Fn → 1E pointwise on K,
w
hence P (Fn )x → P (E)x for all x ∈ X, by part (b) of 13.6.8. Set Tn = P (E) − P (Fn ) =
w
P (E \ Fn ). Then Tn x → 0 and
2
kP (E)x − P (Fn )xk = (Tn x | Tn x) = (Tn∗ Tn x | x) = (Tn x | x) → 0,
proving (d).
We may now formulate the functional calculus in terms of integrals. For each x, y ∈ X,
define P(x,y) (E) := (P (E)x | y). Then
Z
Px,y (E) = (1E (T )x | y) = 1E dµ(x,y) = µ(x, y)(E),
K

so the set function P(x,y) is simply the measure µ(x,y) of the Borel functional calculus, and
(13.5) may be written Z
(f (T )x | y) = f (z) dPx,y (z)
σ(T )

or simply Z
f (T ) := f (z) dP (z).
σ(T )

This expresses f (T ) as an integral with respect to the set function P , which is called the
spectral measure for T . The special case
Z
I= 1 dP (z)
σ(T )

is the motivation for the alternate terminology spectral resolution of the identity. The
special case f (z) = z results in the spectral theorem for normal operators:
Banach Algebras 339

13.6.10 Theorem. If T is a normal operator and P (z) is a spectral resolution of the


identity, then Z
T = z dP (z)
σ(T )

Note that if T is compact, then σ(T ) is a sequence (λn ) ∈ c0 , hence the last integral
reduces to an infinite series, giving the spectral theorem of §12.3.

Exercises
13.45 Let A be a unital C ∗ algebra. Show that if x∗ = −x, then σ(x) ⊆ i R.

13.46 Let A be a unital C ∗ algebra, x ∈ A unitary and σ(x) 6= T. Show that x = eiy for some
self-adjoint y .

13.47 Verify the following assertions to obtain an alternate proof that the operators T ± are unique:
(a) Let T = A − B, for positive operators A and B with AB = 0. Then AT = T A and BT = T B.
(b) A and B commute with T ± .
(c) If C and D are positive operators and CD = DC, then CD is positive.
(d) Set S := T + − A = T − − B. Then 0 ≤ S ∗ S = S 2 = −(T − A + T + B) ≤ 0.
(e) S = 0.

13.48 [↑ 13.18] Let A and B be unital C ∗ -algebras and Φ : A → B an  injective ∗-homomorphism


that maps identity onto identity. Show that for x ∈ A, σ Φ(x) = σ(x) and hence that Φ is
an isometry. JAssume first that x is self-adjoint and use the functional calculus on C(σ(x)).
Apply this to x∗ x for the general case.K

13.49 Let A be a unital C ∗ algebra, x ∈ A normal, and c ∈ C. Show that

kce − xk = sup{|c − z| : z ∈ σ(x)}.

Use this to prove that if x is self-adjoint, then σ(x) ⊆ [0, ∞) iff kce − xk ≤ c for some (for
every) c ≥ kxk JUse 13.2.5.K

13.50 Let A be a unital C ∗ algebra and x, y ∈ A positive. Use the preceding exercise to show that

k(kxk + kyk)e − (x + y)k ≤ kxk + kyk .

Conclude that σ(x + y) ⊆ [0, ∞).

13.51 Let A be a unital C ∗ algebra and let x ∈ A be self-adjoint. Prove the following to conclude
that x ≥ 0 iff σ(x) ⊆ R+ .
(a) If σ(x) ⊆ R+ , then x ≥ 0. JConsider the functional calculus on C ∗ (x).K
(b) Let x ≥ 0. Then x = x+ − x− , where σ(x± ) ⊆ R+ . JUse the functional calculus exactly
as in 13.6.5.K
(c) Set z = yx− . Then σ(z∗ z) ⊆ (−∞, 0].
(d) σ(zz∗ + zz∗ ) ⊆ [0, ∞). JWrite z = u + iv, where u and v are self-adjoint and use
Ex. 13.50.K
(e) σ(zz∗ ) ⊆ [0, ∞). JUse (c), (d), and Ex. 13.50.K
(f) z∗ z = 0. JUse (c), (e) and Ex. 13.26.K
(g) σ(x) ⊆ R+ .

13.52 Show that the definition x ≤ y iff y − x ≥ 0 gives a partial order on the set of self-adjoint
members of a unital C ∗ −algebra such that x ≤ y ⇒ z∗ xz ≤ z∗ yz.
340 Principles of Analysis

13.53 Let A be a unital C ∗ -algebra. Show that if x, y ∈ A are positive and xy = yx, then xy is
positive. JFirst assume A is commutative.K

13.54 Prove that if 0 ≤ S ≤ T and S is invertible, then T is invertible and 0 ≤ T −1 ≤ S −1 . JConsider


S −1/2 and use Ex. 12.14.K

13.55 Let T ∈ B(H) be normal. Show that (f ◦ g)(T ) = f (g(T )), where g is a bounded Borel function
on σ(T ) and f is a bounded Borel function on the closure K of g(σ(T )). JFix g and let B
denote the set of Borel functions f on K for which the equality holds. Then B is a conjugate
closed algebra with properties (a) and (b) of 13.6.7.K
Chapter 14
Miscellaneous Topics

In this chapter we consider some of the deeper aspects of functional analysis and give several
important applications. Additional applications may be found in Chapters 15, 16, and 17.

14.1 Weak Sequential Compactness


In this section we prove that weak compactness and weak sequential compactness in a
Banach space are equivalent. We first prove the analogous result for the pointwise topology
on the space of continuous functions on a compact Hausdorff space.

Pointwise Sequential Compactness


14.1.1 Theorem. Let X be a compact Hausdorff topological space and let A ⊆ C(X) be
norm bounded. Let p denote the topology of pointwise convergence on C(X). The following
statements are equivalent:
(a) A is relatively p-sequentially compact, that is, every sequence (fn ) in A has subsequence
that p-converges in C(X).
(b) If (fn ) is a sequence in A and (xm ) is a sequence in X such that the iterated limits

lim lim fn (xm ) and lim lim fn (xm )


m n n m

exist, then the limits are equal.


(c) A is relatively p-compact.

Proof. (a) ⇒ (b): Assume that the limits in (b) exist. By the hypothesis, (fn ) has a p
sequential limit point, say limk fnk = f ∈ C(X). Let x ∈ X be a limit point of (xm ), say
xmα → x. Then

lim lim fnk (xmα ) = lim fnk (x) = f (x) = lim f (xmα ) = lim lim fnk (xmα ).
k α k α α k

It follows that the limits in (b) are equal.


(b) ⇒ (c): Because A is norm bounded, A ⊆ K X for some compact K ⊆ C. By Tychonoff’s
theorem, K X is compact in the topology of pointwise convergence on X. To prove (c) it
therefore suffices to show that the closure B of A in K X consists entirely of continuous
functions. Suppose, for a contradiction, that B contains a function f : X → K which is not
continuous at some x ∈ X. Thus there exists ε > 0 such that if Dε := {c ∈ K : |c| < ε}, then

f (Nx ) − f (x) 6⊆ Dε for every neighborhood Nx of x in X. (†)

By induction there exist sequences (fn ) in A and (xn ) in X such that

341
342 Principles of Analysis

(i) |fj (x) − fj (xn )| < 1/n, j = 1, . . . , n,

(ii) |f (x) − f (xn )| ≥ ε, and

(iii) |f1 (x) − f (x)| < 1 and |fn+1 (y) − f (y)| < 1/(n + 1), y ∈ {x, x1 , . . . , xn }.

Indeed, the functions fj in (i) define a neighborhood

Nx := {y ∈ X : |fj (x) − fj (y)| < 1/n, j = 1, . . . , n}

of x which is used in (†) to obtain the point xn in (i) and (ii); and (iii) uses the fact that
f is in the pointwise closure of A. Now, since f is bounded, there exists a subsequence
(yk := xmk ) such that f (yk ) → c for some c ∈ K. Then, by (i) and (iii),

lim lim fn (yk ) = lim f (yk ) = c and lim lim fn (yk ) = lim fn (x) = f (x).
k n k n k n

But by (ii), f (x) 6= c, contradicting (b). Therefore, f must be continuous.


(c) ⇒ (a): Let (fn ) be a sequence in A. Suppose first that X is metrizable. It then has a
countable dense subset D. Since A is bounded, a standard diagonal argument shows that
(fn ) has a subsequence (fnk ) that converges pointwise on D. Since C is p-compact, a subnet
of (fnk ) converges pointwise on X to a member f of C. If another subnet of (fnk ) converges
pointwise on X to a member g of C, then f = g on D and so, by continuity, f = g. It follows
that (fnk ) converges pointwise to f on X.
In the general case, define an equivalence relation on X by

x ∼ x0 iff fn (x) = fn (x0 ) for all n.

Give Z := X/ ∼ the quotient topology and let Q : X → Z denote the quotient map. Define
fen on Z by fen ◦ Q = fn . Since fen is continuous, the initial topology τ defined by (fen ) is
weaker than the quotient topology τq . Furthermore, τ is metrizable by

X∞
1 |fn (x) − fn (y)|
d(Q(x), Q(y)) = n 1 + |f (x) − f (y)|
.
n=1
2 n n

Since τ is Hausdorff and τq is compact, τ = τq . Now, by (a), (fn ) has a p-limit point f
in C(X), say fnα → f . Define fe on Z so that fe ◦ Q = f . Then fe is well-defined, since
Q(x) = Q(y) ⇒ f (x) = limα fnα (x) = limn fnα (y) = f (y). Since fe is a p-limit point of
p p
(fen ), by the preceding paragraph fenk → fe for some subsequence (fnk ). Therefore fnk → f ,
proving (a).
Part (b) of the lemma is known as Grothendieck’s double limit property.

The Eberlein-S̆mulian Theorem


Here is the aforementioned result on the equivalence of weak and sequential weak com-
pactness. The proof makes critical use of 14.1.1.
14.1.2 Theorem (Eberlein-S̆mulian). Let X be a Banach space and A ⊆ X. Then A is
relatively weakly compact iff every sequence (xn ) in A has a weakly convergent subsequence.
Proof. For the necessity, it suffices to show that (xn ) has a subsequence (xnk ) such that
for some x ∈ X, hxnk , f i → hx, f i for all f in the closed unit ball C10 of X 0 . We may
assume that X is the closed linear span of (xn ); otherwise we could simply consider weak
Miscellaneous Topics 343

convergence in this space. It follows that C10 , which is weak∗ compact, is metrizable under
the metric
X∞
1 |f (xn ) − g(xn )|
d(f, g) = n 1 + |f (x ) − g(x )|
.
n=1
2 n n

In particular, C10 has a weak∗ dense sequence (fm ). By a diagonal argument, there exists a
subsequence (yn ) of (xn ) such that αm := limn hyn , fm i exists for each m. Since (yn ) is
w
relatively weakly compact, there exists y ∈ Y and a subnet (yα ) of (yn ) such that yα → y.
Therefore, hy, fm i = limα hyα , fm i = αm for all m. If z is another such limit point, then
hz, fm i = αm for all m, hence y = z because (fm ) is weak∗ dense in C10 . Therefore, (yn )
has a unique weak limit point and so must converge weakly.
For the sufficiency, note that the hypothesis and 10.1.2 imply that A is norm bounded.
Let (xα ) be a net in A. Then (x b α ) is a norm bounded net in A b and so has a subnet (xb β)
∗ 00 b
that weak converges to some ϕ in X . It remains to show that ϕ ∈ X, that is, ϕ is weak∗
continuous. By 10.2.9, it suffices to show that the restriction of ϕ to the closed unit ball C10
in X 0 is w∗ -continuous. But this topology is simply the topology of pointwise convergence
on C10 . Thus we have reduced the problem to showing that A b 0 is relatively p-compact in
C1
0
the space of continuous functions on C 1 . But this follows from the hypothesis and 14.1.1,
b
since A C 0 is relatively p-sequentially compact.
1

From Shur’s theorem (10.1.4) we have


14.1.3 Corollary. A subset of `1 (N) is norm compact iff it is weakly compact.
14.1.4 Corollary. Let X be a compact Hausdorff space and A a norm-bounded subset of
C(X). Then A is relatively weakly compact iff A is relatively p-compact. In this case the
topologies p and w coincide on A.
Proof. The necessity is clear. For the sufficiency, it suffices to prove that A relatively weakly
sequentially compact. Let (fn ) be a sequence be a sequence in A. By hypothesis, there exists
a subsequence (fnk ) that converges pointwise on X to some f ∈ C(X). It follows from the
Riesz representation theorem and the dominated convergence theorem that (fnk ) converges
weakly to f .
14.1.5 Corollary. Let X and Y be compact Hausdorff topological spaces and F : X ×Y → C
a bounded, separately continuous function. Then the function y → F (·, y) : Y → C(X) is
continuous in the weak topology of C(X).
Proof. By separate continuity, the function y → F (·, y) : Y → C(X) is continuous in the
p-topology of C(X), hence A := F (·, Y ) is p-compact. Since it is also norm bounded, the
topologies p and w coincide on A.
Part of the next result has already been proved (10.3.6). An application of the Eberlein-
S̆mulian theorem completes the proof.
14.1.6 Corollary. Let X be a Banach space. Then the following are equivalent.
(a) X is reflexive. (b) C1 is weakly compact. (c) C1 is weakly sequentially compact.
344 Principles of Analysis

14.2 Weak Compactness in L1


Let (X, F, µ) be a σ-finite measure space. For 1 < p < ∞, the space Lp (X, F, µ) is
reflexive, hence the weakly relatively compact sets of Lp are simply the bounded sets. For
p = 1, the situation is more complicated. In this section we give a characterization of weak
compactness in L1 (µ) for probability measures µ in terms of uniform integrability. Our
treatment follows [14].

Weak Convergence and Uniform Integrability


14.2.1 Lemma. Let A0 be a countable collection of subsets of X. Then the field ϕ(A0 )
generated by A0 is countable.
Proof. Define sequences (An ) and (Bn ) of subsets of X by the following scheme:
• A1 := the collection of all finite unions of members of A0 ,
• B1 := the collection of all complements of members of A1 ,
• A2 := the collection of all finite unions of members of B1 ,
• B2 := the collection of all complements of members of A2 ,
• etc.
Note that SA, B ∈ An ⇒ A ∪ B ∈ An , Ac ∈ An+1 , and A ∩ B = (Ac ∪ B c )c ∈ An+2 .
Therefore, n An is a field and so coincides with ϕ(A0 ). By induction, An is countable for
every n, completing the proof.
14.2.2 Lemma. Let f be measurable. Then there exists a countable field A ⊆ F such that
f is measurable with respect to σ(A).
Pkn
Proof. Choose a sequence of simple functions gn = j=1 an,j 1An,j converging pointwise to
f . Let A0 denote the countable collection of the sets An,j . Then A := ϕ(A0 ) is countable
(14.2.1) and f is measurable with respect to σ(A).
Here is the key step needed for the proof of the Dunford-Pettis theorem below.
14.2.3 Lemma. Let (X, F, µ) be a probability space. If (gn ) converges weakly to g in L1 (µ),
then (gn ) is uniformly integrable.
Proof. Let dµn = gn dµ and set

X |µn |(A)
η(A) := 2−n , A ∈ F.
n=1
1 + |µn |(X)

By the proof of the Vitali-Hahn-Saks theorem (5.2.4), given ε > 0 there exists δ > 0 and
m ∈ N such that

|µn (A)| ≤ |µm (A)| + 2ε ∀ A with η(A) ≤ δ and ∀ n ≥ m. (†)



Since |µj (A)| ≤ 2j η(A) 1 + |µj |(X) , by taking a smaller δ if necessary we may assume that
η(A) ≤ δ ⇒ |µj (A)| < ε (j = 1, . . . , m). Combining this with (†), we have

|µn (A)| ≤ 3ε ∀ n and ∀ A with η(A) ≤ δ. (‡)

Now observe that η  µ, hence we may choose δ0 > 0 so that µ(A) < δ0 ⇒ η(A) < δ. For
such A, supn |µn (A)| ≤ 3ε from (‡). Thus, by 4.4.2, (gn ) is uniformly integrable.
Miscellaneous Topics 345

The Dunford-Pettis Theorem


Here is the main result of the section.
14.2.4 Theorem (Dunford-Pettis). Let (X, F, µ) be a probability space. Then a subset U
of L1 (µ) is relatively weakly compact iff U is uniformly integrable.
Proof. First, recall that U is uniformly integrable iff
Z
lim sup |f | dµ = 0,
n f ∈U |f |>n

Suppose U is weakly relatively compact but not uniformly integrable. Then there exists
δ > 0 such that Z
lim sup |f | dµ ≥ 2δ.
n→∞ f ∈U |f |>n

For sufficiently large n, choose fn ∈ U such that


Z
|fn | dµ ≥ δ. (α)
|fn |>n

By the Eberlein-S̆mulian theorem, (fn ) has a subsequence (gn ) that converges weakly to
some g. But then (gn ) is uniformly integrable by 14.2.3, contradicting (α).
Conversely, suppose that U is uniformly integrable. We show that a sequence (fn ) in U
has a weakly convergent subsequence. By considering real, imaginary, positive, and negative
parts, we may assume that fn ≥ 0 for all n. Note that by 4.4.2

lim supn kfn 1E k1 = 0. (β)


µ(E)→0

Now, by 14.2.2, there


R exists
 a countable field A ⊆ F such that each fn is F0 := σ(A)
measurable. Since E fn is a bounded sequence and A is countable, we may use a diagonal
argument to extract a subsequence (gn ) of (fn ) such that the limit
Z
η(E) := lim µn (E), µn (E) := gn dµ,
n E

exists for each E ∈ A. Now let


• G = the collection of all G ∈ F such that η(G ∩ A) exists for every A ∈ A, and

• H = the collection of all H ∈ G such that η(H ∩ G) exists for every G ∈ G.


Clearly, A ⊆ H, and since A is closed under finite intersections, A ⊆ G and so H ⊆ G.
In particular, H is closed under finite intersections. Furthermore, since µn (H c ∩ G) =
µn (G) − µn (H ∩ G), H is closed
 under complements. Also, if H1 , H2 ∈ H and H1 ∩ H2 = ∅,
then from µn (H1 ∪ H2 ) ∩ G = µn (H1 ∩ G) + µn (H2 ∩ G) we see that H1 ∪ H2 ∈ H. Since
an arbitrary union H1 ∪ H2 may be written as a disjoint union of members of H, the latter
is a field.
We show next that H is closed under increasing unions. It will follow that H is a σ-field
and hence contains F0 . Let Hk ∈ H, Hk ↑ H, and G ∈ G. To show that η(H ∩ G) exists,
set Ek = (H ∩ G) \ (Hk ∩ G) and note that µ(Ek ) ↓ 0. By (β), µn (Ek ) ↓k 0 uniformly in n,
hence µn (Hk ∩ G) ↑k µn (H ∩ G) uniformly in n. Let ε > 0 and choose k such that

|µn (H ∩ G) − µn (Hk ∩ G)| < ε ∀ n.


346 Principles of Analysis

Then choose N so that |µn (Hk ∩ G) − µm (Hk ∩ G)| < ε for all m, n ≥ N . By the triangle
inequality, |µn (H ∩ G) − µm (H ∩ G)| < 3ε for all n, m ≥ N . Therefore the sequence
(µn (H ∩ G)) is Cauchy, hence H ∈ H.
We have shown that the limit η(E) exists for all E ∈ F0 . By the Vitali-Hahn-Saks
theorem, η is a measure, and clearly η  µ. By the Radon-Nikodym theorem, there exists a
F0 -measurable function g such that dη = g dµ. Thus
Z Z
hg dµ = lim hgn dµ (γ)
n

holds for all F0 -measurable indicator functions h, hence for all F0 -simple functions. Since
the simple functions are dense in L∞ (F0 , µ) (4.2.1), an approximation argument shows that
(γ) holds for all h ∈ L∞ (F0 , µ). Therefore, gn → g weakly in the subspace L1 (F0 , µ), hence
also in the ambient space L1 (F, µ).

14.3 Convexity and Compactness


Recall that the closed convex hull of a subset A of a topological vector space Xτ is the
intersection of all closed, convex sets containing A. It may also be characterized as the
closure of co A (see Ex. 9.3). Furthermore, in a LCS the weak and norm closures of a convex
set coincide (10.1.6). In this section we consider the closed convex hull of weakly compact
subsets of Banach spaces and of τ -compact subsets of Fréchet spaces.

The Krein-S̆mulian Theorem


14.3.1 Theorem (Krein-S̆mulian). The closed convex hull of a weakly compact set K in a
Banach space X is weakly compact.
Proof. We may assume that K ⊆ C1 . Let K have the weak topology and let C10 , the closed
unit ball of X 0 , have the weak∗ topology. Then K and C10 are compact and Hausdorff in these
topologies and the duality map hx, x0 i restricted
to K × C10 is separately continuous. By
0
14.1.5, the mapping C1 → C(K), x 7→ x K , is continuous in the weak∗ topology of C10 and
0 0

the weak topology of C(K). Thus for each µ ∈ C(K)0 , the linear functional x0 → µ x0 |K
is weak∗ continuous on C10 and hence equals x cµ for a unique xµ ∈ X (10.2.9). We now have

µ x0 |K = hxµ , x0 i ∀ µ ∈ C(K)0 and ∀ x0 ∈ X 0 with kx0 k ≤ 1. (†)
It follows that the mapping µ → xµ is affine and weak∗ -weak continuous, hence the image
A of the closed unit ball in C(K) is weakly compact and convex. Since every x ∈ K is of
the form xµ for µ = x b with kxk
b = kxk ≤ 1, the closed convex hull of K is contained in A
and so is weakly compact.
14.3.2 Corollary. The closure of the convex balanced hull of a weakly compact set K in a
Banach space X is weakly compact.
P
P The convex balanced hull of K is the set B of all finite sums j αj xj , where xj ∈ K
Proof.
and j |αj | ≤ 1. Such a sum may be written
X X −1 X
tj βj xj , tj := |αj | |αi | , βj := αj |αj |−1 |αi |.
j i i

Since |βj | ≤ 1, cl B is the closed convex hull of the weakly compact set (cl D) · K and hence
is weakly compact.
Miscellaneous Topics 347

Mazur’s Theorem
Here is an analog of the Krein-S̆mulian theorem for Fréchet spaces, but in the original
topology.

14.3.3 Theorem (Mazur). Let K be a compact subset of a Fréchet space X. Then cl co K


is compact.
Proof. It suffices to show that co K is totally bounded. Let U be a neighborhood of zero
in X and let V be a neighborhood of 0 such that V + V ⊆ U . Since K is totally bounded,
K ⊆ V + F for some finite set F = {x1 , . . . , xn } ⊆ K. Since the mapping
n
X n n
X o
(t1 , . . . , tn ) 7→ tj xj : (t1 , . . . , tn ) : tj ≥ 0, tj = 1 → co F
j=1 j=1

E such that co F ⊆ V + E.
is continuous, co F is compact. Therefore, there exists a finite setP
m
It follows that co K ⊆ U + E. Indeed, let y ∈ co K, say y := j=1 tj yj , where yj ∈ K,
Pm
tj ≥ 0, and j=1 tj = 1. By choice of F , there exist zj ∈ F such that yj − zj ∈ V . By
convexity of V ,
m
X m
X
y= tj (yj − zj ) + tj zj ∈ V + co F ⊆ V + V + E ⊆ U + E.
j=1 j=1

Therefore, co K is totally bounded.

The Finite Dimensional Case


For finite dimensional spaces we have the following stronger result:

14.3.4 Theorem. Let K be a compact subset of Rd . Then co K is compact.


Proof. Let A be any nonempty subset of Rd . We claim that for each x ∈ co A there exists
a subset
Pm B of A containing at most d + 1 pointsPm such that x ∈ co B. To see this, let
x = j=1 tj aj , where m > d + 1, tj ≥ 0 and j=1 tj = 1. We may assume that tj > 0 for
all j. Since m − 1 > d, the vectors aj − am (1 ≤ j ≤ m − 1) are linearly dependent, hence
Pm−1 Pm−1
there exist cj not all 0 such that j=1 cj (aj − am ) = 0. Setting cm = − j=1 cj , we have
m
X m
X
cj aj = 0 and cj = 0. (†)
j=1 j=1

We may assume each cj 6= 0, otherwise reduce the above sums accordingly. Choose k so that
|cj /tj | ≤ |ck /tk | for j = 1, . . . , m. Then tj /tk ≥ |cj /ck | ≥ cj /ck , hence, using (†), we have

Xm   Xm  
cj cj cj
tj − tk ≥ 0, tj − tk = 1, and tj − tk aj = x.
ck j=1
ck j=1
ck

Since the kth coefficient in the last sum is 0, x is now expressed as a convex combination of
fewer than m vectors in A. Continuing P this reduction process verifies the claim.
Now let S = {(t1 , . . . , td+1 ) : tj ≥ 0, j tj = 1}. By the result of the previous paragraph
applied to A = K, we see that co K is the imageP of the compact set S ×K ×· · ·×K under the
d+1
continuous map (t1 , . . . , td+1 , x1 , . . . , xd+1 ) 7→ j=1 tj xj . Therefore, co K is compact.
348 Principles of Analysis

14.4 Extreme Points


Definitions and Examples
An extreme point of a nonempty, convex subset K of a vector space X is a point z ∈ K
with the property

x, y ∈ K, 0 < t < 1, and z = tx + (1 − t)y ⇒ x = y.

We denote the set of extreme points of K by ex K. For example, if K is a triangle (inluding


interior) in R2 , then ex K consists of the vertices. The Krein-Milman theorem asserts that
a compact, convex subset of a LCS is the closed convex hull of its extreme points. The
theorem is among the most powerful tools in modern analysis with important applications
in many diverse areas, some of which we explore in this and later sections.
Here are useful alternate characterizations of extreme point.
14.4.1 Proposition. Let K be a nonempty convex subset of a vector space X. The following
are equivalent:

(a) z is an extreme point of K.


Pn Pn
(b) If z = j=1 tj xj , where xj ∈ K, 0 < tj < 1, and j=1 tj = 1, then z = xj for
some j.
1
(c) If x, y ∈ K and z = 2x + 12 y then x = y.

Proof. Assume that (a) holds. The verification of (b) is a simple induction argument. Indeed,
Pn+1
the assertion is obviously true for n = 2, and if the assertion holds for n and z = j=1 tj xj ,
then, setting t = 1 − tn+1 , we have
X
n  Xn
tj tj
z=t xj + (1 − t)xn+1 , = 1,
j=1
t j=1
t

so the assertion holds for n + 1.


Clearly, (b) ⇒ (c). Now assume that (c) holds and let x, y ∈ K, 0 < t < 1, and
z = tx + (1 − t)y. We may assume that 1/2 ≤ t < 1. Then s := 2t − 1 ∈ [0, 1) and so
sx + (1 − s)y ∈ K. Since z = 12 x + 12 (sx + (1 − s)y), by hypothesis x = sx + (1 − s)y.
Therefore, x = y, hence z is extreme.
The following proposition shows, as one might expect, that an extreme point of a convex
set must lie on the boundary.
14.4.2 Proposition. Let K be a convex subset of a TVS. Then no interior point of K can
be an extreme point.

Proof. Let x ∈ int K and choose y ∈ int K with y 6= x. Since the interior of a convex set
is convex (9.1.1), the function f (t) = y + t(x − y) = tx + (1 − t)y maps [0, 1] into int K.
Since f is continuous, f [0, 1 + ε] ⊆ int K for some ε > 0. Set z = f (1 + ε) = (1 + ε)x − εy.
Then z, y ∈ K, z = 6 y, and x = ε(1 + ε)−1 y + (1 + ε)−1 z, hence x is not an extreme
point.
Miscellaneous Topics 349

14.4.3 Examples.
(a) Let X be a strictly convex normed space. It follows from 8.1.8 that the extreme points
of the closed unit ball in X are the points on the boundary S1 . In particular, this holds for
Hilbert spaces and Lp spaces (1 < p < ∞).
(b) The closed unit ball C1 in c0 has no extreme points. Indeed, if x = (xn ) ∈ S1 and n is
chosen so that |xn | < 1/2, then the equation

x = 12 (x1 , . . . , xn−1 , xn − 12 , xn+1 , . . .) + 12 (x1 , . . . , xn−1 , xn + 12 , xn+1 , . . .)

expresses x as a nontrivial convex sum of members of C1 .


(c) The closed unit ball C1 in L1 (Rd , λd ) has R no extreme points. To see this, let kf k1 = 1
and choose a measurable set E such that E |f | = 1/2 (Ex. 1.85). Set f1 := 2f 1E and
f2 := 2f 1E c . Then f1 , f2 ∈ C1 , f = 21 f1 + 12 f2 , and f1 6= f2 , so f is not an extreme point.
(d) The extreme points of C1 in `1 (N) are the points cen = (0, · · · , 0, c, 0, · · · ), |c| = 1. To
see that such a point is extreme, assume that cen = tx + (1 − t)y, kxk1 = kyk1 = 1,
so that c = txn + (1 − t)yn . If, say, |xn | < |yn |, then |c| ≤ t|xn | + (1 − t)|yn | < |yn | ≤ 1,
impossible. Therefore, |xn | = |yn |, hence c = xn = yn , since c is an extreme point of the set
{|z| ≤ 1}. Since kxk1 = kyk1 = 1 and |c| = 1, it follows that xk = yk = 0 for all k 6= n.
No other point is extreme: Let x = (x1 , x2 , . . .) ∈ S1 be an extreme point of C1 . It suffices
to show that no two members of the P sequence can be nonzero. Suppose, for example, that

both x1 and x2 are nonzero. Set t = n=2 |xn |. Then 0 < t < 1 and

x = (1 − t)(sgn(x1 ), 0, 0, . . .) + t(0, x2 /t, x3 /t, . . .),

which expresses x as a nontrivial convex combination of members of C1 .


(e) Let X be a compact Hausdorff topological space. The extreme points of the closed
unit ball C1 in C(X) are the functions f such that |f | = 1. To see that such a function
is extreme, let f = tg + (1 − t)h, where kgk∞ = khk∞ = 1 and 0 < t < 1. Then for each
x ∈ X, f (x) = tg(x) + (1 − t)h(x), hence, by the strict convexity of C = Euclidean space
R2 , f (x) = g(x) = h(x).
Conversely, let f ∈ C1 with |f (x0 )| < 1 for some x0 . The set

C := x ∈ X : |f (x) − f (x0 )| ≥ 12 (1 − |f (x0 )|)

is closed and does not contain x0 , hence we may choose a nonzero g ∈ C(X) such that g = 0
on C and kgk∞ < r, r > 0 to be determined. Now, f = 12 (f + g) + 12 (f − g) so if we can
choose r so that kf ± gk ≤ 1 it will follow that f is not extreme. Thus it suffices to show
that for suitable r, |f (x)| + |g(x)| ≤ 1 for x ∈ C c . But for such x,

|f (x)| ≤ |f (x) − f (x0 )| + |f (x0 )| ≤ 12 (1 − |f (x0 )|) + |f (x0 )| = 12 (1 + |f (x0 )|),

hence
|f (x)| + |g(x)| ≤ 21 (1 + |f (x0 )|) + r.
Choosing r = 21 (1 − |f (x0 )|) completes the argument.
A similar argument shows that the extreme points of the closed unit ball in L∞ are
the functions f with |f (x)| = 1 a.e. (Or one may use the fact that L∞ is isometric and
isomorphic to C(X), where X is the spectrum of the C ∗ -algebra L∞ .)
(f) Let X be a compact Hausdorff space. Identify the dual of C(X) with the space of all
complex regular Borel measures µ on X with total variation norm |µ|(X). Let C10 denote
the closed unit ball in C(X)0 and P the convex subset of probability measures. We show:
350 Principles of Analysis

(i) The extreme points of C10 are the complex measures cδx , c ∈ C, |c| = 1.
(ii) The extreme points of P are the Dirac measures δx .
To see that cδx is extreme in C10 , let cδx = tµ + (1 − t)ν, where µ, ν ∈ C10 and 0 < t < 1.
For any Borel set E 3 x,

c = cδx (E) = tµ(E) + (1 − t)ν(E), µ(E), ν(E) ≤ 1,

hence µ(E) = ν(E) = c. In particular, µ(X) = ν(X) = c, hence µ(E c ) = µ(X) − µ(E) =
0 = ν(E c ). Therefore, µ = ν and so cδx is extreme in C10 .
Conversely, suppose that kµk = |µ|(X) = 1 and that the support K of |µ| contains at
least two points x and y. Choose disjoint open sets U 3 x and V 3 y. Then |µ|(U ) > 0 and
|µ|(V ) > 0, hence also |µ|(U c ) > 0. Define

µ(U ∩ E) µ(U c ∩ E)
ν(E) = and η = ,
|µ|(U ) |µ|(U c )

so that |ν|(X) = |η|(X) = 1. Setting t = |µ|(U ), we then have µ = tν + (1 − t)η, hence µ is


not extreme. Therefore, |µ| = δx for some x ∈ X. Thus µ(E) = 0 if x 6∈ E and µ(E) = µ({x})
if x ∈ E, so µ = µ({x})δx This verifies (i). Similar arguments establish (ii). ♦
For the proof of the Krein-Milman theorem, we need the following generalization of the
notion of extreme point. Let A be a nonempty subset of a linear space X. A nonempty
subset E of A is an extreme subset of A if

x, y ∈ A, 0 < t < 1, and tx + (1 − t)y ∈ E ⇒ x, y ∈ E.

For example, a side E of a triangle A in R2 is an extreme subset.


14.4.4 Lemma. Let X be a LCS and A 6= ∅ a compact subset of X. If f is a real continuous
linear functional on X and a := min{f (x); x ∈ A}, then E := {x ∈ A : f (x) = a} is an
extreme subset of A.
Proof. Let x, y ∈ A, 0 < t < 1, and f (tx + (1 − t)y) = a. Then f (x), f (y) ≥ a
and the assumption that one or the other is greater than a leads to the contradiction
a = f (tx + (1 − t)y) = tf (x) + (1 − t)f (y) > ta + (1 − t)a = a. Therefore, x, y ∈ E.

The Krein-Milman Theorem


14.4.5 Theorem (Krein-Milman). Let X be a LCS over K and let K ⊆ X be a nonempty
compact convex subset of X. Then K is the closed convex hull of its extreme points.
Proof. We first show that ex K 6= ∅. Let E be the collection of all nonempty, closed extreme
subsets of K. Since K is such a set, E 6= ∅.TPartially order E downward by inclusion, and let
E0 be a chain in E. We show that E0 := E∈E0 E ∈ E. Now, E0 has the finite intersection
property since it is linearly ordered, hence by compactness E0 6= ∅. Also, it is easy to check
that E0 is an extreme set of K. Therefore, E0 is a lower bound for E0 in E. By Zorn’s lemma,
K has an extreme subset A that does not properly contain another extreme subset. We show
that A consists of a single point. Suppose to the contrary that A contains distinct points
x and y. Then there exists a real continuous linear functional f such that f (x) 6= f (y).
By the lemma, the set E := {z ∈ A : f (z) = a} is an extreme subset of A. Since A is an
extreme subset of K, it follows that E is an extreme subset of K. But E cannot contain
both x and y, hence E is a proper nonempty subset of A, contradicting the minimality of
A. Therefore, A has just one point, necessarily an extreme point of K.
Now let B denote the closed convex hull of the set of extreme points of K. Since K is closed
Miscellaneous Topics 351

and convex, B ⊆ K. Suppose the containment is proper, and let x ∈ K \ B. By 9.3.2, there
exists a real continuous linear functional f such that f (x) < inf f (B). Now, since the set
C := {z ∈ K : f (z) = inf f (K)} is nonempty, compact, and convex, it has an extreme point
z, by the first paragraph. Since C is an extreme subset of K (by the lemma), z is an extreme
point of K. In particular z ∈ B, which is impossible, since f (z) = inf f (K) ≤ f (x) < f (y)
for all y ∈ B. Therefore, it must be the case that B = K.
The following theorem describes a minimality property of ex K. It asserts that the closure
of any subset E of K that “generates” K must already contain the extreme points of K.
14.4.6 Theorem. Let X be a LCS and let K ⊆ X be a nonempty, compact, convex, subset
of X. If K = cl co E, then ex K ⊆ cl E.
Proof. We may assume that E is closed, hence compact. Suppose for a contradiction that x
is an extreme point of K not contained in E. Let U be a closed, balanced, neighborhood of
zero such that (x + U ) ∩ E = ∅. By compactness, there exist z1 , . . . , zn ∈ E such that the
sets zj + U cover E. Set Ej := E ∩ (zj + U ), these sets being compact and contained in K.
Now, the mapping
n n
X o n
X
(t1 , . . . , tn ) : tj ≥ 0, tj = 1 × E1 × · · · × En → K : (t1 , . . . , tn , x1 , . . . , xn ) 7→ tj xj
j=1 j=1
S  S 
n n
is continuous and so has compact range co j=1 Ej . Since E ⊆ co j=1 Ej , we have

[
n  [
n 
K = cl co E ⊆ cl co Ej = co Ej .
j=1 j=1
Pn Pn
Thus x may be expressed as x = j=1 tj xj , where xj ∈ Ej , tj ≥ 0, and j=1 tj = 1. Since
x is extreme, x = xj for some j. Thus x ∈ Ej ⊆ zj + U ⊆ E + U. But then x = e + u for
some e ∈ E and u ∈ U , producing the contradiction x − u = e ∈ (x + U ) ∩ E = ∅.
14.4.7 Remarks.
(a) The set of extreme points of a compact convex set need not be closed, even in the finite
dimensional case, as the figure illustrates.

extreme not extreme


FIGURE 14.1: A nonclosed set of extreme points.

(b) If X is the dual of a normed space, then the closed unit ball C1 is weak∗ compact and so
C1 is the closed convex hull of its extreme points. Thus, by (b) and (c) of 14.4.3, L1 (Rd , λd )
and c0 are not dual spaces.
(c) The space C(X, R), where X is a nontrivial, compact, connected, Hausdorff topological
space, is not a dual space. Indeed, the extreme points of C1 are the functions f with |f | = 1.
For such a function, f −1 {−1} and f −1 {1} are disjoint open sets whose union is X, hence one
352 Principles of Analysis

of these sets must equal X. Therefore, the extreme points of C1 are the constant functions ±1
and so ex C1 consists of constant functions. However, Urysohn’s lemma implies the existence
of nonconstant functions in C1 . Thus cl co(ex C1 ) 6= C1 , verifying the assertion. ♦

14.5 Applications of the Krein-Milman Theorem


The Existence of Ergodic Measures
Let X be a compact Hausdorff space and let P = P(X) denote the convex, w∗ -compact
set of regular Borel probability measures on X. Let T be a family of homeomorphisms on X.
A member µ of P is said to be T-invariant if T (µ) = µ for all T ∈ T. We denote by PT
the subset of P consisting of T-invariant measures. A direct argument shows that if PT is
nonempty, then it is convex and weak∗ -compact. A member µ of PT is said to be ergodic
if µ(A) = 0 or 1 for all A ∈ F with T −1 (A) = A µ-a.e., that is, µ(A 4 T −1 (A)) = 0. We
claim that if the mappings in T commute, then PT 6= ∅ and the ergodic measures are the
extreme points of PT . In particular, the Krein-Milman theorem implies the existence of
ergodic probability measures.
Suppose first that T consists of a single map T . Set PT = PT . Abusing notation slightly,
we let T also denote the weak∗ continuous affine, bijection T : µ 7→ T (µ): P → P. To show
that PT is nonempty, fix µ ∈ P and define µn ∈ P by
n−1
1X j
µn = T (µ).
n j=0

1
Pn
Let f ∈ C(X). Since T (µn ) = n j=1 T j (µ) we have

|µn (f ) − T (µn )(f )| ≤ n−1 |f − T n (µ)(f )| ≤ 2n−1 kf k∞ .

Thus if ν is any weak∗ limit point of (µn ), then T (ν)(f ) = ν(f ) for all f ∈ C(X), hence PT
is nonempty.
Now let µ ∈ PT . The mapping U f := f ◦ T maps L2 (µ) onto L2 (µ) and is unitary. By
the mean ergodic theorem (12.1.16),
n−1
1X j
lim U f = Pf (†)
n n j=0

in L2 (µ) norm, where P is the projection onto the closed linear subspace of L2R(µ) consisting
of those g ∈ L2 with U g = g. Applying the continuous linear functional h → h dµ = µ(h),
we see by invariance of µ that µ(P f ) = µ(f ).
We claim that if µ is ergodic, then P f must be constant and that constant must be
µ(f ). To see this, observe that U maps real functions onto real functions, hence so does P .
By considering real and imaginary parts, we may take f to be real. Set g = P f . Because
g = U g = g ◦ T , the set An := {g ≥ µ(g) + 1/n} satisfies T −1 (An ) = An µ-a.e. and so has
measure zero or one. If the measure were one, then by integrating we would obtain the
absurdity µ(g) ≥ µ(g) + 1/n. Therefore, µ(An ) = 0 for all n and so g ≤ µ(g) µ-a.e. A similar
argument shows that g ≥ µ(g) µ-a.e. Thus P f = µ(P f ) = µ(f ) for all f ∈ C(X), verifying
the claim.
Miscellaneous Topics 353

To show that an ergodic measure µ is an extreme point, let µ = tν + (1 − t)η, where


ν, η ∈ PT and 0 < t < 1. Since ν ≤ µ/t it follows that (†) also holds in the L2 (ν) norm.
Applying the linear functional ν to (†) and using P f = µ(f ) and the invariance of ν we see
that ν(f ) = ν(P f ) = µ(f ) for all f ∈ C(X). Therefore, µ = ν = η, establishing that every
ergodic measure is an extreme point. Conversely, suppose that µ ∈ PT is not ergodic. Then
there exists A ∈ F such that T (A) = A µ-a.e. and 0 < µ(A) < 1. The measure µA is then
invariant:
   
µA T (E) = µ A ∩ T (E) = µ T −1 (A ∩ T (E)) = µ (T −1 A) ∩ E = µA (E).

Similarly, µAc is invariant. Now define invariant probability measures


µA µAc
ν := and η := .
µ(A) µ(Ac )

Setting t = µ(A) we have 0 < t < 1 and µ = tν + 1 − t)η, hence µ is not extreme. This
verifies the claim for the case T = {T }.
Now consider the general case. Let S, T ∈ T. Because the maps commute, T maps
PS onto itself, hence we may restrict the mapping µ → T (µ) in the argument of the
second paragraph to the set PS to conclude that µ is both T - and S-invariant, that is,
P{S,T } 6= ∅. More generally, PF 6= ∅ for every finite F ⊆ T. Since these sets are compact,
their intersection PT is nonempty. The entire argument in the preceding paragraphs then
goes through if {T } is replaced by T.

The Stone-Weierstrass Theorem


In this section we use the Krein-Milman theorem to give a relatively short proof, due to
de Branges, of Stone’s generalization of the Weierstrass approximation theorem.

14.5.1 Theorem (Stone-Weierstrass). Let X be a compact Hausdorff topological space and


A a conjugate-closed subalgebra of C(X) that contains the constant functions and separates
points of X. Then A is dense in C(X).
Proof. It is clear that the closure of A is a conjugate-closed subalgebra of C(X), hence we
may assume that A is closed and then prove that A = C(X).
By the Hahn-Banach theorem, it suffices to show that A ⊥ = {0}. Suppose that this is not
the case. By Alaoglu’s theorem, the closed unit ball C10 of A ⊥ ⊆ C(X)0 is weak∗ compact.
By the Krein-Milman theorem, there exists a nontrivial extreme point µ of C10 , which we
take to be a complex Radon measure on X with K := supp |µ| = 6 ∅. We show that K consists
of a single point x. It will then follow
R that µ = cδx , where |c| = 1. Since 1 ∈ A and µ ∈ A ⊥ ,
we will have the contradiction c = 1 dµ = 0, completing the proof.
Assume that K contains distinct points x and y. We show that this is incompatible with
the fact that µ is an extreme point of C10 . Note first that for any h ∈ A the complex
R measure
h dµ is in A ⊥ . Indeed, since A is an algebra, if g ∈ A, then gh ∈ A and so gh dµ = 0, as
required. Now, since A separates points, there exists a function f ∈ A such that f (x) 6= f (y).
Let c = f (y). Then g := f − c ∈ A and g(y) = 0 6= g(x). Set
Z Z
1
h= gg, s := h d|µ| = khµk and t := 1 − s = (1 − h) d|µ| = k(1 − h)µk .
1 + kggk

Then h, 1 − h ∈ A, hence h dµ, (1 − h) dµ ∈ A ⊥ . Moreover, h(y) = 0, 0 < h(x) < 1, and


0 ≤ h < 1. Choose 0 < r < h(x) and an open neighborhood U of x such that h ≥ r on
354 Principles of Analysis
R
U . Then s ≥ U h d|µ| ≥ r|µ|(U ) > 0, the last inequality because U ∩ K 6= ∅. A similar
calculation shows that t > 0. Since
hµ (1 − h)µ
µ=s +t
khµk k(1 − h)µk
−1 −1
and µ is an extreme point of C10 , µ = khµk hµ. By uniqueness of densities, khµk h = 1 |µ|-
a.e. But then, by continuity, h = khµk = constant on K, contradicting that h(x) 6= h(y).

The Banach-Stone Isomorphism Theorem


Let X and Y be compact Hausdorff topological spaces. Given continuous mappings
σ : Y → X and τ : Y → C, define a mapping T(σ,τ ) : C(X) → C(Y ) by
(T(τ,σ) f )(y) = τ (y)(f ◦ σ)(y), f ∈ C(X), y ∈ Y. (14.1)
Clearly, T(τ,σ) is linear and

T(τ,σ) f = sup{|τ (y)(f ◦ σ)(y)| : y ∈ Y } ≤ kτ k kf k , (14.2)
∞ ∞ ∞

hence T(τ,σ) ≤ kτ k∞ . In this subsection we prove that every isometric isomorphism of
C(X) onto C(Y ) is of the form T(τ,σ) , where σ is a homeomorphism and τ (Y ) ⊆ T. We
isolate part of the proof in the following lemma.
14.5.2 Lemma. The linear map T := T(τ,σ) is an isometry onto Cb (Y ) iff σ is a homeo-
morphism onto Y and τ (Y ) ⊆ T.
Proof. The sufficiency is clear. For the necessity, first take f = 1 to obtain |τ (y)| =
|(T 1)(y)| ≤ kT 1k∞ ≤ 1. Next take g = 1 and set f = T −1 g. Since kgk∞ = 1 and T is an
isometry, kf k∞ = 1, hence 1 = g(y) = |τ (y)(f ◦ σ)(y)| ≤ |τ (y)|. Therefore, τ (Y ) ⊆ T.
Now suppose σ(Y ) 6= X andlet x ∈ X \ σ(Y ). Choose a continuous function f on X
such that f (x) = 1 and f σ(Y ) = 0. Then T f = 0, yet kf k∞ ≥ 1. Therefore, σ must be
surjective.
Finally, to show that σ is injective and hence is a homeomorphism, let y0 6= y1 and let g
be a continuous function on Y such that 0 ≤ g ≤ 1, g(y0 ) = 0, and g(y1 ) = 1. Set f = T −1 g.
Then
τ (y0 )(f ◦ σ)(y0 ) = (T f )(y0 ) = 0 and τ (y1 )f ◦ σ(y1 ) = (T f )(y1 ) = 1,
 
hence f (σ(y0 ) = 0 6= f (σ(y1 ) and so σ(y0 ) 6= σ(y1 ).
We may now prove
14.5.3 Theorem (Banach-Stone). Let X and Y be compact Hausdorff topological spaces. If
T : C(X) → C(Y ) is a surjective, isometric isomorphism, then there exists a homeomorphism
σ : Y → X and a continuous function τ : Y → T such that T = T(τ,σ) .
Proof. The dual map T 0 : Mra (Y ) → Mra (X) is an isometric isomorphism and hence maps
the closed unit ball C1,Y of Mra (Y ) onto the closed unit ball C1,X of Mra (X). Since T 0 is
also a weak∗ -homeomorphism that preserves convexity, it maps the extreme points of C1,Y
onto the extreme points of C1,X . By 14.4.3(f), the extreme points of these balls are of the
form cδz , |c| = 1, where δz is the Dirac measure at z. Thus for each y ∈ Y there exist unique
τ (y) ∈ T and σ(y) ∈ X such that T 0 δy = τ (y)δσ(y) . Therefore, for all f ∈ C(X),


(T f )(y) = hf, T 0 δy i = f, τ (y)δσ(y) = τ (y)f σ(y) .

Taking f = 1, we have (T 1)(y) = τ (y), hence τ is continuous. It follows that f σ(y) =
[τ (y)]−1 (T f )(y) is continuous in y for each f , which implies that σ is continuous. Since
T = T(τ,σ) , σ is a homeomorphism by the lemma.
Miscellaneous Topics 355

The Lyapunov Convexity Theorem


An R-valued measure µ on a measurable space (X, F) is said to be non-atomic if for each
A ∈ F with |µ|(A) > 0 there exists B ∈ F such that B ⊆ A and 0 < |µ|(B)| < |µ|(A). For
example, for any real-valued g ∈ L1 (λd ) the measure g dλd on Rd is non-atomic (Ex. 1.85. On
the other hand, counting measure on N is obviously atomic. The theorem in this subsection
asserts that the range of a finite dimensional, non-atomic vector measure is compact and
convex. For the proof we need
14.5.4 Lemma. Let ν be a σ-finite, non-atomic measure on (X, F). If ν(E) > 0, then
L∞ (E) is infinite dimensional.

Proof. We identify the space L∞ (E) with the subspace of all f ∈ L∞ (X) such that f = 0 on
E c . Since ν is σ-finite, we may suppose that ν(E) < ∞, otherwise consider a subset F of E
with positive finite measure and work with the subspace L∞ (F ). Set E0 = E. Since ν is non-
atomic, we may choose measurable sets En such that En ⊆ En−1 and 0 < ν(En ) < ν(En−1 )
for all n. Set Fn = En−1 \ En . Then the sets Fn are disjoint and have positive measure
implying that the indicator functions 1Fn are linearly independent.
14.5.5 Theorem (Lyapunov). Let µ1 , . . . , µd be real-valued non-atomic measures on F.
For E ∈ F, define µ(E) = µ1 (E), . . . , µd (E) . Then the set µ(F) := {µ(E) : E ∈ F} is a
compact convex subset of Rd .
Pd
Proof. Set ν = j=1 |µj | and note that ν is a non-atomic measure on (X, F) with |µj |  ν
for each j. By the Radon-Nikodym theorem, there exists gj ∈ L1 (ν) such that dµj = gj dν,
so µ = (g1 ν, . . . , gd ν). Define a linear map
Z Z  Z Z 
∞ d
T : L (ν) → R , T f = f dµ1 , . . . , f dµd = f g1 dν, . . . , f gd dν .
X X X X

Then T is continuous with respect to the weak∗ topology of L∞ (ν) = (L1 )0 and the norm
topology of Rd . Moreover, T 1E = µ(E) for all E ∈ F. Now consider the convex set
C :=R {f ∈ L∞R(ν) : 0 ≤ f ≤ 1 ν a.e.}. If (fα ) is a net in C that w∗ -converges to f , then
0 ≤ E f dν ≤ E 1 dν for all E, hence 0 ≤ f ≤ 1 ν-a.e. Therefore, C is w∗ -closed and so is
w∗ -compact, by the Banach-Alaoglu theorem. Thus T (C) is compact and convex in Rd .
We claim that µ(F) = T (C), which will prove the theorem. By definition of C, we
have µ(E) = T (1E ) ∈ T (C) for all E, that is, µ(F) ⊆ T (C). For the reverse inclusion, let
x ∈ T (C) and consider the convex, weak∗ compact set K := {f ∈ C : T f = x}. By the
Krein-Milman theorem, K has an extreme point g. We show that g is an indicator function.
If not, then ν{g(1 − g) 6= 0} > 0 and so for some ε > 0 the set E := {ε ≤ g ≤ 1 − ε} has
positive ν measure. By the lemma, L∞ (E) is infinite dimensional. Since T L∞ (E) is finite
dimensional, it follows that T cannot be 1-1, hence T h = 0 for some nonzero h ∈ L∞ (E)
−1
with khk∞ 6= 0. Multiplying by ε khk∞ , we may assume that −ε ≤ h ≤ ε. But then g ± h
are distinct members of K and g = 2 (g + h) + 12 (g − h), contradicting that g is extreme in
1

K. Therefore, g is an indicator function, completing the proof.


The convexity theorem may be seen as the theoretical basis for the so-called “bang-bang”
principle in optimal control. Control theory considers how systems (in physics, economics,
etc.) that evolve in time are influenced by feedback. The bang-bang principle asserts that
optimal change of system in minimal time may be achieved by the extreme values of the
set of allowable controls. (A missile can find its target optimally by executing a sequence
extreme left or extreme right rudder movements.) See, for example, [27].
356 Principles of Analysis

The Ryll-Nardzewski Fixed Point Theorem


Let X be a LCS, C a nonempty, compact, convex subset of X, and T a family of
continuous affine mappings from C into itself. A point x ∈ C is called a fixed point of T
if T (x) = x for all T ∈ T. We denote the set of fixed points of a single mapping T by FT .
Then
T FT is a compact, convex (possibly empty) subset of C, and T has a fixed point iff
T ∈T FT 6= ∅. In this chapter we establish an important fixed point theorem for mappings
on C. The following lemma will be useful.
T
14.5.6 Lemma. Suppose that T ∈F FT 6= ∅ for each finite subset F ⊆ T. Then T has a
fixed point.
Proof. By hypothesis, the collection {FT : T ∈ T} has the finite intersection property, hence,
by compactness, has a nonempty intersection.
For the theorems in this subsection, we require the following notions. A nonempty subset
X of C is said to be T-invariant if T X ⊆ X for all T ∈ T. A minimal invariant set is
an invariant set not properly contained in an invariant set. If X is T-invariant, then T is
said to be noncontracting on X if
x, y ∈ X and x 6= y ⇒ 0 6∈ cl{T x − T y : T ∈ T};
equivalently, for any net (Tα ) in T,
Tα x − Tα y → 0 ⇒ x = y.
Here is the key step needed to establish the Ryll-Nardzewski fixed point theorem. Its
proof relies on the Krein-Milman theorem.
14.5.7 Lemma (Dugundji-Granas). Let T be a semigroup of continuous affine maps from
C into itself that is noncontracting on each minimal, closed T-invariant subset of C. Then
T has a fixed point.
Proof. Let A be the collection of all nonempty, compact, convex, T-invariant subsets of C.
In particular, C ∈ A, henceTA 6= ∅. Partially order A downward by inclusion. Clearly, a chain
A0 in A has lower bound A0 , hence, by Zorn’s lemma, A has a minimal element E. Now
let B be the collection of all nonempty, closed, invariant, subsets of E. Another application
of Zorn’s lemma shows that there is a closed, minimal, T-invariant subset X of E. We show
that X has a single member, completing the proof.
Assume that X has distinct
 members x and y. Since E is convex and invariant, the
set A := {T (x + y)/2 : T ∈ T} is contained in E, hence so is the closure cl A, which
is also invariant. Since each T is affine, cl co A is invariant. Thus, by minimality of E,
cl co A = E. Now let z be an extreme point of E. By 14.4.6, z ∈ cl A, so there exists a net
Tα ((x + y)/2) → z. Taking subnets if necessary, we may suppose that Tα (x) → u and
Tα (y) → v for some u, v ∈ E. Thus z = 12 u + 12 v, and since z is extreme u = v. But
then Tα x − Tα y → 0, contradicting the noncontracting property of T on X.
14.5.8 Corollary (F. Hahn). Let T be a noncontracting semigroup of continuous affine
maps from C into itself. Then T has a fixed point.
14.5.9 Corollary (Kakutani). Let T be a group of continuous, affine maps from C onto
itself. Suppose that T is equicontinuous, that is, for each neighborhood V of 0 there exists a
neighborhood U of 0 such that
x − y ∈ U ⇒ Tx − Ty ∈ V ∀ T ∈ T.
Then T has a fixed point.
Miscellaneous Topics 357

Proof. By 14.5.8, is suffices to show that T is noncontracting. Let x, y ∈ C and let (Tα )
be a net in T such that Tα x − Tα y → 0. Let V be an arbitrary neighborhood of 0 and
choose U as in the theorem. Next, choose α0 such that Tα (x) − Tα (y) ∈ U for all α ≥ α0 .
For such α, x − y = Tα−1 Tα x − Tα−1 Tα y ∈ V. Since V was arbitrary, x = y.
We are now in a position to prove the main result of the section. We give a nontrivial
application in Chapter 16.
14.5.10 Theorem (Ryll-Nardzewski). Let C be a nonempty, weakly compact, convex subset
of a locally convex space Xτ and let T be a τ -noncontracting semigroup of weakly continuous
affine maps from C into itself. Then T has a fixed point.
T
Proof. (Dugundji-Granas) By 14.5.6, it suffices to prove that T ∈F FT 6= ∅, where F ⊆ T is
finite. Let S denote the subsemigroup of T generated by F. Then S consists of all products
of members of F. Choose any point x0 ∈ C. Since S is countable, the S-invariant convex
set K := clτ co(Sx0 ) ⊆ C is τ -separable. Moreover, by 10.1.6, K is weakly closed. Let X be
a weakly closed, minimal, S-invariant subset of K. We show that S is noncontracting on X
in the weak topology. It will follow from 14.5.7 that S has a fixed point in K, proving the
theorem.
Let x and y be distinct members of X. Since S is τ -noncontracting, there exists a τ -open,
convex neighborhood U of 0 such that the neighborhood V − V of zero is disjoint from
{Sx − Sy : S ∈ S}, where V := cl U . Since {z + U : z ∈ X} is a cover of X and X is
τ -separable, there exist countably many sets (zn + V ) ∩ X that cover X. Since V is τ -closed
and convex, it is weakly closed. Therefore, the weakly compact set X is a countable union of
weakly closed sets (zn + V ) ∩ X. By Baire’s theorem (0.12.5), some set (zn + V ) ∩ X contains
a nonempty, weakly open S set W . Now, the collection {S −1 (W ) : S ∈ S} of weakly open sets
covers X; otherwise X \ S∈S S −1 (W ) would be a weakly closed, nonempty, S-invariant
subset properly contained in the minimal set X. To show that S is noncontracting in the
weak topology, suppose for a contradiction that there exists a net (Sα ) in S such that
w- limα [Sα x − αSα y] = 0. We may assume by the weak compactness of X that the limits
w- limα Sα x andSw- limα Sα y exist and hence are equal. Let z denote their common value.
Since z ∈ X ⊆ S∈S S −1 (W ), we may choose S so that Sz ∈ W , implying that SSα x
and SSα y are eventually in W ⊆ zn + V . But then SSα x − SSα y is eventually in V − V ,
contradicting the choice of V .

14.6 Vector-Valued Integrals


Let X be a LCS and (X, F, µ) a measure space. A function f : X → X is said to be
weakly measurable (weakly integrable) if x0 ◦ f is measurable (integrable) R for each
x0 ∈ X 0 . For a weakly integrable f we seek a member of X, which we denote by X f (x) dµ(x)
and call the weak integral of f with respect to µ, that satisfies
Z  Z  Z 
0 0
f dµ, x = hf, x i dµ = hf (x), x i dµ(x) for all x0 ∈ X 0 .
0
(14.3)
X X X
R
If (14.3) holds, then, because X 0 separates points, the integral X
f dµ is unique, linear in f
and, in the case of a normed space, satisfies the inequality
Z Z

f dµ
≤ kf (x)k dµ(x). (14.4)
X X
358 Principles of Analysis

Moreover, if X is a Hilbert space and (14.3) holds for weakly integrable f and g, then
Z Z  Z

f dµ g dµ = (f (x) | g(y)) dµ(x) dµ(y) for all x ∈ X. (14.5)
X X X
 R
Indeed, taking x0 (·) = · | X g dµ in (14.3), we have
Z Z  Z  Z 


f dµ g dµ =
f (x) g(y) dµ(y) dµ(x),
X X X X

and using (14.3) on the inner integral yields (14.5). Finally, if T : X → Y is continuous and
linear and (14.3) holds for f and T f , where (T f )(x) = T (f (x)), then
 Z  Z  Z Z Z 
0 0 0 0 0 0 0
T f dµ, y = f dµ, T y = hf, T y i dµ = hT f, y i dµ = T f dµ, y ,
X X X X X

that is, Z Z
T f (x) dµ(x) = T (f (x)) dµ(x). (14.6)
X X
For the construction of the weak integral, we consider first the case of a Banach space.

Weak Integrals in Banach Spaces


Let X be a Banach space, (X, F, µ) a σ-finite measure space, and f : X → X weakly
integrable. The mapping
Tf : x0 → x0 ◦ f : X 0 → L1 (µ)
is clearly linear. Moreover, by the closed graph theorem, Tf is continuous. Indeed, if x0n → x0
a.e.
in X 0 and Tf x0n → g in L1 (µ), then for some subsequence we have Tf x0nk = x0nk ◦ f → g,
hence Tf x0 = g a.e., as required. Identifying L1 (µ)0 with L∞ (µ), we see that the dual map
Tf0 : L∞ (µ) → X 00 satisfies
Z
0
hx , Tf0 gi = hTf x , gi = 0
hf (x), x0 ig(x) dµ(x), g ∈ L∞ (µ), x0 ∈ X 0 .
X

In particular, Z
0
hx , Tf0 1E i = hf, x0 i dµ, E ∈ F, x0 ∈ X 0 .
E
R R
We denote Tf0 1E by E
f dµ. Thus E
f dµ is the unique member of X 00 satisfying
D Z E Z
0
x, f dµ = hf (x), x0 i dµ(x), ∀ E ∈ F and x0 ∈ X 0 . (14.7)
E E
R R
The vector E f dµ is called the Dunford integralR of f over E. If E f dµ ∈ X(= X) b for
all E, then f is said to be Pettis integrable and E f dµ is called the Pettis integral of
f over E. In this case, (14.7) may be written
DZ E Z
f dµ, x0 = hf (x), x0 i dµ(x), ∀ E ∈ F and x0 ∈ X 0 . (14.8)
E E

If X is reflexive, then the Dunford and Pettis integrals clearly coincide. The following
example shows this is not necessarily the case for nonreflexive spaces.
Miscellaneous Topics 359

14.6.1 Example. Let µ be counting measure on N and let a = (an ) ∈ `∞ (N). Define
f : N → c0 by f (n) = an en . For any x = (xn ) ∈ `1 (N) = c00 ,

X
hf (n), xi = aj xj en (j) = an xn ,
j=1

hence Z ∞
X
hf (n), xi dµ(n) = an xn = hx, ai.
N n=1
R
Therefore, f is Dunford integrable with N
f dµ = a ∈ `∞ (N) = c000 , and f is Pettis integrable
iff a ∈ c0 . ♦
The next theorem gives a simple sufficient condition for Pettis integrability.

14.6.2 Theorem. If X is a separable Banach space and f : X → X is weakly measurable


with kf (·)k ∈ L1 (µ), then f is Pettis integrable.
1
Proof. The condition kf (·)k ∈ L R (µ) implies that f is weakly integrable, hence (14.7) holds.
It remains to show that x := E f du ∈ X. For this it suffices by 10.2.9 to show that x00 is
00

weak∗ continuous on C10 .


By the separability of X, C10 , is metrizable in the weak∗ topology. Let x0n → x0 in
the weak∗ topology of C10 . Then hf (x), x0n i → hf (x), x0 i for each x ∈ X, and since
| hf (x), x0n i | ≤ kf (x)k and kf (·)k is integrable, the dominated convergence theorem implies
that
D Z E Z Z
hx0n , x00 i = x0n , f dµ = hf (x), x0n i dµ(x) → hf (x), x0 i dµ(x) = hx0 , x00 i ,
E E E

as required.
The question of countable additivity of the Dunford and Pettis integrals is of critical
importance in applications. The Dunford integral is countably additive in the weak∗ sense
but not S To see the former, let (En ) be a disjoint sequence in F and set
Snin the norm sense.
Fn = j=1 En and E = n En . Then for all x0 ,
D Z E Z Z D Z E
0 0 0 0
lim x , f dµ = lim hf (x), x i dµ(x) = hf (x), x i dµ(x) = x , f dµ ,
n Fn n Fn E E

which implies that


Z ∞ Z
X

f dµ = w - f dµ.
E n=1 En

On the other hand, taking a = (1, 1, . . . ) in Example 14.6.1 we have for any n
 Z  Z Z X
en , f dµ = hen , f (k)i dµ(k) = hen , ek i dµ(k) = hen , ek i = 1,
N N N k

and for m < n


 m Z
X  m Z
X m
X
en , f dµ = hen , ek i dµ(k) = hen , ej i = 0,
j=1 {j} j=1 {j} j=1
360 Principles of Analysis

hence for all m


Z m Z D Z m Z E
X X
f dµ − f dµ ≥ sup e , f dµ − f dµ ≥ 1.
n
n
N j=1 {j} N j=1 {j}

Therefore, the integral is not countably additive in the norm sense.


An argument similar to the above shows that the Pettis integral is countably additive in
the weak sense: Z X∞ Z
f dµ = w- f dµ.
E n=1 En

It is a remarkable fact that, in contrast to the Dunford integral, the Pettis integral is also
countably additive in the norm sense. This may be seen as a consequence of the Orlicz-Pettis
theorem regarding weak subseries-convergence of sequences. For details the reader is referred
to [12] or [45].

Weak Integrals in Locally Convex Spaces


Weak integrals may be defined for a LCS; however, for this we require a Radon measure.
14.6.3 Theorem. Let X be a LCS, X a locally compact Hausdorff topological space, and µ
a Radon measure on X. If f : X → X is continuous
R with compact support Kf and cl co f (X)
is compact,1 then there exists a unique member X f dµ ∈ cl co f (X) such that (14.3) holds.
Proof. By restricting the field of scalars to R and taking real parts of members of X 0 , we
may regard X as a real linear space. Moreover, by a scalar adjustment, we may suppose
that µ(Kf ) = 1. Given a finite subset F = {x01 , . . . , x0n } of X 0 , let QF denote the closed set
of all x ∈ cl co f (X) with the property
Z
hx, x0 i = hf, x0 i dµ for all x0 ∈ F. (†)
Kf
T
If each QF is nonempty, then by compactness F QF 6= ∅, hence the unique member of this
set satisfies the requirements of the theorem. Set
Z


a := (a1 , . . . , an ), where aj = f, x0j dµ.
X

Then QF will be nonempty if


 
a ∈ cl co hf (x), x01 i , . . . , hf (x), x0n i : x ∈ X . (‡)
Indeed, if (‡) holds, then a is the limit of a net of convex combinations of the form
n
X n
X
 
tj hf (xj ), x01 i , . . . , hf (xj ), x0n i = hy, x01 i , . . . , hy, x0n i , y := tj f (xj ),
j=1 j=1

hence the limit x of any  convergent subnet of the corresponding y’s ∈ co f (X) satisfies
hx, x01 i , . . . , hx, x0n i = a, which is (†). Now, if (‡) does not hold, then, by the separation
theorem in Rn , there exists a vector b = (b1 , . . . , bn ) ∈ Rn such that

hf (x), x01 i , . . . , hf (x), x0n i · b < a · b ∀ x ∈ X.
The integral of the left side over Kf is just a · b. Recalling that µ(Kf ) = 1, this is also the
integral of the right side. Since Kf is compact, integration preserves the strict inequality
and produces the contradiction a · b < a · b.
1 This is automatically the case if X is a Fréchet space (14.3.4).
Miscellaneous Topics 361

14.6.4 Corollary. Let X be a LCS, X a compact Hausdorff topological space, and P(X)
R measures µ on X. If f : X → X is continuous and
the set of all Radon probability R cl co f (X)
is compact, then the integral X f dµ exists in X. Moreover, the mapping µ → X f dµ from
P(X) to X is w∗ -w continuous.

The Bochner Integral


So far we have considered only weak integrals, that is, integrals defined in terms of
continuous linear functionals. In a Banach space, it is possible to construct a stronger form
of vector integral using limits of vector-valued simple functions in much the same way as
scalar integrals are constructed.
Let X be a Banach space and (X, F, µ) a measure space. An X-valued simple function
is a function f : X → X with finite range such that the set {f = x} is measurable for all
x ∈ X. If the distinct nonzero values of f are xk (1 ≤ k ≤ n), then f may be expressed in
standard form as
Xn
f= 1Ek xk , Ek := {f = xk }.
k=1

We say that f is Bochner integrable if µ(Ek ) < ∞ for all k. In this case we define the
Bochner integral of f by
Z Xn
f dµ = µ(Ek )xk .
X k=1

Note that
Z X n Z

f dµ ≤ µ(Ek ) kxk k = kf k dµ. (14.9)
X k=1 X

An argument entirely similar to that of 3.1.1(b) shows that for Bochner integrable simple
functions f and g and scalars c,
Z Z Z Z Z
(f + g) dµ = f dµ + g dµ and cf dµ = c f dµ.
X X X X X

Thus the integral is linear on the vector space of all Bochner integrable simple functions.
A function f : X → X is strongly measurable if there exists a sequence of simple
functions fn : X → X such that limn kfn (x) − f (x)k = 0 for µ-a.a. x ∈ X. In this case we
a.e.
write fn → f . It is easy to check that the set of strongly measurable functions is a linear
space under pointwise operations. Moreover, since kfn k is measurable and kfn (x)k → kf (x)k
a.e., we see that the norm of a strongly measurable function is measurable.
A strongly measurable function f : X → X is said to be Bochner integrable if there
exists a sequence of Bochner integrable simple functions fn such that
Z
a.e.
fn → f and lim kfn − f k dµ = 0.
n X

In this case the Bochner integral of f is defined by


Z Z
f dµ = lim fn dµ. (14.10)
X n X

We shall call the sequence of simple functions (fn ) in this definition a defining sequence
for the integral of f . To see that the limit in (14.10) exists, note that by (14.9)
Z Z Z Z Z

fm dµ − fn dµ ≤ kfm − fn k dµ ≤ kfm − f k dµ + kfn − f k dµ,
X X X X
362 Principles of Analysis
R 
hence fn dµ is a Cauchy sequence and so converges. To see that the limit in (14.10) is
independent of the defining sequence (fn ), let (gn ) be another such sequence. Then
Z Z Z Z

fn dµ − gn dµ ≤ kfn − f k dµ + kgn − f k dµ → 0.
X X X X

We say that f is Bochner integrable over E ∈ F, if f 1E is Bochner integrable.


14.6.5 Proposition. The set of Bochner integrable functions is a linear space under
pointwise operations, and the Bochner integral is linear. Moreover, if f is Bochner integrable,
then it is integrable over every E ∈ F.
Proof. Let f and g be Bochner integrable with defining sequences (fn ) and (gn ), respectively.
a.e. R
Then fn + gn → f + g and limn X kfn + gn − (f + g)k dµ = 0, hence f + g is Bochner
integrable. Since the integral is linear on simple functions,
Z Z Z Z Z Z
(f + g) dµ = lim (fn + gn ) dµ = lim fn dµ + lim gn dµ = f dµ + g dµ,
X n X n X n X X X

and Z Z Z Z
cf dµ = lim cfn = lim c fn dµ = c f dµ.
X n X n X X
The last assertion
R of the proposition follows from the obvious fact (fn 1E ) is a defining
sequence for E f .
Here is a useful characterization of Bochner integrability:
14.6.6 Theorem. Let f : X → X Rbe strongly
measurable.
R Then f is Bochner integrable iff
kf (·)k is integrable. In this case, X f dµ ≤ X kf k dµ.
R
Proof. If f is Bochner integrable and (fn ) is a defining sequence for X f , then, by definition,
kfn − f k is integrable. Since kf k ≤ kfn − f k + kfn k, kf k is integrable.
Conversely, assume that kf k is integrable. Choose a sequence of simple functions fn
with kfn (x) − f (x)k → 0 µ-a.e. Set En = {x : kfn (x)k ≤ 2 kf (x)k} and gn := fn 1En .
Then gn is simple, and because kfn (x)k → kf (x)k, x ∈ En for Rall sufficiently large n and
so kgn (x) − f (x)k → 0 a.e. Since kgn (x) − f (x)k ≤ 3 kf (x)k, kgnR− f k dµ → R 0 by the
dominated
R convergence
R theorem. Therefore, f is Bochner integrable, gn dµ → f dµ, and
kgn k dµ →R kf k dµ,
the
R last limit by the dominated convergence theorem. Finally, taking
limits in X gn dµ ≤ X kgn k dµ, we obtain the desired inequality.
Next, we prove a dominated convergence theorem for the Bochner integral. For this we
need the following lemma.
14.6.7 Lemma. Suppose that (X, F, µ) is σ-finite. If fn : X → X is strongly measurable
for all n and limn kfn (x) − f (x)k = 0 a.e., then f is strongly measurable.
Proof. Since (X, F, µ) is σ-finite, there exists a positive, integrable function ψ on X (Ex. 3.25).
Since kfn − fm k is measurable, kfn − f k = limm kfn − fm k is measurable. For measurable
functions g and h, define
Z
kg(x) − h(x)k
d(g, h) = ψ dµ.
1 + kg(x) − h(x)k
Since kf (x) − fn (x)k → 0 a.e., d(f, fn ) → 0 by the dominated convergence theorem. For
each n, let (gn,k )k be a sequence of simple functions converging a.e. in norm to fn . Then
limk d(fn , gn,k ) = 0. Since d(f, gn,k ) ≤ d(f, fn ) + d(fn , gn,k ), we may choose a sequence of
simple functions hn such that d(f, hn ) → 0. Passing to a subsequence if necessary, we may
assume (since ψ is positive) that khn − f k → 0 a.e. Therefore, f is strongly measurable.
Miscellaneous Topics 363

Here is the promised convergence theorem.


14.6.8 Theorem. Let (X, F, µ) be σ-finite and let (fn : X → X) be a sequence of Bochner
a.e.
integrable functions such that fn → f . Suppose there exists g ∈ L1 (µ) such that kfn k ≤ g
a.e. for each n. Then f is Bochner integrable and
Z Z
lim fn dµ = f dµ.
n X X

Proof. By the lemma, f is strongly measurable. Moreover, kf k ≤ g a.e., hence kf k is


integrable and so f is Bochner integrable (14.6.6). Moreover, since R kf − fn k → 0 and
kf − fn k ≤ 2g, the dominated convergence theorem R implies kf
that R − fn k → 0. The
desired conclusion now follows from the inequality X (f − fn ) dµ ≤ X kf − fn k dµ.
14.6.9 Theorem. Let T ∈ B(X). If f is Bochner integrable, then so is T f and
Z  Z
T f dµ = T f dµ.
X X
Pn
Proof. Let gPbe a Bochner integrable simple function, say g = k=1 xk 1Ek , µ(Ek ) < ∞.
n
Then T g = k=1 (T xk )1Ek , hence T g is a Bochner integrable simple function and
Z Xn n
X Z
T g dµ = (T xk )µ(Ek ) = T xk µ(Ek ) = T g dµ.
X k=1 k=1 X

Now let (fn ) be a defining sequence of simple functions for the integral of f . From
kT fn (x) − T f (x)k ≤ kT k kfn (x) − f (x)k, we see that (T fn ) is a defining sequence of simple
functions for the integral of T f . Therefore,
Z Z Z Z Z
T f dµ = lim T fn dµ = lim T fn dµ = T lim fn dµ = T f dµ.
X n X n X n X X

Here is the connection between the Pettis integral and the Bochner integral.
14.6.10 Proposition. If f : X → X is Bochner integrable, then f is Pettis integrable and
the Bochner and Pettis integrals coincide.
Pm
Proof. Let g be a Bochner integrable simple function, say g = k=1 xk 1Ek , µ(Ek ) < ∞.
Then for any x0 ∈ X 0 ,
DZ E DX m E X m Z X
m
g, x0 = µ(Ek )xk , x0 = µ(Ek )hxk , x0 i = hxk , x0 i1Ek dµ
X k=1 k=1 X k=1
Z DX
m E Z E
= 1Ek xk , x0 dµ = hg, x0 dµ. (†)
X k=1 X

Now let (gn ) be the defining sequence for the Bochner integral of f constructed in the
proof of 14.6.6. For any x0 ∈ X 0 , (x0 ◦ f )(x) = limn (x0 ◦ gn )(x) a.e. and |(x0 ◦ gn )(x)| ≤
kx0 k kgn (x)k ≤ 2 kx0 k kf (x)k, hence by the dominated convergence theorem and (†)
DZ E DZ E Z E Z E
f dµ, x0 = lim gn dµ, x0 = lim hgn , x0 dµ = hf, x0 dµ.
X n X n X X
R
Replacing f by f 1E shows that f is Pettis integrable and that the Bochner integral E
f dµ
is the same as the Pettis integral.
Note that the function f in Example 14.6.1 is Bochner integrable iff a ∈ `1 (N) (14.6.6).
Choosing a ∈ c0 \ `1 (N) produces an example of a Pettis integrable function that is not
Bochner integrable.
364 Principles of Analysis

14.7 Choquet’s Theorem


Let K be a nonempty, compact, convex subset of a real LCS X. Taking X = cl ex K and
f (x) = x in 14.6.4 and noting by the Krein-Milman theorem that cl co X = K, we have
Z
x dµ(x) ∈ K, µ ∈ P(X).
X

A vector x0 ∈ K is said to be represented by µ ∈ P(X) if


Z
x0 = x dµ(x). (14.11)
X=cl ex K
R
Every point of K may be so represented. Indeed, since the mapping F (µ) := X x dµ(x) on
P(X) is affine, F co δX = co X, and since F is w∗ -w continuous, it follows from 14.4.2(f)
and the Krein-Milman theorem that F P(X)) = K. Choquet’s theorem asserts that if K is
metrizable, then the measure µ in 14.11 may be taken to have support in ex K rather than
simply in cl ex K.
For the remainder of the section, K is assumed to be metrizable with metric d. For the
proof of Choquet’s theorem, we need the following lemmas.
14.7.1 Lemma. ex K is a Borel subset of K.
Proof. Let x ∈ K. Then x is not an extreme point of K iff there exist distinct points y
1
and z in K such
S that x = 2 (y + z) (14.4.1). Therefore, the complement of ex K in K is
the Borel set n Fn , where Fn = { 12 (y + z) : y, z ∈ K, d(y, z) ≥ 1/n}.
14.7.2 Lemma. Let A(K, R) denote set real-valued, continuous, affine functions on K. For
f ∈ C(K, R), define a function c(f ) : K → R by

c(f )(x) := inf{h(x) : h ∈ A(K, R) and f ≤ h}, x ∈ K.

Then the following properties hold:


(a) f ≤ c(f ) ≤ kf k∞ .

(b) c(f ) is concave and Borel measurable.

(c) If f is concave, then f = c(f ).

(d) c(f + g) ≤ c(f ) + c(g) and c(tf ) = tc(f ) (t > 0).

(e) c(f + h) = c(f ) + h if h ∈ A(K, R).

(f ) c(f − g) ≤ kf − gk∞ .

Proof. (a) The first inequality is clear, and the second follows from the fact that the constant
function h(x) := kf k∞ is affine.
(b) Let 0 < t < 1. If hj ∈ A(K, R) and hj ≥ fj , then th1 + (1 − t)h2 ∈ A(K, R) and
tf1 + (1 − t)f2 ≤ th1 + (1 − t)h2 , hence

c tf1 + (1 − t)f2 (x) ≤ th1 (x) + (1 − t)h2 (x) for all x ∈ K.
Miscellaneous Topics 365

Taking infima over h1 and h2 yields



c tf1 + (1 − t)f2 (x) ≤ t c(f1 )(x) + (1 − t) c(f2 )(x).

For the second part of (b), note that if c(f )(x) < a, then h(x) < a for some h ∈ A(K, R)
with h ≥ f . Since h < a on some neighborhood U of x, the inequality c(f ) < a holds on U .
Thus {c(f ) < a} is open, hence measurable.
(c) Assume for a contradiction that f (y) < c(f )(y) for some y. Since f is continuous and
concave, C := {(x, t) : t ≤ f (x)} is a closed convex subset of the real LCS X × R. Since
y, c(f )(y) 6∈ C, by the separation theorem there exists an a ∈ R and a continuous linear
functional F on X × R such that

F (x, t) ≤ a < F y, c(f )(y) ∀ (x, t) ∈ C.

In particular, F (y, f (y)) < F y, c(f )(y) , and by subtracting and normalizing we see that

F y, c(f )(y) − F (y, f (y))
F (0, 1) = > 0.
c(f )(y) − f (y)

Now define
a − F (x, 0)
h(x) := , x ∈ K.
F (0, 1)
Then  
F x, h(x) = F 0, h(x) + F x, 0) = h(x)F (0, 1) + F x, 0) = a.

If also F x, t = a, then a−F (x, 0) = tF 0, 1), and dividing by F 0, 1) shows that t = h(x).
Thus h(x) is the unique real number satisfying F (x, h(x)) = a. It follows that h ∈ A(K, R).
Since for all x ∈ K,

a ≥ F x, f (x) = F (x, 0) + f (x)F 0, 1) = a − h(x)F (0, 1) + f (x)F 0, 1),

we see that f ≤ h. Therefore, c(f )(y) ≤ h(y). On the other hand,



F (y, 0) + h(y)F (0, 1) = a < F y, c(f )(y) = F (y, 0) + c(f )(y)F (0, 1),

hence h(y) < c(f )(y). With this contradiction we see that (c) holds.
(d) The first inequality is proved by considering affine functions h and k majorizing f and
g, respectively, and noting that h + k is affine. The second follows from the fact that h is
affine iff th is affine.
(e) By (a) and (d) it suffices to show c(f ) + h ≤ c(f + h). But if k is affine and k ≥ f + h
then k − h is affine and majorizes f , so k − h ≥ c(f ), or k ≥ c(f ) + h. Taking infima over k
gives the desired inequality.
(f) By part (d), c(f ) = c(f −g+g) ≤ c(f −g)+c(g), hence c(f )−c(g) ≤ c(f −g) ≤ kf − gk∞ ,
the last inequality by (a).
We may now prove
14.7.3 Theorem (Choquet). Let X be a real LCS and K a nonempty, compact, convex,
metrizable subset of X. Then each x0 ∈ K is represented by a Radon probability measure µ
on cl ex K supported by ex K.
366 Principles of Analysis

P R) is separable. Let (hn ) be a dense sequence in the unit


Proof. Since K is metrizable, A(K,
sphere of A(K, R) and set g = n h2n /2n . We claim that g is strictly convex on K. To see
this, let x =6 y ∈ K and 0 < t < 1. Since X 0 separates points of X, affine functions separate
points of K, hence there exists an hn such that hn (x) 6= hn (y). Since the function x 7→ x2
is strictly convex,
  2
h2n tx + (1 − t)y = thn (x) + (1 − t)hn (y) < th2n (x) + (1 − t)h2n (y).

Since weak inequality holds for the remaining functions hk in the definition of g, it follows
that g(tx + (1 − t)cy) < tg(x) + (1 − t)g(cy), verifying the claim.
Fix x0 ∈ K and define a functional p on C(K, R) by p(f ) = c(f )(x0 ), where c(f ) is the
function in 14.7.2. From 14.7.2(d), p is subadditive and positively homogeneous. Define a
linear functional ϕ on the subspace B := A(K, R)+R g of C(K, R) by ϕ(h+rg) = (h+rg)(x0 ).
In particular, ϕ(h) = h(x0 ), ϕ(g) = g(x0 ), and ϕ(1) = 1. We claim that ϕ ≤ p on B, that
is,
h(x0 ) + rg(x0 ) ≤ c(h + rg)(x0 ) ∀ r ∈ R and h ∈ A(K, R).
Indeed, if r ≥ 0, then c(h + rg) = h + rc(g) ≥ h + rg by 14.7.2(a,e), and if r < 0, then rg is
concave, hence c(h + rg) = h + rg by 14.7.2(c), verifying the claim. By the Hahn-Banach
theorem, ϕ extends to a linear functional µ on C(K, R) such that µ ≤ p on C(K, R). Noting
that f ≤ 0 ⇒ µ(f ) ≤ p(f ) = c(f )(x0 ) ≤ 0, we see that µ is a positive linear functional
on C(K, R). Since ϕ(1) = 1, we may identify µ with a Radon probability measure on K.
Since g(x0 ) = ϕ(g) = µ(g) for all g ∈ A(F, R), µ represents x0 . It remains to show that
supp µ ⊆ ex K. 
We claim that µ(g) = µ c(g) (recalling that c(g) is a bounded Borel function). Indeed,
by 14.7.2(a), µ(g) ≤ µ c(g) . For the reverse inequality,
 let h ∈ A(K, R) and h ≥ g. Then
h ≥ c(g), hence h(x0 ) = ϕ(h) = µ(h) ≥ µ c(g) . Taking the infimum over all such h yields
µ(g) = ϕ(g) = c(g)(x
R 0 ) ≥ µ c(g) . 
We now have [c(g) − g] dµ = 0. Since c(g) − g ≥ 0 (14.7.2(a)), µ g < c(g) = 0.
To complete the proof it therefore suffices to show that K \ ex K ⊆ {g < c(g)}. But if
x ∈ K \ ex K and y = 6 z ∈ K with x = 12 (y + z), then, by the strict convexity of g and the
concavity of c(g), g(x) < 12 g(y) + 12 g(z) ≤ 12 c(g)(y) + 12 c(g)(z) ≤ c(g)(x), as required.
The proof of Choquet’s Theorem given above is due to Bonsall. This, as well as a proof
of the more general Choquet-Bishop-deLeeuw theorem (where the metrizability hypothesis
on K is removed), may be found in [37], which contains many related results. Here is one of
particular interest:
14.7.4 Corollary (Rainwater). Let X be a separable normed space, x ∈ X, and (xn ) a
bounded sequence in X. If limn hxn , x0 i = hx, x0 i for every extreme point x0 of the closed
w
unit ball C10 of X 0 , then xn → x.
Proof. We may assume that X is a real normed space. It suffices to show that limn hxn , y0 i =
hx, y0 i for y0 ∈ C10 . Since C10 is compact, convex, and metrizable in the weak∗ topology, y0
may be represented by a probability measure µ on C10 supported by the extreme points:
Z
y0 = x0 dµ(x0 ).
ex C10

By the dominated convergence theorem,


Z Z
hxn , y0 i = cn (x0 ) dµ(x0 ) →
x b 0 ) dµ(x0 ) = hx, y0 i.
x(x
ex C10 ex C10
Part III

Applications
Chapter 15
Distributions

Spaces of distributions are the duals of spaces of C ∞ functions on open subsets of Rd . The
operations of differentiation, convolution, and Fourier transform of functions may be extended
by duality to distributions, opening up the possibility of finding non-differentiable solutions,
so-called weak solutions, of differential equations that may not have smooth solutions. For
example, consider the partial differential equation
X
ψα (x)∂ α f (x) = g(x)
α∈S

on some open set U ⊆ Rd , where S is a finite set of multi-indices, ψα ∈ C ∞ (U ), and g is


locally integrable. Multiplying the equation by a function φ in Cc∞ (U ) and integrating by
parts over Rd yields
Z X Z
|α| α d
f (−1) ∂ (ψα φ) dλ = gφ dλd .
Rd α∈S Rd

There are no constant terms here because φ has compact support. Functions f that satisfy
the last equation for every φ ∈ Cc∞ (U ) are called weak solutions of the original PDE. There
is no reason to assume that these solutions must be smooth.
It is beyond the scope of the text to delve into the distributional theory of PDEs. Our
goal here is merely to define the main distribution spaces, describe their functional analytic
properties, and discuss the standard operations on distributions. We do, however, give a
simple application to PDEs in §15.6.

15.1 General Theory


The Fréchet Space C ∞
K (U )

Let U be an open subset of Rd . For a compact subset K of U , let



CK (U ) := {φ ∈ C ∞ (U ) : supp φ ⊆ K}.

Then CK (U ) is a Fréchet space with respect to the countable family of norms φ → k∂ α φk∞ ,

where α = (α1 , . . . , αd ) is a multi-index. The verification that CK (U ) is complete is entirely
similar to the argument in 9.1.6(c) regarding the completeness of the Fréchet space C ∞ (U ).

We denote the topology of CK (U ) by τ K . The following proposition is immediate from
9.2.4.

15.1.1 Proposition. Let F be a linear functional on CK (U ). Then F is τ K -continuous
iff there exists m ∈ Z and M > 0 such that | hφ, F i | ≤ M pm (φ) for all φ ∈ CK
+ ∞
(U ), where
pm (φ) := max{k∂ α φk∞ : |α| ≤ m}.

369
370 Principles of Analysis

The Spaces D(U ) and D0 (U )


Define an increasing sequence of compact subsets of U as in 9.1.6(a) by
Kn := {x ∈ Rd : |x| ≤ n, d(x, U c ) ≥ 1/n}.
Clearly, Kn ↑ U and Cc∞ (U ) is the union of the spaces CK

n
(U ). Since every compact subset
K of U is contained in some Kn , the following result is an immediate consequence of 9.4.6
and 9.4.7.1
15.1.2 Theorem. There exists a LCS topology τ on Cc∞ (U ) such that the following hold:
(a) A sequence (φn ) τ -converges to φ in Cc∞ (U ) iff there exists a compact set K such that

(φn ) ⊆ CK (U ) and ∂ α φn → ∂ α φ uniformly on U for all multi-indices α.
(b) If T is a linear mapping from Cc∞ (U ) to a LCS X, then T is τ -continuous iff for each

compact subset K of U the restriction of T to CK (U ) is τ K continuous, that is, iff

T φn → T φ in X whenever (φn ) ⊆ CK (U ) and ∂ α φn → ∂ α φ uniformly on U for all
multi-indices α.
The space Cc∞ (U ) with the topology τ is denoted by D(U ) and is called the space of test
functions. A member of the dual D0 (U ) of D(U ) is called a distribution or generalized
function on U . It is customary in this setting to reverse the duality notation hφ, F i = F (φ)
and write instead
hF, φi = F (φ), F ∈ D0 (U ), φ ∈ D(U ).
This convention frequently renders a distribution formula into a more readable and compu-
tationally convenient form. The following proposition follows directly from 15.1.1 and (b) of
15.1.2.
15.1.3 Proposition. Let F be a linear functional on D(U ). Then F ∈ D0 (U ) iff for each
compact K ⊆ U there exists m ∈ Z+ and M > 0 such that | hF, φi | ≤ M pm (φ) for all

φ ∈ CK (U ).

Examples of Distributions

(a) Let f : U → C be locally Lebesgue integrable, that is, f is measurable and f K is
Lebesgue integrable for every compact subset K of U . Denote the space of all R locally
integrable functions by L1loc (U ). For each f ∈ L1loc (U ), the equation hFf , φi := U f φ dλd
defines a distribution, as may be seen by taking m = 0 in 15.1.3. Note that the mapping
f → Ff : L1loc (U ) → D0 (U ) is linear. Moreover, if we identify functions
R that are equal a.e.,
then the map is 1-1. Indeed, this amounts to the assertion that U f φ = 0 for all φ ∈ Cc∞ (U )
⇒ f = 0 a.e., which is valid by a standard approximation argument, since Cc∞ (U ) is dense
in L1 (U ). In view of this correspondence and to simplify notation one frequently writes f
for Ff , so that Z
hf, φi = hFf , φi = f φ dλd , φ ∈ D(U ). (15.1)
U
R
(b) Let µ be a Radon measure on U . Then φ → U φ dµ defines a distribution,
R again by
taking m = 0 in 15.1.3. More generally, for fixed α, the mapping φ → U ∂ α φ dµ defines a
distribution, this time by taking m = |α|.
(c) A special case of (b) is obtained by taking µ to be the Dirac measure δx at x ∈ U . This
gives the Dirac delta distribution φ → φ(x) at x.
1 These refer to the existence and the basic properties of strict inductive limits. Since an understanding of

the material in the current chapter does not depend on the abstract notion of inductive limit, the reader
may simply accept the statement of Theorem 15.1.2.
Distributions 371

15.1.4 Remarks. The Dirac delta distribution is not given by a function as in (a). To see
this, take the special case U = R and x = 0. For r > 0 consider the test function
( 
exp 1 − [1 − (x/r)2 ]−1 if |x| ≤ r
φr (x) =
0 otherwise.

If δ0 were given by a locally integrable function f we would have


Z Z
1 = φr (0) = f (x)φr (x) dx ≤ |f (x)| dx → 0 as r → 0.
|x|≤r |x|≤r

6 0 then the distribution φ → ∂ α φ(x) is not given by a measure µ as in


Similarly, if α =
Example (b). Otherwise, for the test function ψr (x) := xφr (x), we would have
Z Z
0
1 = ψr (0) = ψr (x) dµ ≤ r dµ → 0 as r → 0. ♦
R |x|≤r

15.2 Operations on Distributions


Derivative of a Locally Integrable Function
Let f ∈ C ∞ (U ), α a multi-index, and φ ∈ D(U ). The classical integration by parts
formula gives
Z Z
h∂ α f, φi = φ ∂ α f = (−1)|α| f ∂ α φ = (−1)|α| hf, ∂ α φi .
U U

Here we have used the convention described in (15.1), The right side of the equation makes
sense for any locally integrable function f . Thus we define the distributional or weak
derivative ∂ α f of f by

h∂ α f, φi = (−1)|α| hf, ∂ α φi , φ ∈ D(U ), f ∈ L1loc (U ).

For example, the classical derivative of f (x) = |x| does not exist on R, but the distributional
derivative of f exists and equals 1(0,∞) − 1(−∞,0) . Indeed, if φ ∈ D(R), then integrating by
parts we have
Z 0 Z ∞ Z 0 Z ∞
0 0 0
−hf, φ i = xφ (x) dx − xφ (x) dx = − φ+ φ = h1(0,∞) − 1(−∞,0) , φi.
−∞ 0 −∞ 0

Derivative of a Distribution
Generalizing the preceding, we define the derivative ∂ α F of F ∈ D0 (U ) by

h∂ α F, φi := (−1)|α| hF, ∂ α φi, φ ∈ D(U ).

It follows directly from 15.1.3 that ∂ α F ∈ D0 (U ). For an example, take H = 1[0,∞) , the
so-called Heaviside function on R. For any φ ∈ D(R) we have
  Z ∞
d
FH , φ = − φ 0 = φ(0),
dx 0

0
hence FH = δ0 (φ).
372 Principles of Analysis

Multiplication by a Smooth Function


Given F ∈ D0 (U ) and f ∈ C ∞ (U ), define f F by
hf F, φi = hF, f φi, φ ∈ D(U ).
By 15.1.3, f F ∈ D0 (U ). Note that if g is locally integrable, then
Z
hf Fg , φi = hFg , f φi = g(f φ) = hFf g , φi,

that is, f Fg = Ff g . Furthermore, for any 1 ≤ k ≤ d and φ ∈ D, the classical product rule
∂k (f φ) = f ∂k φ + φ∂k f implies that
h∂k (f F ), φi = −hf F, ∂k φi = −hF, f ∂k φi = −hF, ∂k (f φ)i + hF, φ ∂k f i
= h∂k F, f φi + h(∂k f )F, φi = hf (∂k F ), φi + h(∂k f )F, φi,
that is,
∂k (f F ) = f (∂k F ) + (∂k f )F.
This is the product rule for distributions.

Composition with Linear Maps

Let T ∈ B(Rd ) be invertible and set V := T (U ), so that T : U → V is C ∞ with C ∞


inverse T −1 : V → U . If f ∈ Cc (V ) and φ ∈ D(U ), then Ff ∈ D0 (V ), φ ◦ T −1 ∈ D(V ), and
by the change of variables theorem (3.2.18),
Z Z
Ff (φ ◦ T −1 ) = f · (φ ◦ T −1 ) = | det T | (f ◦ T )φ = | det T |Ff ◦T (φ),
V U

which we write as
Ff ◦T (φ) = | det T |−1 Ff (φ ◦ T −1 ), φ ∈ D(U ), f ∈ Cc (V ).
The identification f ↔ Ff then suggests the following definition of F ◦ T for an arbitrary
distribution F :
F ◦ T (φ) = | det T |−1 F (φ ◦ T −1 ), φ ∈ D(U ), F ∈ D0 (V ).
One easily checks that F ◦ T ∈ D0 (U ). In particular, for reflections T (x) = −x we define
the distribution Fe by
e where φ(x)
Fe(φ) := F ◦ T (φ) = F (φ), e := φ(−x).

15.3 Distributions with Compact Support


Let V be an open subset of U . A distribution F ∈ D0 (U ) is said to be zero on V if
hF, φi = 0 for all φ ∈ Cc (U ) with supp φ ⊆ V . The complement in U of the union of all open
sets on which F is zero is called the support of F and is denoted by supp F . Thus supp F
is the closed set defined by
[
U \ supp F = {V : V ⊆ U is open and F = 0 on V }.
The support of a distribution is a generalization of the notion of support of a function in
the following sense:
Distributions 373

15.3.1 Proposition. If f ∈ C(U ) is locally integrable, then supp(Ff ) = supp(f ).


R
Proof. Let C := supp(f ). Since hFf , φi = C f φ = 0 for all φ ∈ Cc (U ) with support
contained in the open set U \ C, supp(Ff ) ⊆ C.
For the reverse inclusion, let V be any open set on which Ff = 0. Then
Z
f φ dλd = hFf , φi = 0 ∀ φ ∈ Cc (U ) with supp(φ) ⊆ V .

Let K ⊆ V be compact. RIf f ∈ C ∞ (U ), then, replacing φ by f φ in the above, where


1K ≤ φ ≤ 1U , we see that K |f |2 dλd = 0 and so f = 0 on K. Since a continuous function
is uniformly approximable on K by C ∞ functions, the same result holds for continuous
functions f . Since K was arbitrary, f = 0 on V . Therefore, supp(f ) ⊆ V c . Since V was
arbitrary, supp(f ) ⊆ supp(Ff ).
15.3.2 Proposition. F = 0 on U \ supp F .
Proof. Let {Vi : i ∈ I} be the collection
S of all open subsets of U on which F = 0 and let
φ ∈ Cc (U ) with K := supp(φ) ⊆ i Vi = U \ supp F . We show F (φ) = 0, which will prove
the proposition.
By the partition of unity theorem (0.14.6) applied to the open cover {Vi } of K there
, Vp } and nonnegative functions χj ∈ Cc∞ (U ) (j = 1, . . . , p)
exists a finite subcover {V1 , . . .P
p
such that supp(χP j ) ⊆ Vj and j=1 χj P = 1 on K. Then, by definition of Vj , F (φχj ) = 0,
and since φ = j φχj we have F (φ) = j F (φχj ) = 0.
A distribution F ∈ D0 (U ) is said to have compact support if supp(F ) is compact. For
example, by 15.3.1, members of Cc (U ), considered as distributions, have compact support.
We denote the space of all such distributions by E0 (U ):

E0 (U ) := {F ∈ D0 (U ) : supp(F ) is compact}.

Recall from 9.1.6(c) that topology on the Fréchet space C ∞ (U ) is defined by the seminorms

pm,α (f ) = sup{|∂ α f (x)| : x ∈ Km }, (15.2)

where the Km are compact, Km ⊆ int(Km+1 ), and Km ↑ U , and that Cc∞ (U ) is dense in
C ∞ (U ). The next theorem asserts that the dual of C ∞ (U ) is E0 (U ). For the statement, we
employ the following convenient notation: Let X be a LCS and Y a linear subspace of X
with a locally convex topology with respect to which the inclusion mapping Y ,→ X is
continuous. This simply means that the given topology of Y is stronger than the relative
topology from X. It follows that the restriction to Y of every member of the dual X 0 of X
is a member of Y 0 . We express this by writing X 0 Y ⊆ Y 0 .

15.3.3 Theorem. The inclusion mapping D(U ) ,→ C ∞ (U ) is continuous, and the restriction
to D(U ) of a member G of the dual of C ∞ (U ) is a distribution F . Moreover, F has compact
support, and every member of D(U )0 with compact support arises in this manner, that is,
extends (uniquely) to a member of the dual of C ∞ (U ). Thus


C ∞ (U )0 = E(U )0 .
D(U )

Proof. Let (φn ) τ -converge to 0 in D(U ) as in (a) of 15.1.2. Thus there exists a compact

K ⊆ U such that (φn ) ⊆ CK (U ) and ∂ α φn → 0 uniformly on U for all multi-indices α. Then,
α
trivially, ∂ φn → 0 uniformly on any compact subset of U . Thus by (b) of 15.1.2, D(U ) ,→
C ∞ (U ) is continuous. This shows that every continuous linear functional G on C ∞ (U )
374 Principles of Analysis

restricts to a continuous linear functional F on D(U ). To see that F has compact support, by
continuity of G choose C > 0 and m, N ≥ 1 such that |G(φ)| ≤ C max|α|≤N pm,α (φ) for all
φ ∈ C ∞ (U ). If φ ∈ Cc∞ (U ) and supp(φ) ⊆ Km c
, then pm,α (φ) = 0 and so F (φ) = G(φ) = 0.
c
Therefore, Km is one of the open sets comprising U \ supp(F ), hence supp F ⊆ Km .
Conversely, let F ∈ D0 (U ) have compact support. Choose ψ ∈ Cc∞ (U ) such that ψ = 1 on
supp F and set K := supp ψ ⊇ supp F . Define G on C ∞ (U ) by G(f ) = F (f ψ). By continuity

of F on CK (U ), there exists M > 0 and N ≥ 1 such that |F (f ψ)| ≤ M max|α|≤N k∂ α (f ψ)k∞
for all f ∈ C ∞ (U ). Now, by the product rule, ∂ α (f ψ) is a sum of derivatives (∂ β f )(∂ γ ψ)
(|β| + |γ| = |α|), and each of the terms ∂ γ ψ has support in K. Letting M 0 be a bound for
the sum of the terms |∂ γ ψ|, we then have for sufficiently large m
|G(f )| = |F (f ψ)| ≤ M M 0 max sup |∂ α (f )| = M M 0 max pm,α (f ), f ∈ C ∞ (U ),
|α|≤N K |α|≤N

proving that G is continuous on C ∞ (U ).


Now note that V := U \ K ⊆ U \ supp F , hence F = 0 on V (15.3.2). Let f ∈ Cc∞ (U ).
Since f ψ = f on K, supp (f ψ − f ) ⊆ V and so F (f ψ − f ) = 0, that is, G(f ) = F (f ).
Therefore, G is an extension of F . Uniqueness of the extension follows from the fact that
Cc∞ (U ) is dense in C ∞ (U ) (9.1.6(b)).

15.4 Convolution of Distributions


The convolution of functions f, g ∈ L1 (Rd ) was defined in §6.1. The same construction is
valid for f ∈ L1loc (Rd ) and ψ ∈ Cc∞ (Rd ):
Z Z Z
f ∗ ψ(x) = f (x − y)ψ(y) dy = f (y)ψ(x − y) dy = f (y)ψx (y) dy = Ff (ψx ),

where ψx (y) := ψ(x − y). This suggests the definition


(F ∗ ψ)(x) := F (ψx ), x ∈ Rd ,
for an arbitrary distribution F . The function F ∗ ψ is called the convolution of F with ψ.
The basic properties of this convolution are given in the following proposition.
15.4.1 Proposition. Let F ∈ D0 (Rd ), ψ ∈ D(Rd ). Then
(a) F ∗ ψ ∈ C ∞ (Rd ), (b) ∂ α (F ∗ ψ)(x) = hF, (∂ α ψ)x , i, and (c) (∂ α F ) ∗ ψ = F ∗ (∂ α ψ),

where (∂ α ψ)x (y) = (∂ α ψ)(x − y).


Proof. We show first that F ∗ ψ is continuous. Let xn → x in Rd , so that (xn ) ⊆ Cr (x) for
some r. Then the supports of ψxn and ψx are contained in the compact set Cr (x) − supp(ψ).
Moreover, from ∂ α ψxn = (−1)|α| (∂ α ψ)xn and ∂ α ψx = (−1)|α| (∂ α ψ)x we see that ∂ α ψxn →
τ
∂ α ψx uniformly on Rd . Therefore, by 15.1.2(a), ψxn → ψx and so F (ψxn ) → F (ψx ).
Now observe that if t := (t, 0, . . . , 0), then
  
(∂/∂x1 )(F ∗ ψ)(x) = lim t−1 F (ψx+t ) − F (ψx ) = lim F t−1 [ψx+t − ψx ] .
t→0 t→0

An argument similar to that of the preceding paragraph (using the mean value theorem)
shows that if tn → 0, then
   
t−1 −1
n ψx+tn (y) − ψx (y) = tn ψ(x + tn − y) − ψ(x − y) → [(∂/∂x1 )ψ](x − y)
Distributions 375

in D(Rd ) and so (∂/∂x1 )(F ∗ ψ)(x) = F (∂/∂x1 ψ)x . Analogous arguments
 apply to the
α α
other variables. By induction we obtain ∂ (F ∗ ψ)(x) = F (∂ ψ)x , proving (a) and (b).
From the definitions of convolution and derivative,

[(∂ α F ) ∗ ψ](x) = (∂ α F )(ψx ) = (−1)|α| F (∂ α ψx ) = F ((∂ α ψ)x ) = (F ∗ ∂ α ψ)(x),

verifying (c).
15.4.2 Proposition. If F ∈ D0 (Rd ) and ψ ∈ Cc∞ (Rd ), then supp F ∗ ψ ⊆ supp F + supp ψ.
In particular, the members of E0 (Rd ) ∗ Cc∞ (Rd ) have compact support, that is, the inclusion
E0 (Rd ) ∗ Cc∞ (Rd ) ⊆ E0 (Rd ) holds.
Proof. Since supp F is closed and supp ψ is compact, the set C := supp F + supp ψ is closed.
Let U be open with compact closure contained in C c . Then cl U − supp ψ is compact and
does not meet the closed set supp F , hence there exists g ∈ C ∞ (Rd ) such that g = 0 on an
open set V ⊇ cl U − supp ψ and g = 1 on an open set W ⊇ supp F . Then for all φ ∈ D(Rd ),
supp(gφ − φ) ⊆ W c ⊆ (supp F )c and so F (gφ − φ) = 0. In particular, F (gψx − ψx ) = 0 that
is, F ∗ ψ(x) = F (gψx ). But if x ∈ U , then gψx is identically equal to zero. Indeed, assume
that g(y)ψx (y) 6= 0 for some y. Then x − y ∈ supp ψ, hence y ∈ V . But g = 0 on V . Thus
F ∗ ψ(x) = F (gψx ) = 0 on U . Since U was arbitrary, F ∗ ψ = 0 on the open set C c and so
supp F ∗ ψ ⊆ C.
The following lemma will be used to prove the associative law for convolutions.
15.4.3 Lemma. Let F ∈ D0 (Rd ) and ψ, φ ∈ D(Rd ). Then hF ∗ ψ, φi = hF, ψe ∗ φi, where
e
ψ(x) := ψ(−x).
Proof. The left side of the desired equality may be written
Z Z


hF ∗ ψ, φi = (F ∗ ψ)(x)φ(x) dx = F, φ(x) ψx dx.

To obtain a like expression for the right side, note that


Z Z Z
(ψe ∗ φ)(y) = ψ(ye − x)φ(x) dx = ψ(x − y)φ(x) dx = φ(x)ψx (y) dx,

so that the right side of the desired equation may be written


 Z 
e
hF, ψ ∗ φi = F, φ(x)ψx (·) dx ,

where the integral may be taken to be a Bochner integral. Thus we must show that
Z  Z 


F, φ(x)ψx dx = F, φ(x)ψx (·) dx .

To this end, note first that the integrand on the right, as a function of y, is supported in the
compact set K := supp(φ) + supp(ψ). Overlay K with a grid Q of cubes Qj with volumes
vj and let xj ∈ Qj . Set X
S(y, Q) := φ(xj )ψxj (y)vj .
j

Then Z XZ  
φ(x)ψx (y) dx − S(y, Q) = φ(x)ψx (y) − φ(xj )ψxj (y) dx.
j Qj
376 Principles of Analysis

R tend to zero uniformly in y as kQk → 0.


By uniform continuity, the integrands on the right
Therefore, the Riemann sums S(y, Q) tend to φ(x)ψx (y) dx uniformly in y. A similar
argument shows that
X Z Z
α α α α
∂ S(y, Q) = φ(xj )∂y ψxj (y)vj → φ(x)∂y ψx (y) dx = ∂y φ(x)ψx (y) dx
j
R
uniformly in y as kQk → 0. Thus S(y, Q) → φ(x)ψx dy in the topology of D(U ) and so
Z X  Z 


F, φ(x)ψx dx = lim hF, φ(xj )ψxj vj i = lim hF, S(y, Q)i = F, φ(x)ψx dx
kQk→0 kQk→0
j

as required.
Here is the aforementioned associative law for convolutions:
15.4.4 Theorem. Let F ∈ D0 (Rd ) and ψ, φ ∈ D(Rd ). Then F ∗ (ψ ∗ φ) = (F ∗ ψ) ∗ φ.
Proof. For all y,
Z Z
(ψ ∗ φ)x (y) = (ψ ∗ φ)(x − y) = ψ(z)φ(x − y − z) dz = e
ψ(z)φ e
x (y − z) dz = (ψ ∗ φx )(y).

Therefore, by the lemma,

[F ∗ (ψ ∗ φ)](x) = hF, (ψ ∗ φ)x i = hF, ψe ∗ φx i = hF ∗ ψ, φx i = [(F ∗ ψ) ∗ φ](x).

As in the classical case, convolution is continuous:


15.4.5 Theorem. For F ∈ D0 (Rd ) the linear mapping T : φ 7→ F ∗ φ : D(Rd ) → C ∞ (Rd )
is continuous.
Proof. We show that for any compact set K, the restriction of T to CK (Rd ) is continuous.
The seminorms defining the topology of CK (Rd ) are of the form

pK,n (φ) = sup{|∂ α φ(x)| : x ∈ Rd , |α| ≤ n},

The seminorms defining the topology of C ∞ (Rd ) are of the form

qH,k (φ) = sup{|∂ β φ(x)| : x ∈ H, |β| ≤ k}, H compact, k ∈ Z+ .

We show that given qH,k there exists n ≥ 0 and M > 0 such that

qH,k (F ∗ φ) ≤ M pK,n (φ) ∀ φ ∈ CK (Rd ).



Now, F restricted to CH−K (Rd ) is continuous, hence there exists M > 0 and m ∈ N such
that  ∞
|hF, ψi| ≤ M sup |∂ α ψ(y)| : y ∈ Rd , |α| ≤ m ∀ ψ ∈ CH−K (Rd ).

In particular, if φ ∈ CK (Rd ) and x ∈ H, then φx ∈ CH−K (Rd ), hence, recalling that
β β
∂ (F ∗ φ)(x) = hF, (∂ φ)x i, we have

|∂ β (F ∗ φ)(x)| ≤ M sup |∂ α (∂ β φ)x (y)| : y ∈ H − K, |α| ≤ m

= M sup |∂yα (∂ β φ)(x − y)| : y ∈ H − K, |α| ≤ m

= M sup |(∂ α+β φ)(x − y)| : y ∈ H − K, |α| ≤ m
Distributions 377

and so

qH,k (F ∗ φ) = sup{|∂ β (F ∗ φ)(x)| : x ∈ H, |β| ≤ k}


≤ M sup{|∂ α+β φ(x − y)| : x ∈ H, y ∈ H − K, |α| ≤ m, |β| ≤ k}
≤ M sup{|∂ α+β φ(z)| : z ∈ K, |α| ≤ m, |β| ≤ k}
= pK,m+k (φ).

Recall that the space Cc (Rd ) may be viewed as a subspace of E0 (Rd ) via the identification
f ↔ Ff . Since F ∗ ψ ∈ Cc∞ (Rd ) for F ∈ E0 (Rd ) and ψ ∈ Cc∞ (Rd ) (15.4.2), the following
theorem implies that the space Cc∞ (Rd ) is weak∗ dense in E0 (Rd ).
15.4.6 Theorem. There exists a sequence (ϕn ) in Cc∞ (Rd ) such that for every F ∈ D0 (Rd ),
w∗
F ∗ ϕn → F , that is,
Z
hF, φi = limhF ∗ ϕn , φi = lim (F ∗ ϕn )φ, φ ∈ D(Rd ).
n n

Proof. Let (ϕn ) ⊆ Cc∞ (Rd ) be a sequence such that f ∗ ϕn → f uniformly for all uniformly
continuous and bounded functions f on Rd , where supp(ϕn ) ⊆ B1/n (0) and each ϕn is an
even function (6.1.3). By associativity,
Z


e
F ∗ ϕn , φ = (F ∗ ϕn )(x)φ(x) dλd (x) = [(F ∗ ϕn ) ∗ φ](0) e
= [F ∗ (ϕn ∗ φ)](0) = hF, ϕn ∗ φi,

the last equality from (ϕn ∗ φ)0 = ϕn ∗ φ (because ϕn is even). But the sequence (ϕn ∗ φ) is
supported in a compact set K and ∂ α (ϕ
n ∗ φ) = ϕ n ∗ ∂
α φ → ∂ α φ uniformly on K for all α.
Therefore, ϕn ∗ φ → φ in D(Rd ) and so F ∗ ϕn , φ → F, φ .
15.4.7 Remark. Lemma 15.4.3 suggests the following definition of convolution in E0 (Rd ):
e ∗ φi, φ ∈ C ∞ (Rd ), F, G ∈ E0 (Rd ).
hF ∗ G, φi = hF, G c

It may be shown that F ∗ G is a distribution with compact support and that convolution on
E0 (Rd ) is commutative, associative, and bilinear. (See, for example, [48]). ♦

15.5 Tempered Distributions


Recall that the Fréchet space S(Rd ) of rapidly decreasing functions on Rd is a subspace
of C ∞ (Rd ) whose topology is given by the equivalent families of seminorms
n o n o
qα,m (φ) = sup (1 + |x|2 )m |∂ α φ(x)| : m, α and pα,β (φ) = sup |xα ∂ β φ(x)| : α, β .
x∈Rd x∈Rd

Recalling the definition

pm,α (φ) = sup |∂ α φ(x)|, m ∈ N, φ ∈ C ∞ (Rd ),


x∈Km

of a defining seminorm on the space C ∞ (Rd ) (9.1.6(b)), we see that

pm,α (φ) ≤ qα,n (φ) ∀ m, n and φ ∈ S(Rd ). (15.3)


378 Principles of Analysis

Moreover, if supp(φ) ⊆ Km , then the supremum in the definition of qα,n (φ) may be taken
over Km , hence for a suitable M > 0 depending only on m,

qα,n (φ) = sup (1 + |x|2 )n |∂ α φ(x)| ≤ M sup |∂ α φ(x)| = M pm,α (φ). (15.4)
x∈Km x∈Km

Using these relations we prove

15.5.1 Proposition. (a) D(Rd ) ⊆ S(Rd ) ⊆ C ∞ (Rd ).


(b) The inclusion mappings D(Rd ) ,→ S(Rd ) ,→ C ∞ (Rd ) are continuous, hence


C ∞ (Rd )0 ⊆ S0 (Rd ) and S0 (Rd ) D(Rd ) ⊆ D0 (Rd ).
S(Rd )

(c) D(Rd ) is dense in S(Rd ) and S(Rd ) is dense in the Fréchet space C ∞ (U ).

Proof. Part (a) is clear. For (b) let φn → 0 in D(Rd ). Then there exists m such that
supp(φn ) ⊆ Km for all n, hence, by (15.4), qα,m (φn ) → 0. This shows that D(Rd ) ,→ S(Rd )
is continuous. A similar argument using (15.3) shows that S(Rd ) ,→ C ∞ (Rd ) is continuous.
(c) Since Cc∞ (Rd ) is contained in S(Rd ) and is dense in C ∞ (Rd ) (9.1.6(b)), S(Rd ) must
be dense in C ∞ (Rd ). To show that Cc∞ (Rd ) is dense in S(Rd ), let f ∈ S(Rd ) and choose
φ ∈ Cc∞ (Rd ) such that φ(x) = 1 for all |x| ≤ 1. The function fn (x) := f (x)φ(x/n) is in
Cc∞ (Rd ), hence the desired conclusion will follow if we show that fn → f in the topology of
S(Rd ), that is,
k  
1 + |x|2 ∂ α f (x) 1 − φ(x/n) → 0 uniformly on Rd .

Now, ∂ α f (x) 1 − φ(x/n) is a sum of terms ∂ β f (x) · ∂ γ 1 − φ(x/n)). Moreover, for any
compact set K, supx∈K |1−φ(x/n)| = 0 for all large n. Thus the sequence ∂ α f (x) 1−φ(x/n)
converges uniformly to zero on compact sets. Since (1 + |x|2 )k ∂ β f (x) is in C0 (Rd ), it follows
that (1 + |x|2 )k ∂ β f (x) · ∂ γ 1 − ψ(x/n)) converges uniformly to zero on Rd , completing the
proof.

15.5.2 Proposition. Let β be a multi-index, f ∈ S(Rd ), and g a polynomial on Rd . Then


the linear mappings φ 7→ ∂ β φ, φ 7→ f φ, and φ 7→ gφ on S(Rd ) are continuous.
Proof. The equality
n
qα,n (∂ β φ) = sup 1 + |x|2 |∂ α+β φ(x)| = qα+β,n (φ)
x∈Rd

implies that ∂ β φ ∈ S(Rd ) and that the function φ → ∂ β φ is continuous. Now consider
n
qα,n (f φ) = sup 1 + |x|2 |∂ α (f φ)(x)|.
x∈Rd
n n
By the product rule, (1 + |x|2 ∂ α (f φ)(x) is a sum of products (1 + |x|2 ∂ β f (x) · ∂ γ φ(x),
which are majorized by qβ,n (f ) · qγ,0 (φ). This shows that f φ ∈ S(Rd ) and that φ 7→ f φ is
continuous. A similar argument shows that φ 7→ gφ is continuous.
The members of S0 (Rd ) are called tempered distributions. By 15.5.1(b), they may be
viewed as distributions that are continuous in a weaker topology and with an enlarged space
of test functions. Their importance derives from connections with Fourier analysis, discussed
in the next subsection.
Distributions 379

15.5.3 Examples.
(a) A distribution F with compact support is tempered. To see this, let K = supp(F ) and
for φ ∈ S(Rd ) set G(φ) := F (φψ) where ψ ∈ Cc∞ (Rd ) and ψ = 1 on K. For any φ ∈ Cc∞ (Rd ),
supp(φ(1 − ψ)) ⊆ K c , hence F φ(1 − ψ) = 0, that is, G = F on Cc∞ (Rd ). Therefore,
G is a linear extension of F to S(Rd ). To see that G is continuous, let φn → 0 in S(Rd ).
Then ∂ α φn → 0 uniformly on Rd , hence, by the product rule and the boundedness of the
derivatives of ψ, ∂ α (ψφn ) → 0 uniformly on Rd . Therefore, ψφn → 0 in D(Rd ) and so
G(φn ) = F (ψφn ) → 0.
(b) AR polynomial f on Rd is tempered. This is simply the assertion that the linear functional
φ → f φ on S(Rd ) is continuous, that is, , for some continuous seminorm qα,m ,
Z

f φ ≤ qα,m (φ) ∀ φ ∈ S(Rd ).

This is clear by taking α = 0 and m sufficiently large.


(c) Every f ∈ Lp is tempered. It must be shown as in (b) that for suitable m,
Z m

f φ ≤ sup 1 + |x|2 |φ(x)|.
x∈Rd

For example, if 1 < p < ∞ and q is the conjugate exponent, then


Z Z Z
q |φ(x)|q (1 + |x|2 )m dx m
|φ| = dx ≤ sup 1 + |y|2 |φ(y)|q ,
(1 + |x|2 )m 1 + |x|2 )m y∈Rd

hence by Hölder’s inequality


Z Z 1/q
dx m/q
|f φ| ≤ kf kp sup 1 + |y|2 |φ(y)|.
(1 + |x|2 )m y∈Rd

It therefore suffices to choose m sufficiently large so that the term in parentheses is finite.
(See 3.6.3.)
(d) If F is a tempered distribution, then so are ∂ α F , f F (f ∈ S(Rd )) and gF (g a polynomial).
This follows immediately from 15.5.2. ♦

The Fourier Transform of a Tempered Distribution

Recall that the Fourier transform of f ∈ S = S(Rd ) is defined by


Z
fb(ξ) = e−2π i ξ·x f (x) dx, ξ ∈ Rd ,

and the inverse by b Z


f (x) = e2π i ξ·x f (ξ) dξ, x ∈ Rd .
b
Moreover, the mappings f 7→ fb and f 7→ f are continuous in the topology of S, and for
φ ∈ S we have Z Z
hf , φi = f · φ = f · φb = hf, φi
b b b
b
with a similar equation holding for f . (See the proof of 6.2.4.) This suggests the following
definitions:
380 Principles of Analysis
b
The Fourier transform Fb and inverse Fourier transform F of an arbitrary distri-
bution F are defined, respectively, by
b b
hFb, φi = hF, φb i and hF , φi = hF, φ i, φ ∈ S.
b of Fourier analysis carry over to tempered distributions. For
Many of the standard bresults
b b b
example, the formulas φ = φ = φ and φb = φe imply, by duality, their distribution counterparts
b b
b b
F = Fb = F and Fb = Fe,
b
b −1
as may be readily verified. As in the classical case, one sets F(F ) := F , so that F (F ) = F .

15.6 Sobolev Theory


Sobolev Spaces

For an open subset U of Rd define linear spaces



Lpk (U ) := f ∈ Lp (U ) : ∂ α f ∈ Lp (U ) ∀ |α| ≤ k , 1 ≤ p < ∞, k ∈ N.
The derivatives in the definition are assumed to be distributional derivatives. Thus ∂ α f ∈
Lp (U ) satisfies Z Z
φ ∂ α f = (−1)|α| f ∂ α φ, φ ∈ Cc∞ (U ).

The Sobolev inequalities, proved below, imply that one actually obtains ordinary derivatives
by taking f ∈ Lpm (U ) for sufficiently large m.
Define a norm on Lpk (U ) by
 X Z 1/p
α p
kf kk,p := |∂ f | .
|α|≤k U

Thus a sequence (fn ) converges to f in Lpk (U ) iff ∂ α fn → ∂ α f in Lp (U ) for all α with


|α| ≤ k. For p = 2, the norm is given by an inner product
X Z
(f | g)k := (∂ α f )(∂ α g).
|α|≤k U

Sobolev spaces, being defined in terms of Lp norms, tend to be somewhat easier to manage
than spaces of distributions. Moreover, they have an advantage over Lp spaces in that a
derivative of a member of Lpk (U ) is a member of Lpk−1 (U ). These features make Sobolev
spaces important tools in the study of weak solutions of PDEs.
15.6.1 Theorem. Lpk (U ) is a Banach space and L2k (U ) is a Hilbert space.
Proof. Let (fn ) be a Cauchy sequence in Lpk (U ). Then for each α with |α| ≤ k, (∂ α fn ) is a
Cauchy sequence in Lp (U ) and so converges to some fα ∈ Lp (U ). For any φ ∈ Cc∞ (U ) we
then have
Z Z Z Z
hfα , φi = fα φ = lim (∂ α fn )φ = lim(−1)|α| fn (∂ α φ) = (−1)|α| f (∂ α φ) = h∂ α f, φi,
n n

hence fα = ∂ f . Thus fn → f in Lpk (U ), showing that Lpk (U ) is complete and hence a


α

Banach space.
Distributions 381

Application: Elliptic PDEs


The space L2k (U ) is of particular interest in Sobolev theory because of the availability of
general Hilbert space techniques. We illustrate with the following application.
Define Hk2 (U ) to be the closure of Cc∞ (U ) in L2k (U ). In general, Hk2 (U ) is a proper subset
of L2k (U ). Of course, as a closed subspace of a Hilbert space, Hk2 (U ) is itself a Hilbert space.
Assume that U is bounded and let gij ∈ C ∞ (U0 ), where U0 is an open set containing the
compact set cl U . Consider the differential operator P on U defined by
d
X
Pf = − ∂i (gij ∂j f ) , (15.5)
i,j=1

where ∂i = ∂/∂xi . We assume all functions are real-valued. Further, we assume that the
matrix [gij ] is strictly positive definite, that is,
d
X
yi yj gij (x) > 0 for all yj ∈ R and x ∈ U0 . (15.6)
i,j=1

Under these circumstances, the operator P in (15.5) is said to be strongly elliptic. We


show the existence of a unique weak solution f of the PDE
P f + af = h, where a ≥ 0 and h ∈ H12 (U ). (15.7)
Since the sum in (15.6) is continuous in (x, y), it has a minimum m > 0 on the compact
set cl U × S 1 . Therefore
d
X
yi yj gij (x) ≥ m|y|2 for all yj ∈ R and x ∈ cl U . (15.8)
i,j=1

For φ, ψ ∈ Cc∞ (U ), define a bilinear form B on Cc∞ (U ) by


Z Xd Z Z d Z
X
B(φ, ψ) := h(P + aI)φ, ψi = a φψ − ψ∂i (gij ∂j φ) = a φψ + gij (∂j φ)(∂i ψ),
i,j=1 i,j=1

where the last equality comes from an integration by parts. Since the functions gij are
bounded, it follows from the definition of inner product in H12 (U ) and the CBS inequality
that for some constant c > 0
|B(φ, ψ)| ≤ c kφk1,2 kψk1,2 .
Therefore, B extends continuously to a sesquilinear form on H12 (U ). Furthermore, by (15.8),
Z Xd Z d Z
X
2 2
B(φ, φ) = a φ + gij (∂i φ)(∂j φ) ≥ m (∂j φ)2 = m kφk1,2 . (15.9)
i,j=1 j=0

Now, since
Z Z 1/2 Z 1/2 Z 1/2
2
hg ≤ |h|2
|g|2
≤ |h|2
kgk1,2 ,

R
the functional g → hg is continuous on H12 (U ). By R the Lax-Milgram theorem (11.4.2),
there exists a unique f ∈ H12 (U ) such that B(f, g) = hg. In particular, for all ψ ∈ Cc∞ (U ),
Z
hP f + af, ψi = B(f, ψ) = hψ = hh, ψi,

which shows that f is the desired weak solution of (15.7). ♦


382 Principles of Analysis

Sobolev Inequalities
These inequalities are important tools in determining existence and uniqueness of solutions
of a variety of PDEs, as well as in the study of regularity properties of these solutions. In
this subsection we give the reader a flavor of the subject by proving two such inequalities.
15.6.2 Theorem. If f ∈ L1d (Rd ), then kf k∞ ≤ c kf k1,d and there exists g ∈ Cb (Rd ) such
that f = g a.e. Moreover, if f ∈ L1d+k (Rd ) (k ≥ 1), then one may take g ∈ Cbk (Rd ).

Proof. Consider first the case f ∈ C ∞ (Rd ) and d = 2. For any ψ ∈ Cc∞ (R2 ),
Z y Z x
∂2ψ
ψ(x, y) = (s, t) ds dt,
−∞ −∞ ∂x ∂y

hence Z y Z x 2 2
∂ ψ
|ψ(x, y)| ≤ (s, t) ds dt ≤ ∂ ψ .
∂x ∂y ∂x ∂y
−∞ −∞ 1
Replacing ψ by f ψ, we have
2
∂ (f ψ)
|f ψ(x, y)| ≤
∂x ∂y , x, y ∈ R.
1

Since
∂ 2 (f ψ) ∂2f ∂f ∂ψ ∂f ∂ψ ∂2ψ
=ψ + + +f ,
∂x ∂y ∂x ∂y ∂x ∂y ∂y ∂x ∂x ∂y
we see that
2 2
∂ f
|f ψ(x, y)| ≤ kψk∞ + ∂ψ ∂f + ∂ψ ∂f + kf k ∂ ψ .
∂x ∂y ∂y ∂x ∂x ∂y 1
∂x ∂y ∞
1 ∞ 1 ∞ 1

Now let 0 ≤ ψ ≤ 1 such that ψ = 1 on [−1, 1] × [−1, 1] and ψ = 0 outside [−2, 2] × [−2, 2].
Set ψn (x) = ψ(x/n). Since the partial derivatives of ψ are bounded, there exists a constant
c depending only on ψ such that for all (x, y) and n,
 2 
∂ f
|f ψn (x, y)| ≤ c + ∂f + ∂f + kf k .
∂x ∂y ∂x ∂y 1
1 1 1

Since ψn (x, y) → 1, we obtain the Sobolev inequality for d = 2:


 2 
∂ f ∂f ∂f
kf k∞ ≤ c + + + kf k1 .
∂x ∂y 1 ∂x 1 ∂y 1

For an arbitrary d, start off the argument with


Z xd Z x1
∂d
ψ(x1 , . . . , xd ) = ··· ψ(t1 , . . . , td ) dt1 . . . dt2 .
−∞ −∞ ∂x1 . . . ∂xd

and proceed as above.


Now consider the general case f ∈ L1d (Rd ). Choose an approximate identity (φn ) in
Cc (Rd ) for L1 (Rd ). Since f ∗ φn is C ∞ , by the preceding paragraph

kf ∗ φn k∞ ≤ c kf ∗ φn k1,d for all n. (†)


Distributions 383

The norm on the right is a sum of terms that are L1 norms of the derivatives
Z Z
α α |α|
∂ (f ∗ φn )(x) = f (y)∂x φn (x − y) dy = (−1) f (y)∂yα φn (x − y) dy
Z
= (∂ α f )(y)φn (x − y) dy = (∂ α f ∗ φn )(x),

where ∂ α f is the distributional derivative. Taking absolute values and integrating with
respect to x, recalling that kφn k1 = 1, we see that k∂ α (f ∗ φn )k1 ≤ k∂ α f k1 . Taking the
sum over all |α| ≤ d and using (†) we obtain

kf ∗ φn k∞ ≤ c kf k1,d for all n.

L1
Since f ∗φn → f , there exists a subsequence such that f ∗φnk → f a.e. Thus kf k∞ ≤ c kf k1,d .
Since f − f ∗ φn ∈ L1d , we may replace f in the last inequality by f − f ∗ φn to conclude that

kf − f ∗ φn k∞ ≤ c kf − f ∗ φn k1,d .

The norm on the right is a sum of terms k∂ α f − ∂ α (f ∗ φn )k1 = k∂ α f − (∂ α f ) ∗ φn )k1 which


tend to zero by the approximate identity property. Therefore, kf − f ∗ φn k∞ → 0, hence f
has a version that is continuous. This proves the theorem for the case k = 0. For k ≥ 1 one
need only replace f in this argument by ∂ α f , |α| ≤ k.
Theorem 15.6.2 has general versions where L1d is replaced by Lpm , 1 ≤ p < ∞, m ≥ d. We
illustrate with the case p = 2, which has an elegant proof using Fourier transforms.
15.6.3 Theorem. Let m > d/2. If f ∈ L2m (Rd ), then kf k∞ ≤ c kf km,2 and there exists
g ∈ C0 (Rd ) such that f = g a.e. Moreover, ff f ∈ L2m+k (Rd ) (k ≥ 1), then g may be taken
to be a C k function.
Proof. In the following we employ the convention of describing a function by displaying the
independent variable. We show first that fb ∈ L1 (Rd ). From the formula

∂ α F(φ) = F((−2π i x)α φ(x)), φ ∈ S

and the definition of the Fourier transform of a distribution we have for |α| ≤ m

hF(∂ α f ), φi = h∂ α f, F(φ)i = hf, ∂ α F(φ)i = hf, F((−2π i x)α φ(x))i


= hF(f ), (−2π i x)α φ(x)i = h(−2π i ξ)α F(f ), φi .

Since S(Rd ) is dense in L2 (Rd ), F(∂ α f ) = (−2π i ξ)α F(f ). Taking L2 norms of the last
equation and using the Plancherel theorem kF(∂ α f )k2 = k∂ α f k2 , we have for a suitable
constant M1 Z
|ξ α |2 |fb(ξ)|2 dξ = M1 k∂ α f k2 , |α| ≤ m.

Summing over α we obtain


Z X X
h(ξ)|fb(ξ)|2 dξ = M1 k∂ α f k2 , where h(ξ) := |ξ α |2 .
|α|≤m |α|≤m

We claim that kfbk1 < ∞. By the CBS inequality,


Z Z 1/2 Z 1/2
kfbk1 = h(ξ)1/2 |fb(ξ)|h(ξ)−1/2 dξ ≤ h(ξ)|fb(ξ)|2 dξ h(ξ)−1 dξ ,
384 Principles of Analysis

hence it suffices to show that the second factor on the right is finite. Pd Now, by taking α’s of
the form (0, . . . , 0) and (0, . . . , 0, m, 0 . . . , 0), we have h(ξ) ≥ 1 + j=1 |ξj |2m . The inequality
X
d m  m d
X
2m 2 2
|ξ| = |ξj | ≤ d max |ξj | = dm max |ξj |2m ≤ dm |ξj |2m
1≤j≤d 1≤j≤d
j=1 j=1

2m
R −1shows that h(ξ) ≥ 1 + c|ξ| . By 3.6.3 (with s = 2m, t = 1, and p = 2) we see that
then
h < ∞ and so for a suitable M2 > 0
X
kfbk1 ≤ M2 k∂ α f k2 < ∞.
|α|≤m

It now follows from the Fourier inversion formula and the Riemann-Lebesgue lemma that
f ∈ C0 (Rd ). Finally, since Z
f (x) = fb(ξ)e2πi x·ξ dξ
P
we have kf k∞ ≤ kfbk1 ≤ C2 |α|≤m k∂ α f k2 . This completes the proof of the first part of
theorem. The second part may be proved in a similar manner by replacing f throughout by
∂ β f , |β| ≤ k.
Chapter 16
Analysis on Locally Compact Groups

Lebesgue measure on R and counting measure on Z are examples of measures µ that are
translation invariant, that is, µ(B + x) = µ(B) for all Borel sets B. These are special cases
of a general construct called Haar measure. As we shall see, the existence Haar measure
leads to a unification and generalization of Fourier analysis, the basic aspects of which are
presented in this chapter.

16.1 Topological Groups


Definitions and Basic Properties
A topological group is a group G with a topology relative to which the group operations

(s, t) → st : G × G → G and t → t−1 : G → G

are continuous. For example, a TVS, and in particular Kd , is an abelian topological group
under addition. The set of nonzero members of K is an abelian topological group under
multiplication. The set of n × n matrices over K with determinant one is a nonabelian
topological group under matrix multiplication.
Here are useful alternate characterizations of a topological group that will be needed in
the chapter.
16.1.1 Proposition. Let G be a group with a topology. The following are equivalent:

(a) G is a topological group.

(b) The map (s, t) 7→ s−1 t : G × G → G is continuous.

(c) The map (s, t) 7→ s−1 t : G × G → G is continuous at (e, e), and for each a ∈ G the
translation mappings x → ax and x → xa are continuous.

Proof. (a) ⇒ (b): The map is a composition of the continuous mapping (s, t) 7→ (s−1 , t) and
the multiplication map, hence is continuous.
(b) ⇒ (c): The first statement is clear. If xα → x, then (a−1 , xα ) → (a−1 , x), hence
axα → ax. Therefore, x 7→ ax is continuous. Similarly, x 7→ xa is continuous.
(c) ⇒ (a): If sα → s, then, by the second part of the hypothesis, (s−1 sα , s−1 s) → (e, e).
Applying the first part of the hypothesis, we have s−1 α s = (s
−1
sα )−1 (s−1 s) → e, hence
−1 −1
sα → s , which shows that inversion is continuous. Since multiplication is the composition
of the continuous maps (s, t) 7→ (s−1 , t) and (s−1 , t) 7→ st, multiplication is continuous at
(e, e). Now let sα → s and tβ → t. Then s−1 sα → e and tβ t−1 → e, hence s−1 sα tβ t−1 → e
and so sα tβ → st. Therefore, multiplication is continuous.

385
386 Principles of Analysis

The basic properties of topological groups are given in the next proposition.
16.1.2 Proposition. Let G be a topological group and H a subgroup of G.
(a) For fixed a ∈ G, the mappings t → at, t → ta, and t → t−1 are homeomorphisms.
(b) Each neighborhood U of e contains a symmetric neighborhood of e, that is, a neigh-
borhood V of e such that V = V −1 (= {x−1 : x ∈ V }).
(c) Each neighborhood U of e contains a neighborhood V of e such that V V ⊆ U .
(d) The closure of H is a subgroup.
(e) If H is open, then it is also closed.
(f ) If G is Hausdorff and H is locally compact, then H is closed.
Proof. Part (a) follows from 16.1.1. For part (b), take V = U ∩ U −1 . Part (c) follows from
the continuity of the mapping (s, t) → st at (e, e) and part (d) from the continuity of the
group operations. For (e), let x ∈ cl(H). Since xH is a neighborhood of x, xH ∩ H 6= ∅.
Then xy ∈ H for some y ∈ H and so x = (xy)y −1 ∈ H.
To prove (f), let x ∈ clG (H), (xα ) ⊆ H, and xα → x. Since H is locally compact,
there exists an open neighborhood V of e in G such that clH (V ∩ H) is compact in H.
Therefore, clH (V ∩ H) is compact in G, hence also closed. From x−1 α →x
−1
and x−1
α ∈H
−1 −1 −1 −1
we have x ∈ clG (H). Thus since V x is a neighborhood of x , H ∩ V x = 6 ∅. Choose
y ∈ H ∩ V x−1 . Then yx ∈ V so yxα is eventually in V ∩ H. Thus yx is in the closed set
clH (V ∩ H) ⊆ H and so x = y −1 yx ∈ H.

Translation and Uniform Continuity


The left and right translates of a function f on a topological group G are defined by
1
Lt f (s) = f (ts) and Rt f (s) = f (st), s, t ∈ G. (16.1)
The set of all left (respectively, right) translates of f is denoted by LG f (respectively, RG f ).
A subset F of functions on G is said to be left translation invariant if LG f ⊆ F for all
f ∈ F. Right translation invariance is defined analogously. A subset that is both left and
right translation invariant is said to be translation invariant. For example, by continuity
of multiplication in G, the spaces Cb (G), C0 (G), and Cc (G) are translation invariant. Note
that the translation operators Lt and Rt satisfy
Lst = Lt Ls and Rst = Rs Rt .
A function f : G → C is said to be right uniformly continuous on G if the function
t 7→ Rt f is norm continuous at e. Thus for each ε > 0 there exists a neighborhood V of e
such that kRt f − f k∞ < ε for all t ∈ V or, equivalently,
|f (st) − f (s)| < ε for all t ∈ V and s ∈ G.
Setting x = st and y = s we may write this as
|f (x) − f (y)| < ε for all x, y ∈ G with x ∈ yV .
Left uniform continuity is defined by replacing Rt by Lt , and for this the following
equivalent formulation holds:
|f (x) − f (y)| < ε for all x, y ∈ G with x ∈ V y.
A function is uniformly continuous if it is both left and right uniformly continuous.
1 Some authors give the alternate definition L f (s) = f (t−1 s), which results in the pleasing relation
t
Lst = Ls Lt . Our choice is dictated by the desire for a uniform notation for groups and semigroups. (See
Chapter 17.)
Analysis on Locally Compact Groups 387

16.1.3 Proposition. Let G be a topological group. If f ∈ Cc (G), then f is uniformly


continuous.
Proof. It suffices to prove right continuity. Let K := supp(f ) and ε > 0. By continuity,
for each s ∈ K there exists a neighborhood Us of e such that |f (st) − f (s)| < ε/2 for all
t ∈ Us . By 16.1.2, there exists a symmetric neighborhood Vs of e such that Vs Vs ⊆ Us . Since
{sVs S: s ∈ K} is an open cover
Tn of K, by compactness there exist s1 , . . . , sn ∈ K such that
n
K ⊆ j=1 sj Vsj . Set V := j=1 Vsj . We claim that if t ∈ V , then |f (st) − f (s)| < ε for all
s ∈ G. This obviously holds if both s, st 6∈ K. Now consider the remaining cases:
(a) s ∈ K. Then s ∈ sj Vsj for some j, hence x := s−1 −1
j s ∈ Vsj ⊆ Usj and y := (sj s)t ∈
Vsj Vsj ⊆ Usj , so by definition of Usj

|f (st) − f (s)| ≤ |f (st) − f (sj )| + |f (sj ) − f (s)| = |f (sj y) − f (sj )| + |f (sj ) − f (sj x)| < ε.

(b) s 6∈ K and st ∈ K. Then st = sj tj for some tj ∈ Vsj ⊆ Usj , hence

|f (st) − f (sj )| = |f (sj tj ) − f (sj )| < ε/2.

Moreover, since Vsj is symmetric, x := s−1 −1


j s = (sj st)t
−1
= tj t−1 ∈ Vsj Vsj ⊆ Usj , hence

|f (sj )| = |f (s) − f (sj )| = |f (sj x) − f (sj )| < ε/2.

Therefore, |f (st) − f (s)| ≤ |f (st) − f (sj )| + |f (sj )| < ε, as required.

For a function f on G, define fe by

fe(x) = f (x−1 ), x ∈ G.

Then f is right uniformly continuous iff fe is left uniformly continuous. A function f is


symmetric if fe = f . For example, for any g the functions g + ge and g · ge are symmetric.

16.2 Haar Measure


Definition and Basic Properties
For the remainder of the chapter we restrict our attention to locally compact, Hausdorff
topological groups. Such a group is traditionally referred to as simply a locally compact
group. The fundamental property of a locally compact group that allows such a rich supply
of analytical techniques is the existence of translation invariant measures.
A Borel measure µ on locally compact group G is said to be left invariant if

µ(sB) = µ(B) for all s ∈ G and all Borel sets B. (16.2)

Right invariance is defined by replacing sB by Bs. A nontrivial (that is, not identically
zero) left (right) invariant Radon measure on G is called a left (right) Haar measure.
A measure that is both a left Haar measure and a right Haar measure is called a Haar
measure. Lebesgue measure on Rd and counting R measure on Zd are Haar measures. One
−1
may show directly that the set function B 7→ B x dx defines a Haar measure on the
multiplicative group of nonzero real numbers. We shall see other examples later.
Now define a Borel measure µ e by

e(B) = µ(B −1 ), B ∈ B(G).


µ (16.3)
388 Principles of Analysis
 
Then µ e(Bs) = µ (Bs)−1 = µ s−1 B −1 , hence µ is left invariant iff µ e is right invariant.
Moreover, since inversion is a homeomorphism, µ e is regular iff µ is regular. Thus µ is a left
Haar measure iff µ e is a right Haar measure. In view of this duality, we shall frequently state
results only for left Haar measures.
The next proposition summarizes the elementary properties of left Haar measure. For the
statement we need the following notation, which will be used throughout the chapter.

Cc+ = Cc+ (G) := {f ∈ Cc (G) : f ≥ 0 and kf k∞ > 0}.

16.2.1 Proposition. Let µ be a Radon measure on G. The following statements are


equivalent:
(a) µ is a left Haar measure.

(b) µ(sK) = µ(K) for all compact K ⊆ G and s ∈ G.

(c) µ(sU ) = µ(U ) for all open U ⊆ G and s ∈ G.


R R
(d) Ls f dµ = f dµ for all f ∈ L1 (µ) and s ∈ G.
R R
(e) Ls f dµ = f dµ for all f ∈ Cc+ and s ∈ G.

Proof. That (a), (b), and (c) are equivalent follows easily from the regularity properties of
Radon measures (7.1). Clearly, (d) implies (b) and (e).
Now suppose that (a) holds. Then
Z Z
Ls 1B dµ = 1B (st) dµ(t) = µ(s−1 B) = µ(B),

hence (d) holds for indicator functions. The usual arguments then show that (d) holds for
all f ∈ L1 . That (e) ⇒ (d) follows by approximation (7.1.2).
16.2.2 Proposition. Let µ be a left Haar measure on G. Then
(a) µ(U ) > 0 for all for all open U ⊆ G.

(b) µ(G) < ∞ iff G is compact.

(c) If E is a Borel set and µ(E) = 0, then E c is dense in G.

Proof. (a) Suppose µ(U ) = 0 for some nonempty open set. Since any compact set K may
be covered by finitely many translates sU and since µ(sU ) = µ(U ) = 0, µ(K) = 0. By
regularity, µ(B) = 0 for all Borel sets B, contradicting the definition of Haar measure.
(b) The sufficiency follows from the definition of Radon measure. For the necessity, assume
that G is not compact. Choose any open neighborhood U of e with compact closure. Then
G cannot be covered by finitely many left translates Sn of U . Letting s1 be arbitrary, we
may construct a sequence (sn ) such that sn+1 6∈ k=1 sn U . Now let V be a symmetric
open neighborhood of e with V V ⊂ U . The sets sn V are disjoint. Indeed, if m > n and
(sn V ) ∩ (sm V ) 6= ∅, then sn vn = smS
vm for some vn , vm ∈ V and we haveP the contradiction
−1
sm = sn vn vm ∈ sn U . Now set B = n sn V . By left invariance, µ(B) = n µ(V ). But since
µ(B) < ∞, µ(V ) = 0, contradicting (a).
Part (c) follows from (a).
Analysis on Locally Compact Groups 389

Existence of Haar Measure


16.2.3 Theorem. Every locally compact group G has a left Haar measure and a right Haar
measure.
Proof. (Weil). It suffices to construct a left Haar measure on G. This is accomplished by
first constructing a left invariant positive linear functional I on Cc (G). An application of
the Riesz representation theorem then yields the desired measure.
To construct I, let f, φ ∈ Cc+ and define the Haar covering number of f with respect to φ
by
X n n
X 
(f : φ) = inf ci : f ≤ ci Lsi φ n ∈ N, sj ∈ G, and cj ≥ 0 .
i=1 i=1
To see that such constants ci exist and hence that (f : φ) < ∞, consider the nonempty open
set U := {t : φ(t) > 21 kφk∞ } and let sj U be finitely many left translates of U that cover
the compact set supp(f ). If x ∈ supp f , then 2φ(s−1 j x) > kφk∞ for some j, and setting
−1 P
c := 2 kf k∞ kφk∞ we have f (x) ≤ kf k∞ ≤ i cφ(s−1 i x).
The basic idea in the construction of I is to show that a suitably normalized version of
(f : φ) tends to I(f ) as supp(φ) tends to {e}. This is accomplished in the following steps, the
first of which suggests that (f, φ) is an appropriate precursor of I. (All functions considered
in these steps are assumed to be in Cc+ .)
(1) (f : φ) has the following properties:

(a) (f1 + f2 : φ) ≤ (f1 : φ) + (f2 : φ). (b) (cf : φ) = c(f : φ) ∀ c > 0.


(c) f ≤ g ⇒ (f : φ) ≤ (g : φ). (d) (Ls f : φ) = (f : φ) ∀ s ∈ G.
−1
(e) (f : φ) ≤ (f : g)(g : φ). (f) (f : g) ≥ kf k∞ kgk∞ .

JParts (a) – (d) follow directly from the definition of (f : φ). For (e), let
m
X n
X
f≤ ai Lsi g and g ≤ bj Ltj φ.
i=1 j=1
Pn P
Then Lsi g ≤ j=1 bj Ltj si φ, hence f ≤ ai bj Ltj si φ and so
i,j

X X m  X n 
(f : φ) ≤ ai bj = ai bj .
i,j i=1 j=1
Pm Pn
Taking infima over all sums i=1 ai and j=1 bj gives (e).
Pm Pm
Now let f P≤ i=1 ci Lsi g. Then f (x) ≤ kgk∞ Pi=1 ci for all x and so we have
m m
kf k∞ ≤ kgk∞ i=1 ci . Taking infima over the sums i=1 ci yields (f).K
(f : φ)
(2) Let f0 be an arbitrary member of Cc+ and define Iφ (f ) := . Then Iφ has the
(f0 : φ)
following properties:
(a) Iφ (f1 + f2 ) ≤ Iφ (f1 ) + Iφ (f2 ). (b) Iφ (cf ) = cIφ (f ) ∀ c > 0.
(c) f ≤ g ⇒ Iφ (f ) ≤ Iφ (g). (d) Iφ (Ls f ) = Iφ (f ) ∀ s ∈ G.
−1
(e) (f0 : f ) ≤ Iφ (f ) ≤ (f : f0 ).

JBy (f) of (1), (f0 : φ) > 0, hence I is well-defined. Properties (a) – (e) then follow
immediately from the corresponding parts (a) – (e) of (1).K
390 Principles of Analysis

(3) For each ε > 0, there exists a neighborhood V of e such that

Iφ (f1 ) + Iφ (f2 ) ≤ Iφ (f1 + f2 ) + ε for all φ with supp(φ) ⊆ V .

JLet g ∈ Cc+ such that g = 1 on supp(f1 + f2 ), and let δ > 0 be arbitrary. Set
h := f1 + f2 + δg and hk := fk /h. Note that if h(x) = 0, then fk (x) = 0, in which
case the value of hk (x) is taken to be zero. With this definition, one easily checks that
hk ∈ Cc+ . By 16.1.3, there exists a neighborhood VPof e such that |hk (x) − hk (y)| < δ
whenever y −1 x ∈ V . If K := supp(φ) ⊆ V and h ≤ i ci Lsi φ, then for k = 1, 2 we have
X
fk (x) = h(x)hk (x) ≤ ci φ(si x)hk (x).
i

Since the only contribution to the sum on the right comes from terms for which si x ∈ K,
and since for these |hk (x) − hk (s−1
i )| < δ, we see that
X  
fk (x) ≤ ci φ(si x) hk (s−1
i )+δ .
i
P −1
Therefore, (fk : φ) ≤ i ci [hk (si )
+ δ], hence
X   X
(f1 : φ) + (f2 : φ) ≤ ci h1 (s−1 −1
i ) + h2 (si ) + 2δ ≤ (1 + 2δ) ci ,
i i
P
the last inequality because h1 + h2 ≤ 1. Taking the infimum over all such sums i ci
and dividing by (f0 : φ) we have
 
Iφ (f1 ) + Iφ (f2 ) ≤ (1 + 2δ)Iφ (h) ≤ (1 + 2δ) Iφ (f1 + f2 ) + δIφ (g)
 
= Iφ (f1 + f2 ) + 2δIφ (f1 + f2 ) + δ(1 + 2δ)Iφ (g) ,

the second inequality by (a) and (b) of (2) applied to h = f1 + f2 + δg. By (e) of (2),
the term in square brackets is ≤ 2δ(f1 + f2 : f0 ) + δ(1 + 2δ)(g : f0 ). Choosing δ so that
this expression is less than ε completes the proof of (3).K
(4) There exists a positive linear functional I on Cc (G) such that I(Ls f ) = I(f ) for all
s ∈ G.
JThe aforementioned limiting process Iφ → I is provided by Tychonoff’s theorem, using
+ −1
Q of (2): For each f ∈ Cc , let Jf denote the interval [(f0 : f ) , (f : f0 )] and let
part (e)
X := f ∈Cc+ Jf . Then X is compact in the product topology, that is, the topology with
basic open neighborhoods

NF := {G ∈ X : |G(fi ) − F (fi )| < ε, i = 1, . . . , n}, ε > 0, n ∈ N, fi ∈ Cc+ . (†)

Moreover, Iφ ∈ X for each φ ∈ Cc+ . For each neighborhood V of e set

CV := clX {Iφ : φ ∈ Cc+ , supp(φ) ⊆ V }.

Then CV is compact and has theTfinite intersection property, since CV1 ∩ · · · ∩ CVn ⊇
CV1 ∩···∩Vn . By compactness of X, V CV 6= ∅. If I is a member of this intersection, then,
from (†), for each V , ε > 0, and fi ∈ Cc+ there exists φ with supp(φ) ⊆ V such that

|Iφ (fi ) − I(fi )| < ε, i = 1, . . . , n.

It follows from (3) that I is additive on Cc+ and has properties (b) – (e) of (2). Extending
I to Cc (G) by defining I(f ) := I(f + ) − I(f − ) produces the desired functional.K
Analysis on Locally Compact Groups 391

Essential Uniqueness of Haar Measure


Haar measure µ is not unique, since multiplying µ by a positive constant obviously
produces another Haar measure. However, Haar measure is essentially unique in the sense
that there is no other way of producing new Haar measures.
16.2.4 Theorem. If µ and ν are left Haar measures on a locally compact group G, then
µ = cν for some c > 0.
Proof. (Loomis). Let f1 , f2 ∈ Cc+ and let U be a fixed compact, symmetric neighborhood of
e. For i = 1, 2, the set Ki := [supp(fi )U ] ∪ [U supp(fi )] is compact. Moreover, for each y ∈ U
the functions x 7→ fi (xy) and x 7→ fi (yx) have support contained in Ki . We show that
R R
f1 dµ f2 dµ
R = R . (†)
f1 dν f2 dν
Assuming (†) holds and setting the expression on the right equal to c, we then have
Z Z
f1 dµ = c f1 dν for all f1 ∈ Cc+ .

It follows from the uniqueness part of 7.2.1 that µ = cν, proving the theorem.
To verify (†), for a given ε > 0 choose a compact symmetric neighborhood V of e contained
in U such that
|fi (xy) − fi (yx)| < ε for all y ∈ V and x ∈ G, i = 1, 2.
This is possible by the uniform continuity of fi . Next, choose g ∈ Cc+ such that g(x) = g(x−1 )
and supp(g) ⊆ V . (For example, one could choose h ∈ Cc+ such that 1{e} ≤ h ≤ 1V and
then take g(x) = h(x) + h(x−1 ).) By left invariance of µ,
Z Z ZZ ZZ
g dν fi dµ = g(y)fi (x) dµ(x) dν(y) = g(y)fi (yx) dµ(x) dν(y),

and by left invariance of µ and ν, the symmetry property of g, and Fubini’s theorem for
Radon measures (7.3.2),
Z Z ZZ ZZ
g dµ fi dν = g(x)fi (y) dµ(x) dν(y) = g(y −1 x)fi (y) dµ(x) dν(y)
ZZ ZZ
= g(x−1 y)fi (y) dν(y) dµ(x) = g(y)fi (xy) dν(y) dµ(x)
ZZ
= g(y)fi (xy) dµ(x) dν(y).

Thus
Z Z Z Z Z Z

g dν fi dµ − g dµ fi dν ≤ g(y)|fi (yx) − fi (xy)| dµ(x) dν(y)

V G
Z
≤ εµ(Ki ) g dν

and so
R R
fi dµ g dµ µ(Ki )
R − R ≤ εR , i = 1, 2.
fi dν g dν fi dµ
Therefore, R R  
f1 dµ f2 dµ µ(K1 ) µ(K2 )
R
f1 dν − R f2 dν ≤ ε R f1 dµ + R f2 dµ .
Letting ε → 0 shows that the ratios on the left are equal.
392 Principles of Analysis

The Modular Function


The modular function makes an important connection between left and right Haar
measures. Let µ be a left Haar measure. Then for each x the set function B 7→ µ(Bx),
B ∈ B(G), is again a left Haar measure, hence, by essential uniqueness, there exists a unique
positive real number ∆(x) such that

µ(Bx) = ∆(x)µ(B) for all x ∈ G and all B ∈ B(G).

Since this obviously holds for µ replaced by cµ, c > 0 and since all left Haar measures are
of this form, we see that ∆ is independent of the measure µ. The function ∆ is called the
modular function of G. It is an intrinsic feature of G.
Theorem 16.2.6 below gives the key properties of the modular function. For the proof we
need the following lemma, a generalization of which is given later.
16.2.5 Lemma. Let f ∈ Cc (G) and 1 ≤ p < ∞. Then the mapping x 7→ Rx f is continuous
at e in the Lp norm.
Proof. Let U be a compact, symmetric neighborhood of e and set K := supp f , so that KU is
compact and supp(Rx f ) ⊆ KU for x ∈ U . By uniform continuity of f (16.1.3), given ε > 0 we
may choose a neighborhood V of e contained in U such that |f (yx) − f (y)| < ε(µ(KU ))−1/p
for all x ∈ V and y ∈ G. For such x we then have
Z
p
kRx f − f kp = |f (yx) − f (y)|p dµ(y) ≤ εp .
KU

16.2.6 Theorem. The modular function ∆ is a continuous homomorphism from G into


the group of positive reals under multiplication. Moreover,
Z Z
−1

Rx f dµ = ∆ x f dµ, f ∈ L1 (µ). (16.4)

Proof. The calculation

∆(xy)µ(B) = µ(Bxy) = ∆(y)µ(Bx) = ∆(x)∆(y)µ(B)

shows that ∆ is a homomorphism. To prove (16.4), note that


Z Z Z
−1 −1
Rx 1B (y) dµ(y) = 1Bx−1 (y) dµ(y) = µ(Bx ) = ∆(x ) 1B (y) dµ(y),

hence (16.4) holds for measurable indicator functions f . The usual arguments then show
that the equation holds for all f ∈ LR1 (µ).
Now take f ∈ Cc (G) such that f dµ = 6 0. By the lemma, the left side of (16.4) is
continuous in x at e. It follows that ∆ is continuous at e, and since ∆ is a homomorphism,
it is continuous on G.
It follows directly from the definition that a left Haar measure is right invariant iff ∆(x) ≡ 1.
In this case, G is said to be unimodular. Abelian groups are obviously unimodular. Here
is another important class of unimodular groups.
16.2.7 Proposition. Every compact group is unimodular.

Proof. If G is compact, then 0 < µ(G) < ∞ (16.2.2). Since Gx = G, we have µ(G) =
µ(Gx) = ∆(x)µ(G), hence ∆(x) = 1.
Analysis on Locally Compact Groups 393

For a compact group G, the unique Haar measure µ for which µ(G) = 1 is called
normalized Haar measure. For a finite group G = {x1 , . . . , xn } normalized Haar measure
is given by
n
1X
µ(B) = 1B (xj ), B ⊆ G.
n j=1

We conclude this section with a result that relates a left Haar measure µ to the right
e (see (16.3)).
Haar measure µ
16.2.8 Proposition. Let µ be a left Haar measure on G. If one side of the following
equation exists, then so does the other and the equality is then valid.
Z Z
−1 −1
f (y )∆(y ) dµ(y) = f (y) dµ(y).

e = µ, iff G is unimodular.
In particular, µ is inverse invariant, that is, µ
Proof. Replacing f by fe shows that the assertion is equivalent to
Z Z  Z 
f (y)∆(y −1 ) dµ(y) = f (y −1 ) dµ(y) = f (y) de
µ(y) .

e ) the right side. For


For f ∈ Cc (G), let I(f ) denote the left side of this equation and I(f
x ∈ G,
Z Z

I(Rx f ) = f (yx)∆(y −1 ) dµ(y) = ∆(x) f (yx)∆ (yx)−1 dµ(y)
Z
= ∆(x)∆(x ) f (y)∆(y −1 ) dµ(y)
−1

= I(f ),

the third equality by (16.4). Therefore, I is a right Haar integral. Since Ie is also a right Haar
integral, there exists c > 0 such that I = cI,e that is,
Z Z
f (y)∆(y −1 ) dµ(y) = c f (y −1 ) dµ(y),

in the sense that if one side is finite then so is the other, in which case equality holds. In
particular, if f is symmetric, then
Z Z
(1 − c) f dµ = [1 − ∆(y −1 )]f (y) dµ(y).

Let ε > 0 and let U be a compact symmetric neighborhood of e on which |∆ − 1| < ε. Taking
f = 1U in the last equation we have |1 − c|µ(U ) ≤ εµ(U ). Since ε was arbitrary, c = 1, hence
Ie = I, completing the proof.
Note that the conclusion of the proposition may be written
Z Z
f (y)∆(y) de
µ(y) = f (y) dµ(y),

which shows that ∆de


µ = dµ.
394 Principles of Analysis

16.3 Some Constructions


Haar Measure on Direct Products
The direct product of groups G and H is the group G × H with multiplication

(a, b)(x, y) = (ax, by), a, x ∈ G, b, y ∈ H.

If G and H are topological groups, then G × H is easily seen to be a topological group in


the product topology.
Now let G and H be locally compact groups and let µ and ν be left Haar measures on G
and H, respectively. As noted in 7.3.4, µ ⊗ ν on B(G) ⊗ B(H) extends to a Radon measure
µ ⊗ ν on B(G × H) such that for all f ∈ Cc (G × H)
Z Z Z Z Z
f (x, y) d(µ ⊗ ν)(x, y) = f (x, y) dν(y) dµ(x) = f (x, y) dµ(x) dν(y).
G×H G H H G

From the left invariance of µ and ν we then have


Z Z
L(a,b) f d(µ ⊗ ν) = f d(µ ⊗ ν),
G×H G×H

hence µ⊗ν is left Haar measure on G × H. Furthermore,


Z Z Z
R(a,b) f d(µ ⊗ ν) = f (xa, yb) dµ(x) dν(y)
G×H H G
Z Z
 
= ∆G a−1 ∆H b−1 f (x, y) dµ(x) dν(y),
H G

hence
∆G×H (a, b) = ∆G (a)∆H (b).
It follows that if G × H is unimodular iff both G and H are unimodular.

Haar Measure on Semidirect Products


Let G and H be groups and let σ : H × G → H be a function such that for each x ∈ G
the map σx := σ(·, x) is a member of the automorphism group Aut(H) of H, that is, the
group of isomorphisms of H onto H, and such that the mapping x 7→ σx : G → Aut(H) is a
homomorphism into Aut(H). Thus we require that

σ(y1 y2 , x) = σ(y1 , x)σ(y2 , x), σ(e, x) = e, σ(y, x1 x2 ) = σ σ(y, x1 ), x2 , σ(·, e) = idH .

Define multiplication on G × H by
 
(a, b)(x, y) = ax, σ(b, x)y = ax, σx (b)y , a, x ∈ G, b, y ∈ H.

It is straightforward to check that under this multiplication G × H is a group with identity


(e, e) and inverse 
(x, y)−1 = x−1 , σ(y −1 , x−1 ) .
The group G × H is called a semidirect product of G and H and is denoted by G σ H.
The semidirect product reduces to the direct product by taking σx = idH for all x. If G and
Analysis on Locally Compact Groups 395

H are topological groups we require additionally that σ be continuous, in which case G σ H


is easily seen to be a topological group in the product topology.
Now let G and H be locally compact groups and let µ and ν be left Haar measures on G
and H, respectively. For f ∈ Cc (G × H),
Z Z Z

L(a,b) f (x, y) d(µ⊗ν)(x, y) = f ax, σx (b)y dν(y) dµ(x)
G×H
ZG ZH

= f x, y dν(y) dµ(x),
G H

hence µ⊗ν is a left Haar measure on G σ H. To find the modular function, let σa (ν) denote
the image measure on B(H):
 
σa (ν)(B) = ν σa−1 (B) = ν σa−1 (B) , B ∈ B(H).

Then, by left invariance of ν,



σa (ν)(yB) = ν σa−1 (y)σa−1 (B) = σa (ν)(B),

so σa (ν) is a left Haar measure. By essential uniqueness, σa (ν) = δ(a)ν for some δ(a) > 0.
From 
δ(ax)ν = σax (ν) = (σa ◦ σx )(ν) = σa δ(x)(ν) = δ(a)δ(x)ν
we see that δ : G → (0, ∞) is a homomorphism. Moreover, from
Z Z Z

δ(a) f dν = f dσa (ν) = f σa (y) dν(y), f ∈ Cc (H),
H H H

and the continuity of σ it follows that δ(a) is continuous. We now have


Z Z Z

R(a,b) f (x, y) µ⊗ν(x, y) = f xa, σa (y)b dµ(x) dν(y)
G×H
Z Z H G
−1
 −1

= ∆G a ∆H b δ(a) f (x, y) dµ(x) dν(y),
H G

hence
∆G σ H
(a, b) = ∆G (a)∆H (b)δ(a).

It follows that if G σ H is unimodular iff both G and H are unimodular and δ ≡ 1.


16.3.1 Example. Let G be the group of nonzero real numbers under multiplication, let
H = (R, +), and take σ(y, x) = σx (y) = xy, x ∈ G, y ∈ H. Then multiplication in G σ H
has the form 
(a, b)(x, y) = ax, σx (b) + y = (ax, bx + y).
Since     
x y a b ax bx + y
= ,
0 1 0 1 0 1
G σ H may be realized concretely as a matrix group. ♦
396 Principles of Analysis

Haar Measure on Quotient Groups


Let G be a group and H a subgroup of G. Recall that x ∼ y iff x−1 y ∈ H defines
an equivalence relation on G with equivalence classes xH (see §0.2). Moreover, if G is a
topological group, then the quotient map Q : G → G/H is continuous. The following theorem
summarizes the basic properties of quotients of topological groups.
16.3.2 Theorem. Let G be a topological group and H a subgroup of G.
(a) The quotient topology is the unique topology on G/H relative to which Q is open and
continuous.
(b) If H is closed, then G/H is Hausdorff.

(c) If H is normal in G, then G/H is a topological group.

(d) If G is locally compact, then so is G/H.


 S
Proof. (a) If U is open in G, then Q−1 Q(U ) = U H = y∈H U y, which is open in G.
Uniqueness follows easily from the fact that Q is open and continuous.
(b) If Q(x) 6= Q(y), then x−1 y is in the open set H c . Since (s, t) → sx−1 yt is continuous,
there exists a symmetric neighborhood U of e such that U x−1 yU ⊆ H c . It follows that
Q(xU ) and Q(yU ) are disjoint neighborhoods of Q(x) and Q(y).
(c) By 16.1.1, it suffices to show that the map (Q(x), Q(y)) 7→ Q(x)−1 Q(y) is continuous
at (Q(e), Q(e)), and that for each a ∈ G the translation mappings Q(x) 7→ Q(ax) and
Q(x) 7→ Q(xa) are continuous. We establish the former and leave the latter as an exercise. An
arbitrary open neighborhood of Q(e) is of the form Q(U ), where U is an open neighborhood
of e. Choose an open neighborhood V of e such that v, w ∈ V ⇒ v −1 w ∈ U and let
Q(x), Q(y) ∈ Q(V ), say Q(x) = Q(v) and Q(y) = Q(w), (v, w ∈ V ). Then x ∈ Hv and
y ∈ wH, hence x−1 y ∈ v −1 HwH = v −1 wH, where the last equality uses the normality of
H. Therefore, Q(x)−1 Q(y) = Q(v −1 w) ∈ Q(U ).
(d) This follows from the continuity and openness of Q.
16.3.3 Theorem. Let G be a locally compact group, H a closed normal subgroup of G, ν
a left Haar measure on H, and η a left Haar measure on G/H. Then there exists a Haar
measure µ on G such that
Z Z Z
f (x) dµ(x) = f (xy) dν(y) dη(xH), f ∈ Cc (G).
G/H H

Moreover, ∆G = ∆H on H.
Proof. We show first that the right side of the equation, which we denote by I(f ), is
well-defined. Let F (x) denote the inner integral:
Z Z
F (x) = f (xy) dν(y) = f (xy) dν(y), x ∈ G, K := supp(f ). (†)
H (x−1 K)∩H

If aH = bH, then b−1 a ∈ H and so by left invariance of ν,


Z Z

F (b) = f (by) dν(y) = f b(b−1 ay) dν(y) = F (a).
H H

It follows that F = f ◦ Q for a unique function f 0 ∈ Cb (G/H) depending only on f .


0

Now, y ∈ (x−1 K) ∩ H ⇒ Q(x) = Q(xy) ∈ Q(K), so (x−1 K) ∩ H = ∅ for all x for which
Analysis on Locally Compact Groups 397

Q(x) ∈ Q(K)c . It follows from (†) that F (x) = 0 for such x and so supp(f 0 ) ⊆ Q(K).
Therefore, f 0 ∈ Cc (G/H) and
Z Z
I(f ) = F (x) dη(xH) = f 0 (xH) dη(xH), (‡)
G/H G/H

which shows that I is well-defined.


Now, I is clearly a positive linear functional on f ∈ Cc (G). We show that I is left invariant.
Let a ∈ G. In the notation of the first paragraph,
Z Z
(La F )(x) = F (ax) = f (axy) dν(y) = (La f )(xy) dν(y).
H H

Comparing with (†) we see that (La f )0 ◦ Q = La F , that is,


  
(La f )0 Q(x) = F (ax) = f 0 Q(ax) = f 0 Q(a)Q(x) .

Thus by (‡) and the left invariance of η,


Z Z
 
I(La f ) = (La f )0 Q(x) dη(xH) = f 0 aHxH dη(xH) = I(f ),
G/H G/H

verifying the claim.


The first part of the theorem now follows from the Riesz representation theorem. For the
second part, if b ∈ H and f ∈ Cc (G), then
Z Z
∆G (b−1 )I(f ) = I(Rb f ) = f (xyb) dν(y) dη(xH) = ∆H (b−1 )I(f ).
G/H H

16.4 The L1 -Group Algebra


For this and the remaining sections of the chapter, G is a locally compact group with left
Haar measure, written asR dx, and modular function ∆. Integrals with respect to dx will
frequently be denoted by f . The left Haar measure of a Borel set E is denoted by |E|.

Convolution and Involution


The convolution f ∗ g and involution f ∗ of functions f, g : G → C are defined by
Z
f ∗ g(x) = f (y)g(y −1 x) dy and f ∗ (x) = ∆(x−1 )f (x−1 ), x ∈ G.

If A, B ⊆ L1 (G), we set A ∗ B := {f ∗ g : f ∈ A, g ∈ B}. The next two propositions


summarize the basic properties of these operations.
16.4.1 Proposition. Let f, g, h ∈ L1 (G) and c ∈ C.

(a) f ∗ g ∈ L1 (G) and kf ∗ gk1 ≤ kf k1 kgk1 . (b) If G is abelian, then f ∗ g = g ∗ f .


(c) (f ∗ g) ∗ h = f ∗ (g ∗ h). (d) f ∗ (cg) = (cf ) ∗ g = c(f ∗ g).
(e) (f ∗ (g + h) = f ∗ g + f ∗ h. (f) Cc (G) ∗ Cc (G) ⊆ Cc (G).
398 Principles of Analysis

Proof. The proofs of (a)–(e) are entirely similar to the corresponding parts of 6.1.1, except
that care must be taken to allow for the fact that the group is not necessarily abelian and dx
is not necessarily right invariant. For example, to prove (c) use left invariance and Fubini’s
theorem2 to obtain
Z ZZ
f ∗ (g ∗ h)(x) = f (z)(g ∗ h)(z −1 x) dz = f (z)g(y)h(y −1 z −1 x) dy dz
ZZ Z
= f (z)g(z −1 y)h(y −1 x) dy dz = (f ∗ g)(y)h(y −1 x) dy

= (f ∗ g) ∗ h(x).
R
To prove (f), let ϕi ∈ Cc (G) and set Ki = supp ϕi . From ϕ1 ∗ϕ2 (x) = K1 ϕ1 (y)ϕ2 (y −1 x) dy
we see that if y ∈ K1 and x 6∈ yK2 , then the integrand is zero. Therefore, supp ϕ1 ∗ϕ2 ⊆ K1 K2 ,
which is compact.

16.4.2 Proposition. Let f, g, h ∈ L1 (G) and c ∈ C.

(a) (f + g)∗ = f ∗ + g ∗ . (b) (cf )∗ = cf ∗ .


(c) f ∗∗ = f. (d) (f ∗ g)∗ = g ∗ ∗ f ∗ .

(e) kf k1 = kf k1 . (f) f, g ∈ L1 ∩ L2 ⇒ (f ∗ ∗ g)(x) = (Rx g | f ) .

Proof. Parts (a) – (c) are clear. For (d), we have


Z Z
 
(f ∗ ∗ g ∗ )(x) = f ∗ (y)g ∗ (y −1 x) dy = ∆ y −1 ∆ (y −1 x)−1 f (y −1 )g(x−1 y) dy
Z Z
= ∆(x−1 ) f (y −1 )g(x−1 y) dy = ∆(x−1 ) g(y)f (y −1 x−1 ) dy

= (g ∗ f )∗ (x).

Parts (e) and (f) follow from 16.2.8:


Z Z
kf ∗ k1 = ∆(x−1 )|f (x−1 )| dx = |f (x)| dx = kf k1

and Z Z
∗ −1
(f ∗ g)(x) = ∆(y )f (y −1 )g(y −1 x) dy = f (y)g(yx) dy = (Rx g | f ) .

From the preceding propositions we have


16.4.3 Theorem. L1 (G) is a Banach ∗-algebra with respect to convolution and involution,
and is commutative if G is abelian.
The following result complements 16.4.1(f):
16.4.4 Proposition. Let G be unimodular and let p and q be conjugate exponents with
1 < p, q < ∞. If f ∈ Lp (G) and g ∈ Lq (G), then f ∗ g ∈ C0 (G) and kf ∗ gk∞ ≤ kf kp kgkq .
2 Fubini’s theorem requires the spaces to be σ-finite, which is not necessarily the case in the present setting.

However, the theorem is valid for functions f ∈ Lp , 1 ≤ p < ∞, since it may be shown that such functions
are zero outside a σ-compact set. We may therefore invoke Fubini’s theorem and shall do so without further
comment. For the technical details, the reader is referred to [21] or [34].
Analysis on Locally Compact Groups 399

Proof. By Hölder’s inequality and translation and inversion invariance,


Z Z 1/p Z 1/q
|f ∗ g(x)| ≤ |f (y)g(y −1 x)| dy ≤ |f (y)|p dy |g(y −1 x)|q dy = kf kp kgkq ,

establishing the inequality.


Now let fn , gn ∈ Cc (G) such that kfn − f kp → 0 and kgn − gkq → 0 (7.1.2). By the
inequality just established, kfn ∗ gn − f ∗ gk∞ → 0. But by 16.4.1(f), fn ∗ gn ∈ Cc (G).
Therefore, f ∗ g ∈ C0 (G).

Approximate Identities
In this subsection we generalize to arbitrary locally compact groups the existence of an
approximate identity, established for the group Rd in 6.1.2. 3 The proof uses the following
lemma, which expresses an important continuity property of left and right translations,
extending 16.2.5.
16.4.5 Lemma. Let f ∈ Lp (G), 1 ≤ p < ∞. Then the mappings x → Lx f and x → Rx f
are continuous in the Lp norm.
Proof. We prove the right translation version. Let ε > 0, g ∈ Cc (G), U a compact neighbor-
hood of U of e, and x ∈ U . Then

kRx f − f kp ≤ kRx (f − g)kp + kRx g − gkp + kf − gkp


≤ sup ∆(x−1 )1/p kf − gkp + kf − gkp + kRx g − gkp .
x∈U

Since Cc (G) is dense in Lp we may choose g ∈ Cc (G) so that the sum of first two terms in
the last expression is < ε/2. By 16.2.5, there exists a neighborhood V of e contained in U
such that the third term is < ε/2 for x ∈ V . For such x, kRx f − f kp < ε, which shows that
x 7→ Rx f is Lp continuous at e. Continuity at arbitrary x0 follows from

kRx f − Rx0 f k = ∆(x−1


0 )kRx−1 x f − f k.
0

p
16.4.6 Theorem. Let f ∈ L , 1 ≤ p < ∞, and ε > 0. Then there exists a neighborhood V
of the identity such that kf ∗ ψ − f kp < ε and kψ ∗ f − f kp < ε for all symmetric ψ ∈ Cc+ (G)
R
with supp(ψ) ⊆ V and ψ = 1. Moreover, if p = ∞, then the first inequality holds if f is
right uniformly continuous, and the second holds if f is left uniformly continuous.
Proof. We prove only the part concerning f ∗ ψ. Given ε > 0, by the preceding lemma we
may choose a neighborhood V of e such that kRy f − f kp < ε for all y ∈ V . If f is right
uniformly continuous, then we may choose
R V so that kRy f − f k∞ < ε. Now let ψ ∈ Cc+ (G)
be symmetric with supp(ψ) ⊆ V and ψ = 1. Then, by left invariance and symmetry of ψ,
Z Z Z
 
f ∗ ψ(x) − f (x) = f (y)ψ(y −1 x) dy − f (x) ψ(y) dy = f (xy) − f (x) ψ(y) dy.

By Minkowski’s integral inequality (4.1.5),


Z Z p 1/p Z Z 1/p
p
kf ∗ ψ − f kp ≤
f (xy) − f (x) ψ(y) dy dx ≤
f (xy) − f (x) dx ψ(y) dy

Z
= kRy f − f kp ψ(y) dy < ε,
V

3 For a discrete group G, L1 (G) actually has an identity, namely the indicator function 1{e} .
400 Principles of Analysis

verifying the desired inequality for p < ∞. If f is right uniformly continuous, then
Z
kf ∗ ψ − f k∞ ≤ kRy f − f k∞ ψ(y) dy < ε.

Theorem 16.4.6 is typically used as follows: Since the set of all neighborhoods V of the
identity is directed downward by inclusion, we may form a net (ψV )V , where ψV has the
properties in the theorem. We then have
lim f ∗ ψV = f in Lp , 1 ≤ p ≤ ∞.
V

(For p = ∞, f must be uniformly continuous.) The net (ψV ) is called an approximate


identity for Lp (G). More Rgenerally, this term is applied to any net (ψα ) in Cc+ (G) for which
f ∗ ψα → f in Lp (G) and ψα = 1. Here is an application:
16.4.7 Theorem. A closed subspace I of L1 (G) is a right (left) ideal iff it is closed under
right (left) translations.
Proof. We prove the right version. Suppose I is a right ideal. For f ∈ I and (ψα ) an
approximate identity, we have
Z Z
Rx (f ∗ ψα )(y) = f (z)ψα (z −1 yx) dz = f (z)(Rx ψα )(z −1 y) dz = (f ∗ Rx ψα )(y).

Since Rx is continuous on L1 , Rx f = limα Rx (f ∗ ψα ) = limα (f ∗ Rx ψα ) ∈ I, proving that


I is closed under right translation.
Conversely, assume that I is right translation invariant. For f ∈ I and g ∈ Cc (G), we
have
Z Z Z
f ∗ g(x) = f (y)g(y −1 x) dy = f (xy)g(y −1 ) dy = g(y −1 )(Ry f )(x) dy.

The function y 7→ g(y −1 )Ry f is L1 -continuous and has compact support,


R hence may be
weakly integrated as in 14.6.3. The result is a unique function F = g(y −1 )Ry f dy in the
closed linear span of the right translations of f , and therefore in I, such that
Z ZZ Z
h(x)F (x) dx = h(x)g(y −1 )Ry f (x) dy dx = h(x)(f ∗ g)(x) dx, h ∈ L∞ (G).

It follows that f ∗ g = F ∈ I.

The Measure Algebra


Let µ and ν be complex Radon measures on B(G). For φ ∈ Cc (G) define
ZZ
I(φ) := φ(xy) dµ(x) dν(y).

Then I is a continuous linear functional on Cc (G) and so is given by a complex Radon


measure µ ∗ ν on B(G). Thus
Z ZZ
φ d(µ ∗ ν) = φ(xy) dµ(x) dν(y), φ ∈ Cc (G).

The measure µ∗ν is called the convolution of µ and ν. By 7.3.2 and 7.3.3, µ∗ν may also be
seen as the image measure m(µ⊗ν) of µ⊗ν under the multiplication mapping m(x, y) = xy.
Therefore, we have Z Z
h(z) d(µ ∗ ν)(z) = h(xy) d(µ⊗ν)(x, y)
Analysis on Locally Compact Groups 401

in the usual sense that whenever one side exists then so does the other and equality holds.
It is easy to check that the collection Mra (G) of Radon measures on G is a Banach algebra
under the operation of convolution. The proof is the same as for the special case Mra (Rd )
(see 6.4.1). Moreover, Mra (G) is a ∗-algebra under involution µ → µ∗ defined by

µ∗ (E) = µ(E −1 ), E ∈ B(G)

or, equivalently,
Z Z Z −
∗ −1 −1
φ(x) dµ (x) = φ(x ) dµ(x) = φ(x ) dµ(x) , φ ∈ Cc (G).

For example, the requirement (µ ∗ ν)∗ = ν ∗ ∗ µ∗ follows from the calculations


Z Z −  Z Z −

φ d(µ ∗ ν)∗ = φ(x−1 ) d(µ ∗ ν)(x) = φ (xy)−1 dµ(x) dν(y) and
Z ZZ Z Z −
∗ ∗ ∗ ∗ −1
φ d(ν ∗ µ ) = φ(yx) dν (y) dµ (x) = φ(y x) dν dµ∗ (x)
 ZZ −

= φ y −1 x−1 dν(y) dν(x) .

Finally, the Dirac measure δe is an identity for Mra (G) as is seen, for example, from
Z ZZ Z Z
φ d(µ ∗ δe ) = φ(xy) dµ(x)dδe (y) = φ(xe) dµ(x) = φ(x) dµ(x).

16.5 Representations
Positive-Definite Functions
A function φ : G → C is said to be positive definite if
n
X
cj ck φ(x−1
k xj ) ≥ 0 for all cj ∈ C, xj ∈ G, and n ∈ N. (16.5)
j,k=1

Using the Euclidean inner product, we may write this condition as (Ac | c) ≥ 0, where
c = (c1 , . . . , cn ) and A = [ajk ]n×n , ajk := φ(x−1
k xj ). Thus φ is a positive definite function
iff A is a positive definite matrix.
16.5.1 Proposition. Let φ be positive definite and x, y ∈ G. Then
(a) φ(x−1 ) = φ(x).

(b) |φ(x)| ≤ φ(e), hence kφk∞ = φ(e).

(c) |φ(x) − φ(y)| ≤ 2φ(e) Re [φ(e) − φ(xy −1 )].

Proof. For the parameters n = 1, x1 = e, and c1 = 1, we have φ(e) ≥ 0. Now take n = 2,


x1 = x, x2 = e. Then for |c| = 1,
  
  φ(e) φ(x) 1
0 ≤ (A(1, c) | (1, c)) = 1 c = 2φ(e) + cφ(x) + cφ(x−1 ). (†)
φ(x−1 ) φ(e) c
402 Principles of Analysis

Therefore, cφ(x) + cφ(x−1 ) is real. Taking c = 1 and c = i shows that φ(x) + φ(x−1 ) and
i[φ(x) − φ(x−1 )] are real, which implies (a). Choosing c in (†) so that cφ(x) = −|φ(x)| and
using (a), we have 0 ≤ 2φ(e) − |φ(x)| + cφ(x−1 ) = 2φ(e) − 2|φ(x)|, proving (b).
For (c), take n = 3, x1 = e, x2 = x, x3 = y. For |c| = 1 and t real,
 −1
 
  φ(e) φ(x ) φ(y −1 ) 1
0 ≤ (A(1, tc, −tc) | (1, tc, −tc)) = 1 tc −tc φ(x) φ(e) φ(y −1 x)  tc 
−1
φ(y) φ(x y) φ(e) −tc
      
= φ(e) 1 + 2t + ct φ(x) − φ(y) + ct φ(x ) − φ(y ) − t φ(y x) + φ(x−1 y)
2 −1 −1 2 −1
   
= φ(e) 1 + 2t2 + 2tRe c φ(x) − φ(y) − 2t2 Re φ(y −1 x),

the last equality by (a). Taking c = |φ(x) − φ(y)|[φ(x) − φ(y)]−1 we have for all real t

0 ≤ 2[φ(e) − Re φ(y −1 x)]t2 + 2 φ(x) − φ(y) t + φ(e) =: at2 + bt + c.

The discriminant b2 − 4ac is therefore ≤ 0, implying (c).

Functions of Positive Type


A function φ ∈ L∞ (G) is said to be of positive type if
Z
(f ∗ ∗ f )φ ≥ 0 for all f ∈ L1 (G).

Since Cc (G) is dense in L1 (G), to test for this property it suffices to take f ∈ Cc (G).
R if fn ∈ Cc (G)
Indeed, R satisfies the preceding inequality for all n and if kfn − f k1 → 0, then
0 ≤ (fn∗ ∗ fn )φ → (f ∗ ∗ f )φ by L1 continuity of convolution.
For future reference we note that
Z ZZ ZZ
(g ∗ ∗ f )φ = ∆(y −1 )g(y −1 )f (y −1 x)φ(x) dy dx = g(y)f (x)φ(y −1 x) dx dy, (16.6)

where we have used 16.2.8 and the left invariance of dx. Taking g = f and considering the
conjugate of the last integral, we see that φ is of positive type iff φ is of positive type.
We denote the set of all continuous functions of positive-type by P(G):
 Z 
∗ 1
P(G) := φ ∈ Cb (G) : (f ∗ f )φ ≥ 0 for all f ∈ L (G) .

Note that P(G) is convex.


The following proposition shows that for bounded continuous functions, the notions of
positive-definite function and function of positive-type coincide.

16.5.2 Proposition. Let φ : G → C be bounded and continuous. Then φ is of positive type


iff φ is positive definite.
Proof. Let φ be positive definite and let f ∈ Cc (G), K := supp(f ). Then the function
g(x, y) := f (x)f (y)φ(y −1 x), which is the integrand in (16.6), is continuous and has support
contained in K × K and so is uniformly continuous. Given ε > 0, choose a neighborhood V
of e such that
|g(x, y) − g(a, b)| < ε whenever x ∈ aV and y ∈ bV .
By compactness, K is a finite union of sets of the form aV ∩ K (a ∈ K), hence there exists a
Analysis on Locally Compact Groups 403

measurable partition E1 , . . . , En of K and points xj ∈ Ej such that |g(x, y) − g(xj , xk )| < ε


for all (x, y) ∈ Ej × Ek . We then have
Z X ZZ
I := (f ∗ ∗ f )φ = 1Ej ×Ek (x, y)g(x, y) dx dy,
j,k
X X
Sε := |Ej | |Ek | g(xj , xk ) = |Ej | f (xj )|Ek | f (xk )φ(x−1
k xj ) ≥ 0, and
j,k j,k
X ZZ  
I − Sε = 1Ej ×Ek (x, y) g(x, y) − g(xj , xk ) dx dy.
j,k

Since
X ZZ
|I − Sε | ≤ 1Ej ×Ek (x, y)|g(x, y) − g(xj , xk )| dx dy ≤ ε|K|2 ,
j,k

I = I − Sε + Sε ≥ I − Sε → 0, hence I ≥ 0. Therefore, φ is of positive type.


Conversely, let φPbe of positive type and let cjR ∈ C, xj ∈ G. For an approximate identity
n
(ψU ), set fU (x) := j=1 cj ψU (x−1 ∗
j x) and IU := (fU ∗ fU )φ. By (16.6) and left invariance,
Z Z X Z Z
IU = fU (y)fU (x)φ(y −1 x) dx dy = cj ck ψU (x)ψU (y)φ(y −1 x−1
k xj x) dx dy.
U U j,k U U

R
Since ψU = 1 we see that
X X Z Z
 
IU − cj ck φ(x−1
k xj ) = cj ck ψU (x)ψU (y) φ(y −1 x−1 −1
k xj x) − φ(xk xj ) dx dy.
j,k j,k U U

Given ε > 0, choose U0 so that |φ(y −1 x−1 −1


k xj x) − φ(xk xj )| < ε for all x, y ∈ U0 and all j, k.
Then for any U ⊆ U0 ,
X X

I − c c φ(x −1
x ) ≤ε |cj ck |,
U j k k j
j,k j,k

P −1
which shows that limU IU = j,k cj ck φ(xk xj ). Since IU ≥ 0, the limit is nonnegative.

Unitary Representations
Let X be a normed space. The strong operator topology of B(X) is the locally convex
topology defined by the seminorms

p(T ) = max{kT xj k : xj ∈ X, 1 ≤ j ≤ n}.

The weak operator topology of B(X) is the locally convex topology defined by the
seminorms

p(T ) = max{| T xj , x0j | : xj ∈ X, x0j ∈ X 0 , 1 ≤ j ≤ n}.
Thus a net (Tα ) in B(X) converges to T in the strong operator topology (resp., weak
s w
operator topology) iff Tα x → T x (resp., Tα x → T x) for each x ∈ X.
A representation of G on X is a mapping π from G into B(X) such that

π(xy) = π(x)π(y), x, y ∈ G.
404 Principles of Analysis

If X = H is a Hilbert space and each π(x) is unitary, then π is called a unitary repre-
sentation of G. In this case we shall require that π be continuous in the strong operator
topology. Thus a unitary representation π : G → B(H) satisfies

π(xy) = π(x)π(y), π(x−1 ) = π(x)−1 = π(x)∗ , and x 7→ π(x)x is continuous ∀ x ∈ H.

It is notable that continuity of π in the strong operator topology is equivalent to continuity


in the weak operator topology. Indeed, if (Uα ) is a net of unitary operators converging in
the weak operator topology to a unitary operator U , then
2 2 2 2
kUα x − U xk = kUα xk − 2Re (Uα x | U x) + kU xk = 2 kxk − 2Re (Uα x | U x)
2
→ 2 kxk − 2 (U x | U x) = 0.

An important example of a unitary representation of G is the left regular representa-


tion πL on L2 (G) defined by
πL (x)f = Lx−1 f.
Indeed, the unitary property follows immediately from the left invariance of dx:
Z Z
 −1 −1
πL (x)f | πL (x)g = f (x y)g(x y) dx = f (y)g(y) dx = (f | g).
 R R
Since πL (x)f | g = f (x−1 y)g(y) dy = (Lx−1 f (y))g(y) dy, continuity of π follows from
16.4.5.
There is a close connection between unitary representations and functions of positive-type.
Here is one part of the connection. A converse is given by 16.5.7.
16.5.3 Proposition. Let π be a unitary representation of G on H. For a vector x ∈ H
define
φ(x) = (π(x)x | x) , x ∈ G, (16.7)
Then φ ∈ P(G).
Proof. By (16.6), for any f ∈ Cc (G)
Z ZZ ZZ
∗ −1

(f ∗ f )φ = f (y)f (x)φ(y x) dx dy = f (y)f (x) π(y −1 x)x | x dx dy
ZZ
= (f (x)π(x)x | f (y)π(y)x) dx dy = (u | u) ≥ 0,
R
where u is the weak vector integral f (x)π(x)x dx (14.6.3).

16.5.4 Corollary. Let f ∈ L2 (G) and fe(x) := f (x−1 ) (= ∆(x)f ∗ (x)). Then f ∗ fe ∈ P(G).
R 
Proof. f ∗ fe(x) = f (x−1 y)f (y) dy = πL (x)f | f .
16.5.5 Corollary. Let Pc (G) := Cc (G) ∩ P(G). Then Cc (G) ∗ Cc (G) ⊆ span Pc (G)).
Moreover, span Pc (G) is dense in Cc (G) in the uniform norm and is dense in Lp (G) in the
Lp norm for 1 ≤ p < ∞.
Proof. Let f ∈ Cc (G) and K := suppf . By 16.5.4, f ∗ fe ∈ P(G). Also, from f ∗ fe(x) =
R
K
f (y)f (x−1 y) dy we see that supp(f ∗ fe) ⊆ KK −1 . Therefore, f ∗ fe ∈ Pc (G). Since the
mapping (g, h) 7→ g ∗ eh on Cc (G) × Cc (G) is sesquilinear, by the polarization identity we
P4
have g ∗ e
h = 14 k=1 ik (g + ik h) ∗ (g + ik h)e. Replacing h by e
h we see that g ∗ h ∈ span Pc (G).
Taking h to be an approximate identity, we conclude that span Pc (G) is dense in Cc (G) in
the uniform and Lp norms and hence is dense in Lp .
Analysis on Locally Compact Groups 405

The converse of 16.5.3 is considerably deeper. We shall the following lemma.


R
16.5.6 Lemma. Let φ be a bounded Borel function on G such that f φ = 0 for all
f ∈ L1 (G). Then φ = 0 a.e. on each Borel set E with |E| < ∞. If, additionally, φ is
continuous, then φ = 0 everywhere.
Proof. Let K ⊆ E be compact andRlet g ∈ Cc (G) be nonnegative with g = 1 on K. Taking
f = gφ in the hypothesis, we have K |φ|2 = 0, hence φ = 0 a.e. on K. By regularity, there
exist compact Kn with Kn ↑ B ⊆ E and |Kn | ↑ |E|. Then φ = 0 a.e. on B and |E \ B| = 0,
hence φ = 0 a.e. on E. If φ is continuous, then φ is identically equal to zero on any compact
set K by 16.2.2(c), so φ is the zero function.
A cyclic vector for a unitary representation π : G → B(H) is a member x of H such
that the linear span of πφ (G)x is dense in H. Here is the aforementioned converse of 16.5.3.
16.5.7 Theorem. Let φ be of positive type on G. Then there exists a Hilbert space Hφ ,
a unitary representation πφ : G → B(Hφ ), and a cyclic vector x ∈ Hφ such that φ(·) =
(π(·)x | x)φ a.e. on every Borel set E with |E| < ∞. Thus if φ ∈ P(G), then φ(·) =
(π(·)x | x)φ everywhere.
Proof. Assume that φ 6= 0. Define
Z ZZ

(f | g)φ = (g ∗ f )φ = g(y)f (x)φ(y −1 x) dx dy, f, g ∈ L1 (G), (16.8)

where the second equality is from (16.6). Then (f | g)φ is a positive sesquilinear form on
L1 (G) and by the CBS inequality

(f | g)φ ≤ (f | f )φ (g | g)φ ≤ kf k1 kgk1 kφk∞ . (16.9)

Now define N := {f ∈ L1 : (f | f )φ = 0}. Then N is a closed linear subspace of L1 (G). Let


f 7→ f˘ denote the quotient map L1 (G) → L1 (G)/N and define
 
f˘ | ğ = (f | g)φ , f, g ∈ L1 (G).
φ

If f˘1 = f˘2 and g˘1 = g˘2 , then (f1 − f2 | f1 − f2 )φ = (g1 − g2 | g1 − g2 )φ = 0 and so by the
CBS inequality

| (f1 | g1 )φ − (f2 | g2 )φ | ≤ | (f1 − f2 | g1 )φ | + | (f2 | g2 − g2 )φ | = 0.

Therefore, (f˘ | ğ)φ is well-defined. It is readily established that (f˘ | ğ)φ is an inner product
on L1 (G)/N. Denote the Hilbert space completion of L1 (G)/N by Hφ (11.1.7). From (16.9),

˘
f ğ φ ≤ kgk1 kf k1 kφk∞ .

Next, for x ∈ G define L̆x on L1 (G)/N by L̆x f˘ = (Lx f )˘. By left invariance,
ZZ ZZ
−1
(Lx f | Lx g)φ = g(xz)f (xy)φ(z y) dy dz = g(z)f (y)φ(z −1 y) dy dz = (f | g)φ ,

hence L̆x is well-defined, preserves the inner products, and therefore extends to a unitary
operator on Hφ . Now define a mapping πφ : G → B(Hφ ) by πφ (x) = L̆x−1 . Then

πφ (x)πφ (y)f = L̆x−1 (L̆y−1 f ) = L̆x−1 (Ly−1 f )˘= (Lx−1 Ly−1 f )˘= (Ly−1 x−1 f )˘= πφ (xy)f,
406 Principles of Analysis

hence πφ is a unitary representation of G on Hφ .


It remains to find a cyclic vector x ∈ Hφ such that (16.7) holds. To this end, let (ψα )
R R
be an approximate identity in L1 (G). Then for f ∈ L1 (G), (f˘ | ψ̆α )φ = (ψα∗ ∗ f )φ → f φ.
Since the net (ψ̆α ) is bounded in the norm of Hφ , there exists a subnet (ψ̆β ) that converges
weakly to some x ∈ Hφ . Thus
Z
˘
(f | x)φ = f φ, f ∈ L1 (G), (†)

and so
Z Z
(f˘ | πφ (y)x)φ = (πφ (y)−1 f˘ | x)φ = f (yx)φ(x) dx = f (x)φ(y −1 x) dx.

Therefore, for any f, g ∈ L1 (G),


ZZ Z

(f˘ | ğ)φ = g(y)f (x)φ(y −1 x) dx dy = g(y) f˘ | πφ (y)x φ dy. (‡)


It follows that if f˘ | πφ (y)x = 0 for all y, then f˘ = 0, which shows that the linear
span of πφ (G)x is dense in Hφ . Moreover, if g ∈ Cc (G), then the vector integral I(g) :=
R 
g(y)πφ (y)x dy exists, and from (‡) we have (f˘ | ğ)φ = f˘ | I(g) φ for all f . Therefore,
Z Z
  
x | πφ (y)x φ g(y) dy = x | I(g) φ = lim ψ̆β | I(g) φ = (x | ğ)φ = gφ,
β

the last equality from (†). The desired conclusion now follows from the preceding lemma,
since Cc (G) is dense in L1 .
It is not necessarily the case that φ(·) = (π(·)x | x) a.e. on G. Indeed, as the proof shows,
such a conclusion would depend on (L1 , L∞ ) duality, which holds generally only in the
σ-finite case.

Irreducible Representations
Let π be a unitary representation of G on a Hilbert space H. An invariant subspace for
π is a subspace M of H such that π(x)M ⊆ M for all x ∈ G. If the only invariant subspaces
for π are the trivial subspaces {0} and H, then π is said to be irreducible; otherwise π is
reducible. Also, call an operator in B(H) nontrivial if it is not a multiple of the identity
operator I. The following result is a fundamental tool in the study of representations.

16.5.8 Schur’s Lemma. A unitary representation π is reducible iff there exists a nontrivial
T ∈ B(H) that commutes with every π(x).
Proof. Assume that π is reducible and let M be a nontrivial closed subspace of H such that
π(x)M ⊆ M for all x ∈ G. For x ∈ M and x⊥ ∈ M ⊥ , x | π(x)x⊥ = π(x−1 )x | x⊥ = 0,
hence π M ⊥ ⊆ M ⊥ . If P denotes the orthogonal projection onto M, then

π(x)P (x + x⊥ ) = π(x)x = P π(x)x = P π(x)(x + x⊥ ),

hence P is a nontrivial operator commuting with every π(x).


Conversely, let T be a nontrivial operator commuting with every π(x). Then
∗ ∗
π(x)T ∗ = T π(x−1 ) = π(x−1 )T = T ∗ π(x),
Analysis on Locally Compact Groups 407

hence T ∗ commutes with each π(x). Therefore, the self-adjoint operators Tr := (T + T ∗ )/2
and Ti := (T − T ∗ )/(2i) commute with π(x). Since T = Tr + iTi , at least one of the operators
is nontrivial. Thus we may as well assume that the original operator T is self-adjoint. Now
consider the Borel functional calculus f 7→ f (T ). Since π(x) commutes with T it commutes
with the projections PE := 1E (T ), where E is a nontrivial Borel subset of σ(T ). Then ran PE
is a nontrivial subspace of H invariant under every π(x), hence π is reducible.

16.5.9 Corollary. If G is abelian and π is irreducible, then dim(H) = 1.


Proof. For each y ∈ G, π(y)π(x) = π(x)π(y) for all x ∈ G, hence there exists a constant
c(y) such that π(y) = c(y)I. Thus all subspaces of H are invariant and so, by irreducibility,
dim(H) = 1.
The connection between irreducible unitary representations and functions of positive-type
is given in the next theorem. For this, let C1 and S1 denote, respectively, the closed unit
ball and the unit sphere in L∞ (G).
16.5.10 Theorem. Let φ ∈ P(G) ∩ S1 and let πφ be as in 16.5.7. Then πφ is irreducible
iff φ is an extreme point of P ∩ S1 .
Proof. Let πφ be reducible, so that Hφ = M ⊕ M ⊥ for some nontrivial closed linear subspace
M of Hφ invariant under πφ . Let x ∈ Hφ be a cyclic vector for πφ . Since the spaces M and
M ⊥ are invariant and nontrivial, x is a member of neither, hence x = m + m⊥ , where
m, m⊥ 6= 0. Setting y = m/ kmk and z = m⊥ /km⊥ k, we then have

φ(x) = (πφ (x)x | x)φ = (πφ (x)m | m)φ + πφ (x)m⊥ | m⊥ φ
2
= kmk (πφ (x)y | y)φ + km⊥ k2 (πφ (x)z | z)φ . (†)

Since 2
2 2
kmk + m⊥ = kxk = (πφ (e)x | x)φ = φ(e) = 1,
equation (†) exhibits φ as a proper convex combination of members of P ∩ S1 . Therefore, φ
is not extreme.
Now assume that πφ is irreducible and let φ = θ + ψ, θ, ψ ∈ P(G). Then, by (16.8),
(f | g)φ = (f | g)θ + (f | g)ψ , which implies that (f | f )θ ≤ (f | f )φ and so

| (f | g)θ |2 ≤ (f | f )θ (g | g)θ ≤ (f | f )φ (g | g)φ .

It follows that B(f˘, ğ) := (f | g)θ is a well-defined bounded Hermitian sesquilinear form on
Hφ . By 11.4.1 there exists T ∈ B(Hφ ) such that (T f˘ | ğ)φ = (f | g)θ for all f, g ∈ L1 (G).
Recalling that
(πφ (x)f˘ | ğ)φ = (L̆x−1 f˘ | ğ)φ = (Lx−1 f | g)φ ,
with the analogous equations holding for θ, we have

(T πφ (x)f˘ | ğ)φ = (T (Lx−1 f )˘| ğ)φ = (Lx−1 f | g)θ = (f | Lx g)θ = (T f˘ | L̆x ğ)φ
= (πφ (x)T f˘ | ğ)φ .

Thus T commutes with πφ (x) for all x and so T = cI for some c ∈ C by Schur’s lemma.
Therefore,
Z Z
(g ∗ ∗ f )θ = (f | g)θ = (T f˘ | ğ)φ = (cf˘ | ğ)φ = (cf | g)φ = (g ∗ ∗ f )cφ
408 Principles of Analysis

for all f, g ∈ L1 . Since the functions g ∗ ∗ f form a dense subset of L1 , θ = cφ.


Now let φ = tθ + (1 − t)ψ, where θ, ψ ∈ P(G) ∩ S1 and 0 < t < 1. By the preceding
paragraph, tθ = aφ for some a ∈ C, hence (a − t)φ = (1 − t)ψ. RTaking norms, R we have
|a − t| = 1 − t. Therefore, φ = bψ for some b with |b| = 1 and so b (f ∗ ∗ f )ψ = (f ∗ ∗ f )φ
for all f ∈ L1 . Since the integrals are nonnegative, b > 0, hence b = 1 and φ = ψ. Therefore,
φ is an extreme point of P(G) ∩ S1 .
The next theorem, a fundamental result in representation theory, asserts that the ir-
reducible unitary representations of G separate points of G. For the proof we need the
following.
R
16.5.11 Lemma. Let f ∈ Cc (G), f 6= 0. Then there exists a ψ ∈ P(G) with (f ∗ ∗ f )ψ > 0.
2
Proof. Observe that f ∗ ∗ f is continuous and f ∗ ∗ f (e) = kf k2 > 0. Thus there exists a
compact, symmetric neighborhood V of e on which f ∗ ∗ f ≥ c > 0. Take g ∈ Cc (G) with
g ≥ 0 and g = 1 on V and set ψ := g ∗ g. By left invariance,
Z ZZ ZZ
∗ ∗ −1
(f ∗ f )ψ ≥ (f ∗ f )(x)1V (x)1V (y x) dy dx = (f ∗ ∗ f )(x)1V (x)1V (y −1 ) dy dx

≥ c|V |2 > 0.
16.5.12 Theorem (Gelfand-Raikov). Given distinct points x, y ∈ G, there exists an irre-
ducible unitary representation π of G such that π(x) 6= π(y).
Proof. Let a := x−1 y and choose g ∈ Cc (G) such that La g 6= g. Set f := La g − g ∈ Cc (G)
and choose ψ ∈ P(G) as in the lemma. Normalizing, we may assume ψ ∈ P(G) ∩ S1 . By
the Krein-Milman theorem, ψ is a weak∗ limit of convex combinations R of extreme points of
P(G) ∩ S1 , hence there must exist an extreme point φ such that (f ∗ ∗ f )φ > 0. Thus, in
2
the notation of 16.5.7, (f | f )φ > 0. Since πφ (a−1 )ğ − ğ φ = (f | f )φ > 0, πφ (x)ğ =
6 πφ (y)ğ.
Finally, by 16.5.10, πφ is irreducible.

Unitary Representations of Compact Groups


In this subsection G is assumed to be compact and π denotes a generic unitary represen-
tation of G on a Hilbert space H.
16.5.13 Lemma. For a fixed unit vector u ∈ H, define T ∈ B(H) by the vector integral
Z
T x = (x | π(x)u) π(x)u dx.

Then T is a compact, positive, nonzero operator and T π(x) = π(x)T for all x ∈ G.
Proof. For any x, y ∈ H,
Z
(T x | y) = (x | π(x)u) (π(x)u | y) dx.
R
In particular, (T x | x) = | (x | π(x)u) |2 dx ≥ 0, and because | (u | π(x)u) |2 is continuous
in x and positive at x = e, (T u | u) > 0. Therefore, T is a nonzero, positive operator.
Furthermore, by translation invariance,
Z Z

(T π(y)x | y) = (π(y)x | π(x)u) (π(x)u | y) dx = x | π(y −1 x)u (π(x)u | y) dx
Z Z

= (x | π(x)u) (π(yx)u | y) dx = (x | π(x)u) π(x)u | π(y −1 )y dx

= T x | π(y −1 )y = (π(y)T x | y) .
Analysis on Locally Compact Groups 409

Therefore, T π(x) = π(x)T .


Now, because G is compact, the function π(·)u is uniformly continuous, hence given
ε > 0, there exists a measurable partition E1 , . . . , En of G and xj ∈ Ej such that x ∈ Ej ⇒
kπ(x)u − π(xj )uk < ε. Now set
Z
Tj x := (x | π(x)u) π(x)u dx and Fj x := |Ej | (x | π(xj )u) π(xj )u
Ej

and note that


Z
kTj x − Fj xk ≤ k (x | π(x)u) π(x)u − (x | π(xj )u) π(xj )uk dx.
Ej

For x ∈ Ej , the integrand is less than or equal to



(x | [π(x) − π(xj )]u) π(x)u + (x | π(xj )u) [π(x) − π(xj )]u ≤ 2ε kxk ,
Pn
hence kTj x − Fj xk < 2ε kxk |Ej | and so T − j=1 Fj < 2ε. This shows that T may be
approximated in norm by operators of finite rank and hence is compact.
16.5.14 Corollary. If π is irreducible, then H is finite dimensional.
Proof. The compact operator T of the lemma commutes with every π(x), hence is a nonzero
multiple of the identity. Thus the identity operator on H is compact, which implies that H
is finite dimensional.
The following lemma will allow us to express a representation in terms of irreducible
representations.
16.5.15 Lemma. If H is finite dimensional, then π is a direct sum of irreducible represen-
tations. That is, H = M1 ⊕ · · · ⊕ Mn , where π is irreducible on Mj .
Proof. If π is reducible, then it has a nontrivial invariant subspace M. Since M ⊥ is also
invariant, we may assume by induction that M and M ⊥ are each direct sums of irreducible
subspaces, hence so is H.
16.5.16 Theorem. Every representation π of G is a direct sum of irreducible (hence finite
dimensional) representations. That is, there exists a family of mutually orthogonal invariant
subspaces of H with linear span dense in H such that the restriction of π to each subspace
is irreducible.
Proof. Since the operator T of 16.5.13 is compact and positive, it has a nonzero eigenvalue
(12.3.8) and hence a finite dimensional eigenspace M. Since T commutes with π, M is
π-invariant. By 16.5.15, M is a direct sum of irreducible representations. In particular, there
exist irreducible subrepresentations.
Now consider families of mutually orthogonal irreducible π-invariant subspaces. Ordering
these families by inclusion and applying Zorn’s lemma yields a maximal family {Mi : i ∈ I}.
It then follows that H is the closed linear span S of all the Mi , otherwise S ⊥ would contain
a π-invariant irreducible subspace, contradicting maximality.
A coefficient of π is a function on G of the form (π(·)x | y), x, y ∈ H. Let C(G) denote
the linear span of all coefficients of finite dimensional representations of G. By 16.5.15, C(G)
is also the linear span of coefficients of all finite dimensional irreducible representations of G.
The final and main theorem of this subsection asserts that C(G) is dense in C(G). For this
we need the following lemma.
410 Principles of Analysis

16.5.17 Lemma. Let K be a finite dimensional complex Hilbert space and let V be a
group of operators on K (under composition) whose identity is the identity operator. If V is
compact in B(K), then there exists an inner product on K relative to which each member
of V is unitary.
Proof. Clearly, V is a topological group under composition. If dV denotes normalized Haar
measure on V and (x | y) is the given inner product on K, then
Z
hx | yi := (V x | V y) dV
cl V
is the required new inner product on K. For example, the calculation
Z
hV0 x | V0 yi = (V V0 x | V V0 y) dV = hx | yi
V
shows that V0 ∈ V is unitary.
We may now prove
16.5.18 Theorem (Peter-Weyl). Let G be a compact topological group. Then C(G) is dense
in C(G).
Proof. By the Gelfand-Raikov theorem, C = C(G) separates points of G. We show that C
is closed under multiplication and complex conjugation. The desired conclusion will then
follow from the Stone-Weierstrass theorem.
The product of typical members of C is of the form
X n  X
m  X
(πj (x)xj | yj ) (e ek |y
πk (x)x e k) = (πj (x)xj | yj ) (e
πk (x)xk | yk ) .
j=1 k=1 j,k

To prove closure under multiplication, it therefore suffices to show that if π : G → B(H)


and πe : G → B(H)f are finite dimensional unitary representations of G, then the function
x 7→ (π(x)x | y) (e e | y)
π (x)x e is a member of C. For this we use 12.4.6, which implies that
for each x ∈ G there exists a unique bounded linear operator π(x) ⊗ π e(x) on the finite
dimensional Hilbert space B2 (H, H)f such that

(π(x) ⊗ π e |y⊗y
e(x))x ⊗ x e = (π(x)x | y) (e e | y)
π (x)x e .

This defines a unitary representation π ⊗ π f


e on B2 (H, H).
To show that C(G) is closed under complex conjugation, it suffices the show that if
π : G → B(H) is a finite dimensional unitary representations of G, then (y0 | π(·)x0 ) is a
coefficient of a finite dimensional unitary representation on G. To this end, let F denote the
finite dimensional subspace of C(G) consisting of all functions gx defined by
gx (t) = (y0 | π(t)x) , t ∈ G, x ∈ H.
Since F is right translation invariant, t 7→ Rt is a continuous representation of G on the
space F. By the lemma, there exists an inner product h· | ·i on F relative to which the
operators Rt are unitary. Since the evaluation map eb is a continuous linear functional on F,
by the Riesz representation theorem there exists a member gx1 of F such that
gx (e) = hgx | gx1 i , x ∈ H.
It follows that


(y0 | π(s)x0 ) = gx0 (s) = gπ(s)x0 (e) = gπ(s)x0 | gx1 = hRs gx0 | gx1 i ,
which shows that (y0 | π(·)x0 ) is a coefficient of the unitary representation R, completing
the proof.
Analysis on Locally Compact Groups 411

16.6 Locally Compact Abelian Groups


In this section, G is assumed to be abelian.

The Dual Group


A character of G is a continuous homomorphism ξ from G into the circle group T. The
b Two characters ξ1 and ξ2 may be multiplied together
set of all characters is denoted by G.
to form another character:
     
(ξ1 ξ2 )(xy) = ξ1 (xy) ξ2 (xy) = ξ1 (x)ξ1 (y) ξ2 (x)ξ2 (y) = (ξ1 ξ2 )(x) · (ξ1 ξ2 )(x)

b is an
Furthermore, the map x 7→ ξ −1 (x) = ξ(x) is easily seen to define a character. Thus G
abelian group with identity the constant function 1. We show in this subsection that G b is
locally compact under a natural topology. We use the standard notation
b
hx, ξi = ξ(x), x ∈ G, ξ ∈ G.

The Fourier transform fb : G


b → C of f ∈ L1 (G) is defined by
Z Z

−1
fb(ξ) := hx, ξif (x) dx = x, ξ f (x) dx.

As in the case G = Rd (see 6.2.1),

f[
∗ g = fb · gb and fc∗ = f . (16.10)

Now define Φξ (f ) = fb(ξ). Clearly kΦξ (f )k = kfbk∞ ≤ kf k1 . Moreover, the mapping


ξ 7→ Φξ : Gb → L1 (G) is the restriction to Gb ⊆ L∞ (G) of the isometric isomorphism that
∞ 1 4
identifies L (G) with the dual of L (G). More can be said:
b onto the spectrum Σ =
16.6.1 Theorem. The function ξ 7→ Φξ is a bijection from G
σ(L1 (G)) of the commutative Banach algebra L1 (G).
Proof. Recall that Σ is the set of continuous, nontrivial homomorphisms from L1 (G) into C,
these being members of the dual of L1 (G). The calculation
ZZ ZZ
Φξ (f ∗ g) = hx, ξif (y −1 x)g(y) dy dx = hx, ξif (y −1 x)g(y) dx dy
ZZ
= hxy, ξif (x)g(y) dx dy = Φξ (f )Φξ (g)


shows that ΦGb ⊆ Σ. For the reverse inclusion, let Φ ∈ Σ ⊆ L1 (G)0 and choose φ ∈ L∞ (G)
(see footnote) such that
Z
Φ(f ) = φ(y)f (y) dy, f ∈ L1 (G).

4 In the non-σ-finite case, the assertion that the dual of L1 (G) is L∞ (G) requires a modification of the
definition of L∞ (G) using the notion of local measurability. We shall assume that L∞ (G) has been so
modified. (see [21]). Alternatively, the reader may simply assume in what follows that G is σ-finite.
412 Principles of Analysis

Fix g ∈ L1 (G) such Φ(g) 6= 0. For any f ∈ L1 (G),


Z ZZ
Φ(f ∗ g) 1
φ(y)f (y) dy = Φ(f ) = = φ(x)g(y −1 x)f (y) dy dx
Φ(g) Φ(g)
ZZ Z
1 1 
= φ(x)g(y −1 x)f (y) dx dy = Φ Ly−1 g f (y) dy.
Φ(g) Φ(g)

Therefore, φ may
−1  be identified with, and hence replaced by, the continuous function y 7→
Φ(g) Φ Ly−1 g , which is a nonzero continuous homomorphism from G into C. Since
b
φ(y n ) = φ(y)n for every n ∈ Z and φ is bounded, we see that |φ(y)| = 1, hence φ ∈ G.
Recall that Σ is locally compact in the weak∗ (Gelfand) topology of the dual of L1 (G). Let
b have the unique topology that makes the mapping ξ → Φξ : G
G b → Σ a homeomorphism.
b
Then G is locally compact, and a basic neighborhood of ξ0 ∈ G is of the form b
n o
b : fbj (ξ) − fbj (ξ0 ) < ε, j = 1, . . . , n ,
V (ξ0 ; f1 , . . . , fn ; ε) = ξ ∈ G (16.11)

b iff fb(ξα ) → fb(ξ0 ) for all


b converges to ξ0 ∈ G
where fj ∈ L1 and ε > 0. Thus a net (ξα ) in G
f ∈ L (G). Note that, by virtue of the homeomorphism ξ → Φξ , the two meanings of fb, one
1

as the Gelfand transform of f and the other as the Fourier transform of f , coincide:

fb Φξ = Φξ (f ) = fb(ξ), ξ ∈ G.
b

From this identification and 13.5.1 we have


16.6.2 Proposition. The space of Fourier transforms of members of L1 (G) is a conjugate
b that is dense in C0 (G).
closed subalgebra of C0 (G) b

We now show that G b is a topological group under the topology described in the preceding
paragraph. For this it is helpful to introduce an equivalent neighborhood system on G. b The
following lemmas accomplish this.
b is uniformly continuous. Moreover, hx, ξi is jointly continuous
16.6.3 Lemma. Every ξ ∈ G
b
in (x, ξ) ∈ G × G.
Proof. For f ∈ L1 (G),
Z Z Z
Lx f (ξ) = f (xy)ξ(y) dy = f (y)ξ(x y) dy = ξ(x) f (y)ξ(y) dy = ξ(x)fb(ξ),
d −1

−1
hence if fb(ξ) 6= 0, then ξ(x) = fb(ξ) L dx f (ξ). Since

d d
Lx f (ξ) − Ly f (ξ) ≤ kLx f − Ly f k1 ,

ξ is uniformly continuous. The calculation



d [ d [
Lx f (ξ) − Lx0 f (ξ0 ) ≤ kLx f − Lx0 f k1 + Lx0 f (ξ) − Lx0 f (ξ0 )

shows that the map (x, ξ) 7→ hx, ξi is jointly continuous.


16.6.4 Corollary. If xα → x0 in G, then hxα , ξi → hx0 , ξi uniformly in ξ on compact
b
subsets of G.
Analysis on Locally Compact Groups 413

16.6.5 Lemma. The sets


n o
b : sup | hx, ξi − hx, ξ0 i | < ε ,
W (ξ0 , K, ε) := ξ ∈ G
x∈K

where K ⊆ G is compact and ε > 0, form a basis of open neighborhoods of ξ0 ∈ G. b Thus


b iff hx, ξα i → hx, ξ0 i uniformly in x on each compact subset of G.
ξα → ξ0 in G
Proof. Let F (x, ξ) = | hx, ξi − hx, ξ0 i |. By 16.6.3, F is continuous on G × G.b To show that
b
W := W (ξ0 , K, ε) is open in G, fix ξ ∈ W and let x ∈ K. Then F (x, ξ) < ε, hence there
exists an open set Ux ⊆ G containing x and an open set Vx ⊆ G b containing ξ such that
S
F < ε on Ux × Vx . Since K is compact, there exist x1 , . . . , xn ∈ K such that K ⊆ j Uxj .
T b and F < ε on K × V , that is, V ⊆ W (ξ0 , K, ε).
Then V := Vx is a neighborhood of ξ in G
j j

b
Therefore, W (ξ0 , K, ε) is open in G.
It remains to show that every neighborhood V (ξ0 ; f1 , . . . , fn ; δ) in (16.11) contains
W (ξ0 , K, ε) for suitable K and ε > 0. Since
W (ξ0 , K1 ∪ K2 , ε1 ∧ ε2 ) ⊆ W (ξ0 , K1 , ε1 ) ∩ W (ξ0 , K2 , ε2 ),
it suffices to show that, given f ∈ L1 (G) and δ > 0, W (ξ0 , K, ε) ⊆ V (ξ0 ; f ; δ) for some K
and ε, that is,

| hx, ξi − hx, ξ0 i | < ε ∀ x ∈ K ⇒ fb(ξ) − fb(ξ0 ) < δ. (†)
But for any compact K ⊆ G,
Z Z Z Z
b b
f (ξ) − f (ξ0 ) ≤ |(ξ − ξ0 ) · f | + |(ξ − ξ0 ) · f | ≤ |(ξ − ξ0 ) · f | + 2 |f |,
K Kc K Kc

and choosing K so that the second term in the last inequality is < δ/2 and taking ε =
δ/(2 kf k1 ) we see that (†) holds.
We may now prove the main result of the subsection:
b is a locally compact abelian topological group in the Gelfand topology.
16.6.6 Theorem. G
b is a topological group. This follows easily from
Proof. All that needs to be proved is that G
the characterization of convergence given in 16.6.5: Let ξα → ξ and ζα → ζ uniformly on
compact sets K. Then ξα−1 = ξα → ξ = ξ −1 uniformly on K, and from the inequality
|ξα ζα − ξζ| ≤ |ξα ζα − ξζα | + |ξζα − ξζ| = |ξα − ξ| + |ζα − ζ|
we see that ξα ζα → ξζ uniformly on K.
The topological group G b is called the dual group of G. The following examples give
concrete representations of various dual groups.
16.6.7 Examples.
(a) The dual of R is R: Every character of (R, +) is of the form ξy (x) := eiyx , where y ∈ R.
Indeed, if ξ is a character of R, then for any a, x ∈ R,
Z a+x Z a Z a
ξ(t) dt = ξ(x + t) dt = ξ(x) ξ(t) dt.
x 0 0
Ra
Choosing a such that α := 0
ξ(t) dt 6= 0 (possible because ξ(0) = 1), we have
Z
1 a+x
ξ(x) = ξ(t) dt,
α x
414 Principles of Analysis

which shows that ξ is differentiable with derivative


 
ξ 0 (x) = α−1 ξ(a + x) − ξ(x) = β ξ(x), β := α−1 [ξ(a) − 1].

Therefore, ξ = ξy , where y = β/i, verifying the assertion. The mapping y → ξy : R → R b is


easily seen to be a homeomorphism and a group isomorphism. Thus (R, +) is its own dual.
(b) The dual of T is Z: Every character ξ of (T, ·) is of the form ξn (z) = z n , where
n ∈ Z. Indeed, the mapping x 7→ ξ(eix ) is a character of R, hence ξ(eix ) = eixy for some
y ∈ R and all x. Since e2πiy = ξ(e2πi ) = ξ(1) = 1, y must be an integer n. Therefore,
ξ(eix ) = (eix )n = ξn (eix ) for all x, verifying the assertion. Using 16.6.5, one easily sees that
the topology on T b is discrete and that the mapping n → ξn : Z → T b group isomorphism.
(c) The dual of Z is T: Every character ξ of (Z, +) is of the form ξz (n) = z n , where |z| = 1;
simply take z = ξ(1). The mapping z → ξz : T → Z b is clearly a homeomorphism and a group
isomorphism. ♦
Higher dimensional versions of the above examples may be obtained with the aid of the
following.
16.6.8 Proposition. Let Gj be a locally compact abelian topological group (1 ≤ j ≤ d) and
Q Q b
let G denote the product group j Gj . Then the product group j G j is isomorphic and
homeomorphic to G b under the mapping (ξ1 , . . . , ξd ) → ξ1 ⊗ · · · ⊗ ξd , where
Y
h(x1 , . . . , xd ), ξ1 ⊗ · · · ⊗ ξd i := hxj , ξj i .
j

Proof. ξ1 ⊗ · · · ⊗ ξd is clearly a character, and an arbitrary character ξ is of this form, where


j
ξj (x) := ξ(e, . . . , e, x, e . . . , e).
From the proposition and the above examples, we have the identifications
cd ∼
R cd =
= Rd , T cd =
∼ Zd , and Z ∼ Td .
b
b∼
Note that in each case G = G. That this holds in general is the content of the Pontrjagin
Duality Theorem, proved later.
We conclude this subsection with the following characterization of the dual of a quotient
group.
16.6.9 Theorem. Let H be a closed subgroup of G, Q : G → G/H the quotient map, and

b : ξ(H) = {1} . Then Ψ(ζ) := ζ ◦ Q defines a topological isomorphism of
set H ⊥ := ξ ∈ G
[ onto H ⊥ .
G/H
Proof. First, ζ ◦ Q is a continuous homomorphism and (ζ ◦ Q)(H) = {ζ(Q(e))} = {1}, hence
Ψ maps into H ⊥ . Since Ψ(ζ1 ζ2 ) := (ζ1 ζ2 ) ◦ Q = (ζ1 ◦ Q)(ζ2 ◦ Q), Ψ is a homomorphism. Now
let ξ ∈ H ⊥ and define ζ on G/H by ζ ◦ Q = ξ. Then ζ is well-defined, since Q(x) = Q(y)
⇒ Q(xy −1 ) = Q(e) ⇒ xy −1 ∈ H ⇒ ξ(x) = ξ(y). Also, ζ is a homomorphism into T with
ζ(Q(e)) = 1, and since Q is an open map, ζ is continuous. Thus Ψ maps G/H [ onto H ⊥ .
Since Ψ(ζ) = 1 implies ζ = 1, Ψ is a group isomorphism.
It remains to show that Ψ is a homeomorphism, that is, ζα → 1 uniformly on compact
subsets of G/H iff ζα ◦ Q → 1 on uniformly compact subsets of G. The necessity is clear,
since if K is compact in G, then Q(K) is compact in G/H. For the sufficiency, it suffices to
show that if C is compact in G/H, then there exists a compact K ⊆ G such that Q(K) = C.
To construct K, let U be an open neighborhood of e in G with compact closure S V . Since
the open sets Q(xU ) (x ∈ G) cover C, there exist x1 , . . . , xn ∈ G such that C ⊆ j Q(xj U ).
S
Then K := Q−1 (C) ∩ j xj V satisfies the requirements.
Analysis on Locally Compact Groups 415

For example, by the theorem the dual of R/Z consists of all characters on R of the form
d is isomorphic to Z. The latter can also be seen
x 7→ e2πinx (n ∈ Z), which implies that R/Z
from the fact that R/Z is topologically isomorphic to T under the map x + Z 7→ eix and
that the dual of T is Z.
16.6.10 Corollary. If x ∈ G \ H, then there exists ξ ∈ H ⊥ such that hx, ξi 6= 1.

Proof. By 16.5.9 and the Gelfand-Raikov theorem, the characters of a locally compact
[ such that ζ(xH) 6= 1. Then
abelian group separate points. Thus we may choose ζ ∈ G/H
ξ := ζ ◦ Q has the desired properties.

Bochner’s Theorem
b if
A function φ on G is said to be represented by µ ∈ Mra (G)
Z
φ(x) = hx, ξi dµ(ξ), x ∈ G. (16.12)

The theorem proved in this subsection gives necessary and sufficient conditions on φ for
such a representation to exist. We shall need the following lemma.
16.6.11 Lemma. Let µ and ν be complex Radon measures on G b such that
Z Z
hx, ξi dµ(ξ) = hx, ξi dν(ξ) for all x ∈ G.

Then µ = ν.
Proof. First, note that for f ∈ L1 (G),
ZZ ZZ Z
f (x) hx, ξi dµ(ξ) dx = f (x) hx, ξi dx dµ(ξ) = fb(ξ −1 ) dµ(ξ),

R R
and similarly for ν. Thus fb(ξ −1 ) dν(ξ) = fb(ξ −1 ) dµ(ξ) for all f ∈ L1 (G). Since the space
b (16.6.2), the measures µ and ν are equal.
of Fourier transforms is dense in C0 (G)

16.6.12 Theorem (Bochner). A function φ on G is represented by µ ∈ Mra (G) b iff φ ∈


P(G), in which case µ is unique. Moreover, if kφk∞ = 1, then µ is a probability measure.
Proof. Uniqueness follows from 16.6.11. If (16.12) holds, then for any f ∈ L1 (G),
Z ZZ ZZZ
∗ −1


(f ∗ f )φ = f (x)f (y)φ(y x) dx dy = f (x)f (y) y −1 x, ξ dµ(ξ) dx dy
ZZZ Z
= f (x) hx, ξi f (y) hy, ξi dx dy dµ(ξ) = |fb(ξ −1 )|2 dµ(ξ) ≥ 0,

hence φ is of positive type. That φ is continuous follows from inner regularity of µ and 16.6.4.
Therefore, φ ∈ P(G).
Conversely, let φ ∈ P(G). We may assume that kφk∞ = 1. By the CBS inequality (see
proof of 16.5.7),
Z 2  Z  Z 

(g ∗ ∗ f )φ ≤ (f ∗
∗ f )φ (g ∗
∗ g)φ , f, g ∈ L1 (G). (†)

416 Principles of Analysis

Now let ψV be an approximate identity in L1 (G) and take g = ψV in (†). Since kφk∞ = 1
R R 2
and ψV∗ ∗ ψV = ψV = 1, we have
Z 2 Z

(ψ ∗ ∗ f )φ ≤ (f ∗ ∗ f )φ.
V

Letting V → e we obtain
Z Z 1/2

f φ ≤ (f ∗ ∗ f )φ , f ∈ L1 (G).

Now set h = f ∗ ∗ f and hn = h ∗ · · · ∗ h (n factors). Iterating the preceding inequality, noting


that h∗ = h, we have
Z Z 1/2 Z 1/4 Z 1/2n+1
1/2n+1
f φ ≤ hφ ≤ h2 φ ≤ · · · ≤ h2n φ ≤ kh2n k1 .

1/2n+1 1/2 1/2


By 13.5.1 kh2n k1 → kb
hk∞ = k |fb|2 k∞ = kfbk∞ , hence
Z

f φ ≤ kfbk∞ , f ∈ L1 (G).

R
Define a linear functional F on L1 (G)b ⊆ C0 (G)b by F (fb) = f φ. The preceding inequality
shows that F is well-defined and kF (fb)k∞ ≤ kfbk∞ . Since L1 (G)b is dense in C0 (G)
b (16.6.2),
b with kF k∞ ≤ 1. By the Riesz representation theorem,
F has a continuous extension to C0 (G)
b
there exists a ν ∈ Mra (G) such that for all f ∈ L1 (G),
Z Z ZZ Z Z

−1
−1
f φ = F (fb) = fbdν = x, ξ f (x) dx dν(ξ) = f (x) x, ξ dν(ξ) dx,

hence Z Z

−1

φ(x) = x, ξ dν(ξ) = hx, ξi dµ(ξ),

b = φ(e) = kφk = 1 we see that µ ≥ 0.


where dµ(ξ) := dν(ξ −1 ). Finally, from µ(G) ∞

The Inversion Theorem


In this subsection we show that, for a suitable class of functions f , the Fourier transform
f 7→ fb may be inverted. The proof for the special case G = Rd given in Chapter 6 relied on
rapidly decreasing functions. As these are not available here, the proof for the general case
is based instead on functions of positive type. We begin with
16.6.13 Lemma. Let K ⊆ G b be compact. Then there exists f ∈ Pc (G) := P(G) ∩ Cc (G)
b b
such that f ≥ 0 and f > 0 on K.
R
Proof. Let g = ψ ∗ ∗ ψ, where ψ ∈ Cc (G) and ψ = 1. By translation invariance and
unimodularity,
Z ZZ ZZ
gb(ξ) = hx, ξig(x) dx = hx, ξiψ(y −1 )ψ(y −1 x) dy dx = hx, ξiψ(y)ψ(yx) dy dx
ZZ ZZ
= hx, ξiψ(y)ψ(yx) dx dy = b 2.
hy −1 x, ξiψ(y)ψ(x) dy dx = |ψ(ξ)|
Analysis on Locally Compact Groups 417

In particular, gb ≥ 0 and gb(e) = 1. By continuity, there exists an open neighborhood U of e


b such that gb > 0 on U . Since K is compact, there exist ξj ∈ K such that K ⊆ Sn ξj U .
in G
Pn j=1
Set f = g j=1 ξj . Then f ∈ Cc (G) and
n Z
X n Z
X n
X


fb(ξ) = hx, ξi hx, ξj i g(x) dx = x, ξξj−1 g(x) dx = gb(ξξj−1 ),
j=1 j=1 j=1

hence fb ≥ 0 on G
b and fb > 0 on K. Finally, for any ξ ∈ Gb and h ∈ L1 (G),
Z ZZ ZZ
(h∗ ∗ h)(ξg) = h(y −1 )h(y −1 x)ξ(x)g(x) dy dx = (ξh)(y −1 )(ξh)(y −1 x)g(x) dy dx
Z
= [(ξh)∗ ∗ (ξh)]g ≥ 0,
R
hence (h∗ ∗ h)f ≥ 0 and so f ∈ P(G).
Here is the promised inversion theorem. For convenience, we indicate the property of the
function f described in the conclusion of the last lemma by writing f ∼ K.
16.6.14 Theorem. If f ∈ S := L1 (G) ∩ span P(G), then fb ∈ L1 (G)
b and
Z
f (x) = hx, ξi fb(ξ) dξ, x ∈ G, (16.13)

b
where dξ is a suitably normalized Haar measure on G.
Proof. We give the proof in several steps:
R
b such that f (x) =
(1) For each f ∈ S there exists µf ∈ Mra G hx, ξi dµf (ξ). Moreover,
b
f dµg = gb dµf .
JThe first assertion follows from Bochner’s theorem. For the second, let h ∈ L1 (G). Then
Z Z Z Z
b
h gb dµf = h [ ∗ g dµf = hx, ξi(h ∗ g)(x) dx dµf (ξ) = (h ∗ g)(x)f (x−1 ) dx

= [(h ∗ g) ∗ f ](e).
R
Similarly b
h fbdµg = [(h ∗ f ) ∗ g](e). Since [(h ∗ g) ∗ f ] = [(h ∗ f ) ∗ g] we have
Z Z
b
h gb dµf = b h fbdµg for all h ∈ L1 (G).

Since the Fourier transforms b b gb dµf = fbdµg .K


h are dense in C0 (G),
b by 16.6.13 choose f ∈ Pc (G) such that f ∼ supp ϕ. Define
(2) For ϕ ∈ Cc (G),
Z Z
ϕ ϕ
I(ϕ) = dµf = dµf .
b
f supp ϕ fb

b
Then I is independent of the choice of f and is a positive linear functional on Cc (G).
JIf also g ∼ supp ϕ, then by step (1),
Z Z Z Z
ϕ ϕ ϕ b ϕ
dµf = gb dµf = f dµg = dµg .
b
f b
f gb b
f gb gb
418 Principles of Analysis

Clearly, I is positive and I(cf ) = cI(f ). To verify additivity, let f ∼ supp ϕ1 ∪ supp ϕ2 .
Then f ∼ supp ϕj , hence
Z Z Z
ϕ1 + ϕ2 ϕ1 ϕ2
I(ϕ1 + ϕ2 ) = dµf = dµf + dµf = I(ϕ1 ) + I(ϕ2 ).K
fb fb fb
R
b I(ϕb
(3) For g ∈ S and ϕ ∈ Cc (G), g) = ϕ dµg . In particular, I is nontrivial.
JBy step (1), Z Z Z
ϕb
g ϕ b
I(ϕb
g) = dµf = f dµg = ϕ dµg .
fb fb
R
Now choose g and ϕ so that ϕ dµg 6= 0.K
(4) I is translation invariant.
b and set τ (ξ) := ξζ. For the image measure τ (µf ) we have
JFix ζ ∈ G
Z Z Z
hx, ξi dµζf (ξ) = (ζf )(x) = hx, ζξi dµf (ξ) = hx, ξi dτ (µf )(ξ),

hence dµζf = dτ (µf ) by 16.6.11. Therefore, if f > 0 on supp ϕ ∪ supp Rζ ϕ, then


Z Z Z
ϕ(ξζ) ϕ(ξ) ϕ(ξ)
I(Rζ ϕ) = dµf = dµζf (ξ) = dµζf (ξ) = I(ϕ).K
b
f (ξ) b −1
f (ξζ ) d
(ζf )(ξ)

To complete the proof of the theorem, let dξ denote the Haar measure corresponding to
the linear functional I, and let f ∈ S. By step (3),
Z Z
ϕ(ξ)fb(ξ) dξ = ϕ(ξ) dµf (ξ) for all ϕ ∈ Cc (G).
b

It follows that fb(ξ) dξ = dµf (ξ) and fb ∈ L1 (G).


b Recalling the defining property of µf in
step (1), we see that (16.13) holds.
The following is a special case of the Gelfand-Raikov theorem. We give a simple independent
proof based on the preceding theorem.
16.6.15 Corollary. The characters of G separate points of G.
Proof. Let x, y ∈ G with x 6= y. Choose f ∈ Cc (G) such that f (x) 6= f (y). Since span Pc (G)
is dense in Cc (G), g(x) 6= g(y) for some g ∈ span Pc (G). Since
Z Z
hx, ξi gb(ξ) dξ = g(x) 6= g(y) = hy, ξi gb(ξ) dξ,

hx, ξi = b
6 hy, ξi for some ξ ∈ G.
For a given Haar measure dx on G, the measure dξ for which the conclusion of the
theorem holds is called the dual measure of dx. For example, in 6.2.4 we had the formulas
Z Z
f (ξ) = f (x)hx, ξi dx and f (x) = fb(ξ) hx, ξi dξ, hx, ξi := e2πiξx .
b

The map hx, ξi identifies R with its dual, and under this identification the dual of Lebesgue
measure is itself.
Analysis on Locally Compact Groups 419

16.6.16 Proposition. If G is compact, then G b has the discrete topology. Moreover, if Haar
measure on G is normalized so that |G| = 1, then the characters form an orthonormal set in
L2 (G) ⊆ L1 (G) and the dual measure is counting measure.
b then ξ ∈ L1 (G) and for all y
Proof. If |G| = 1 and ξ ∈ G,
Z Z Z
hx, ξi dx = hxy, ξi dx = hy, ξi hx, ξi dx.
R R
Thus if ξ 6= 1, then G ξ = 0. It follows that G ξζ = 1 or 0 according as ξ = ζ or ξ 6= ζ, that
is, the characters form an orthonormal set in L2 (G). R
R
Since the function φ 7→ φ is weak∗ continuous on C(G), U := {ξ ∈ G b : ξ − 1 < 1/2}
R
is open in G.b But ξ = 0 or 1, hence U = {1}. Therefore, {1} is open, which implies that G b
is discrete. R
Now, if g = 1 on G, then gb(ξ) = ξ = 1{1} (ξ). Therefore, if dµ(ξ) denotes the dual
measure on G, b then, by the inversion theorem,
Z
1 = g(e) = he, ξi gb(ξ) dµ(ξ) = µ{e}.

By translation invariance, µ{x} = 1 for all x ∈ G. Thus µ is counting measure.


Here is the dual of the preceding proposition.
16.6.17 Proposition. If G is discrete, then Gb is compact. Moreover, if Haar measure on
G is counting measure, then the dual measure on Gb satisfies |G|
b = 1.

Proof. If G is discrete, then the Dirac function δe is an identity for L1 (G), hence the spectrum
b of L1 (G) is compact. If Haar measure dx on G is counting measure and f = 1{e} , then
G
Z
b
f (ξ) = hx, ξif (x) dx = he, ξi = 1, ξ ∈ G, b
G

hence Z Z
1 = f (e) = he, ξifb(ξ) dξ = b
1 dξ = |G|.
G
b G
b

For example, consider the compact group T with Haar measure dθ/2π and dual group Z
with counting measure. The characters are, respectively, ξn (z) = z n and ξθ (n) = einθ hence,
the inversion theorem in this setting is
Z 2π ∞
X

fb(n) = f (θ)e−inθ , f (θ) = fb(n)einθ .
0 2π n=−∞

The Plancherel Theorem


The L2 properties of the Fourier transform on G are given in the following result:
16.6.18 Theorem (Plancherel). The Fourier transform f 7→ fb on L1 (G) ∩ L2 (G) extends
b
uniquely to a unitary transformation from L2 (G) onto L2 (G).
Proof. Let f ∈ L1 (G) ∩ L2 (G). By 16.5.4, f ∗ f ∗ ∈ P(G), hence we may apply the inversion
formula to f ∗ f ∗ to obtain
Z Z Z
|f (x)|2 dx = f ∗ f ∗ (e) = he, ξi (f\
∗ f ∗ )(ξ) dξ = |fb(ξ)|2 dξ,
420 Principles of Analysis

the last equality by (16.10). This shows that the Fourier transform is an L2 -isometry from
b Since L1 (G) ∩ L2 (G) contains Cc (G), which is dense in L2 (G), the
L1 (G) ∩ L2 (G) to L2 (G).
transform has a unique extension to an isometry T from L2 (G) into L2 (G). b It remains to
show that T is surjective. For this it suffices to show that the image of L (G)∩L2 (G) under T
1

has a trivial orthogonal complement. To this end, let ϕ ∈ L2 (G) b with ϕ ⊥ L1 (G) ∩ L2 (G) b.
For any x ∈ G, ξ ∈ G,b and f ∈ L1 (G) ∩ L2 (G),
Z Z Z
\
(Rx f )(ξ) = hy, ξif (yx) dy = hyx , ξif (y) dy = hx, ξi hy, ξif (y) dy = hx, ξi fb(ξ),
−1

hence, Z Z
ϕ(ξ) hx, ξi fb(ξ) dξ = \
ϕ(ξ)(R x f )(ξ) dξ = 0.

Since ϕfb ∈ L1 (G),


b dµ(ξ) := ϕ(ξ)fb(ξ) dξ is a complex Radon measure on G. b We now have
R
hx, ξi dµ = 0 for all x, so by 16.6.11, µ is the zero measure. This implies that ϕfb = 0 a.e.
Since this holds for all f ∈ L1 (G) ∩ L2 (G), it follows from 16.6.13 that ϕ = 0 a.e. on each
compact subset of G.b Therefore,
Z
b
(ϕ | g) = ϕg = 0 for all g ∈ Cc (G).

b is dense in L2 (G)
Since Cc (G) b (7.1.2), ϕ = 0 a.e. on G.
b Therefore, T is surjective.

We shall use the notation fb to indicate the image of f ∈ L2 (G) under the unitary
transformation of the theorem. By the unitary property we have Parseval’s formula
Z Z
f (x)g(x) dx = fb(ξ)b
g (ξ) dξ f, g ∈ L2 (G). (16.14)

b is an orthonormal basis for


16.6.19 Corollary. If G is compact with |G| = 1, then G
2
L (G).
b is orthonormal. To show that G
Proof. By 16.6.16, G b is complete, let f ∈ L2 (G) such that
R
b Then fb(ξ) = f (x)hx,
f ⊥ G. [ ξi dx = 0 for all ξ. Since f 7→ fb is an isometry, f = 0.

The Pontrjagin Duality Theorem

We showed earlier that the mapping (x, ξ) 7→ hx, ξi : G × Gb → T is jointly continuous.


b
Moreover, by definition of multiplication on G, for each x ∈ G hx, ξζi = hx, ξi hx, ζi. Thus
b that is, x b
b . Moreover, the mapping
b := hx, ·i is a character on G,
the function x b∈G
b
b , Φ(x) = x
Φ:G→G b.
is a group homomorphism, since hξ, x
cyi = hxy, ξi = hx, ξi hy, ξi = hξ, x
bi hξ, ybi . Since Φ is 1-1
b
b and that Φ is a
(16.6.15), Φ a group isomorphism. In this section we show that Φ(G) = G
homeomorphism. For this we need the following:
16.6.20 Lemma. If K is a proper closed subset of G, then there exist φi ∈ Cc (G) such that
ϕ1 ∗ ϕ2 ≥ 0, ϕ1 ∗ ϕ2 6= 0, and ϕ1 ∗ ϕ2 = 0 on K.
Proof. Choose x ∈ K c and a symmetric neighborhood U of e such that xU U ⊆ K c . Let
ϕi ∈ Cc+ (G) such that supp ϕ1 ⊆ xU andR supp ϕ2 ⊆ U . Then ϕ1 ∗ ϕ2 has the required
properties. For example, from ϕ1 ∗ ϕ2 (y) = xU ϕ1 (z)ϕ2 (z −1 y) dz we see that if y ∈ K, then
y 6∈ xU U , hence the integrand is zero over xU .
Analysis on Locally Compact Groups 421
b then φ1 ∗ φ2 = fb for some f ∈ L1 (G).
16.6.21 Lemma. If φi ∈ Cc (G),
Proof. Set φ := φ1 ∗ φ2 . For x ∈ G, define
Z Z
fj (x) = hx, ξi φj (ξ) dξ and f (x) = hx, ξi φ(ξ) dξ.

b fj , f are in the linear span of P(G) (16.6.12). Moreover,


Since φj (ξ) dξ, φ(ξ) dξ ∈ Mra (G),
1 2
for any g ∈ L (G) ∩ L (G) we have
Z ZZ Z
fj g = hx, ξi φj (ξ)g(x) dξ dx = φj gb.
G

By the CBS inequality, the absolute value of the last term is ≤ kφj k2 kgk2 . Since g was
arbitrary and L1 (G) ∩ L2 (G) is dense in L2 (G), fj ∈ L2 (G). Since
ZZ ZZ
f (x) = hx, ξi φ1 (ξζ −1 )φ2 (ζ) dζ dξ = hx, ξζi φ1 (ξ)φ2 (ζ) dξ dζ = f1 (x)f2 (x),
R
f ∈ L1 (G). Since also f ∈ span P(G), by the inversion theorem f (x) = hx, ξi fb(ξ) dξ. Thus
Z Z
hx, ξi φ(ξ) dξ = f (x) = hx, ξi fb(ξ) dξ for all x ∈ G

and so by 16.6.11, fb = φ.
16.6.22 Theorem (Pontrjagin). The mapping is Φ : x 7→ hx, ·i a homeomorphism and
b
b.
group isomorphism from G onto G
b
Proof. Let xα → x in G. Since hxα , ξi → hx, ξi uniformly in ξ on compact subsets of G
b b
b . Conversely, let Φ(xα ) → Φ(x) in G
b . By 16.6.5,
(16.6.4), Φ(xα ) → Φ(x) in G

hxα , ξi = hξ, x
bα i → hξ, x
bi = hx, ξi

uniformly in ξ on compact subsets of G.b Thus for all f ∈ span Pc (G),


Z Z
f (xα ) = hxα , ξi fb(ξ) dξ → hx, ξi fb(ξ) dξ = f (x).

Since span Pc (G) is dense in Cc (G) (16.5.5), f (xα ) → f (x) for all f ∈ Cc (G). This implies
that xα → x. Otherwise, there would exist a compact neighborhood U of x such that xα is
frequently in U c , and we obtain a contradiction by choosing f ∈ Cc (G) such that f (x) = 1
and f = 0 on U c . Therefore, Φ a homeomorphism of G onto Φ(G).
b
b . Now, because Φ is a homeomorphism and group
It remains to show that Φ(G) = G
b
b and hence is closed (16.1.2). Suppose
isomorphism, Φ(G) is a locally compact subgroup of G
b
b . By 16.6.20, there exists a nonzero convolution ϕ := ϕ1 ∗ϕ2
for a contradiction that Φ(G) $ G
b
b ) that vanishes identically on Φ(G). By 16.6.21, ϕ = fb for some
of functions ϕj ∈ Cc (G
1 b
f ∈ L (G). In particular, for all x ∈ G,
Z Z
0 = ϕ(bx) = hξ, x
bi f (ξ) dξ = hx, ξi f (ξ) dξ.
G
b G
b

b
b,
But then by 16.6.11, f = 0, producing the contradiction ϕ ≡ 0. Therefore, Φ(G) = G
completing the proof.
422 Principles of Analysis

Here is an application, the analog of 16.6.9.

16.6.23 Theorem. If H is a closed subgroup of G, then H b is topologically isomorphic to


b ⊥ and H
G/H b =G b .
H

Proof. Let Q : G b → G/H
b ⊥ denote the quotient map. By 16.6.9 applied to G/H
b ⊥ , G/Hb ⊥b
is topologically isomorphic to H ⊥⊥ under the mapping F 7→ F ◦ Q. By 16.6.10, H is
⊥⊥
topologically
 isomorphic to H under duality. Therefore, H is topologically isomorphic
b ⊥
to G/H bunder a mapping x 7→ Fx , where Fx ◦ Q = x b is topologically
b. It follows that H

isomorphic to G/H b ⊥ b b under a mapping ζ 7→ Tζ , where ζ(x) = Tζ (Fx ). By the duality
[ b [
theorem, Tζ = Q(ξ) for some ξ ∈ G. Thus ζ(x) = Q(ξ)(Fx ) = Fx (Q(ξ)) = ξ(x) (x ∈ H),
that is, ζ = ξ H .
Chapter 17
Analysis on Semigroups

In this chapter we study representations of semigroups with a topology. Some of the results
here rely on, and indeed may be be seen as extensions of, results in the Fourier analysis of
groups discussed in the last chapter. In particular, compact topological groups and unitary
representations play a central role.
Much of the material in this chapter is based on the papers [10], [11] and [16]. Gener-
alizations and additional material, as well as detailed references, may be found in [4] and
[41].

17.1 Semigroups with Topology


The underlying object of study in this chapter is the semitopological semigroup,
defined as a semigroup S with a topology relative to which multiplication (s, t) → st is
separately continuous. As the structure of a semitopological semigroup is not as rich as
that of a locally compact topological group, one must rely on extrinsic techniques, notably
functional analytic. In particular, the Gelfand theory applied to various C ∗ -algebras of
functions on S is an important tool. For this we shall initially rely on the left and right
translation operators on Cb (S). These are defined exactly as in the group case, as are the
notions of left translation invariance, right translation invariance, and translation invariance
of subsets of Cb (S). Note that the hypothesis of separate continuity of multiplication in S
implies that Cb (S) is itself translation invariant. The left and right translates LS f and RS f
of f are left and right translation invariant, respectively, as may be seen from the inclusions
Lt LS f = LSt f ⊆ LS f and Rt RS f = RtS f ⊆ RS f.
Much of the material in the chapter depends on the following result obtained in 13.4.2.
Given a unital C ∗ -subalgebra F of Cb (S) with spectrum S F , the evaluation mapping sb on
F defined by
sb(f ) := f (s), f ∈ F
is member S F , and the canonical mapping
ι = ιF : S → S F , ι(s) = sb,

is continuous with dense range and satisfying ι∗ C S F = F.
In the sequel we shall also need the notions of semitopological group and topological
semigroup. The former is a group with separately continuous multiplication (inversion is
not assumed to to be continuous) and the latter is a semigroup with a topology relative to
which multiplication is jointly continuous.

For the remainder of the chapter, unless otherwise stated,


S denotes an arbitrary semitopological semigroup.

423
424 Principles of Analysis

17.2 Weakly Almost Periodic Functions


Definition and Basic Properties
A function f ∈ Cb (S) is said to be weakly almost periodic if RS f is relatively weakly
compact in Cb (S). The set of all weakly almost periodic functions on S is denoted by
W AP (S). For example, if S is a compact semitopological semigroup, then W AP (S) = C(S),
as may be seen from the pointwise continuity of the mapping s → Rs f and 14.1.4.
17.2.1 Theorem. W AP (S) is a translation invariant unital C ∗ -algebra of Cb (S).
Proof. The relations

RS (cf + g) ⊆ cRS f + RS g, Rs f = Rs f , Rs RS f = RsS f ⊆ RS f, RS Ls f = Ls RS f, (17.1)

show that W AP (S) is a conjugate closed translation invariant subspace of Cb (S). Since
RS (f g) ⊆ (RS f )(RS g), to show that W AP (S) is an algebra it suffices to prove that the
product AB = {f g : f ∈ A, g ∈ B} of weakly compact subsets of Cb (S) it weakly compact.
Now, by 13.4.2, Cb (S) is (canonically) isometrically isomorphic to C(βS), where βS is the
spectrum of Cb (S). The images of A and B in C(βS) are then weakly compact in C(βS),
and the assertion follows easily from the equivalence of pointwise and weak compactness in
C(βS) (14.1.1).
To show that W AP (S) is closed, let fn ∈ W AP (S) and fn → f ∈ Cb (S). We show that an
arbitrary sequence (Rsk f ) of right translates of f has a weakly convergent subsequence. It will
follow from the Eberlein-S̆mulian theorem that RS f is relatively weakly compact, as required.
Now, since each RS fn relatively weakly compact, a standard diagonal argument produces a
subsequence (tk ) of (sk ) and a sequence (gn ) in Cb (S) such that gn = w-limk Rtk fn for all
n. For any ϕ ∈ Cb (S)0 with kϕk ≤ 1 we then have

|ϕ(gn ) − ϕ(gm )| = lim |ϕ Rtk (fn − fm ) | ≤ kfn − fm k∞ ,
k

hence kgn − gm k∞ ≤ kfn − fm k∞ . Thus (gn ) converges in norm to some g ∈ Cb (S). The
inequality

|ϕ(Rsk f − g)| ≤ |ϕ(Rsk f − Rsk fn )| + |ϕ(Rsk fn − gn )| + |ϕ(gn − g)|


≤ kfn − f k + |ϕ(Rsk fn − gn )| + kgn − gk
w
then shows that ϕ(Rsk f ) → ϕ(g). Therefore, Rsk f → g, as required.
17.2.2 Proposition. Let T be a semitopological
 semigroup and θ : S → T a continuous
homomorphism. Then θ∗ W AP (T ) ⊆ W AP (S), where θ∗ : C(T ) → C(S) is the dual map.
In particular, if S is a subsemigroup of T , then W AP (T )|S ⊆ W AP (S).
∗ ∗
Proof. The
 first assertion follows from the weak continuity of θ and the identity Rs θ (g) =

θ Rθ(s) g . For the second, take θ to be the inclusion map.

The Dual of the Space of Weakly Almost Periodic Functions


We now give the dual space W AP (S)0 a multiplication that makes it a Banach algebra.
For this we introduce the following operators on W AP (S): Given ϕ ∈ W AP (S)0 define
Rϕ : W AP (S) → B(S) by

(Rϕ f )(s) = ϕ(Ls f ), f ∈ W AP (S), s ∈ S.


Analysis on Semigroups 425

Then Rϕ is obviously linear and satisfies

kRϕ k ≤ kϕk , Rbt = Rt , and Lt Rϕ = Rϕ Lt .

Additional properties of Rϕ are given in the following lemma.


17.2.3 Lemma. Let C10 denote the closed unit ball in W AP (S)0 , M := M (W AP (S)) the
set of means on W AP (S), and X := S W AP the spectrum of W AP (S). Let f ∈ W AP (S).

(a) RX f is the weak closure of RS f and is weakly compact.

(b) RM f is the weakly closed convex hull of RS f and is weakly compact.

(c) RC10 f is the weakly closed convex balanced hull of RS f and is weakly compact.

(d) Rϕ W AP (S) ⊆ W AP (S) for all ϕ ∈ W AP (S)0 .

(e) The mapping ϕ 7→ Rϕ f from W AP (S)0 into W AP (S) is weak∗ -weak continuous.

Proof. For part (a), recall that X is the weak∗ closure of the set of mappings sb : f → f (s)
(13.4.2). Now let ϕ ∈ X and let tbα → ϕ in the weak∗ topology of W AP (S)0 . Then for each
s ∈ S,
Rtα f (s) = tbα (Ls f ) → ϕ(Ls f ) = Rϕ f (s),
that is, Rtα f → Rϕ f , pointwise. Since RS f is relatively weakly compact, the convergence is
also in the weak topology, proving (a).
s : s ∈ S} is weak∗ -dense in M .
For (b), recall from 10.2.7 that the convex hull of the set {b
P α α P
c
Thus for each ϕ ∈ M there exists a net of convex sums j cj sj such that j cα α
j g(sj ) → ϕ(g)
for all g ∈ W AP (S). Taking g = Ls f we have the pointwise convergence
X X

j Rsα
j
f (s) = cα α
j f (ssj ) → ϕ(Ls f ) = Rϕ f (s), s ∈ S.
j j

By the Krein-S̆mulianP theorem, the convex hull of RS is relatively weakly compact, hence
the convergence of j cα j Rsj f is also in the weak topology. Thus RM f is the weak closure
α

of the convex hull of RS and hence is weakly compact, proving (b). Similar arguments prove
(c) (see 10.30).
Part (d) follows from (c) and right translation invariance of W AP (S).
For (e), it suffices by 10.2.10 to prove that the restriction of the mapping to C10 is
weak∗ -weak continuous. But this follows because the mapping ϕ 7→ Rϕ f : C10 → W AP (S)
is w∗ -pointwise continuous and the range RC10 f is weakly compact by (c).

By (c) of the lemma, we have


17.2.4 Corollary. Let F be a translation invariant, norm closed, conjugate closed linear
subspace of W AP (S). Then Rϕ F ⊆ F for all ϕ ∈ W AP (S)0 .
Now define multiplication (ϕ1 , ϕ2 ) 7→ ϕ1 · ϕ2 in W AP (S)0 by

(ϕ1 · ϕ2 )(f ) = ϕ1 (Rϕ2 f ), f ∈ W AP (S). (17.2)

The calculation Rϕ1 ·ϕ2 f (s) = ϕ1 · ϕ2 (Ls f ) = ϕ1 (Rϕ2 Ls f ) = ϕ1 (Ls Rϕ2 f ) = Rϕ1 (Rϕ2 f )(s)
shows that

Rϕ1 ·ϕ2 = Rϕ1 Rϕ2 . (17.3)


426 Principles of Analysis

17.2.5 Theorem. Under multiplication defined in (17.2), W AP (S)0 is a Banach algebra


and a semitopological semigroup in the weak∗ topology. Moreover, in the notation of 17.2.3,
C10 , M , and S W AP are weak∗ compact subsemigroups of W AP (S)0 .
Proof. Associativity follows from (17.3):

(ϕ1 · ϕ2 ) · ϕ3 (f ) = (ϕ1 · ϕ2 )(Rϕ3 f ) = ϕ1 (Rϕ2 Rϕ3 f ) = ϕ1 (Rϕ2 ·ϕ3 f ) = ϕ1 · (ϕ2 · ϕ3 )(f ).

It is immediate from the definition of multiplication that (ϕ1 + ϕ2 ) · ϕ3 = ϕ1 · ϕ3 + ϕ2 · ϕ3 .


Since Rϕ2 +ϕ3 = Rϕ2 + Rϕ3 , we also have ϕ1 · (ϕ2 + ϕ3 ) = ϕ1 · ϕ2 + ϕ1 · ϕ3 . The remaining
properties of normed algebra multiplication are clear.
That W AP (S)0 is a semitopological semigroup in the weak∗ topology may be seen directly
from the definition of multiplication by applying 17.2.3(e). The last assertions of the theorem
follow from the fact that sets M , C10 , and S W AP are closed under multiplication ϕ1 · ϕ2 .
Indeed, this is obviously the case for ϕj = sbj , and taking various iterated weak∗ limits proves
the general case.

The Weakly Almost Periodic Compactification

The following theorem summarizes the general properties of the spectrum S W AP of


W AP (S) and the canonical map ι = ιW AP : S → S W AP , ι(s) = sb.
17.2.6 Theorem. S W AP is a compact semitopological semigroup and  ι is a continuous
homomorphism onto a dense subset of S W AP such that ι∗ C(S W AP ) = W AP (S).
Proof. By 13.4.2, ι is a continuous mapping onto a dense subset of S W AP such that the
asserted equality holds. Moreover, for f ∈ W AP (S) and s1 , s2 ∈ S we have

ι(s1 ) · ι(s2 )(f ) = ι(s1 )(Rι(s2 ) f ) = (Rι(s2 ) f )(s1 ) = ι(s2 )(Ls1 f ) = f (s1 s2 ) = ι(s1 s2 )(f ),
hence ι is a homomorphism.
The pair (ι, S W AP ) is called the weakly almost periodic compactification of S. A
key feature of this compactification is the following extension property:
17.2.7 Theorem. Given a continuous homomorphism θ from S into a semitopological
semigroup T , there exists a continuous homomorphism θe : S W AP → T W AP such that the
following diagram commutes:
θe
S W AP −−−−→ T W AP
x x
ι
 ι

θ
S −−−−→ T

Proof. By 17.2.2, θ maps W AP (T ) into W AP (S). Thus the assertion is an immediate
consequence of 13.4.2.
17.2.8 Corollary. A function f ∈ Cb (S) is weakly almost periodic iff LS f is relatively
weakly compact.
Proof. Let f ∈ W AP (S). Then the mapping x 7→ Lx fb on X := S W AP is pointwise
continuous hence weakly continuous by 14.1.4. Therefore LX fb is weakly compact in C(X) and
so LS f is relatively weakly compact in W AP (S). The converse may be proved by considering
the reverse semigroup obtained from S by reversing the order of multiplication.
Since a compact semitopological semigroup is its own W AP compactification, we have
Analysis on Semigroups 427

17.2.9 Corollary. If T is a compact semitopological semigroup and θ : S → T is a


continuous homomorphism, then there exists a continuous homomorphism θe : S W AP → T
such that θe ◦ ιW AP = θ.
17.2.10 Corollary (Eberlein). If G is a locally compact group, then every member of
W AP (G) is uniformly continuous.
Proof. Let X = GW AP and f ∈ W AP (G). We show that the maps s 7→ Ls f and s 7→ Rs f
are norm continuous at e.
Since the function F (x, y) = fb(xy) is separately continuous, there exists a point x0 ∈ X
such that F is jointly continuous at all points of {x0 } × X (B.0.14). By compactness
of X, x 7→ Lx fb is norm continuous at x0 (B.0.9). Therefore, the set U := {x ∈ X :
kLx fb − Lx0 fbk∞ < ε/2} is open in X. Since ι(G) is dense in X, V := ι−1 (U ) is nonempty.
Choose any t ∈ V . Then V t−1 is a neighborhood of e in G, and for s ∈ V t−1 we have

kLs f − f k∞ = kLst f − Lt f k∞ ≤ kLst b b b


b f − Lx0 f k∞ + kLx0 f − Lb
b
t f k∞ < ε.

Thus s 7→ Ls f is norm continuous at e. Similarly s 7→ Rs f is norm continuous at e.


The converse of 17.2.10 is false. (See example below.)

17.2.11 Corollary. If G is a noncompact, locally compact group, then C0 (G) ⊆ W AP (G).


In particular, if G is abelian, then the Fourier transform fb of f ∈ L1 (G) is weakly almost
periodic.
Proof. Let X = G ∪ {∞} denote the one-point compactification of G. Extend multiplication
in G to X by defining x · ∞ = ∞ · x = ∞ for all x. If sα , s ∈ G, and sα → ∞, then for any
compact K ⊆ G, eventually ssα 6∈ K, hence ssα → ∞. Similarly, sα s → ∞. Thus X is a
compact semitopological semigroup. By 17.2.9, the inclusion map θ : G ,→ X extends to a
continuous homomorphism θe : GW AP → X with θ = θe ◦ ιW AP . Considering dual maps we
then have
  
C0 (G) = θ∗ C(X) = (ι∗W AP ◦ θe∗ ) C(X) ⊆ ι∗W AP C(GW AP ) = W AP (G).

17.2.12 Corollary. The coefficients of a unitary representation π : G → B(H) of a locally


compact group G are weakly almost periodic. In particular, every continuous function of
positive type is weakly almost periodic.
Proof. Let π : G → B(H) be a unitary representation. Now, B(H) is a semitopological
semigroup in the weak operator topology, that is, the operation (T, S) 7→ T S of composition
in continuous in this topology. Indeed, if Tα → T in that topology, then for all x and x0 ,

hTα Sx, x0 i → hT Sx, x0 i and hSTα x, x0 i = hTα x, S 0 x0 i → hT x, S 0 x0 i = hST x, x0 i ,

hence Tα S → T S and STα → ST in the weak operator topology. Thus, by the extension
theorem, there exists a continuous homomorphism π e : GW AP → B(H). Then any coefficient
(π(·)x | y) extends to a continuous function (eπ (·)x | y) on GW AP and so is weakly almost
periodic. The last assertion follows from 16.5.7.
From 17.2.6 we see that if f ∈ W AP (S) then |f | ∈ W AP (S). The converse is false:

17.2.13 Example. Let f (x) = tan−1 (x). By 17.2.11, |f | is weakly almost periodic on
S := (R, +). On the other hand, while f is uniformly continuous, it is not weakly almost
periodic. To see the latter, choose a subnet (nα ) of the sequence (1, 2, . . .) such that the
428 Principles of Analysis

limits x = limα ι(nα ) and y = limα ι(−nα ) exist in S W AP . If f ∈ W AP (S) we then have
the contradiction
π
− = lim lim f (nα − nβ ) = lim lim fb(ι(nα )ι(−nβ )) = lim fb(ι(nα )y) = f (xy), and
2 α β α β α
π b
= lim lim f (nα − nβ ) = lim lim f (ι(nα )ι(−nβ )) = lim fb(xι(−nα )y) = f (xy). ♦
2 β α β α β

Invariant Means on Weakly Almost Periodic Functions


A mean m on a translation invariant,
 conjugate closed, unital subspace F of Cb (S) is
said to be left invariant if m Ls f = m(f ) for all f ∈ F and s ∈ S. The notion of right
invariant mean is defined analogously. The set of left (respectively, right) invariant means
is denoted by M` (F) (respectively, Mr (F)). A member of the intersection M` (F) ∩ Mr (F)
is called an invariant mean. It is easy to check that, if nonempty, these sets are convex and
weak∗ closed. For example, if S is commutative, then M (Cb (S)) 6= ∅ (8.6.2), and the same
holds if S is a compact Hausdorff topological group, since then the integral with respect to
normalized Haar measure is an invariant mean.
Note that a mean m on W AP (S) induces a mean on C(S W AP ), which in turn induces a
probability measure µ on S W AP :
Z
m(f ) = fb(x) dµ(x), f ∈ W AP (S).
S W AP

Thus for y ∈ S W AP ,
Z Z Z
m(Ry f ) = \
(Ry f )(x) dµ(x) = x(Ry f ) dµ(x) = (xy)(f ) dµ(x)
W AP S W AP S W AP
ZS
= fb(xy) dµ(x),
S W AP

from we conclude that m ∈ Mr iff µ is a right invariant measure on S W AP . The left version
follows by considering the reverse semigroup of S.
The following result makes a connection between the invariance of means on W AP (S)
and multiplication in the Banach algebra W AP (S)0 .
17.2.14 Proposition. Let m be a mean on W AP (S). Then m ∈ M` (m ∈ Mr ) iff ϕ·m = m
(m · ϕ = m) for all ϕ ∈ W AP (S)0 .
Proof. For any f ∈ W AP (S) and s ∈ S,
s · m)(f ) = sb(Rm f ) = (Rm f )(s) = m(Ls f ) and (m · sb)(f ) = m(Rsbf ) = m(Rs f ).
(b
Therefore, m ∈ M` (m ∈ Mr ) iff sb · m = m (m · sb = m) for all s ∈ S. The desired equivalence
then follows by taking suitable limits, noting that C10 is the weak∗ -closed convex balanced
hull of δS (Ex. 10.30).
17.2.15 Corollary. If W AP (S) has a left invariant mean and a right invariant mean, then
it has an invariant mean.
Proof. Let m` be a left invariant mean, mr a right invariant mean, and set m := mr · m` .
By 17.2.14,
ϕ · m = (ϕ · mr ) · m` = m` = mr · m` = m
and similarly
m · ϕ = mr · (m` · ϕ) = mr = mr · m` = m,
hence m is an invariant mean.
Analysis on Semigroups 429

17.2.16 Theorem. W AP (S) has a left invariant mean iff for each f ∈ W AP (S) the set
C(f ) := clw co(RS f ) contains a constant function. The analogous assertion holds for right
invariant means.
Proof. By 17.2.3(b), C(f ) = RM f . If m is a left invariant mean for W AP (S), then
Rm (f )(s) = m(Ls ) = m(f ) for all s, hence Rm f is the required constant function.
Conversely, assume that RM f contains a constant function Rµf f for each f ∈ W AP (S).
Then for each s ∈ S, Rµf (Ls f ) = Ls Rµf f = Rµf f , hence the set

M (f, s) := {µ ∈ M : Rµ (f − Ls f ) = 0}

is nonempty. Furthermore, M (f, s) is weak∗ compact and M (W AP (S)) · M (f, s) ⊆ M (f, s),
as may be seen from Rm·µ = Rm Rµ . It follows by induction that
n
\
M (fj , sj ) 6= ∅, fj ∈ W AP (S), sj ∈ S.
j=1

Tn−1 Tn
Indeed, if µ ∈ j=1 M (fj , sj ) and ν ∈ M (Rµ fn , sn ), then ν · µ ∈ j=1 M (fj , sj ). Thus
the sets M (f, s) have the finite intersection property, so by compactness their intersection
contains a point η. Then η 2 (Ls f ) = η(Rη Ls f ) = η(Rη f ) = η 2 (f ) for all f ∈ W AP (S) and
s ∈ S, hence η 2 is a left invariant mean.
17.2.17 Corollary. If S is a semitopological group, then W AP (S) has an invariant mean.
Proof. RS restricted to C(f ) is a group of weakly continuous, noncontracting affine maps
from C(f ) into itself. By the Ryll-Nardzewski fixed point theorem, C(f ) has a fixed point g.
Thus g(st) = g(s) for all s and t. Taking s to be the identity of S shows that g is a constant
function. By the theorem, W AP (S) has a left invariant mean. A similar argument shows
that W AP (S) has a right invariant mean. By 17.2.15, W AP (S) has an invariant mean.

17.3 Almost Periodic Functions


Definition and Basic Properties
A member f of Cb (S) is said to be almost periodic if RS f is relatively compact in
the norm topology of Cb (S). The set of all almost periodic functions is denoted by AP (S).
For example, the characters of a locally compact abelian group are almost periodic, as is
easily established. In particular, the function n 7→ ein is almost periodic on (Z, +). On the
2
other hand, the function f (n) = ein is not almost periodic on (Z, +). Indeed, by a result
of Dirichlet, 1 eiN is dense in T, hence we may choose a sequence (nk ) in N with eink → 1
2 2 2 2
and eink → c ∈ T. Then the right translations ei(n+nk ) = ei(n +2nnk +nk ) tend pointwise to
2
cein . If the convergence were uniform in n ∈ Z, then e2innk would converge uniformly to 1
and so z 2nk would converge uniformly to 1 in z ∈ T, or, equivalently, e2nk ti would converge
uniformly to 1 in t ∈ R. But this is impossible, since e2nk tk i → −1, where tk = π/(2nk ).
Clearly AP (S) ⊆ W AP (S). If S is a compact topological semigroup, then joint continuity
of multiplication in S implies that AP (S) = C(S), hence AP (S) = W AP (S). On the other
hand, we have
1 See, for example [], Example 8.3.9.
430 Principles of Analysis

17.3.1 Proposition. If G is a locally compact group, then C0 (G)\{0} ⊆ W AP (G)\AP (G),


hence AP (G) $ W AP (G).
Proof. We have already seen that C0 (G) ⊆ W AP (G) (17.2.11). Let f ∈ C0 (G) ∩ AP (G)
and choose a net sα → ∞. By almost periodicity, we may suppose that kRsα f − gk∞ → 0
for some g ∈ Cb (G). Given ε > 0, choose α0 so that |f (ssα ) − g(s)| < ε for all α ≥ α0 and
s ∈ G. Taking limits shows that |g(s)| < ε for all s and ε, hence g = 0. Thus |f (ssα )| < ε
for all α ≥ α0 and s. Replacing s by ss−1
α , we have |f (s)| < ε for all s and so f = 0.

17.3.2 Theorem. AP (S) is a translation invariant, unital C∗-subalgebra of Cb (S).


Proof. The relations in (17.1) show that AP (S) is a translation invariant, conjugate closed
subspace of Cb (S). Since multiplication in Cb (S) is a norm continuous operation, the product
of norm compact sets is norm compact, hence AP (S) is an algebra. Moreover, if fn ∈ AP (S)
and fn → f in Cb (S), then a straightforward total boundedness argument shows that
f ∈ AP (S). Therefore, AP (S) is closed in Cb (S).
17.3.3 Proposition. Let T be a semitopological semigroup and θ : S → T a continuous
homomorphism. Then θ∗ AP (T ) ⊆ AP (S), where θ∗ : C(T ) → C(S) is the dual map. In
particular, if S is a subsemigroup of T , then AP (T )|S ⊆ AP (S).

Proof. This follows from the norm continuity of θ∗ and Rs θ∗ (g) = θ∗ Rθ(s) g .

The Almost Periodic Compactification

The following theorem summarizes the general properties of the spectrum S AP of AP (S)
and the canonical map ι = ιAP : S → S AP , ι(s) = sb.
17.3.4 Theorem. S AP is a compact topological semigroup and ι : S → S AP is a continuous
homomorphism onto a dense subsemigroup such that ι∗ C(S AP ) = AP (S).
Proof. By 17.2.4, Rϕ AP (S) ⊆ AP (S), hence multiplication ϕ1 · ϕ2 is defined on AP (S)0 .
Thus, as in the W AP case, AP (S)0 is a semitopological semigroup in the weak∗ topology,
S AP is a compact semitopological semigroup, and ι is a continuous homomorphism onto a
dense subset of S AP . It remains only to show that multiplication in S AP is jointly continuous.
Let f ∈ AP (S). By the relative norm compactness of RS f , the map ϕ → Rϕ f on C10 is
w∗ -norm continuous. It follows that ϕ1 · ϕ2 (f ) = ϕ1 (Rϕ2 f ) is jointly continuous in (ϕ1 , ϕ2 )
on C10 and the conclusion follows.
Note that the theorem implies that LS f is relatively compact for all f ∈ AP (S). Thus
the notions of right almost periodicity and left almost periodicity coincide.
The pair (ι, S AP ) is called the almost periodic compactification of S. Analogous to
the weakly almost periodic case we have the following extension property, which is immediate
from 17.3.3.
17.3.5 Theorem. For each continuous homomorphism θ from S into a semitopological
semigroup T , there exists a continuous homomorphism θe : S AP → T AP such that the
following diagram commutes:
θe
S AP −−−−→ T AP
x x
ι
 ι
θ
S −−−−→ T
Since a compact topological semigroup is its own AP compactification, we have
17.3.6 Corollary. For each continuous homomorphism θ from S into a compact topological
semigroup T , there exists a continuous homomorphism θe : S AP → T such that θe ◦ ιAP = θ.
Analysis on Semigroups 431

17.4 The Structure of Compact Semigroups


For deeper results we need to determine the algebraic structure of compact semitopological
semigroups. This structure is based largely on the existence of idempotents and properties
of closed ideals. We begin with an important result on joint continuity of multiplication in a
compact semitopological group.

Ellis’s Theorem
17.4.1 Theorem (Ellis). A compact Hausdorff semitopological group G is a topological
group.
Proof. To establish joint continuity of multiplication, it is enough to show that multiplication
is continuous at each point of {e} × G. Indeed, if xα → x and yα → y, then, by separate
continuity, x−1 xα → e, hence if multiplication is continuous at (e, y), then (x−1 xα )yα → y
and so xα yα → xy.
Fix y ∈ G. To verify continuity of multiplication at (e, y), we show first that for each
x ∈ G with x 6= y there are neighborhoods Nx of e, Ux of x, and Vx of y such that
(Nx Vx ) ∩ Ux = ∅. To see this, let g ∈ C(G) with ran g ⊆ [−1, 1], g(y) = 0 and g(x) 6= g(y).
Define f : G × G → [−1, 1] by f (s, z) = g(sz). By B.0.8, there exists a dense subset A of G
such that f is jointly continuous at every point of A × Y . Since {s ∈ G : f (s, x) 6= f (s, y)}
is open and nonempty, it contains a member s of A. Set

ε := |f (s, x) − f (s, y)| ( > 0 ).

By joint continuity of f at (s, y), there exist neighborhoods Nx of e in G and Vx of y in K


such that
|f (t, v) − f (s, y)| < ε/2 ∀ (t, v) ∈ sNx × Vx . (†)
Set
Ux := {u ∈ K : |f (s, x) − f (s, u)| < ε/2}. (‡)
Then Ux is a neighborhood of x in K. Now suppose that (Nx Vx ) ∩ Ux 6= ∅. Then there exists
t ∈ Nx and v ∈ Vx such that u := tv ∈ Ux . From (†) and (‡),

|f (s, x) − f (s, y)| ≤ |f (s, x) − f (s, tv)| + |f (s, tv) − f (s, y)|
= |f (s, x) − f (s, u)| + |f (st, v) − f (s, y)|
< ε/2 + ε/2 = ε,

contradicting the definition of ε. Therefore, (Nx Vx ) ∩ Ux = ∅, as claimed.


Now let W be a neighborhood
S T G \ W is compact,
of y. Since T there is a finite set F ⊆ G
such that G \ W ⊆ x∈F Ux . Set N := x∈F Nx and V := x∈F Vx . Then N and V are
neighborhoods of e and y, respectively, and N V ⊆ W . This completes the proof of joint
continuity of multiplication in G.
To verify continuity of inversion, Let xα → x ∈ G. By compactness, we may assume that
x−1 −1
α → y for some y ∈ G. Then xy = limα xα xα = e, hence y = x
−1
and so x−1
α →x
−1
.
Ellis has shown that the conclusion of the above theorem holds if the topology of G is
merely locally compact (and Hausdorff) [17], [18].
432 Principles of Analysis

Existence of Idempotents
An idempotent in a semigroup is an element e satisfying e2 = e. A semigroup need not
have an idempotent, as is the case, for example, for (1, ∞) under multiplication or addition.
However, in the compact case one always has idempotents:
17.4.2 Lemma. A compact Hausdorff semitopological semigroup X has an idempotent.
Proof. Order the collection of closed
T subsemigroups of X downward by inclusion. If C is a
chain of such semigroups, then C 6= ∅ by compactness. By Zorn’s lemma, X has a minimal
closed subsemigroup Y . Let e ∈ Y . Then eY is a closed subsemigroup of Y , hence eY = Y
by minimality. Choose y ∈ Y such that e = ey. The set Z = {z ∈ Y : ez = e} is then a
nonempty closed subsemigroup of Y and so Z = Y . In particular, e ∈ Z and so e2 = e.

Ideal Structure
A nonempty subset Y of a semigroup X is a left ideal if XY ⊆ Y . A left ideal is a
minimal if it properly contains no left ideal. Right ideals and minimal right ideals are
defined similarly. An ideal is a subset of X that is both a left ideal and a right ideal. An
ideal is a minimal if properly contains no ideal.
The left and right minimal ideal structures are given in the following theorems.

17.4.3 Theorem. Let X be a compact, Hausdorff semitopological semigroup.


(a) Minimal left (resp., right) ideals exist and are of the form Xe (resp., eX), where e is
an idempotent.

(b) Distinct minimal left (right) ideals are disjoint.

(c) If R is a minimal right ideal and L is a minimal left ideal, then RL is a topological
group. If e is the identity of RL, then RL = eXe.
Proof. (a) We prove the left case. A Zorn’s lemma argument in the spirit of the proof of
17.4.2 shows that minimal closed left ideals L exist. If L0 is left ideal contained in L and
x ∈ L0 , then Xx is a closed left ideal contained in L0 , which forces Xx = L0 = L. Therefore,
all minimal left ideals are closed. Taking x to be an idempotent completes the proof of (a).
(b) Let L1 and L2 be distinct minimal left ideals. Then L1 and L2 are disjoint; otherwise,
by minimality, L1 = L1 ∩ L2 = L2 .
(c) Clearly RL ⊆ R ∩ L. Since LRL ⊆ L, (RL)(RL) = R(LRL) ⊆ RL, hence RL is
a semigroup. We show next that RL is a group. Let t ∈ RL. Then Lt ⊆ L, hence, by
minimality, Lt = L. Therefore, RLt = RL for all t ∈ RL. Similarly, tRL = RL for all
t ∈ RL. Let e ∈ RL such that et = t. If s ∈ RL, there exist x ∈ RL such that s = tx,
hence es = etx = tx = s. Similarly there exists e0 ∈ RL such that se0 = s for all s ∈ RL.
Then e = ee0 = e0 , so e is an identity for RL. To see that every t ∈ RL has an inverse,
choose y, z ∈ RL such that yt = e = tz. Then z = ez = ytz = ye = y. Therefore, RL is a
group. Since e ∈ L, Xe ⊆ L and so Xe = L by minimality. Similarly eX = R. Therefore,
RL = eXXe ⊆ eXe ⊆ RL, so RL = eXe. Finally, by 17.4.1, eXe is a compact topological
group.
17.4.4 Theorem. Let X be a compact Hausdorff semitopological semigroup and let K =
K(X) be the union of all minimal left ideals. Then K is also the union of all minimal right
ideals and is an ideal contained in every other ideal.

Proof. K is obviously a left ideal. Let Xe be a minimal left ideal and s ∈ X. We claim
that the left ideal Xes is minimal. To see this, let L be a left ideal contained in Xes. Every
Analysis on Semigroups 433

member of L is of the form ys for some y ∈ Xe, hence the set {y ∈ Xe : ys ∈ L} is nonempty.
Since it is a left ideal it must equal Xe. Thus y ∈ Xe ⇒ ys ∈ L, that is, Xes ⊆ L. Therefore,
Xes ⊆ K, so K is a right ideal and hence is an ideal.
Now, if I is any ideal in X and Xe is a minimal left ideal, then IXe is a left ideal
contained in Xe and so IXe = Xe. Since also IXe ⊆ I, Xe ⊆ I. Therefore, K ⊆ I, so K
is contained in every ideal of X. Similar arguments show that the union K 0 of all minimal
right ideals is an ideal contained in every ideal of X. Therefore K = K 0 .

17.4.5 Corollary. K is the union of disjoint, compact topological groups eXe, where
e2 = e ∈ K.
Proof. By minimality K 2 = K. But K 2 is the union of disjoint topological groups RL.
17.4.6 Corollary. X is a topological group iff it satisfies the left and right cancellation laws

xy = xz ⇒ y = z and yx = zx ⇒ y = z.

Proof. For the sufficiency, let e2 = e ∈ K. Then for any x ∈ X, eex = ex, hence ex = x and
so X = eX. Similarly X = Xe. Thus X = eXe, so X is a group.

17.4.7 Corollary. W AP (S) has an invariant mean iff K S W AP is a compact topological
group.

Proof. By the preceding, K S W AP is a compact topological group iff has S W AP has a
unique minimal right ideal and a unique minimal left ideal.
Let m be an invariant mean on W AP (S). If L1 and L2 are minimal left ideals of S W AP ,
then, choosing any ηj ∈ Lj , we have m = m · ηj ∈ L1 ∩ L2 (17.2.14), so L1 = L2 by (b) of
17.4.3. Therefore, X has aunique minimal left ideal. Similarly, X has a unique minimal
right ideal. Thus K S W AP is a compact  topological group.
Conversely, assume K = K S W AP is a compact topological group. Define a mean on
W AP (S) by Z
m(f ) = fb(x) dµ(x),
K
where µ is normalized Haar measure on K. Then m is invariant.

17.5 Strongly Almost Periodic Functions


Definition and Basic Properties
A unitary representation π of S on a Hilbert space H is defined exactly as in the case of a
locally compact group, the difference being that, while the operator π(s)−1 is defined for all s,
it may not be in the range of π. The space SAP (S) of strongly almost periodic functions
on S is defined as the closed linear span of the set of coefficients of finite dimensional unitary
representations of S. Since the unitary group in a finite dimensional space is compact,
SAP (S) ⊆ AP (S). Indeed, if π is a finite dimensional unitary representation of S on H and
if (sα ) is a net in S, then a subnet π(sβ ) converges to some unitary operator U and so for
x, y ∈ H,  
Rsβ (π(s)x | y) = π(sβ )x | π(s)−1 y → U x | π(s)−1 y
uniformly in s.
434 Principles of Analysis

For a compact Hausdorff topological group G, the Peter-Weyl theorem (16.5.18) implies
that SAP (G) = C(G), hence SAP (G) = AP (G). We show in 17.5.9 that
C0 (R+ ) \ {0} ⊆ AP (R+ , +) \ SAP (R+ , +), (17.4)
hence SAP (R+ , +) $ AP (R+ , +).
Our immediate goal is to show that SAP (S) is a unital C ∗ -subalgebra of Cb (S). For this
we need the following lemma.
17.5.1 Lemma. Let T be a compact topological semigroup and H a subgroup of T . Then
G := cl H is a topological group.
Proof. We show first that inversion may be extended to G. Given x ∈ G, let xα ∈ H
with xα → x. By compactness, we may assume that x−1 α → y for some y ∈ G. Then
xy = limα xα x−1
α = e. Similarly, zx = e for some z ∈ G. Therefore, G is a group. That
inversion in G is continuous is proved as in 17.4.1.
We may now prove
17.5.2 Theorem. SAP (S) is a translation invariant unital C ∗ -subalgebra of Cb (S).
Proof. Let π be a finite dimensional unitary representation of S on H. The relations
Rs (π(t)x | y) = (π(t)π(s)x | y) and Ls (π(t)x | y) = (π(t)x | π(s)∗ y) show that SAP (S)
is translation invariant. Furthermore, the proof that SAP (S) is closed under multiplication
is the same as in the proof of the Peter-Weyl theorem (16.5.18).
It remains to show that SAP (S) is conjugate closed. For this it suffices to show that
if x0 , y0 ∈ H, then f (s) := (y0 | π(s)x0 ) is a coefficient of some unitary representation.
The proof of this is similar to but somewhat more involved than the corresponding part of
the proof of the Peter-Weyl theorem. As in the latter, let F denote the finite dimensional
subspace of Cb (S) consisting of all functions gx defined by
gx (s) = (y0 | π(s)x) , s ∈ S, x ∈ H.
Since F is right translation invariant, the mapping s 7→ Rs is a continuous representation
of S on the space F. Since Rs gx = gUs x , Rs is surjective, hence invertible. Thus RS is
contained in a bounded group of operators on H and hence, by 17.5.1, is contained in a
compact group of such operators. Thus, by 16.5.17, there exists an inner product h· | ·i on
F relative to which the operators Rs are unitary. Since the closure of π(S) is a group, there
exist a sequence (sn ) in S such that π(sn ) → I. We may assume that the evaluation maps
sbn on F converge to a member of the dual space F 0 , which, by the Riesz representation
theorem, is given by a member gx1 of F. Thus
lim gx (sn ) = hgx | gx1 i , x ∈ H.
n

Since F is translation invariant, the limit relation holds for Rs gx as well. It follows that
f (s) = (y0 | π(s)x0 ) = lim (y0 | π(sn )π(s)x0 ) = lim Rs gx0 (sn ) = hRs gx0 | gx1 i ,
n n

which shows that f is a coefficient of the unitary representation R, completing the proof.
17.5.3 Proposition. Let T be a semitopological
 semigroup and θ : S → T a continuous
homomorphism. Then θ∗ SAP (T ) ⊆ SAP (S), where θ∗ : C(T ) → C(S) is the dual map.
In particular, if S is a subsemigroup of T , then SAP (T )|S ⊆ SAP (S).
Proof. This follows essentially from the fact that if π is a continuous, finite dimensional
unitary representation of T , then π◦θ is a continuous finite dimensional unitary representation
of S.
Analysis on Semigroups 435

The Strongly Almost Periodic Compactification

The following theorem summarizes the general properties of the spectrum S SAP of
SAP (S) and the canonical map ι = ιSAP : S → S SAP , ι(s) = sb.
17.5.4 Theorem. S SAP is a compact topological group
 and ι is a continuous homomorphism
onto a dense subsemigroup such that ι∗ C(S SAP ) = SAP (S).

Proof. Since SAP (S) ⊆ AP (S), S SAP is a topological semigroup. It remains to show that
S SAP is a group. For this we show that S SAP has the cancellation properties in 17.4.6. We
show that if yx = zx in S SAP , then fb(y) = fb(z) for all f ∈ SAP (S), where ι∗ (fb) = f .
It suffices to show this for f (s) = (Us x | y), where U is a continuous, finite dimensional,
unitary representation U of S. Let ι(sα ) → x. We may assume that Usα → V for some
unitary operator V . Let g be the coefficient g(s) = Us V −1 x | y . Then for all s
   
Rx gb ι(s) = gb ι(s)x = lim gb ι(ssα ) = lim Us Usα V −1 x | y = (Us x | y) = f (s),
α α

hence Rx gb = fb and so fb(y) = gb(yx) = gb(zx) = fb(z).

The pair (ι, S SAP ) is called the strongly almost periodic compactification of S. As
in the W AP and AP cases, we have the following extension property, which may be proved
using 17.5.3.
17.5.5 Theorem. For each continuous homomorphism θ from S into a semitopological
semigroup T , there exists a continuous homomorphism θe : S SAP → T SAP such that the
following diagram commutes:
θe
S SAP −−−−→ T SAP
x x
ι
 ι

θ
S −−−−→ T
Since a compact topological group is its own SAP compactification, we have
17.5.6 Corollary. For each continuous homomorphism θ from S into a compact topological
group T , there exists a continuous homomorphism θe : S SAP → T such that θe ◦ ιSAP = θ.
17.5.7 Corollary. If S is a group, then AP (S) = SAP (S).
Proof. By 17.5.1, S AP is a topological group. Applying 17.5.6 to T = S AP and θ = ιAP , we
obtain a continuous homomorphism θe : S SAP → S AP such that θe ◦ ιSAP = ιAP . Thus

AP (S) = ι∗AP C(S AP ) = ι∗SAP ◦ θe∗ C(S AP ) ⊆ SAP (S).

17.5.8 Corollary. Let W AP (S) have an invariant mean m. Then

W AP (S) = SAP (S) ⊕ W AP (S)0 ,

where
W AP (S)0 := {f ∈ W AP (S) : m(|f |) = 0}.
Moreover, W AP (S)0 is an ideal of the C ∗ -algebra W AP (S). In particular, these assertions
hold of S is a group or is commutative.
436 Principles of Analysis

Proof. By 17.4.7, the minimal ideal K = K S W AP is a compact topological group. We
denote the identity in S SAP by 1 and the identity of K by e, so that K = S W AP e = eS W AP e.
The map θ(s) = ιW AP (s)e from S into K is a continuous homomorphism, hence, by 17.5.6,
there exists a continuous homomorphism θ̄ : S SAP → K such that θ = θ̄ ◦ ιSAP . Therefore,

θ∗ C(K) = (ι∗SAP ◦ θ̄∗ )(C(K)) ⊆ SAP (S).

In particular, if f ∈ W AP (S), then the function Re f (s) = fb(ιW AP (s)e) = fb(θ(s)) is strongly
almost periodic. Therefore, Re W AP (S) ⊆ SAP (S). Now let g ∈ SAP (S) and choose

gb ∈ C S SAP such that g = ι∗SAP (b g ). If (ιW AP (tα )) → e, then ιSAP (tα ) = θe ◦ ιW AP (tα ) →
e = 1, so
θ(e) 
Re g(t) = gb tιSAP (tα ) → gb(t · 1) = g(t).
We have proved that Re is a projection from W AP (S) onto SAP (S). It remains to show
that ker Re = W AP0 (S) and that W AP (S)0 is an ideal of W AP (S). Now,
Z
m(|f |) = |fb(x)| dx, f ∈ W AP (S),
K

where dx is Haar measure on K. Thus m(|f |) = 0 iff fb(x) = 0 for all x ∈ K iff fb(xe) = 0
for all x ∈ S W AP iff Re f = 0. Therefore, ker Re = W AP0 (S). That W AP (S)0 is an ideal
follows from the inequality m(|f g|) ≤ kgk∞ m(|f |).

17.5.9 Corollary. AP (R+ , +) = AP (R, +) R+ ⊕ C0 (R+ ).
Proof. Set S = R+ . Since S is commutative, Cb (S) has an invariant mean m. By an obvious
modification of the preceding corollary,

AP (S) = SAP (S) ⊕ AP (S)0 , where AP (S)0 = {f ∈ AP (S) : m(|f |) = 0} = ker Re .


R
Here m(|f |) = K |fb(x)| dx, where fb ◦ ιAP (S) = f and dx is Haar measure on the compact
group K = S AP e. Thus we need to show that SAP (S) = AP (R)|S and AP (S)0 = C0 (S).
For the first equality, note that by 17.5.7 and 17.5.3, AP (R)|R+ ⊆ SAP (S). Now let
f ∈ SAP (S), f = ι∗SAP (fb). We show that f may be extended to a function g ∈ AP (R). To
this end, define ϕ : R → S SAP by
(
ι (t) if t ≥ 0,
ϕ(t) = SAP (S)
ιSAP (S) (−t)−1 if t < 0.
By considering cases, ϕ is easily seen to be a continuous homomorphism. By the extension
property, there exists a continuous homomorphism ϕ e : RAP → S SAP such that ϕ e ◦ ιAP = ϕ.
Then g := ϕ (f ) ∈ AP (R), and for t ≥ 0, g(t) = fb(ϕ(t)) = fb(ιSAP (t)) = f (t). Therefore,
∗ b

SAP (S) = AP (R)|S .


Now, trivially, C0 (S) ⊆ AP (S). We claim that e 6∈ ιAP (S) (S). Assuming this we may then
choose a net ιAP (S) (sα ) → e with sα → ∞, and so Re f = 0 iff f (s + sα ) → 0 iff f ∈ C0 (S).
To verify the claim, assume ιAP (S) (s0 ) = e for some s0 ∈ S. Then ιAP (S) (s0 + s0 ) = e2 =
e = ιAP (S) (s0 ), and since C0 (S) separates points of R+ , s0 + s0 = s0 and so s0 = 0. Thus
ιAP (S) (0) = e, which implies that S AP e = S AP . Therefore, S AP is a topological group
with identity e. Now, the one-point compactification [0, ∞] of [0, ∞) is easily seen to be
a topological semigroup under s + ∞ := ∞, hence there exists a continuous surjective
homomorphism ϕ : S AP → [0, ∞] such that ιAP (S) (s) = s for all s (17.3.6). In particular,
[0, ∞] is a group with identity 0, impossible.
Analysis on Semigroups 437

17.6 Semigroups of Operators


In this section we extend previous results based on the semigroup of operators RS to an
arbitrary semigroup of bounded linear operators on a Banach space X.

Definitions and Basic Properties


Let U be semigroup of operators on X, that is, a nonempty subset of B(X) closed under
operator composition. A point x ∈ X is said to be almost periodic (weakly almost
periodic) if the set Ux := {U x : U ∈ U} is relatively compact in the norm (weak) topology
of X.
17.6.1 Proposition. Let U be uniformly bounded. The sets Xa and Xw of almost periodic
and weakly almost periodic vectors in X are closed, U-invariant linear subspaces of X.
Proof. We prove only the weakly almost periodic part. Clearly, 0 ∈ Xw . The relations

U(x + y) ⊆ Ux + Uy, U(cx) = cUx, and U(U x) ⊆ U(x), U ∈ U,

show that Xw is an invariant linear subspace of X. To show that Xw is closed in X, let (xn )
be a sequence in Xw converging in norm to x in X. By the Eberlein-S̆mulian theorem, it
suffices to show that Ux is weakly relatively sequentially compact. Let (Un x) be a sequence
in Ux. Since each set Uxn is relatively weakly sequentially compact, a standard diagonal
argument shows that there exists a subsequence (Uk ) of (Un ) and a sequence (yn ) ∈ X such
w
that Uk xn → yn for each n. For any x0 ∈ X 0 with kx0 k ≤ 1 we then have

| hyn , x0 i − hym , x0 i | = lim | hUk xn , x0 i − hUk xm , x0 i | ≤ C kxn − xm k ,


k

where C = supU ∈U kU k. Therefore, kyn − ym k ≤ C kxn − xm k , which shows that (yn ) is


a Cauchy sequence. Let y := limn yn . Given ε > 0, choose n such that kx − xn k < ε and
ky − yn k < ε. For all k,

| hUk x, x0 i − hy, x0 i | ≤ | hUk (x − xn ), x0 i | + | hUk xn , x0 i − hyn , x0 i | + | hyn − y, x0 i |


≤ (C + 1)ε + | hUk xn , x0 i − hyn , x0 i |,

hence

lim | hUk x, x0 i − hy, x0 i | ≤ (C + 1)ε + lim || hUk xn , x0 i − hyn , x0 i | = (C + 1)ε.


k k

Therefore, hUk x, x0 i → hy, x0 i and so x ∈ Xw .


A semigroup of operators U on X is said to be almost periodic (resp., weakly almost
periodic) if X = Xa (resp., X = Xw ). For example, if X is reflexive and U is uniformly
bounded, then U is weakly almost periodic. Here is an example for the nonreflexive case.
17.6.2 Example. Let (X, F, µ) be a probability space and S a semigroup under composition
of measurable transformations s : X → X. For each s ∈ S, let µs denote the image measure
of µ, so
µs (E) = µ(s−1 (E)), E ∈ F.

s
We assume that µs  µ for each s ∈ S and that c := sups∈S < ∞. This is obviously
dµ ∞
438 Principles of Analysis

the case if the members of S are measure-preserving, i.e., µs = µ for all s ∈ S. Define
Us f = f ◦ s, f ∈ L1 . Then
Z Z Z
dµs
kUs f k1 = |f ◦ s| dµ = |f | dµs = |f | dµ ≤ c kf k1 ,

hence US is uniformly bounded in L1 . Since kUs 1A k∞ ≤ 1, US 1A uniformly integrable and
so is relatively weakly compact, by the Dunford-Pettis theorem (14.2.4). Therefore, US f
weakly relatively compact for every simple function f . Since these are dense in L1 , the
proposition shows that US is weakly almost periodic on L1 . ♦
17.6.3 Theorem. Let U be a semigroup of operators on a Banach space X.
(a) If U is weakly almost periodic, then in the weak operator topology of B(X) the closure
Uw of U is a compact semitopological semigroup of uniformly bounded operators.
(b) If U is almost periodic, then in the strong operator topology of B(X) the closure Ua
of U is a compact topological semigroup of uniformly bounded operators.
Proof. The uniform boundedness principle shows that Uw and Ua are uniformly bounded.
For each x ∈QX, let Kx denote the closure of Ux in the weak topology of X. The product
space K := x∈X Kx contains U and is compact by Tychonoff’s theorem. Therefore, the
closure cl(U) of U in C is compact. But cl(U) ⊆ B(X). To see this, let (Tα ) be a net in U
such that Tα → T in the product topology. Thus for all x, y ∈ X,
w w w
Tα (x + y) → Tα (x + y), Tα (x) → T (x) and Tα (y) → T (y).

It follows that T is linear, and an application of the uniform boundedness principle shows
that T is bounded. Therefore, Uw = cl(U), proving that Uw is compact in the weak operator
topology. A similar argument shows that Ua is compact in the strong operator topology.
We have already seen in the proof of 17.2.12 that operator composition in B(X) is weak
operator continuous. It follows that Uw is closed under operator composition and so is a
semitopological semigroup. It remains to show that operator composition in Ua is continuous
in the strong operator topology. But if Tα → T and Sα → S in that topology, then for all
x∈X

kTα Sα x − T Sxk ≤ kTα Sα x − Tα Sxk + kTα Sx − T Sxk


≤ M kSα x − Sxk + kTα Sx − T Sxk
→ 0,

where M = supU ∈U kU k < ∞.

Dynamical Properties of Semigroups of Operators


A representation of the semitopological semigroup S by operators on a Banach space
X is defined as in the group case, namely as a homomorphism U : s 7→ Us from S into
B(X). A representation U is said to be almost periodic (respectively, weakly almost
periodic) if it is continuous in the strong operator (respectively, weak operator) topology
and the semigroup U := US is strongly (resp.,weakly) almost periodic. For example, the
representation s 7→ Rs is weakly almost periodic on W AP (S) and almost periodic on AP (S).
If X is reflexive, s 7→ Us is weak operator continuous, and US is uniformly bounded, then
US is weakly almost periodic.
Let U : s 7→ Us be a weakly almost periodic representation of S on X. The coefficient
algebra of the representation is the unital C ∗ -subalgebra AU of Cb (S) generated by the
coefficients s → hUs x, x0 i.
Analysis on Semigroups 439

17.6.4 Proposition. (a) AU is a translation invariant subalgebra of W AP (S).


(b) The map ψ : C(Uw ) → Cb (S) defined by ψ(g)(s) = g(Us ) is a C ∗ -algebra isomorphism
onto AU that commutes with translations.
(c) AU has an invariant mean m iff K := K(Uw ) is a compact topological group. In this
case, Z

m ψ(g) = g(V ) dV, g ∈ C(Uw ),
K
where dV is normalized Haar measure on K.
Proof. (a) Since right and left translations of coefficients are coefficients, AU is translation
invariant. Moreover, since U is weakly almost periodic, it follows exactly as in the proof of
17.2.12 that a coefficient is weakly almost periodic. Therefore, AU ⊆ W AP (S).
(b) Clearly, ψ is a C ∗ isomorphism into Cb (S). Let x, y ∈ X and define g ∈ C(Uw ) by
g(T ) = hT x, yi. Then ψ(g) is a coefficient, hence ran ψ contains AU . Since ψ −1 (AU ) is a
conjugate closed unital subalgebra of C(Uw ) that separates points of Uw , it must coincide
with C(Uw ).
(c) For any mean m on AU , ψ ∗ (m) is a mean on C(Uw ), and conversely. Since
ψ(RUt g)(s) = g(Us Ut ) = g(Ust ) = Rt ψ(g)(s) for all s, we have ψ ∗ (m)(RUt g) = m(Rt ψ(g)).
Similarly, ψ ∗ (m)(LUt g) = m(Lt ψ(g)). Thus m is an invariant mean iff ψ ∗ (m) is an invariant
mean. The first part of (c) now follows from 17.4.7 applied to the semigroup Uw . If m is
an invariant mean, then ψ ∗ (m) defines a normalized Haar measure dV , verifying the last
assertion.
A U-invariant finite dimensional subspace Y of X is said to be unitary if U|Y is contained
in a uniformly bounded group of operators on Y whose identity is the identity operator.
The space of strongly almost periodic vectors in X is the closed linear subspace Xp of
X generated by the unitary subspaces of X. The set of dissipative vectors in X is defined
by
X0 = {x ∈ X : 0 ∈ clw US x}.
The following theorem asserts that every member x of X is a unique sum of a strongly
almost periodic vector and a dissipative vector. Thus for some net (sα ), Usα x converges to
a vector xp with a “stable group orbit” Uw xp .
17.6.5 Theorem (deLeeuw-Glicksberg). Let U be weakly almost periodic and let AU have
an invariant mean m. Then
(a) X0 is a closed, U-invariant subspace of X and X = Xp ⊕ X0 .

(b) Xp is the largest closed, U-invariant subspace of X on which Uw acts as a group with
identity the identity operator,
 
(c) X0 = x ∈ X : m |hU(·) x, x0 i| ∀ x0 ∈ X 0 = 0 .

Proof. By 17.6.4, there exists an idempotent E ∈ K := K(Uw ) such that K = Uw E =


EUw = EUw E is a compact topological group with identity E. Thus X = EX ⊕ (I − E)X.
We show that EX = Xp and (I − E)X = X0 .
For the inclusion Xp ⊆ EX, it suffices to show that every unitary subspace Y of X is
contained in EX. But U|Y is contained in a group with identity the identity operator I,
and since E 2 = E we have E|Y = I and so Y = EY ⊆ EX.
Next, we show that Z := EX ⊆ Xp . Since EV = V E = EV E for all V ∈ Uw , Z is
U -invariant and V = EV E on Z. Thus G := Uw |Z is a compact topological group. Let
w
440 Principles of Analysis

dV be normalized Haar measure on G and let (φα ) be a symmetric approximate identity in


C(G). Then for f ∈ C(G),
Z
φα (V )f (V ) dV = (φα ∗ f )(E) → f (E).
G
R
It follows that for fixed z ∈ Z, the vector integrals Vφα z := G φα (V )V z dV converge
weakly to z:
Z
0
hVφα z, x i = lim φα (V ) hV z, x0 i dV → hEz, x0 i = hz, x0 i .
α G

It therefore suffices to show that Vφα z ∈ Xp . Now, C(G) = SAP (G) is generated by finite
dimensional, translation invariant subspaces, hence every φα is uniformly approximable by
functions φ from such spaces G. Since Vφα z is norm approximable by Vφ z, it now suffices
to show that the finite dimensional space {Vφ z : φ ∈ G} is U invariant (hence unitary). But
this follows from
Z Z Z
−1
W Vφ z = φ(V )W V z dV = φ(W V )V z dV = LW −1 φ(V )V z dV = VW −1 φ .
G G G

This completes the proof that EX = Xp , which implies that Uw restricted to Xp is a group of
operators on Xp with identity the identity operator. Now let Y be any U-invariant subspace
on which Uw acts as a group with identity the identity operator. Since E 2 = E, E|Y = I
and so Y = EY ⊆ EX = Xp . Therefore, Xp is the largest such space.
Next, we show that (I − E)X (= ker E) = X0 . Since clw US x = Uw x it follows that
x ∈ X0 iff V x = 0 for some V ∈ Uw . Thus if x ∈ X0 , then {V ∈ Uw : V x = 0} is
nonempty, hence is a closed left ideal and so must contain the idempotent E. Therefore,
X0 = ker E.
0
Finally, let m be an invariant mean on AU and let
0 0
R ) := |hV 0x, x i|, so that
 g(V
ψ(g)(s) = |hUs x, xi|. By (c) of 17.6.4, m |hU(·) x, x i| = K |hV x, x i| dV. It follows
that m |hU(·) x, x0 i| = 0 for all x0 iff V x = 0 for all V ∈ K iff Ex = 0 (since K = KE)
iff x ∈ (E − I)X = X0 .
The conclusions of the theorem hold if either S is commutative or a group, since in each
case, W AP (S) has an invariant mean. One also has

17.6.6 Corollary (deLeeuw-Glicksberg). If kUs k ≤ 1 for all s and if both X and X 0 are
strictly convex, then the conclusions of the theorem hold.
Proof. We show that E1 = E1 E2 = E2 for all idempotents in K(Uw ). It will follow that
K(Uw ) is a compact topological group, and we can then apply the theorem.
By minimality, Uw E1 E2 = Uw E2 , hence we may choose V so that V E1 E2 = E2 . Then

kE2 xk = kV E1 E2 xk ≤ kE1 E2 xk ≤ kE2 xk

so kE2 xk = kE1 E2 xk. It follows that E2 x = E1 E2 x; otherwise, by strict convexity of X,



kE2 xk = 12 E1 (E1 E2 x + E2 x) ≤ 21 (E1 E2 x + E2 x) < kE2 xk .

To show that E1 = E1 E2 we use minimality again to choose V so that E1 E2 V = E1 .


Then V ∗ E2∗ E1∗ = E1∗ , and since E2∗ is a projection the argument of the preceding paragraph
shows that E2∗ E1∗ = E1∗ and so E1 E2 = E1 .
Analysis on Semigroups 441

Ergodic Properties of Semigroups of Operators


Let U : s 7→ Us be a weakly almost periodic representation of S on a Banach space X.
Then V := co U is a semigroup of operators on X, and by the Krein-S̆mulian theorem,
Vx = co Ux is relatively weakly compact. Therefore, V w is a weakly almost periodic
semigroup of operators on X. The results of the preceding subsection may then be applied
to V.
The coefficient space of the representation U is the closed linear subspace FU of Cb (S)
generated by the coefficients s 7→ hUs x, x0 i, their conjugates, and the constant functions.
Here is the appropriate analog of 17.6.4 in this setting.
17.6.7 Proposition. Let A(V w ) denote the space of continuous affine functions on V w .
(a) FU is a translation invariant subspace of W AP (S).

(b) The map ψ : A(V w ) → Cb (S) defined by ψ(g)(s) = g(Us ) is an isometry onto FU that
commutes with translations.
(c) FU has an invariant mean iff there exists an idempotent E in V w such that EV =
V E = E for all V ∈ V w .
Proof. The proof of (a) is essentially the same as that of part (a) of 17.6.4. The details are
left to the reader.
(b) That ψ is an isometry into Cb (S) is clear. Given a coefficient h(s) = hUs x, yi, define
g ∈ A(V w ) by g(V ) = hV x, yi. Then ψ(g) = h, which shows that FU ⊆ ran ψ and so
ψ −1 (FU ) ⊆ A(V w ). To show equality, let µ ∈ C(V w )0 such that µ = 0 on ψ −1 (FU ). We show
that µ = 0 on A(V w ); it will follow from the Hahn-Banach theorem that ψ −1 (FU ) = A(V w )
and hence that ran ψ = FU .
Now, µ may be identified with a complex measure on V w and hence may be written as a
linear combination of probability measures µj on V w , say

µ = a1 µ1 − a2 µ2 + i(a3 µ3 − a4 µ4 ), aj ≥ 0.

Since 1 ∈ ψ −1 (FU ), we have 0 = a1 − a2 + i(a3 − a4 ), hence a1 = a2 and a3 = a4 . Therefore,


we may assume that µ1 = µ2 and µ3 = µ4 on ψ −1 (FU ). By 10.31, each µj is in the
weak∗ closed convex hull C of the set δV w of all Dirac measures on V w . But since V w closed
and convex, the restriction of C to A(V w ) is simply δV w . Thus each µj restricted to A(V w )
is a Dirac measure δVj . Therefore, f (V1 ) = f (V2 ) and f (V3 ) = f (V4 ) for all f ∈ ψ −1 (FU ).
But ψ −1 (FU ) contains all functions V → hV x, x0 i and so separates points of V w . Thus
V1 = V2 and V3 = V4 , proving that µ = 0 on A(V w ).
(c) For any mean m on FU , ψ ∗ (m) is a mean on A(V w ), and conversely. By the argument
in (b), ψ ∗ (m) = δE for some E ∈ V w . By the argument in (c) of 17.6.4, m is an invariant
mean on F iff ψ ∗ (m) is an invariant mean on A(V w ), that is, iff f (V E) = f (EV ) = f (E)
for all V ∈ V w . Since A(V w ) separates points of V w , (c) follows.
Here is the main result of the subsection.
17.6.8 Theorem (deLeeuw-Glicksberg). If FU has an invariant mean, then

X = {x : Us x = x ∀ s ∈ S} ⊕ cl span{Us x − x : x ∈ X, s ∈ S}.

Proof. By 17.6.4, there exists E ∈ V w such that EV = V E = E for all V . In particular,


E 2 = E, so X = EX ⊕ (I − E)X. Now, Us x = x for all s ∈ S iff V x = x for all V ∈ V w .
Since V E = E, the latter is equivalent to Ex = x. This shows that EX is the first space in
the above direct sum. Since the second space is the span of the vectors V x − x (V ∈ V w )
and since E(V x − x) = 0, the second space is ker E = (I − E)X, completing the proof.
442 Principles of Analysis

The preceding theorem allows a simple proof of the following generalization of the mean
ergodic theorem of von Neumann.
17.6.9 Corollary. Let U ∈ B(X) such that the semigroup {U n : n ∈ N} is weakly almost
Pn−1
periodic. Then An = n−1 j=0 U j converges in the strong operator topology to a projection
E ∈ B(X) satisfying EU = U E = E.

Proof. Let E be the projection in the proof of the theorem for the representation n → U n .
We need only show that for fixed k, An (U k x − x) → 0. This follows from the identity
An (U x − x) = n1 (U n x − x) and the uniform boundedness of U N .
The preceding corollary holds for an operator U of norm ≤ 1 on a reflexive Banach space.
For a nonreflexive example, let (X, F, µ) be a probability space and ϕ : X → X measurable
such that µ(ϕ−1 (E)) ≤ µ(E) for all E ∈ F. Define U in L1 by U f = f ◦ ϕ. By 17.6.2, U N
is weakly
Pn−1 almost periodic, hence the corollary is applicable and we have L1 convergence
−1 j
n j=0 U f = Ef A more refined version of this result in the special case of a measure
preserving ϕ is proved in 18.5.
Chapter 18
Probability Theory

Probability theory has long been a subject of great interest, its roots dating back to the
analysis of games of chance in the sixteenth century. The development of modern probability
theory as a branch of measure theory was initiated by Kolmogorov in the early twentieth
century.
Intuitively, a probability is a number between 0 and 1 that expresses the likelihood of
an outcome in an experiment. In this context, the term experiment simply refers to a
repeatable procedure that has a well-defined set of outcomes; something as simple as tossing
a die or as complex as noting the first time a stock dips below a prescribed level. In practice,
the determination of probabilities may be based on logical deduction, analytical methods,
or statistical analysis (as in polling). For our purposes, we shall take as given a particular
assignment of probabilities and not be concerned with their origin. More precisely, our
development of the subject begins in the modern tradition with a given probability space
(Ω, F, P ).1

18.1 Random Variables


Many terms in modern probability theory reflect the classical origins of the subject as well
as its use in analysing real data. For example, the set Ω of a probability space (Ω, F, P ) is
called the sample space (in practice, the set of outcomes of an experiment), and members
of F are called events (sets of outcomes). Properties holding almost everywhere are said to
hold almost surely (a.s.). A real-valued (Borel) measurable function is called a random
variable and may be viewed as a numerical description of an outcome of an experiment. A
measurable function that takes values in Rd is called a d-dimensional random variable.
Random variables are typically denoted by letters X, Y , etc. A d-dimensional random
variable may be written (X1 , . . . , Xd ), where each Xj is a (1-dimensional) random variable. If
X is a family of d-dimensional random variables on Ω, we denote by σ(X) the σ-sub-algebra
of F generated by the sets {X ∈ B}, where X ∈ X and B ∈ B(Rd ).

Expectation and Variance


The integral with respect to P of a random variable X is called the mean or expectation
of X and is denoted by E(X) (or, simply, E X):
Z
EX = X dP.

1 Here, in keeping with standard conventions, we write Ω instead of X and use the symbol P for a probability

measure. Other changes of notation to accommodate convention, as well as changes in terminology, are given
in §18.1.

443
444 Principles of Analysis

The variance of an L2 random variable X is defined by

V (X) := E[X − E(X)]2 = E(X 2 ) − [E(X)]2 . (18.1)

Variance may be seen as a measure of the dispersion of the data X from the mean. The
quantity p
σ(X) := V (X)
is called the standard deviation of X. The covariance of L2 random variables X and Y
is the quantity

cov(X, Y ) := E[(X − E(X))(Y − E(Y ))] = E(XY ) − E(X)E(Y ). (18.2)

Covariance measures the degree of correlation between X and Y . For example, independent
random variables have covariance zero (see 18.2.3).
The characteristic function φX of a d-dimensional random variable X = (X1 , . . . , Xd )
is defined by 
φX (t) = E eit·X .
Note that this is simply a variation of the Fourier transform of the image measure X(P )
(see next subsection).

Probability Distributions

A probability measure on B(Rd ) is called a d-dimensional probability distribution.


An important example is the image measure PX := X(P ) on B(Rd ) of a d-dimensional
random variable X = (X1 , . . . , Xd ). The measure PX is called the distribution of X or
the joint distribution of (X1 , . . . , Xd ). By the image measure theorem 3.2.15,
Z
E g(X1 , . . . , Xn ) = g(x1 , . . . , xn ) dPX (x1 , . . . , xn )

for any Borel function g for which one side or the other of the equation is defined. Every
probability distribution Q on B(Rd ) arises in this manner, that is, as the distribution of
a random variable X on a probability space (Ω, F, P ): simply take Ω = Rd , F = B(Rd ),
P = Q and X the identity mapping on Rd . A family X of d-dimensional random variables is
said to be identically distributed if PX = PY for all X, Y ∈ X.
For d = 1 the function

FX (x) = P (X ≤ x) = PX (−∞, x]

is called the cumulative distribution function (cdf) of X. In many cases of interest, the
cdf is given by a probability density fX , so that
Z x Z

FX (x) = fX (t) dt and E g(X) = g(t)fX (t) dt.
−∞ R

If ran X is countable, then the cdf is given by the probability mass function (pmf)

pX (x) := P (X = x).

In this case X  X
FX (x) = pX (x) and E g(X) = g(x)pX (x).
t≤x x

The following are standard distributions given in terms of the probability mass function or
density. In each case X denotes a random variable with the given distribution.
Probability Theory 445

18.1.1 Examples.
• Bernoulli distribution with parameter p ∈ (0, 1):
pX (1) = 1 − pX (0) = p.
For example, the number of heads (0 or 1) that appear on a single toss of a fair coin has a
Bernoulli distribution with parameter 1/2. By an easy calculation,
E(X) = p, V (X) = pq, and φX (t) = eit p + q, where q := 1 − p.

• Binomial distribution with parameters (n, p), n ∈ N, 0 < p < 1:


 
n k n−k
pX (k) = p q , 0 ≤ k ≤ n, q := 1 − p.
k
For example, the number of heads that appear in n tosses of a coin has a binomial
distribution, where p is the probability of a head on a single toss. One may check that
E(X) = np, V (X) = npq, and φX (t) = (eit p + q)n .

• Geometric distribution with parameter p ∈ (0, 1):


pX (k) = q k p k ∈ N, q := 1 − p.
For example, in repeated tosses of a coin, the number of tails that appear before the first
head is geometrically distributed. One easily calculates
q q p
E(X) = , V (X) = 2 , and φX (y) = .
p p 1 − qeit
• Uniform distribution on the interval (a, b):
1
fX = 1(a,b) .
b−a
For example, a number drawn randomly from the interval (a, b) is uniformly distributed.
For such a random variable,
a+b (b − a)2 eitb − eitb
E(X) = , V (X) = , and φX = .
2 12 it(b − a)
• Exponential distribution with parameter λ > 0:
fX = λe−λx 1[0,∞) .
The exponential distribution is often used to model the life of a biological, electrical, or
mechanical system. It is easily seen that
1 1 λ
E(X) = , V (X) = 2 , and φX = .
λ λ λ − it
• Normal distribution with mean m ∈ R and standard deviation σ > 0:
 2 !
1 1 x−m
fX = √ exp − .
σ 2π 2 σ
For example, samples drawn randomly from a large population of independent data have
a nearly normal distribution. One calculates that
2 2
E(X) = m, V (X) = σ 2 , and φX = eitm−σ t /2 .
R∞ 2 √
The first two equations may be established using −∞ e−x /2 dx = 2π and a substitution.
The third equation follows from 6.2.3. ♦
446 Principles of Analysis

18.2 Independence
The notion of independence is specific to probability theory and may be seen as one of
several major points of departure of the subject from general measure theory.

Independent Events
Let (Ω, F, P ) be a probability space. A family {Ai : i ∈ I} of events in F is said to be
independent if
P (Ai1 ∩ · · · ∩ Ain ) = P (Ai1 ) · · · P (Ain )
for all choices of distinct indices ik in I. A family {Ai : i ∈ I} of subcollections Ai of F is
independent if the collection {Ai : i ∈ I} is independent for all choices Ai ∈ Ai , i ∈ I.
For example, if (Ω, F, P ) = (Ω1 × Ω2 , F1 ⊗ F2 , P1 × P2 ), then, by definition of the product
measure, the σ-fields F1 × Ω2 and Ω2 × F2 are independent families. This is the basis of the
notion of independent trials. Indeed, if (ω1 , ω2 ) represents the outcome of a two stage
experiment, then in this model the events A1 × Ω2 and Ω1 × A2 , occurring in stages one
and two, respectively, are independent. This idea generalizes to arbitrary finite sequences of
trials and even to infinite sequences (see §18.4).
18.2.1 Proposition. Let (Ω, F, P ) be a probability space and {Ai : i ∈ I} an independent
family of π-systems contained in F. Then the family {σ(Ai ) : i ∈ I} is independent.

Proof. We may suppose that Ω ∈ Ai for every i, since adjoining Ω does not alter the
independence property. Since the notion of independence involves only finitely many sets at a
time, we may also assume that I is finite, say, I = {1, . . . , n}. The property of independence
may now be expressed as

P (A1 ∩ · · · ∩ An ) = P (A1 ) · · · P (An ), (†)

for all Aj ∈ σ(Aj ), j = 1, . . . , n. Fix Aj ∈ Aj , j = 2, . . . , n. By hypothesis, (†) holds for all


A1 ∈ A1 . Since each side is a finite measure in A1 ∈ σ(A1 ), it follows from the measure
uniqueness theorem (1.6.8) that (†) holds for all A1 ∈ σ(A1 ). Now fix A1 ∈ σ(A1 ) and
Aj ∈ Aj , j > 2. Arguing as before, this time on A2 , we conclude that (†) holds for all
A1 ∈ σ(A1 ), A2 ∈ σ(A2 ), and Aj ∈ Aj , j = 3, . . . , n. Continuing in this manner, we see
that (†) holds for all Aj ∈ σ(Aj ) and all j.

Independent Random Variables


A collection {Xi : i ∈ I} of random variables is said to be independent if the family
{Ai := Xi−1 B(R) : i ∈ I} is independent. In particular, finitely many random variables
X1 , . . . , Xn are independent iff

P (X1 ∈ B1 , . . . , Xn ∈ Bn ) = P (X1 ∈ B1 ) · · · P (Xn ∈ Bn )

for all Borel sets Bj . Note that by 18.2.1, to test for independence it suffices to take Bj
in a generating π-system. The preceding equation may be written in terms of probability
distributions as

P(X1 ,...,Xn ) (B1 × · · · × Bn ) = PX1 (B1 ) · · · PXn (Bn ) = PX1 ⊗ · · · ⊗ PXn (B1 × · · · × Bn ).

Thus we have
Probability Theory 447

18.2.2 Proposition. Random variables X1 , . . . , Xn are independent iff

P(X1 ,...,Xn ) = PX1 ⊗ · · · ⊗ PXn .

Proposition 18.2.2 gives a precise characterization of the notion of independent trials. If


the random variable Xj is the numerical outcome of the jth stage of an experiment, then in
a model described by the law PX1 ⊗ · · · ⊗ PXn no trial will have influence on future trials.

18.2.3 Theorem. If X1 , . . . , Xn are independent and either Xj ≥ 0 for all j or Xj is


integrable for all j, then

E(X1 · · · Xn ) = E(X1 ) · · · E(Xn ).

Proof. We prove the second case. By 18.2.2,


Z
E(|X1 · · · Xn |) = |x1 · · · xn | dP(X1 ,...,Xn ) (x1 , . . . , xn )
Z Z
= · · · |x1 | · · · |xn | dPX1 (x1 ) · · · dPXn (xn )

= E|X1 | · · · E|Xn | < ∞

By Fubini’s theorem, the absolute value signs in this equation may be removed, proving the
theorem.
18.2.4 Proposition. Let X1 , . . . , Xn be independent and Xj ∈ L2 (P ). Then

V (X1 + · · · + Xn ) = V (X1 ) + · · · + V (Xn ).

Proof. Set S := X1 + · · · + Xn , mj := E(Xj ), and m := m1 + · · · + mn . Then


X
n 2 X
n X
(S − m)2 = (Xj − mj ) = (Xj − mj )2 + (Xi − mi )(Xj − mj ),
j=1 j=1 i6=j

and the conclusion follows by taking expectations, noting that the expectation of the second
sum on the right is zero, by independence.
18.2.5 Proposition. Let X1 , . . . , Xn be independent random variables. Then

PX1 +···+Xn = PX1 ∗ · · · ∗ PXn .

Proof. For A ∈ B(R),


Z
PX1 +···+Xn (A) = 1A (X1 + · · · + Xn ) dP
Z
= 1A (x1 + · · · + xn ) dP(X1 ,...,Xn )
Z Z
= · · · 1A (x1 + · · · + xn ) dPX1 · · · dPXn

= (PX1 ∗ · · · ∗ PXn )(A),

the third equality from 18.2.2.


448 Principles of Analysis

18.3 Conditional Expectation


Let (Ω, F, P ) be a probability space, G a sub-σ-field of F, and X an L1 random variable.
Recall fromR §5.3 that Rthe conditional expectation of X given G is a G-random variable Y
such that A Y dP = A X dP for all A ∈ G (see 5.3.6). The standard notation for Y is
E(X | G). Thus, by definition, E(X | G) is the unique (up to a set of G measure zero) G
random variable with the property
Z Z
E(X | G) dP = X dP for all A ∈ G. (18.3)
A A

Note that by uniqueness of measures (1.6.8), the equation holds for all A ∈ G iff it holds for A
in a generating π-system for G. In the special case G = σ(X1 , X2 , . . .), E(X | G) is called the
conditional expectation of X given X1 , X2 , . . . and is denoted by E(X | X1 , X2 , . . .).
To test whether 18.3 holds in this case, it suffices to restrict consideration to events A of the
form {X1 ∈ B1 , . . . , Xn ∈ Bn }.
A sub-σ-field G of F may be viewed as information regarding the location of an outcome.
For example, in the case of a repeated coin toss, the σ-field generated by all events of
the form {H} × A2 × A3 × · · · tells us with certainty that the first toss came up heads.
Conditional expectation generalizes the notion of standard expectation by incorporating
such information into its definition. It may be viewed as the best prediction of X given the
information G. The two extreme cases are E X | {∅, Ω} = E(X) and E(X | P(X)) = X.
In the first case, the σ-field {∅, Ω} provides no information, and one merely obtains the mean
of X. In the second case, the best prediction of X given all possible information is X itself.
The following theorem summarizes the main properties of conditional expectation. The
reader will note that several of these properties are analogs of those of ordinary expectation.
18.3.1 Theorem. Let X, Y ∈ L1 (Ω, F, P ) and let G and H be σ-fields with H ⊆ G ⊆ F.
(a) E(1 | G) = 1.

(b) E(aX + bY | G) = aE(X | G) + bE(Y | G), a, b ∈ R.

(c) X ≤ Y ⇒ E(X | G) ≤ E(Y | G).

(d) |E(X | G)| ≤ E(|X| | G).

 . If φ : R → R is convexp and φ(X) ∈ L , then


1
(e) (Conditional Jensen’s Inequality)
φ E(X | G) ≤ E φ(X) | G . In particular, if X ∈ L (1 ≤ p < ∞), then
kE(X | G)kp ≤ kXkp .
(f ) (Factor Property) . If X is a G-random variable, then E(XY |G) = XE(Y |G). In
particular, E(X | G) = X.
(g) (Independence Property) . If σ(X) and G are independent, then E(X | G) = E(X).

(h) (Tower Property) . E [E(X | G) | H] = E(X | H).

(i) (Monotone convergence theorem) . Let (Xn ) be a sequence of nonnegative random


variables with Xn ↑ X. If X is integrable, then E(Xn | G) ↑ E(X | G).
Proof. Properties (a)–(c) follow by taking integrals over sets A ∈ G. Part (d) follows from
part (c) and the inequalities ±X ≤ |X|. Part (e) is proved the same way as the standard
Jensen’s inequality (4.5.4).
Probability Theory 449

For (f), note first that the random variable XE(Y |G) is G-measurable. Now let A ∈ G.
To establish the required property that E [1A
PXE(Y |G)] = E(1A XY ), we may assume that
n
X, Y ≥ 0. Now, for G-simple functions X = j=1 aj 1Aj , we have, by definition of E(Y | G)
and linearity,
n
X n
  X
E [1A XE(Y | G)] = aj E 1A∩Aj E(Y | G) = aj E(1A∩Aj Y ) = E(1A XY ).
j=1 j=1

The desired equality now follows by considering an increasing sequence of simple functions
Xn and applying the monotone convergence theorem.
For (g), simply note that by independence of G and σ(X) and by 18.2.3 we have

E(1A X) = (E 1A )(E X) = E[1A E(X)], A ∈ G.

Property (h) follows from


Z Z Z Z

E E(X | G)|H dP = E(X | G) dP = X dP = E(X | H) dP, A ∈ H.
A A A A

Finally, for (i) we apply (c) to conclude that E(Xn | G) ↑ Y for some G-random variable
Y . By the monotone convergence theorem, for any A ∈ G,
Z Z Z Z
Y dP = lim E(Xn | G) dP = lim Xn dP = X dP.
A n A n A A

Therefore, Y = E(X | G).

18.4 Sequences of Independent Random Variables


A sequence of independent random variables may be viewed as the numerical outcomes
of of independent trials of an experiment. In this chapter, we consider the main theorems
regarding such sequences, including zero-one laws, laws of large numbers, and the central
limit theorem. The first step is to construct the infinite product of a sequence of probability
spaces. The construction is motivated by the following example.
18.4.1 Example. Suppose a fair coin is tossed repeatedly. The outcomes of the experiment
may be identified with infinite sequences of heads (H) and tails (T ). We seek to assign
a probability to the event En that the first head appears on the nth toss. For this we
argue as follows: Since on the first toss the outcomes H or T are equally likely, we should
set P (E1 ) = 1/2. Similarly, the outcomes HH, HT , T H, T T of the first two tosses are
equally likely, hence P (E2 ), the probability that T H occurs, should be 1/4. In general, by
this argument we should set P (En ) = 2−n , n ≥ 1. If Ωn = {H, T } denotes the sample
space of outcomes on the nth toss and if Pn {H} = Pn {T } = 1/2, then En is of the form
En = A1 × · · · × An × Ωn+1 × · · · , where Aj ⊆ Ωj , and P (En ) = P1 (A1 ) · · · Pn (An ). The
last equation describes a general way of assigning probabilities P (En ). Theorem 18.4.4
below shows that such assignments of probabilities may be extended to obtain a probability
measure P on the σ-field generated by all sets En . Thus, in S
the example
Pat hand, we conclude

that the probability of a head eventually appearing is P ( n En ) = n=1 2−n = 1. ♦
450 Principles of Analysis

Infinite Product Measures


Q∞
Let (Ωn , Fn , Pn ) (n = 1, 2, . . .) be probability spaces and Ω := n=1 Ωn . A cylinder set
in Ω is a subset of the form

B × Ωn+1 × Ωn+2 × · · · , B ∈ F1 ⊗ · · · ⊗ Fn .

A special case is the rectangular cylinder set

A1 × · · · × An × Ωn+1 × Ωn+2 × · · · , Ak ∈ Fk . (†)

Interpreting Ak as an event that occurs at “time k”, cylinder sets may be seen as events
occurring in finite time. The σ-field generated by all the cylinder sets (hence N∞also by the
rectangular cylinder sets) is called the product σ-field and is denoted by n=1 Fn . The
following analog of 2.1.5 is readily established.
N∞
18.4.2 Proposition. Let F = n=1 Fn and let πn : Ω → Ωn be the nth projection map
πn (ω1 , ω2 . . . ) = ωn . Then πn is F/Fn -measurable. Moreover, if (Ω0 , F0 ) is a measurable
space, then a mapping T : Ω0 → Ω is F0 /F-measurable iff πn ◦ T is F0 /Fn -measurable for
every n.
N∞
For the construction of a suitable probability measure on n=1 Fn , we follow the elegant
argument of Saeki [42], which begins with the following lemma.
18.4.3 Lemma. Let Ω be a nonempty set and A a semiring
P∞ of subsets of Ω containing Ω.
Let P be a set function on A such that P (∅) = 0 and n=1 P (An ) = 1 whenever (An ) is a
disjoint sequence in A with union Ω. Then P extends to a probability measure on σ(A).
Proof. Let Au denote the set of all finite disjoint unions of members of A. By the proof
of 1.6.4, Au is a field. Moreover, since ∅ ∈ A, every member A of Au can be written
(non-uniquely) as an infinite disjoint union of members An of A. We P∞ shall call (An ) a
representing sequence for A. Now extend P to Au by defining P (A) = n=1 P (An ), where
(An ) is any representing sequence for A. To see that the extension is well-defined, write
Ac ∈ Au as a disjoint union B1 ∪ · · · ∪ Bm , Bj ∈ A. By hypothesis,

X m
X
P (An ) = 1 − P (Bj ).
n=1 j=1

As the right side is independent of the representing sequence for A, the extension P is
well-defined. Since, by definition, P is countably additive on Au , Theorem 1.6.4 guarantees
the existence of an extension of P to σ(A).
We may now prove
N∞
18.4.4 Theorem. There exists a unique probability measure P on n=1 Fn such that

P A1 × · · · × An × Ωn+1 × Ωn+2 × · · · = P1 (A1 ) · · · Pn (An )

for all n and all Ak ∈ Fk .


Proof. Uniqueness follows from 1.6.8. To establish existence, for each n ∈ N let An denote
the collection of all rectangular cylinder sets of the form A1 × · · · × An × Ωn+1 × · · · , and
let A denote the collection of all rectangular cylinder sets. Since An ↑ A and each An is a
semiring, A is a semiring. Now set Qn = P1 ⊗ · · · ⊗ Pn and define a set function P on A by

P (A1 × · · · × An × Ωn+1 × Ωn+2 × · · · ) := Qn (A1 × · · · × An ).


Probability Theory 451

In particular,

P (Ω1 × · · · × Ωn−1 × An × Ωn+1 × Ωn+2 × · · · ) = Pn (An ).

To see that P is well-defined, suppose that n < m and

A × Ωn+1 × Ωn+2 × · · · = B × Ωm+1 × Ωm+2 × · · · ,

where
A = A1 × · · · × An and B = B1 × · · · × Bm , Aj , Bj ∈ Fj .
Then B = A × Ωn+1 × · · · × Ωm and so

P (A × Ωn+1 × Ωn+2 × · · · ) = Qn (A) = Qm (B) = P (B × Ωm+1 × Ωm+2 × · · · ).

Next, we show that P has the property of the lemma. Let (An ) be a disjoint sequence in
A with union Ω. Then

Y ∞
Y
An = Anj , Anj ∈ Fj , Anj = Ωj , j > jn , and P (An ) = P (Anj ).
j=1 j=1
P∞
Suppose, for a contradiction, that n=1 P (An ) 6= 1. Then there must exist an ω1 ∈ Ω1 such
that
X∞ ∞
Y
1An1 (ω1 ) P (Anj ) 6= 1;
n=1 j=2
P∞
otherwise, integrating over ω1 ∈ Ω1 would produce n=1 P (An ) = 1. It follows by similar
reasoning that there exists ω2 ∈ Ω2 such that

X ∞
Y
1An1 (ω1 )1An2 (ω2 ) P (Anj ) 6= 1.
n=1 j=3

By induction, we obtain a point ω = (ω1 , ω2 , . . .) ∈ Ω such that for all m,


∞ Y
X m ∞
Y
1Anj (ωj ) P (Anj ) 6= 1. (a)
n=1 j=1 j=m+1

Now, ω is in Ap for some p. Recalling that

Ap = Ap1 × · · · × Apjp × Ωjp +1 × Ωjp +2 · · · ,

we see that
jp ∞
Y Y
1Apj (ωj ) P (Apj ) = 1. (b)
j=1 j=jp +1

Next, we show that


jp ∞
Y Y
1Anj (ωj ) P (Anj ) = 0, n 6= p. (c)
j=1 j=jp +1
P∞
Let ωj0 ∈ Ωj , j > jp . Then 1Ap (ω1 , . . . , ωjp , ωj0 p +1 , . . .) = 1, and since n=1 1An = 1 it
follows that for n 6= p,
jp ∞
Y Y
1Anj (ωj ) 1Anj (ωj0 ) = 1An (ω1 , . . . , ωjp , ωj0 p +1 , . . .) = 0. (d)
j=1 j=jp +1
452 Principles of Analysis
Q
Let N ≥ jp + 1 and restrict ωj0 to lie in Anj for j > N . Then j>N 1Anj (ωj0 ) = 1, hence
from (d)
jp N
Y Y
1Anj (ωj ) 1Anj (ωj0 ) = 1An (ω1 , . . . , ωjp , ωj0 p +1 , . . .) = 0.
j=1 j=jp +1

Integrating with respect to Pj over all ωj0 ∈ Ωj (j = jp + 1, . . . , N ), we have


jp N
Y Y
1Anj (ωj ) P (Anj ) = 0.
j=1 j=jp +1

Since N was arbitrary, (c) holds. From (b) and (c) we have
jp
∞ Y ∞
X Y
1Anj (ωj ) P (Anj ) = 1.
n=1 j=1 j=jp +1

But this contradicts (a) with m = jp .


The probability measure P constructed in the
N∞ theorem is called the product of the
probability measures (Pn ) and is denoted by n=1 Pn . The probability space
∞ ∞ ∞
!
Y O O
Ωn , Fn , Pn
n=1 n=1 n=1

is called the product of the probability spaces (Ωn , Fn , Pn ). An important special case
is the Ncountable product of probability spaces of the form (R, B(R), Pn ). Note that in this

case, n=1 B(R) = B(R∞ ), where R∞ is the topological Cartesian product of countably
many copies of R. This follows from the fact that a basis for the product topology consists
of countable unions of sets of the form U1 × · · · × Un × R × R × · · · , where Uj is in a
countable basis for R. Similar remarks apply to a countable product of probability spaces
(Rd , B(Rd ), Pn ).

The Distribution of a Sequence of Random Variables


Using 18.4.4, we may extend 18.2.2 to the infinite case as follows: Let X := (Xn ) be a
on a probability space (Ω, F, P ). The distribution of
infinite sequence of random variables N

X is the probability measure PX on n=1 B(R) defined by

PX (B) = P (X1 , X2 , · · · ) ∈ B .

By definition, the random variables Xj are independent iff for each n and Bj ∈ B(R),

P (X1 , . . . , Xn ) ∈ B1 × · · · Bn = P (X1 ∈ B1 ) · · · P (Xn ∈ Bn ),

which may be written


O
∞ 

PX B1 × · · · Bn × Ωn+1 × ×Ωn+2 × · · · = PXj (B1 × · · · Bn × Ωn+1 × Ωn+2 × · · · ).
j=1

Thus, by the uniqueness of measures theorem, we have

N∞ Proposition. A sequence X = (Xn ) of random variables is independent iff PX =


18.4.5
j=1 PXj .
Probability Theory 453

The question still remains as to whether there exist sequences of independent random
variables. Theorem 18.4.4 neatly settles that question: Consider the sequence of probability
spaces (Ωn , Fn , Pn ), where Ωn = Rd , Fn = B(Rd ), and Pn is an arbitrary d-dimensional
probability distribution on Rd , and let (Ω, F, P ) denote the product space. The projection
maps Xn : Ω → Rd are then d-dimensional random variables such that
{(X1 , · · · , Xn ) ∈ B1 × · · · × Bn } = {ω : ωj ∈ Bj , 1 ≤ j ≤ n} = B1 × · · · × Bn × Ωn+1 · · ·
In particular, PXj = Pj , and the Xj are independent since
Yn
P {(X1 , · · · , Xn ) ∈ B1 × · · · × Bn } = P (B1 × · · · × Bn × Ωn+1 · · · ) = Pj (Bj )
j=1
Yn
= P (Xj ∈ Bj )
j=1

We have proved
18.4.6 Proposition. Given a sequence (Pn ) of d-dimensional probability distributions, there
exists a probability space (Ω, F, P ) and a sequence of independent d-dimensional random
variables Xn on (Ω, F, P ) such that PXn = Pn for all n.
We return to the coin toss experiment:
18.4.7 Example. By the proposition, there exists a probability space and a sequence of
independent random variables Xn such that P (Xn = 1) = p = 1 − P (Xn = 0), where
0 < p < 1. This may be taken as a model for an infinite sequence of coin tosses, where
Xn = 1 if the nth toss is heads, Xn = 0 if the nth toss is tails, and p is the probability of
heads on a single toss. Using this model, we may determine probabilities of various interesting
events. For example, the probability that a head occurs on an even toss is
P (X2 = 1) + P (X2 = 0, X4 = 1) + P (X2 = X4 = 0, X6 = 1) + · · · = p(1 + q + q 2 + · · · ) = 1,
where q := 1 − p. The probability that the first head occurs on an even toss is
q
P (X1 = 0, X2 = 1) + P (X1 = X2 = X3 = 0, X4 = 1) + · · · = p(q + q 3 + q 5 + · · · ) = .
1+q
For a fair coin, the latter probability is 1/3. ♦

Zero-One Laws
The tail σ-field of a sequence of random variables Xn is the σ-field

\
T= σ(Xn , Xn+1 , · · · ).
n=1

Members of T are called tail events. Thus tail events are unaffected by changes that occur
in finite time. For example, the events
( n
)
1X 1
ω: Xk (ω) → and {ω : Xn (ω) → 0}
n 2
k=1

are tail events, but ( )



X
ω: Xn (ω) = 0
n=1
is not. Of particular interest are tail events for independent random variables. These have
considerably restricted probabilities, as the following theorem shows.
454 Principles of Analysis

18.4.8 Kolmogorov’s Zero-One Law. If (Xn ) is a sequence of independent random


variables and A ∈ T, then P (A) = 0 or 1.
S∞
Proof. Since σ(X1 , X2 , . . .) is generated by the field n=1 σ(X1 , . . . , Xn ), there exists by 1.6.5
a strictly increasing sequence of integers nk and sets Ak ∈ σ(Xi : 1 ≤ i ≤ nk ) such that
P (A 4 Ak ) < 1/k . Now
Z Z

|P (A) − P (Ak )| = (1A − 1Ak ) dP ≤ |1A − 1Ak | dP = P A 4 Ak and
Z Z

|P (A) − P (Ak ∩ A)| ≤ |1A − 1Ak ∩A | dP = 1A |1A − 1Ak | dP ≤ P A 4 Ak ,

hence P (Ak ) → P (A) and P (Ak ∩ A) → P (A). But A ∈ σ(Xi : i > nk ), so by independence
P (Ak ∩ A) = P (Ak )P (A). Therefore, P 2 (A) = P (A).
P∞
For a simple P
application, consider an infinite
P series n=1 Xn of independent random
∞ ∞
variables. Since n=1 Xn (ω) converges iff n=m Xn (ω) converges, the event
 X∞ 
A= ω: Xn (ω) converges
n=1

is a tail event and so has probability 0 or 1. In 18.4.14 we give sufficient conditions for which
P (A) = 1, that is, for which the series converges almost surely.
The next result concerns a particularly important tail event and gives sufficient conditions
that determine the probability of the event.
18.4.9 Borel-Cantelli Lemma. Let (An ) be a sequence of events and let A = lim supn An ,
the event that An occurs infinitely often (i.o.).
P∞
(a) If n=1 P (An ) < ∞, then P (A) = 0.
P∞
(b) If the events An are independent and n=1 P (An ) = ∞, then P (A) = 1.
P∞
Proof. Part (a) follows from P (A) ≤ k=n P (Ak ) for all n. For (b) we have
∞ \ ∞
! m
! m
[ \ Y
c c
1 − P (A) = P Ak = lim lim P Ak = lim lim P (Ak c ),
n m n m
n=1 k=n k=n k=n

the last equality by independence of (An ). By the inequality ln(1 − x) ≤ −x on [0, 1) we


have for each n
m
Y m
Y m
X
lim ln P (Ack ) = lim ln (1 − P (Ak )) ≤ − lim P (Ak ) = −∞,
m m m
k=n k=n k=n

which implies that


m
Y m
Y
lim P (Ak c ) = lim (1 − P (Ak )) = 0.
m m
k=n k=n

Therefore, P (A) = 1.
Note that the independence hypothesisPin (b) is crucial. For example, if P is Lebesgue

measure on [0, 1] and An = [0, 1/n], then n=1 P (An ) = ∞ but P (A) = 0.
18.4.10 Example. PLet

(Xn ) be a sequence of independent random variables such that
Xn → 0 a.s. Then n=1 P (Xn ≥ ε) < ∞. Otherwise, we would have P (Xn ≥ ε i.o.) = 1 by
18.4.9(b) and so P (Xn → 0) = 0. ♦
Probability Theory 455

Laws of Large Numbers


Consider the coin toss example 18.4.7, where Xn is the number of heads (0 or 1) appearing
on the nth toss. The average number of heads in n tosses is then (X1 + · · · + Xn )/n, and for
a fair coin we would expect this to be close to 1/2 for large n. Thus we should have, in some
sense,
X1 + · · · + Xn 1
→ .
n 2
In this subsection we derive several results regarding the convergence of such averages, these
known generally as laws of large numbers. For the first law we need
18.4.11 Chebyshev’s Inequality. Let X be an L2 random variable with mean zero and
variance σ 2 . Then
σ2
P (|X| ≥ r) ≤ 2 .
r
R R
Proof. σ 2 = X 2 dP ≥ {|X|≥r} X 2 dP ≥ r2 P (|X| ≥ r).
2
18.4.12 Weak Law of Large
−2
PnNumbers. Let (Xn ) be a sequence of independent L
random variables. If vn := n k=1 V (Xn ) → 0, then

n
1 X  P
Yn := Xk − E(Xk ) → 0.
n
k=1

Proof. Since Yn has mean zero and variance vn (18.2.4), P (|Yn | ≥ ε) ≤ vn /ε2 → 0.
By strengthening the hypothesis of the weak law of large numbers, we obtain more powerful
conclusions in the form of strong laws. For these laws we need the following generalization
of Chebyshev’s inequality.
18.4.13 Kolmogorov’s Inequality. Let X1 , . . . , Xn be independent L2 random variables
with mean 0 and set Sj := X1 + · · · + Xj , j = 1, . . . , n. Then
  n
1 X
P max |Sj | ≥ ε ≤ V (Xj ).
1≤j≤n ε2 j=1

Proof. By replacing Xj by Xj − E(Xj ) and noting that V (X + c) = V (X), we may assume


that E(Xj ) = 0. Set
  k−1
\
A= max |Sj | ≥ ε , B1 = {|S1 | ≥ ε}, and Bk = {|Sk | ≥ ε} ∩ {|Sj | < ε} (k ≥ 2).
1≤j≤n
j=1
Sn
Clearly, A = k=1 Bk (disjoint). Now, for k < n the random variables Sn − Sk and Sk 1Bk
are independent, being Borel functions of (Xk+1 , · · · , Xn ) and (X1 , · · · , Xk ), respectively.
Therefore, by 18.2.3,

E (Sn − Sk )Sk 1Bk = E(Sn − Sk )E(Sk 1Bk ) = 0.

Writing
2
Sn2 = (Sn − Sk + Sk ) = (Sn − Sk )2 + 2(Sn − Sk )Sk + Sk2 ,
we then have
   
E(Sn2 1Bk ) = E (Sn −Sk )2 1Bk +2E (Sn −Sk )Sk 1Bk +E(Sk2 1Bk ) ≥ E(Sk2 1Bk ) ≥ ε2 P (Bk ),
456 Principles of Analysis

which implies
n
X Z XZ X
V (Xk ) = V (Sn ) ≥ Sn2 dP = Sn2 dP ≥ ε2 P (Bk ) = ε2 P (A).
k=1 A k Bk k

The following is a precursor to the strong laws.


18.4.14
P∞ Theorem. Let P (Xn ) be a sequence of independent L2 randomP∞ variables. If

n=1 V (X n ) < ∞,
P∞ then n=1 Xn − E(X n ) converges a.s. Hence if n=1 E(Xn ) also
converges, then n=1 Xn converges a.s.
Pn 
Proof. Let Sn := j=1 Xj − E(Xj ) . We claim that limn Sn and limn Sn are finite a.s. To
see this, note first that

| lim Sn | ≤ lim |Sn |, | lim Sn | = | lim(−Sn )| ≤ lim |Sn |,


n n n n n

and
lim |Sn | ≤ lim |Sn − S1 | + |S1 | ≤ sup |Sn − S1 | + |S1 |.
n n n≥2

Therefore, the claim will follow if we show that supn≥2 |Sn − S1 | < ∞ a.s. Now, for r > 0,
  ∞
! N
!
[ [
P sup |Sn − S1 | ≥ 2r ≤ P {|Sn − S1 | ≥ r} = lim P {|Sn − S1 | ≥ r}
n≥2 N →∞
n=2 n=2
 
≤ lim P max |Sn − S1 | ≥ r
N →∞ 2≤n≤N

1 X
≤ 2 V (Xj ), (†)
r j=2

the last inequality by Kolmogorov’s inequality applied to Xn − E(Xn ). Thus


    ∞
1 X
P sup |Sn − S1 | = ∞ ≤ P sup |Sn − S1 | ≥ 2r ≤ 2 V (Xj )
n≥2 n≥2 r j=2

for all r and so P supn≥2 |Sn − S1 | = ∞ = 0, verifying the claim.
Now note that for any real sequence (an ), a ∈ R, and m ∈ N, if a − ε < an < a + ε for all
n ≥ m, then a − ε ≤ limn an ≤ limn an ≤ a + ε, hence limn an − limn an ≤ 2ε. Put another
way,
lim an − lim an > 2ε ⇒ sup |an − a| ≥ ε.
n n n≥m

Thus for all m, by an obvious extension of (†),


    ∞
4 X
P lim Sn − lim Sn > 2ε ≤ P sup |Sn − Sm | ≥ ε ≤ 2 V (Xj ).
n n n≥m ε
j≥m

Letting m → ∞ shows that P limn Sn − limn Sn > 2ε = 0 for all ε. It follows that
limn Sn = limn Sn almost surely.
P∞
18.4.15 Kronecker’s Lemma. If the series n=1 an converges, then
n
1X
lim kak = 0.
n n
k=1
Probability Theory 457
P∞ Pn
Proof. Set s := n=1 an and sn := k=1 ak . Since sn → s, it follows that the averages
Pn−1
tn := (s1 + · · · + sn )/n also tend to s. Then (n − 1)tn−1 = s1 + · · · + sn−1 = k=1 (n − k)ak
and so
n n n−1
1X 1X 1X n−1
kak = [n − (n − k)]ak = sn − (n − k)ak = sn − tn−1 → 0.
n n n n
k=1 k=1 k=1

We may now prove


18.4.16 L2 -Strong PLaw of Large Numbers. Let (Xn ) be a sequence of independent L2

random variables. If n=1 n−2 V (Xn ) < ∞, then
n
1X 
lim Xk − E(Xk ) = 0 a.s.
n n
k=1
P∞ 
Proof. By 18.4.14 applied to the sequence (n−1 Xn ), the series k=1 k −1 Xk − E(Xk )
converges almost surely. Kronecker’s lemma completes the proof.
In the case of identically distributed random variables, the requirement of square summa-
bility may be weakened. To prove this we need
18.4.17 Lemma. For n ∈ Z+ , let An = {|X| ≥ n} and Bn = {n ≤ |X| < n + 1}. Then

X ∞
X ∞
X ∞
X
P (An ) ≤ nP (Bn ) ≤ E |X| ≤ (n + 1)P (Bn ) = P (An ).
n=1 n=1 n=0 n=0
P
Proof. Since P (Am ) = P (Bn ),
n≥m


X ∞
X ∞
X ∞
X
1+ P (Am ) = P (Am ) = (n + 1)P (Bn ) ≤ nP (Bn ) + 1,
m=1 m=0 n=0 n=1

which proves the first inequality. For the remaining inequalities, note that n1Bn ≤ |X|1Bn ≤
(n + 1)1Bn , hence

X ∞ Z
X ∞
X
nP (Bn ) ≤ E |X| = |X| dP ≤ (n + 1)P (Bn ).
n=1 n=0 Bn n=0

18.4.18 L1 -Strong Law of Large Numbers. Let (Xn ) be a sequence of independent and
identically distributed L1 random variables. Then
n
1X
lim Xk = E(X1 ) a.s.
n n
k=1

6 Yn ) = P (|Xn | ≥ n) = P (|X1 | ≥ n), hence, by


Proof. Set Yn := Xn 1{|Xn |<n} . Then P (Xn =
the lemma,
X ∞
P (Xn 6= Yn ) ≤ E|X1 | < ∞.
n=1

By the Borel-Cantelli
Pn lemma, Xn = Yn eventually with probability one, hence it suffices to
prove that n−1 k=1 Yk → E(X1 ) a.s. Now E(Yn ) = E(X1 1{|X1 |<n} ) → E(X1 ), hence also
458 Principles of Analysis
P n Pn 
n−1 k=1 E(Yk ) → E(X1 ). We must therefore show that P n−1 k=1 Yk − E(Yk ) → 0 a.s.

For this it is sufficient by the L2 strong law to prove that k=1 V (Yk )/k 2 < ∞. Now,
 
V (Yk ) = E(Yk2 ) − E 2 (Yk ) ≤ E(Yk2 ) = E Xk2 1{|Xk |<k} = E X12 1{|X1 |<k} ,

hence, setting Bn = {n − 1 ≤ |X1 | < n}, we have



X Z X
∞ X∞ Z ∞
X
1 1 2 1 2
V (Yk ) ≤ X 1
1 {|X1 |<k} dP = X 1{|X1 |<k} dP.
k 2 k 2
n=1 Bn
k2 1
k=1 k=1 k=1

Since 1{|X1 |<k} 1Bn = 0 (k ≤ n − 1), we have


Z ∞
X Z X ∞ Z X ∞
1 2 n2 n2
2
X 1
1 {|X1 |<k} dP ≤ 2
1{|X1 |<k} dP ≤ dP ≤ 2nP (Bn ),
Bn k Bn k Bn k2
k=1 k=n k=n
P
where, for the last inequality, we have used the estimate k≥n k −2 ≤ 2/n, as may be verified
by a simple induction argument. Thus

X ∞
X
V (Yk )
≤2 nP (Bn ),
k2 n=1
k=1

which, by the lemma, is finite.

The Central Limit Theorem


The central limit theorem is one of the most important results in probability theory,
underlying much of statistical analysis. The theorem asserts (remarkably) that (suitably ad-
justed) sample means of independent data are approximately normally distributed, regardless
of the actual distribution of the data. Here is the precise statement:
18.4.19 Theorem. Let (Xn ) be a sequence of independent, identically distributed L2 random
variables with mean µ and variance σ 2 . Set Sn := X1 + · · · + Xn . Then for all x
  Z x
Sn − nµ 1 2
lim P √ ≤x = √ e−t /2 dt. (18.4)
n σ n 2π −∞
Proof. Since
n    
Sn − nµ 1 X Xn − µ Xn − µ Xn − µ
√ =√ , E = 0, and V = 1,
σ n n σ σ σ
k=1

the general result follows from the special case µ = 0 and σ = 1, which we assume. Let Fn (x)
denote the distribution function of n−1/2 Sn and F the distribution function of a standard
normally distributed random variable. The assertion of the theorem is that Fn (x) → F (x)
for all x. By 7.4.3, this is equivalent to the vague convergence Qn → Q, where
Z
√  1 2
Qn (B) = P Sn ∈ nB and Q(B) = √ e−t /2 dt.
2π B

To establish vague convergence, it suffices by 7.4.2 to show that Qb n (ξ) → Q(ξ)


b for every
ξ ∈ R. Now, differentiating under the integral in
Z Z
Qc1 (ξ) = e−2πi ξx dQ1 (x) = e−2πi ξx dPX (x)
1
Probability Theory 459
b 1 0 (0) = −2πiE(X1 ) = 0 and Q
we have Q b 1 00 (0) = −4π 2 V (X1 ) = −4π 2 . By Taylor’s theorem,

b 1 (ξ) = 1 + 1 Q b 00 2
Q 2 1 (θ(ξ))ξ , where |θ(ξ)| ≤ |ξ|.

By independence, Qn (B) = PX1 ∗ · · · ∗ PXn ( n B), hence
Z Z h
−1/2 i n
b
Qn (ξ) = · · · e−2πin ξ(x1 +···+xn )
dPX1 (x1 ) · · · dPXn (xn ) = Qb 1 n−1/2 ξ
   ξ 2 n 
b 00 −1/2 an n
= 1 + Q1 θ(n ξ) = 1+ ,
2n n
where  
b 001 (0) ξ 2 /2 = −2π 2 ξ 2 .
an := Q001 θ(n−1/2 ξ) ξ 2 /2 → Q

Also, by 6.2.3 with a = 1/2 and b = −2πi,


Z ∞
b 1 2 2 2
Q(ξ) =√ e−x /2−2πiξx dx = e−2π ξ .
2π −∞

By a Taylor series expansion, ln(1 + z) = z + z 2 O(z). Taking z = an /n we see that


 
b n (ξ) = n n−1 an + (n−1 an )2 O(n−1 an ) = an + a2n O(n−1 an )/n → −2π 2 ξ 2 = ln Q(ξ).
ln Q b

b n (ξ) → Q(ξ),
Therefore, Q b completing the proof.
18.4.20 Remark. In the special case that the random variables Xj are Bernoulli with
parameter p ∈ (0, 1), (18.4) becomes
  Z x
Sn − np 1 2
lim P √ ≤x = √ e−t /2 dt, q := 1 − p,
n→∞ npq 2π −∞
a result is known as the DeMoivre-Laplace theorem. One can use this equation to obtain
the following approximation for the pmf of the binomial random variable Sn :
 
k − .5 − np Sn − np k + .5 − np
P (Sn = k) = P (k − .5 < Sn < k + .5) = P √ < √ < √
npq npq npq
   
k + .5 − np k − .5 − np
≈Φ √ −Φ √ , k = 0, 1, . . . , n,
npq npq

where Z x
1 2
Φ(x) := √ e−t /2
dt.
2π −∞

In particular, for p = .5 we have the approximation


   
2k + 1 − n 2k − 1 − n
P (Sn = k) ≈ Φ √ −Φ √ . ♦
n n

The Individual Ergodic Theorem


A measurable transformation T : Ω → Ω on a probability space (Ω, F, P ) is said to be
measure preserving if T (P ) = P , that is, P (T −1 (A)) = P (A) for all A ∈ F. For the
purposes of this section, it is useful to think of T n := T ◦ T · · · ◦ T (n factors) as representing
the state of an evolving system at time n. If X is a random variable that measures some
aspect of the system at time 0, then Xn := X ◦ T n is that measurement at time n.
460 Principles of Analysis

An event A ∈ F is said to be invariant under a measure preserving map T if T −1 (A) = A.


It is easy to see that the collection I of all invariant events is a σ-field. The transformation
T is said to be ergodic if P (A) = 0 or 1 for all A ∈ I. ThinkingP again of X ◦ T n as
1 n
some measurement of an evolving system, we see that the quantity n k=1 X T k−1 (ω) is
n−1
then a time average of the measurement over the states ω, T ω, . . . , T ω. Josiah Gibbs
argued that for large n the time average should approximate the space average E(X), an
approximation that may be viewed as a “uniform mixing” criterion.2 The fact that mixing
does not compress or dilate sets (that is, P (T −1 (A)) = P (A)) is then a natural condition in
this context. Moreover, for uniform mixing to occur, it is reasonable to expect the absence
of nontrivial invariant sets A (0 < P (A) < 1), these representing “unmixed pockets.” Thus
ergodicity is another natural requirement. The ergodic theorem of Birkhoff makes these
ideas precise. For the proof we need the following result.
18.4.21 Maximal Ergodic Theorem. Let T : Ω → Ω be measure preserving and let X
be an L1 random variable. Define random variables
n−1
X
Sn := X ◦ T k and Mn := max{0, S1 , S2 , . . . , Sn }.
k=0
R
Then for all n, Mn >0
X dP ≥ 0.

Proof. (Garsia). For 1 ≤ k ≤ n, Mn (T ω) ≥ Sk (T ω) = Sk+1 (ω) − X(ω), hence

X(ω) ≥ Sk+1 (ω) − Mn (T ω).

Since S1 (ω) = X(ω) and Mn (T ω) ≥ 0, the inequality also holds for k = 0. Therefore,

X(ω) ≥ max Sj (ω) − Mn (T ω).


1≤j≤n

Noting that Mn (ω) = max1≤j≤n Sj (ω) on the set {Mn > 0}, we have
Z Z Z
X dP ≥ Mn (ω) dP (ω) − Mn (T ω) dP (ω)
{Mn >0} {M >0} {M >0}
Z n Z n
≥ Mn (ω) dP (ω) − Mn (T ω) dP (ω).
{Mn >0}
R
Since T is measure preserving, the last difference reduces to − {Mn ≤0}
Mn dP ≥ 0.

Pn−1 Theorem. Let T : Ω → Ω be measure preserving and let


18.4.22 Birkhoff Ergodic
X ∈ L1 (P ). Set Sn := k=0 X ◦ T k . Then

Sn
lim = E(X | I) a.s., (18.5)
n n
where I is the σ-field of all invariant events. In particular, if T is ergodic, then E(X | I) =
E(X).
Proof. First, note that E(X | I) ◦ T = E(X | I). Indeed, since T is I-measurable so is
E(X | I) ◦ T , and, by the measure preserving property of T ,
Z Z
E(X | I) ◦ T dP = E(X | I) dP for all A ∈ I.
A A

2 The cocktail example of Halmos [22] comes to mind.


Probability Theory 461
P n 
We may now express (18.5) as limn n−1 k=1 X − E(X | I) ◦ T k = 0 a.s.. Thus we may
assume in the statement of the theorem that E(X | I) = 0.
We show first that Y := limn Sn /n ≤ 0. Given ε > 0, set A = {Y > ε}. Noting that

Sn (T ω) Sn+1 (ω) − X(ω) Sn+1 (ω)


Y (T ω) = lim = lim = lim = Y (ω),
n n n n n n+1
e := [X − ε]1A with
we see that A ∈ I. Now apply the maximal ergodic theorem to X
f e
corresponding Mn and Sn to obtain
Z
Xe dP ≥ 0. (α)
{M
fn >0}

By the invariance of A,
k−1  
1e 1X 1
Sk (ω) = [X(T j ω) − ε]1A (T j ω) = Sk (ω) − ε 1A (ω),
k k j=0 k

and since supk k1 Sk ≥ Y > ε > 0 on A we have


n o     
1e

f e e
Mn > 0 = max Sk > 0 ↑ sup Sk > 0 = sup Sk > 0 = A. (β)
1≤k≤n k k k

e ∈ L1 , (α) and (β) imply that


Since X
Z Z Z
0≤ Xe dP = X dP − εP (A) = E(X | I) dP − εP (A) = −εP (A).
A A A

Therefore, P (A) = 0, that is, Y ≤ ε a.s. Since ε was arbitrary, limk Sk /k ≤ 0 a.s. Since the
argument holds for −X as well, we also have limk (−Sk /k) ≤ 0, that is, limk (Sk /k) ≥ 0 a.s.
Therefore, limk Sk /k = 0 a.s., as required.
Finally, if T is ergodic, then I consists only of sets with measure zero or one, hence
E(X | I) = E(X).
The following result is an analog of a result proved in the general setting of weakly almost
periodic semigroups of operators on Banach spaces. (See 17.6.9 and the paragraph following.)
We give a probabilistic proof here.

18.4.23 Corollary. The convergence in (18.5) is also in L1 .


Pn
Proof. Set An (X) = n−1 j=1 X ◦ T j−1 . If X is bounded, then so is the L1 random variable
An (X) − E(X | I), and since An (X) − E(X | I) → 0 a.s., the convergence is in the L1 norm
by the dominated convergence theorem.
In the general case, given ε > 0 choose a bounded random variable Y such that
kX − Y k1 < ε. Then

kAn (X) − E(X | I)k1 ≤ kAn (X − Y )k1 + kAn (Y ) − E(Y | I)k1 + kE(X − Y | I)k1 . (†)

We show that each of the terms on the right may be made arbitrarily small.
By the result of the first paragraph, for sufficiently large n,

kAn (Y ) − E(Y | I)k1 < ε.


462 Principles of Analysis

Also, by the measure preserving property of P ,


n n
1 X
(X − Y ) ◦ T j−1 = 1
X
kAn (X − Y )k1 ≤ 1
kX − Y k1 < ε.
n j=1 n j=1

Finally, by Fatou’s lemma and (18.5),


Z Z
kE(X | I) − E(Y | I)k1 = lim |An (X − Y )| dP ≤ lim inf |An (X − Y )| ≤ ε.
n n

Therefore, by (†), kAn (X) − E(X | I)k1 < 3ε for all sufficiently large n.

Stationary Processes
A sequence of random variables Xn on a probability space (Ω, F, P ) is called a stationary
process if
 
P (Xn , Xn+1 , . . .) ∈ B = P (X1 , X2 , . . .) ∈ B ∀ B ∈ B(R∞ ) and n ∈ N. (18.6)
In particular, taking B = B1 × R × · · · , we see that P (Xn ∈ B1 ) = P (X1 ∈ B1 ), so that the
random variables Xn are identically distributed. In this section we prove an ergodic theorem
for stationary processes.
For an example of a stationary process, let T : Ω → Ω be a measure preserving transfor-
mation and X1 a random variable. Then the sequence (Xn := X1 ◦ T n−1 ) is a stationary
process. To see this, let
An := {(Xn , Xn+1 , . . .) ∈ B}
and note that because Xn = Xn−1 ◦ T we have T −1 (An−1 ) = An and so P (An ) =
P (T −1 (An−1 )) = P (An−1 ). Iterating, we obtain (18.6).
Now let X = (Xn ) be an arbitrary stationary process on (Ω, F, P ) and let T : R∞ → R∞
denote the left shift operator T (x1 , x2 , . . .) = (x2 , x3 , . . .). Thus for all n
(Xn+1 , Xn+2 , . . .) = T n (X1 , X2 , . . .) =: T n ◦ X.
Clearly, T is a measurable transformation on (R∞ , B(R∞ ), PX ). Moreover, T is measuring
preserving. Indeed, from
T −1 (B1 × · · · × Bn × R × · · · ) = R × B1 × · · · × Bn × R × · · ·
we have, by stationarity,

T PX (B1 × · · · × Bn × R × · · · ) = P ((X1 , X2 , . . .) ∈ R × B1 × · · · × Bn × R × · · · )
= P ((X2 , X3 , . . .) ∈ B1 × · · · × Bn × R × · · · )
= P ((X1 , X2 , . . .) ∈ B1 × · · · × Bn × R × · · · )
= PX (B1 × · · · × Bn × R × · · · ).
Thus the measures T (PX ) and PX agree on the sets B1 × · · · × Bn × R × · · · and so are
equal, by the uniqueness theorem for measures.
Now call A ∈ F invariant if there exists a B ∈ B(R∞ ) such that
A = {(Xn+1 , Xn+2 , . . .) ∈ B} for all n ≥ 0.
The set I of all invariant sets is easily seen to be a σ-subfield of F. Since the preceding
relationship between A and B may be written as
1A = 1B ◦ T n ◦ X for all n ≥ 0,
Probability Theory 463

the usual arguments show that a function f on Ω is I-measurable iff there exists a B(R∞ )-
measurable function g such that

f = g ◦ T n ◦ X for all n ≥ 0.

With this background we may now prove


18.4.24 Ergodic Theorem for Stationary Processes. Let (Xn ) be a stationary process
on (Ω, F, P ). If X1 ∈ L1 (P ) then
n
1X
lim Xj = E(X1 | I) in L1 and a.s.
n n j=1

Proof. Let πn : R∞ → R denote the nth coordinate projection. Then πn is a random variable
on (R∞ , B(R∞ ), PX ) and πn = π1 ◦PT n−1 . Since T is measure preserving, by the Birkhoff
n
ergodic theorem the averages n−1 j=1 πj converge a.s. and in L1 on (R∞ , B(R∞ ), PX ).
Pn
Since (πn ) has distribution PX , n−1 j=1 Xn converges a.s. and in L1 to some random
variable Y ∈ L1 (P ). To see that Y = E(X1 | I), note first that since Y = g ◦ T n ◦ X for
all n, where g is the measurable function g(x1 , x2 , . . .) = limk (x1 + · · · + xk /k), the random
variable Y is I-measurable. It remains to show that
Z Z
Y dP = X1 dP for all A ∈ I. (†)
A A

But if A is invariant, say A = {(Xk , Xk+1 , . . .) ∈ B} for all k, then, by the stationary
property of X,
Z Z Z
Xk dP = xk dPX (x1 , x2 , . . .) = x1 dPX (x1 , x2 , . . .)
A {(xk ,xk+1 ,...)∈B} {(x1 ,x2 ,...)∈B}
Z
= X1 dP,
A

hence Z Z
n
1X
Xk dP = X1 dP.
A n A
k=1

Taking limits yields (†).

18.5 Discrete-Time Martingales


A sequence (Xn ) of random variables on a probability space (Ω, F, P ) is called a discrete
time stochastic process or simply a process. If Xn ∈ Lp for all n, then (Xn ) is called an
Lp -process. For example, a stationary process is an L1 process, and L1 and L2 processes
arose in the context of the laws of large numbers. If we view a process as a model for the
numerical outcomes of an ongoing experiment, then a mathematical model of the history of
the experiment becomes important. Such a model in known as a filtration.
464 Principles of Analysis

Filtrations
A (discrete-time) filtration on (Ω, F, P ) is a sequence of σ-fields Fn such that

Fn ⊆ Fn+1 ⊆ F for all n ∈ N.

A probability space
 with a filtration is called a filtered probability space and is denoted
by Ω, F, (Fn ), P . It is sometimes useful to view a filtration as a mathematical description of
the information produced by an experiment consisting of repeated trials. At the completion of
the nth trial, Fn encapsulates the information revealed by the outcome of this and previous
trials.
A stochastic process (Xn ) is said to be adapted to a filtration (Fn ) if for all n the
random variable Xn is Fn -measurable. For example, (Xn ) is clearly adapted to the filtration
 
FX = FX X
n , Fn := σ(X1 , . . . , Xn ) ,

which is called the natural filtration of (Xn ). As noted above, a filtration models the
evolution of information. Thus if (Xn ) is adapted to a filtration (Fn ), then the σ-field Fn
includes all knowable information about the process up to time n. The natural filtration
includes all knowable information about the process up to time n but nothing more.

Definition and General Properties of Martingales



Let Ω, F, (Fn ), P be a filtered probability space and let (Xn ) be an L1 -process adapted
to (Fn ). Then the pair (Xn , Fn ) is said to be a

• submartingale if Xn ≤ E(Xn+1 | Fn ) for all n ∈ N.


• supermartingale if Xn ≥ E(Xn+1 | Fn ) for all n ∈ N.
• martingale if Xn = E(Xn+1 | Fn ) for all n ∈ N.
We omit Fn from the notation (Xn , Fn ) if the filtration is understood or is the natural
filtration,
A submartingale has the multistep property

Xn ≤ E(Xm | Fn ), for all m ≥ n, (18.7)


that is, Z Z
Xn dP ≤ Xm dP for all A ∈ Fn and m ≥ n.
A A

Indeed, from Xm−1 ≤ E(Xm | Fm−1 ) and the tower property we have

Xm−2 ≤ E(Xm−1 | Fm−2 ) ≤ E E(Xm | Fm−1 ) | Fm−2 = E(Xm | Fm−2 ).

Iterating we obtain (18.7). Submartingales and martingales have analogous multistep prop-
erties. Note that (Xn , Fn ) is a submartingale iff (−Xn , Fn ) is a supermartingale.
We may think of a martingale as the accumulated winnings of a gambler in a sequence
of fair games. The martingale condition, which may be written E(Xn+1 − Xn | Fn ) = 0,
then asserts that the best prediction of the gain Xn+1 − Xn on the next play, based on the
information Fn obtained during the first n plays, is zero, the hallmark of a fair game. The
games favor the house (respectively, the player), if the winnings constitute a supermartingale
(respectively, a submartingale).
Probability Theory 465

18.5.1 Examples of Martingales.


(a) Let Y1 , Y2 , . . . be a sequence of independent L1 random variables on (Ω, F, P) with mean
one. Set Xn := Y1 · · · Yn . By the factor and independence properties

E(Xn+1 − Xn | X1 , . . . , Xn ) = Xn E(Yn+1 − 1 | X1 , . . . , Xn ) = Xn E(Yn+1 − 1) = 0,

which is the desired martingale property.


(b) Let (Yn ) be a sequence of independent L1 random variables on (Ω, F, P) with mean p
and set Xn = Y1 + Y2 + · · · + Yn − np. Then

E(Xn+1 − Xn | X1 , . . . , Xn ) = E(Yn+1 − p | X1 , . . . , Xn ) = E(Yn+1 − p) = 0.



(c) Let Ω, F, (Fn ), P be a filtered probability space and X ∈ L1 (Ω, F, P). Define Xn =
E(X|Fn ). By the tower property E(Xn+1 | Fn ) = E(X | Fn+1 R | Fn ) = E(X | Fn ) = Xn .
Note that (Xn ) is uniformly integrable, that is, limt→∞ supn {|Xn |≥t} |Xn | dP = 0. Indeed,
 
since |Xn | = |E X | Fn | ≤ E |X| | Fn and {|Xn | ≥ t} ∈ Fn ,
Z Z Z

|Xn | dP ≤ E |X| | Fn dP = |X| dP,
{|Xn |≥t} {|Xn |≥t} {|Xn |≥t}

and by the Markov inequality,


Z Z Z
1 1  1
P {|Xn | ≥ t} ≤ |Xn | dP ≤ E |X| | Fn dP = |X| dP → 0.
t t t

(d) Consider a sequence of finite partitions Pn := {An,1 , An,2 , . . . , An,mn } of Ω such that
each member of Pn is a union of members of Pn+1 . Let Q be a probability measure such
that Q  P . Define
mn
X Q(An,j )
Xn = an,j 1An,j , an,j := ,
j=1
P (An,j )

where an,j is defined


S to be 0 if P (An,j ) = 0. Clearly, (Xn ) is adapted to the filtration
σ(Pn ) . If An,k = j∈Fk An+1,j , then An,k ∩ An+1,j = ∅ unless j ∈ Fk , in which case the
intersection is An+1,j . Therefore, for all k
Z mn+1
X Z X Z
Xn+1 dP = an+1,j 1An+1,j dP = Q(An+1,j ) = Q(An,k ) = Xn dP,
An,k j=1 An,k j∈Fk An,k

which implies the martingale property. ♦


The next result is a direct consequence of the linearity and order properties of conditional
expectation.
18.5.2 Proposition. Let (Xn ) and (Yn ) be processes on a filtered probability space
(Ω, F, (Fn ), P ) and set Zn := aXn + bYn , a, b ∈ R.
(a) If (Xn , Fn ), (Yn , Fn ) are martingales, then (Zn , Fn ) is a martingale.

(b) If (Xn , Fn ) and (Yn , Fn ) are sub (super) martingales and a, b ≥ 0, then (Zn , Fn ) is a
sub (super) martingale.
For the remainder of the subsection, we focus mainly on submartingales. Corresponding
results for supermartingales may be obtained by considering (−Xn ). The next result describes
several ways of generating submartingales.
466 Principles of Analysis

18.5.3 Theorem. Let (Xn ) and (Yn ) be processes on a filtered probability space
(Ω, F, (Fn ), P ).
(a) If (Xn , Fn ) and (Yn , Fn ) are submartingales, then (Xn ∨ Yn , Fn ) is a submartingale.
In particular, (Xn+ , Fn ) is a submartingale.
(b) If (Xn , Fn ) is a submartingale, φ is convex and increasing, and φ(Xn ) ∈ L1 for all n,
then φ(Xn ), Fn is a submartingale.

(c) If (Xn , Fn ) is a martingale, φ is convex, and φ(Xn ) ∈ L1 for all n, then φ(Xn ), Fn
is a submartingale. In particular, (|Xn |, Fn ) is a submartingale.
Proof. For (a) we have E(Xn+1 ∨ Yn+1 | Fn ) ≥ E(Xn+1 | Fn ) ≥ Xn , with a similar
inequality for Y . Therefore, E(Xn+1 ∨ Yn+1 | Fn ) ≥ Xn ∨ Yn . Part
 (b) follows from the

conditional form of Jensen’s inequality: φ(Xn ) ≤ φ E(Xn+1 | Fn ) ≤ E φ(Xn+1 ) | Fn ) .
The proof of part (c) is similar.
The following theorem asserts that reducing the amount of information provided by
a filtration preserves the submartingale property. (The same is not necessarily true if
information is increased.)
18.5.4 Theorem. Let (Gn ) and (Fn ) be filtrations with Gn ⊆ Fn ⊆ F. If (Xn ) is adapted
to (Gn ) and is a submartingale with respect to (Fn ), then it is also a submartingale with
respect to (Gn ).

Proof. E(Xn+1 | Gn ) = E E(Xn+1 | Fn ) | Gn ≥ E(Xn | Gn ) = Xn .

Stopping Times. Optional Sampling



S Ω, F, (Fn ), P be a filtered probability space and let F∞ denote the σ-field generated
Let
by n Fn . A stopping time (relative to the given filtration) is a function τ : Ω → N ∪ {∞}
such that
{τ = n} ∈ Fn for all n ∈ N ∪ {∞}.
Note that if τ is a stopping time, then the set {τ ≤ n} is a union of sets {τ = j} ∈ Fj and
so is a member of Fn . It follows that {τ > n} = {τ ≤ n}c also lies in Fn .
The constant function τ = m, where m is a fixed positive integer, is trivially a stopping
time. Also, if τ and σ are stopping times, then so is τ ∧ σ, as may be seen from
{τ ∧ σ = n} = {τ = n, σ ≥ n} ∪ {σ = n, τ ≥ n}.
In particular, τ ∧ m is a stopping time.
18.5.5 Example. Consider a stochastic process (Xn ). Let τ denote the first time Xn < 0.
Thus (
min{n : Xn (ω) < 0} if {n : Xn (ω) < 0} =
6 ∅
τ (ω) =
∞ otherwise,
The calculations
n−1
\ ∞
\
{τ = n} = {Xn < 0} ∩ {Xj ≥ 0} and {τ = ∞} = {Xj ≥ 0},
j=0 j=1

show that τ is a stopping time relative to the natural filtration of (Xn ). In this connection,
note that the function
(
max{n : Xn (ω) < 0} if {n : Xn (ω) < 0} = 6 ∅
σ(ω) =
∞ otherwise
Probability Theory 467

is not a stopping time. This is a mathematical formulation of the self-evident fact that one
cannot predict the future: By knowing merely the past history of the process, one cannot
expect (in the absence of prescience) to know when the process will be negative for the last
time. ♦
One of the most important facts regarding stopping times is that they may be combined
with submartingales to produce submartingales indexed by random times, a construct useful
in contexts where one may wish to stop a process when a certain goal is achieved. (Think of
a gambler who resolves to stop playing as soon as he has amassed sufficient winnings.) We
shall need these so-called stopped processes in the proof of Doob’s martingale convergence
theorem below.
The main result of the current subsection depends on the following notions: Let
Ω, F, (Fn ), P be a filtered probability space and let (Xn ) be a process adapted to (Fn ). If
τ is a stopping time taking values in N ∪ {∞}, then the stopped random variable Xτ is
defined by
X∞
Xτ (ω) := Xτ (ω) (ω)1{τ <∞} (ω) = 1{τ (ω)=j} Xj (ω).
j=1

The σ-field Fτ of events up to time τ is defined by

Fτ = {A ∈ F : A ∩ {τ ≤ n} ∈ Fn ∀ n ∈ N} .

That Fτ is indeed a σ-field is a straightforward calculation. For example, to see that Fτ is


closed under complements, note that if A ∈ Fτ , then Ac ∪ {τ > n} = (A ∩ {τ ≤ n})c ∈ Fn ,
hence Ac ∩ {τ ≤ n} = (Ac ∪ {τ > n}) ∩ {τ ≤ n} ∈ Fn .
18.5.6 Optional Sampling. If (Xn ) is a submartingale and τ and σ are stopping times with
σ ≤ τ , then Fσ ⊆ Fτ . If τ is bounded, then Xτ and Xσ are integrable and Xσ ≤ E(Xτ | Fσ ).
Proof. Let A ∈ Fσ . Then A ∩ {τ ≤ n} = A ∩ {σ ≤ n} ∩ {τ ≤ n} ∈ Fn , hence A ∈ Fτ .
Therefore, Fσ ⊆ Fτ .
For the second assertion, assume that τ (hence also σ) takes values in {1, 2, . . . , n}.
Suppose first that τ − σ ≤ 1. Then for A ∈ Fσ ,
Z XZ
n−1
(Xτ − Xσ ) dP = (Xj+1 − Xj ) dP.
A j=1 A∩{σ=j,τ =j+1}

Since
A ∩ {σ = j, τ = j + 1} = A ∩ {σ = j} ∩ {τ > j} ∈ Fj ,
the terms in the above sum are nonnegative by the submartingale property. Therefore,
Xσ ≤ E(Xτ | Fσ ). For the general case, define stopping times ρi = τ ∧ (σ + i) (0 ≤ i ≤ n).
Then σ = ρ0 ≤ ρ1 ≤ · · · ≤ ρn = τ and ρi+1 − ρi ≤ 1, hence, by the special case,
Z Z Z
Xσ dP ≤ Xρ1 dP ≤ · · · ≤ Xτ dP, A ∈ Fσ .
A A A

18.5.7 Corollary. Let (Xn ) be a submartingale and let τ be a stopping time. Then (Xn∧τ )
is a submartingale.
Proof. This follows immediately from 18.5.6 and the inequality n ∧ τ ≤ (n + 1) ∧ τ .
The process (Xn∧τ ) in the corollary is called the stopped process relative to (Xn ) and τ .
468 Principles of Analysis

Upcrossings
The martingale convergence theorem, proved in the next subsection, is one of the key
results in martingale theory. The proof is based on Doob’s notion of upcrossings, which
we now describe.
Let (xn ) be any sequence in R. Given real numbers a < b, define a sequence (τn ) with
values in N ∪ {∞} by
τ1 := inf{j ≥ 1 : xj ≤ a}, τ2 := inf{j > τ1 : xj ≥ b},
(18.8)
τ2n−1 := inf{j > τ2n−2 : xj ≤ a}, τ2n := inf{j > τ2n−1 : xj ≥ b}.
Here, as usual, we set inf ∅ = ∞. Clearly, the sequence (τn ) is increasing, τn ≥ n, and
xτ2n−1 ≤ a < b ≤ xτ2n if τ2n−1 < ∞.
From the definition we see that τ1 is the first time the sequence is below a, τ2 the first time
after τ1 that the sequence is above b, etc. It follows that τ2 is the time of the first upcrossing
of the interval [a, b], τ4 the time of the second upcrossing, and in general τ2k is the time of
n
the kth upcrossing. The number U[a,b] of upcrossings of the interval [a, b] up to time n by
the sequence (xn ) is the largest k for which τ2k ≤ n:
n
U[a,b] := sup{k : τ2k ≤ n}.
n
If the set in the definition is empty, we define U[a,b] = 0. Obviously,
n n
U[a,b] ≤ n, and k > U[a,b] iff τ2k > n.
The total number of upcrossings is defined as
n
U[a,b] = sup U[a,b] = sup{k : τ2k < ∞}.
n

(xn )

τ1 τ2 τ3 τ4 τ5 τ6
16
FIGURE 18.1: U[a,b] = 3.

The connection between upcrossings and convergence of the sequence (xn ) is given in the
following lemma.
18.5.8 Lemma. A sequence (xn ) of real numbers converges in R iff U[a,b] < ∞ for all
a, b ∈ Q with a < b.
Proof. Set α := limn xn and β := limn xn . If U[a,b] = ∞ for some a < b, then xn ≤ a for
infinitely many n and xn ≥ b for infinitely many n, hence α ≤ a < b ≤ β and so (xn ) cannot
converge in R. Conversely, if (xn ) does not converge in R, then there exist rationals a and b
such that α < a < b < β. It follows that xn < a infinitely often and xn > b infinitely often,
hence U[a,b] = ∞.
Probability Theory 469

Now consider a process X = (Xn ). For each ω and pair of real numbers a, b, we may apply
the above construction to the sequence (Xn (ω)) to obtain N ∪ {∞}-valued the functions τn ,
n
U[a,b] , and U[a,b] on Ω. It is easily established by induction that τn is a stopping time. For
example,

{τ1 = k} = {X1 > a} ∩ · · · ∩ {Xk−1 > a} ∩ {Xk ≤ a} ∈ Fk and


k−1
[
{τ2 = k} = {τ1 = j} ∩ {Xj+1 < b} ∩ · · · ∩ {Xk−1 < b} ∩ {Xk ≥ b} ∈ Fk .
j=1

n
In particular, U[a,b] and U[a,b] are F∞ -measurable. Here is the key result regarding upcross-
ings.
18.5.9 Upcrossing Inequality
 (Doob). Let (Xn ) be a submartingale on a filtered proba-
bility space Ω, F, (Fn ), P . Then, for any a < b,

n
 1  1 
E U[a,b] ≤ E (Xn − a)+ + |a| ≤ E |Xn | + 2|a| , n ∈ N.
b−a b−a

Proof. Set X0 = 0 and X en = (Xn − a)+ (n ≥ 0). By 18.5.3, (X en ) is a submartingale. Let


(τn (ω)) be defined as in (18.8) for the sequence (Xen (ω)) but with a replaced by 0 and b
e n en = Un .
replaced by c := b − a, and set τ0 = 0. Let U[0,c] be defined by (τn ). Clearly, U[0,c] [a,b]
Let 2k > n. Then τ2k > n, hence
2k k
X   X   k−1
X 
en − X
X e0 = Xeτ − e
X = Xe − e
X τ2j−1 ∧n +

X eτ ∧n .
−X
j ∧n τj−1 ∧n τ2j ∧n 2j+1 ∧n 2j
j=1 j=1 j=0

Denote the first sum on the right by S1 and the second by S2 . Now, if τ2j−1 ≥ n, then
Xeτ ∧n − X eτ = 0, and if τ2j−1 < n then Xτ2j−1 ≤ a so X eτ = 0. Therefore,
2j 2j−1 ∧n 2j−1
S1 includes all differences corresponding to the upcrossings of [0, c] up to time n, hence
S1 ≥ cU e n and so E S1 ≥ cE U e n . Moreover, by optional stopping, E S2 ≥ 0. Therefore,
[0,c] [0,c]
E(X en − Xe0 ) ≥ E S1 +E S2 ≥ cE U e n . Since −X
e0 = −(0−a)+ ≤ |a|, the desired inequalities
[0,c]
follow.

Convergence of Martingales
Throughout this subsection, (Xn ) is an adapted process on a filtered probability space
(Ω, F, (Fn ), P ). There are several important results on the convergence of martingales. One
of the most basic is the following:
18.5.10 Martingale Convergence Theorem (Doob). Let (Xn , Fn ) be a submartingale
such that supn kXn k1 < ∞. Then (Xn ) converges almost surely to an L1 random variable
X∞ . If, additionally, (Xn ) is uniformly integrable, then the convergence is in L1 .
Proof. By 18.5.9, for each n

n
 1  1 2|a|
E U[a,b] ≤ E |Xn | + 2|a| ≤ sup kXn k1 + .
b−a b−a n b−a
n
Since and U[a,b] ↑ U[a,b] , by the monotone convergence theorem

 1 2|a|
E U[a,b] ≤ sup kXn k1 + < ∞.
b−a n b−a
470 Principles of Analysis

Therefore, U[a,b] is finite a.s. By 18.5.8, (Xn ) converges a.s. to a measurable function
R R
X∞ : Ω → R. By Fatou’s lemma, |X∞ | dP ≤ limn |Xn | dP < ∞, hence X∞ ∈ L1 . This
proves the first part of the theorem. The last part follows from 4.4.5.
18.5.11 Corollary. Let (Xn ) be a submartingale such that Xn ≤ 0 for all n. Then (Xn )
converges almost surely to an L1 random variable X∞ .
Proof. By the submartingale property, E X1 ≤ E Xn , hence E |Xn | = −E Xn ≤ −E X1 and
so supn kXn k1 < ∞.
18.5.12 Corollary. Let 1 ≤ p < ∞ and let (Xn ) be a submartingale such that (|Xn |p ) is
uniformly integrable. Then (Xn ) converges almost surely and in Lp to an Lp random variable
X∞ .
Proof. By uniform integrability, supn kXn kp < ∞ (4.4.2). Since kXn k1 ≤ kXn kp , (Xn )
converges almost surely to an L1 random variable X∞ . By 4.4.5, the convergence is in Lp
norm.

18.5.13 Corollary. Let (Xn ) be a martingale such that (Xn ) is uniformly integrable. Then
(Xn ) converges almost surely and in L1 to a random variable X∞ with the property that
Xn = E(X∞ | Fn ) a.s. for all n.
Proof. All but the last assertion follows from the preceding corollary. For the desired equality,
let A ∈ Fn and note that for all m ≥ n,
Z Z Z Z
m→∞
Xn dP = Xm dP → X∞ dP = E(X∞ | | Fn ) dP
A A A A

so Xn = E(X∞ | Fn ) a.s.
S
18.5.14 Corollary. Let X ∈ L1 and denote by F∞ the σ-field generated by Fn . Then

E(X | Fn ) → E X | F∞ a.s. and in L1 .

Proof. Let Xn = E X | Fn . Then (Xn ) is a uniformly integrable martingale (18.5.1(c))
and supn kXn k1 ≤ kXk1 < ∞, hence (Xn ) converges a.s. and in L1 to some F∞ -random
variable X∞ . If A ∈ Fm and n > m, then
Z Z Z Z
 
E X | F∞ dP = X dP = E X | Fn dP → X∞ dP.
A A A A
R  R 
Therefore A
E X | F∞ dP = A
X∞ dP for all A ∈ F∞ and so X∞ = E X | F∞ a.s.

Recall that the L1 strong law of large numbers asserts that for independent and identically
distributed (iid)PL1 random variables Xn , the sample averages Sn /n tend to E(X1 ) a.s. or,
1 n
equivalently, n k=1 (Xn − E(Xn )) → 0 a.s. The following generalization removes the iid
requirement.
18.5.15 Corollary. Let supn kXn k1 < ∞. Set F0 = {∅, Ω}. Then
n
1X 
Xj − E(Xj | Fj−1 ) → 0 a.s.
n j=1
Probability Theory 471
Pn 
Proof. Set Yn = j=1 j −1
Xj − E(Xj | Fj−1 )]. Then

1  
Yn+1 − Yn = Xn+1 − E(Xn+1 | Fn ) ,
n+1

hence E Yn+1 − Yn | Fn = 0, that is, (Yn ) is a martingale. Since supn kYn k1 < ∞, (Yn )
converges a.s. to a random variable Y∞ . The conclusion now follows from Kronecker’s
lemma.

Reversed Martingales
A reversed filtration on a probability space (Ω, F, P ) is a sequence of sub-σ fields Fn
of F such that
· · · ⊆ Fn+1 ⊆ Fn · · · ⊆ F1 .
For example, if (Xn ) is a sequence of random variables, then Fn := σ(Xk : k ≥ n) defines a
reversed filtration.
Now let (Xn ) be an L1 process such that Xn is Fn -measurable for each n. Then (Xn , Fn )
is a reversed martingale if

E(Xn | Fn+1 ) = Xn+1 , n ≥ 1.

Iterating, we obtain
E(Xn | Fn+p ) = Xn+p . (18.9)
One may also formulate in an analogous way the notions of reversed submartingales and
reversed supermartingales. We consider only the martingale case.
Here is the reversed martingale analog of Doob’s convergence theorem. Note that the
hypothesis supn kXn k1 < ∞ is not needed in this setting.
18.5.16 Theorem. Let (Xn , Fn ) be a reversed martingale. Then there exists a random
variable X∞ such that
lim Xn = X∞ a.s. and in L1 .
n
T
Moreover, X∞ = E(X1 | F∞ ) a.s., where F∞ = n Fn .
Proof. We apply Doob’s upcrossing inequality to the number U n [a, b] of upcrossings of [a, b]
of the sequence Xn , Xn−1 , . . . , X1 . This gives

n
 1 
E U[a,b] ≤ E |X1 | + 2|a| .
b−a
n
Since U[a,b] ↑ U[a,b] , which is the number of upcrossings of the infinite sequence
· · · Xn , Xn−1 , · · · X1 , we see that U[a,b] < ∞ a.s. and so Xn → X∞ a.s., as before. Since
Xn = E(X1 | Fn ), (Xn ) is uniformly integrable (18.5.1(c)), hence the convergence is also
L1 .
Since Xm is Fn -measurable for all m ≥ n, X∞ is Fn -measurable for all n, that is, X∞ is
F∞ -measurable. Also, from E(X1 | Fn ) = Xn we have

E(X1 | F∞ ) = E E(X1 | Fn ) | F∞ = E(Xn | F∞ ),

hence Z Z Z
E(X1 | F∞ ) dP = Xn → X∞ dP, ∀ A ∈ F∞ .
A A A

Therefore, X∞ = E(X1 | F∞ ) a.s.


472 Principles of Analysis

For an application, consider an iid process (Xn ) with |E(X1 )| < ∞ and set Sn =
X1 + · · · + Xn and Yn = n−1 Sn − E(X1 ). Then (Yn ) is a reversed martingale with respect
to the reversed filtration Fn := σ(Xk : k ≥ n). Indeed, from

n(n + 1)(Yn+1 − Yn ) = nXn+1 − (X1 + · · · + Xn ),

and we see by applying (18.9) that

n(n + 1)E(Yn+1 − Yn | Fn+1 ) = nXn+1 − nXn+1 = 0

and so Yn+1 = E(Yn | Fn+1 ). Since E(Y1 ) = 0, the theorem implies that Yn → 0 a.s. and
in L1 , which is the law of large numbers with the added feature of L1 convergence (which
could also have been established originally.)

18.6 General Stochastic Processes


A d-dimensional stochastic process is an indexed family {Xi : i ∈ I} of d-dimensional
random variables Xi on some probability space (Ω, F, P ). We shall also use the notation
X(i) or X(i, ω). In applications, I is typically one of the sets N or [0, ∞). We have already
considered processes of the former type, so-called discrete-time processes. In this chapter we
consider general processes with particular emphasis on the important special case I = [0, ∞),
so-called continuous-time processes.
The set Rd in a d-dimensional process {Xi : i ∈ I} is called the state space of the
process. For each ω, the function i → Xi (ω) from I to Rd is called a path of the process.
For example, if Xt is the position at time t of a molecule of a gas, then a path t → Xt (ω)
represents one possible trajectory of the molecule. If Xt is the price of a stock at time t,
then a path symbolizes a particular market scenario.

The Consistency Conditions

Let {Xi : i ∈ I} be a d-dimensional process and set S = Rd and B = B(Rd ). For


each n-tuple (i1 , . . . , in ) of distinct members of I, let Pi1 ···in denote the distribution of
(Xi1 , . . . , Xin ), that is,

Pi1 ···in (B) = P (Xi1 , . . . , Xin ) ∈ B , B ∈ B(S n := S × · · · × S). (18.10)

The probability measures P(i1 ...,in ) are called the finite dimensional distributions of the
process X. Note that these distributions satisfy the following consistency conditions: For
all n and Bj ∈ B,
 
C1. Piτ 1 ···iτ n Biτ 1 × · · · × Biτ n = Pi1 ···in Bi1 × · · · × Bin ∀ permutation τ of (1, . . . , n).
 
C2. Pi1 ···in+1 (Bi1 × · · · × Bin × S = Pi1 ···in (Bi1 × · · · × Bin .

The problem we consider in this section is the converse: Given an index set I and a family
D(I) of finite dimensional distributions satisfying the above consistency conditions, find a
probability space (Ω, F, P ) and a process such that (18.10) holds for all members of D(I).
To construct such a process, we must first define the product measurable space S I for a
general measurable space (S, F).
Probability Theory 473

The Product of Measurable Spaces

Let I be an arbitrary index set, (S, F) an arbitrary measurable space, and S I the
collection of all functions f : I → S. For n ∈ N, let (S n , Fn ) = (S × · · · × S, F ⊗ · · · ⊗ F)
denote the n-fold product σ-field. In what follows, we consider finite sequences (i1 , . . . , in )
of distinct members in I. These will be called index sequences. For such sequences, we
write (i1 , . . . , in ) ⊆ (j1 , . . . , jp ) if {i1 , . . . , in } ⊆ {j1 , . . . , jp }. Define the projection map
corresponding to the index sequence (i1 , . . . , in ) by

πi1 ···in : S I → S n , πi1 ···in (f ) = f (i1 ), . . . , f (in ) .

A cylinder set over A ∈ Fn is a set of the form



πi−1
1 ···in
(A) = {f ∈ S I : f (i1 ), . . . , f (in ) ∈ A}.

18.6.1 Lemma. If (i1 , . . . , in ) ⊆ (j1 , . . . , jp ) and A ∈ Fn , then there exists A0 ∈ Fp such


that
πi−1
1 ···in
(A) = πj−1 1 ···jp
(A0 ).

Proof. Let τ be a permutation of {1, . . . , p} such that the first n coordinates of (jτ 1 , . . . , jτ p )
are i1 , . . . , in . Define
A0 = {(x1 , . . . , xp ) : (xτ 1 , . . . , xτ n ) ∈ A}.
Then A0 is the preimage of  A under a measurable
p
 mapping S → S , hence
n
 A0 ∈ F p .
Moreover, f (i1 ), . . . , f (in ) = f (jτ 1 ), . . . , f (jτ n ) ∈ A iff f (j1 ), . . . , f (jp ) ∈ A0 , so the
desired equation holds.
18.6.2 Corollary. Given cylinder sets

πi−1
1 ···in
(A) and πj−1
1 ···jm
(B), A ∈ Fn , B ∈ Fm , (18.11)

and (k1 , . . . , kp ) ⊇ (i1 , . . . , in ) ∪ (j1 , . . . , jm ), there exist A0 , B 0 ∈ Fp such that

πi−1
1 ···in
(A) = πk−1
1 ···kp
(A0 ) and πj−1
1 ···jm
(B) = πk−1
1 ···kp
(B 0 ). (18.12)

We denote the collection of all cylinder sets by C(I). Except in trivial cases, C(I) is not a
σ-field. However,
18.6.3 Proposition. C(I) is a field.

Proof. Consider the cylinder sets in (18.11) represented as in (18.12). Then

πi−1
1 ···in
(A) ∪ πj−1
1 ···jm
(B) = πk−1
1 ···kp
(A0 ∪ B 0 ),

which shows that C(I) is closed under finite unions. Since the complement of πi−1 1 ···in
(A) is
πi−1
1 ···in
(A c
), C(I) is also closed under complementation and hence is a field.

The σ-field generated by C(I) is called the product σ-field and is denoted by FI . The
equality
n
\
πi−1
1 ···in
(B 1 × · · · × B n ) = πi−1
j
(Bj )
j=1

shows that FI is also the σ-field generated by all the projection mappings πi . As a conse-
quence, we have
474 Principles of Analysis

18.6.4 Proposition. The projection mappings are measurable transformations. A function


T from a measurable space (Ω, G) to S I is measurable iff πi ◦ T : Ω → S is measurable for
every i ∈ I.
While the σ-field FI is adequate for many purposes, some important sets may not be
members of FI . The reason for this shortcoming is the following result, which shows that
members of FI are determined by countable subsets of I.
18.6.5 Proposition. For every member A of FI there is a countable subset JA of I
depending on A such that

f ∈ A, g ∈ S I , and f (j) = g(j) ∀ j ∈ JA ⇒ g ∈ A.

Proof. Let G denote the collection of all subsets A of S I with the stated property. We show
that FI ⊆ G. To this end, note first that G contains all sets of the form A := πi−1 (B), B ∈ F;
indeed, one need only take JA = {i}. Since these are generators for FI , the desired inclusion
will follow if we show that G is a σ-field.
Suppose A ∈ G and let f ∈ Ac , g ∈ S I . If f (j) = g(j) for all j ∈ JA , then g ∈ Ac ;
otherwise, g would lie in A forcing f to lie in A. Therefore, we may take JAc = JA , showing
S
that G is closed under complementation. Now let (An ) beSa sequence in G and set A := n An .
Let f ∈ A, g ∈ S I , such that f (j) = g(j) S for all j ∈ n JAn . Since f ∈ Am for some m,
g ∈ Am ⊆ A. Therefore we may take JA = n JAn . Since this is countable, A ∈ G.

18.6.6 Corollary. C[0, ∞) is not a member of B[0,∞) .

Proof. Suppose, for a contradiction, that C[0, ∞) ∈ B[0,∞) . By the proposition there exists
a countable subset D of [0, ∞) with the property
 
f ∈ C[0, ∞), g ∈ RI , and f (t) = g(t) ∀ t ∈ D ⇒ g ∈ C 0, ∞) .

Now take f ≡ 0 and g = 1{s} , where s 6∈ D. Then f = g on D, yet g is not continuous.

The Kolmogorov Extension Theorem

Set S := Rd and B := B(Rd ). Here is the main result of the section.


18.6.7 Theorem (Kolmogorov). Let I be an arbitrary nonempty index set and let D(I) be
a collection of finite dimensional probability distributions satisfying the consistency conditions
C1 and C2. Then there exists a unique probability measure P on the product space (S I , BI )
such that for every index sequence (i1 , . . . , in ),

P πi−1
1 ···in
(A) = Pi1 ···in (A), A ∈ Bn . (18.13)

Proof. Define P on C(I) by (18.13). To see that P is well-defined, suppose that πi−1 1 ···in
(A) =
πj−1
1 ···jm
(B). Represent these cylinder sets as in (18.12). Then A 0
= B 0
, and it follows from
the consistency conditions that

Pi1 ...in (A) = Pk1 ...kp (A0 ) = Pk1 ...kp (B 0 ) = Pj1 ...jm (B).

We show next that P is a probability measure on C(I). The conclusion of the theorem
will then follow from the measure extension theorem (1.6.4). Clearly P (S I ) = 1. To see that
P is finitely additive on C(I), represent disjoint cylinder sets as in (18.12). Then A0 and B 0
must be disjoint and

πi−1
1 ···in
(A) ∪ πj−1
1 ···jm
(B) = πk−1
1 ···kp
(A0 ∪ B 0 ),
Probability Theory 475

hence

P πi−1
1 ···in
(A) ∪ πj−1
1 ···jm
(B) = Pk1 ···kp (A0 ∪ B 0 ) = Pk1 ···kp (A0 ) + Pk1 ···kp (B 0 )
 
= P πi−11 ···in
(A) + P πj−1
1 ···jm
(B) .

Since C(I) is a field, and P is finitely additive, P is monotone. It remains to show that if
(An ) is a sequence in C(I) and An ↓ ∅, then P (AnT ) → 0. Let r := limn P (An ). We show that
the assumption r > 0 implies the contradiction n An 6= ∅.TNow, by 18.6.1, it is possible,
without affecting monotonicity or changing the intersection n An , to precede the sequence
(An ) by terms S I and to insert duplicate terms Aj . Thus we may assume there exists an
infinite sequence of distinct indices in such that

An = πI−1
n
(Bn ), where In = (i1 , . . . , in ) and Bn ∈ Bn .

By regularity, choose a compact set Cn ⊆ Bn with PIn (Bn \ Cn ) < r/2n+1 and set
n
\
Dn = πI−1
n
(Cn ) and En = Dj .
j=1

Then by monotonicity
[
n  Xn n
X
P (An \ En ) = P (An \ Dj ) ≤ P (Aj \ Dj ) = PIj (Bj \ Cj ) ≤ r/2,
j=1 j=1 j=1

and since En ⊆ An we see that P (En ) ≥ P (An ) − r/2 ≥ r/2 > 0. Therefore, En 6= ∅.
Choosing fn ∈ En we have

fn (i1 ), . . . , fn (in ) ∈ Cj , 1 ≤ j ≤ n.

In particular, fn (i1 ) ∈ C1 for all n ≥ 1, and since C1 is compact there exists a subsequence
(1) (1) n (1) (1)
(fn ) of (fn ) such that fn (i1 ) → x1 ∈ C1 . Likewise, since (fn (i1 ), fn (i2 )) ∈ C2 for all
(2) (1) (2) (2) n
n ≥ 2, exists a subsequence (fn ) of (fn ) such that (fn (i1 ), fn (i2 )) → (x1 , x2 ) ∈ C2 .
(k)
By induction we may construct successive subsequences (fn ) such that for all k,
 n
fn(k) (i1 ), fn(k) (i2 ), . . . , fn(k) (ik ) → (x1 , x2 , . . . , xk ) ∈ Ck . (†)
(n)
For each k, the diagonal sequence (fn (ik ))n then converges to xk . Now choose any f
such that f (ij ) = xj for all j. Then by (†), f ∈ Dk ⊆ Ak for all k, which is the desired
contradiction. This proves that P is a probability measure on C(I).
Taking Xi to be the projection map πi , we now have the following resolution to the
problem stated at the beginning of the section.

18.6.8 Corollary. Given D(I) as above, there exists a probability space (Ω, F, P ) and
a family of Rd -valued random variables such that 18.10 holds for every finite sequence
(i1 , . . . , in ).
The following version of the theorem is useful in the important special case I = (0, ∞)
and d = 1.

18.6.9 Corollary. Suppose that for each finite ordered sequence t1 < t2 < · · · < tn in (0, ∞)
476 Principles of Analysis

there exists a probability distribution Pt1 ···tn with cdf Ft1 ···tn such that for all n and k with
1 ≤ k ≤ n,

lim Ft1 ···tn (x1 , . . . , xn ) = Ft1 ···tk−1 tk+1 ···tn (x1 , . . . , xk−1 , xk+1 , . . . , xn . (18.14)
xk →∞

Then there exists a unique probability measure P on the product space (R(0,∞) , B(R)(0,∞) )
such that for every sequence t1 < t2 < · · · < tn ,

P πt−1
1 ···tn
(A) = Pt1 ···tn (A) ∀ A ∈ B(Rn ).

Proof. The hypothesis implies that for all Bj ∈ B(R) and t1 < t2 < · · · < tn ,

Pt1 ···tk−1 tk+1 ,···tn (B1 × · · · × Bk−1 × Bk+1 · · · × Bn

= Pt1 ···tk−1 tk tk+1 ,...,tn ) (B1 × · · · × Bk−1 × R × Bk+1 × · · · × Bn . (†)

The idea is to enlarge the collection of probability distributions to include all index sequences
(s1 , . . . , sn ) and then apply the extension theorem. This is accomplished as follows: Given
an arbitrary sequence (s1 , . . . , sn ) of distinct sj , define

Ps1 ···sn (B1 × · · · × Bn ) = Psτ 1 ···sτ n (Bτ 1 × · · · × Bτ n ),

where τ is the unique permutation of (1, . . . , n) that orders s1 , . . . , sn , that is, that produces
the natural ordering sτ 1 < · · · < sτ n . If σ is any permutation of (1, . . . , n) and τ is the
permutation that orders sσ1 , . . . , sσn , then τ σ is the permutation that orders s1 , . . . , sn ,
hence

Psσ1 ···sσn (Bσ1 × · · · × Bσn ) = Psτ σ1 ···sτ σn (Bτ σ1 × · · · × Bτ σn ) = Ps1 ···sn (B1 × · · · × Bn ).

This shows that consistency condition C1 holds. To verify C2, we must show that

Ps1 ···sn (B1 × · · · × Bn−1 × R) = Ps1 ···sn−1 (B1 × · · · × Bn−1 .

But a permutation that orders s1 , . . . , sn then places R in some position k, and an application
of (†) yields the desired equation.

18.7 Brownian Motion


The phrase Brownian motion, in the classical sense, refers to a phenomenon discovered
in 1827 by the Scottish botanist Robert Brown, who observed that microscopic particles
suspended in a fluid (liquid or gas) exhibited highly irregular motion characterized by
seemingly independent random movements. Later it was determined that this motion
resulted from collisions of the particles with molecules in the ambient fluid. In 1905, Albert
Einstein gave a physical interpretation of Brownian motion. A rigorous mathematical model
of Brownian motion was developed in the 1920s by Norbert Wiener. The model, known as
a Brownian motion process or a Wiener process, has come to play an indispensable role
in many areas of pure and applied mathematics. For example, in pure mathematics the
process has spawned the study of continuous time martingales and stochastic calculus. In
applied mathematics the Wiener process is used as mathematical model of “white noise.”
In mathematical finance, geometric Brownian motion is the fundamental component in the
Probability Theory 477

Black-Scholes model for option pricing (discussed in §18.9). In the current section we consider
a one-dimensional version of (mathematical) Brownian motion, which may be viewed as a
model for the motion of a Brownian particle projected onto a vertical axis.
For a mathematical description of Brownian motion, we need to extend some earlier
terminology. A (continuous-time) filtration on a probability space (Ω, F, P ) is a family of
σ-fields Ft indexed by t ∈ [0, ∞) such that Fs ⊆ Ft ⊆ F for all 0 ≤ s ≤ t. A probability space

with a filtration is called a filtered probability space and is denoted by Ω, F, (Ft )t≥0 , P .
As in the case of discrete time, a filtration (Ft ) may be viewed as a mathematical model for
ever more precise information produced by an experiment evolving in time. An important
example is the natural filtration FX = (FX X
t ) of a process X = (Xt ), where Ft is the
σ-field σ(Xs : 0 ≤ s ≤ t), which consists precisely of the information revealed by the process
up to time t. A stochastic process (Xt ) is said to be adapted to a filtration (Ft ) if for all
t the random variable Xt is Ft -measurable. For example, a process is always adapted to its
natural filtration, but there may be reason to consider larger filtrations.
A (one-dimensional) Brownian motion or Wiener process on a filtered probability
space (Ω, F, (Ft )t≥0 , P) is a stochastic process W = (W (t))t≥0 adapted to (Ft )t≥0 such that
the following conditions hold:

(a) W0 = 0;

(b) For 0 ≤ s < t, the increment W (t) − W (s) is normal with mean zero and variance t − s,
that is,
Z h
 1 x2 i
P W (t) − W (s) ∈ B = p exp − dx, B ∈ B(R).
2π(t − s) B t−s

(c) For 0 ≤ s < t, W (t) − W (s) is independent of Fs .

(d) The paths t → Wt (ω) of W are continuous.

Note that W (t) has independent increments, that is, if 0 < t1 < t2 < · · · < tn then
the random variables W (t1 ), W (t2 ) − W (t1 ), . . . , W (tn ) − W (tn−1 ) are independent. This
follows by induction from (c) and the fact that (Wt ) is adapted to (Ft ).

Construction of Brownian Motion


The existence of a process with properties (a)–(c) is a consequence of the Kolmogorov
extension theorem: For 0 < t1 < · · · < tn set
n
Y  
1 [∆xk ]2
ft1 ···tn (x1 , · · · , xn ) = √ exp − ,
2π∆tk 2∆tk
k=1

where ∆xk = xk − xk−1 and ∆tk = tk − tk−1 (x0 = t0 = 0). Then ft1 ···tn is a density for an
n-dimensional cdf Ft1 ···tn that satisfies the consistency condition (18.14). By 18.6.9 there
exists a probability space (Ω, F, P ) and a process (Xt ) such that
Z

P (Xt1 , . . . , Xtn ) ∈ B = ft1 ···tn (x) dx, B ∈ B(Rn ).
B
478 Principles of Analysis

Taking B = B1 × · · · × Bn and setting xj = y1 + · · · + yj we have, by a substitution,



P (Xt1 , Xt2 − Xt1 , · · · Xtn − Xtn−1 ) ∈ B
Z
= 1B (x1 , x2 − x1 , . . . , xn − xn−1 )ft1 ···tn (x) dx
Z n
Y 
1 y2
= 1B (y1 , y2 , . . . , yn )exp − k√ dy
2π∆tk 2∆tk
k=1
n Z
Y  
1 yk2
= 1Bk (yk ) √ exp − dyk
2π∆tk 2∆tk
k=1
Yn

= P Xtk − Xtk−1 ∈ Bk .
k=1

This shows that (Xt ) satisfies (b) and (c) of the definition of Brownian motion. Setting
X0 = 0 completes the construction.
It remains to show that there exists a continuous process satisfying (a)–(c). The idea is
to modify the process (Xt ) obtained in the preceding paragraph on a set of probability zero
to produce the desired continuous process. This is accomplished by the following general
theorem. The proof depends on the density of the dyadic rationals j/2n (j ≥ 0, n ≥ 1) in
[0, ∞).
18.7.1 Theorem. Let g(t) and h(t) be nonnegative even functions on some interval (−a, a)
that are increasing on (0, a) such that g is continuous at zero and the series

X ∞
X
g(2−n ) and n2n h(2−n )
n=1 n=1

converge. Let (Xt ) be a stochastic process on a probability space (Ω, F, P ) that satisfies
 
P |Xt − Xs | ≥ g(t − s) ≤ h(t − s) whenever t, s ≥ 0 and |t − s| < a.

Then there exists a process (Yt ) on (Ω, F, P ) with continuous paths such that for any t,
Yt = Xt a.s. (the exceptional set depending on t). In particular, Y and X have the same
finite-dimensional distributions.
Proof. Set tn,j = j2−n (n = 1, 2, . . . , j = 0, 1, . . .). For each n and ω, imbed in the path
t → Xt (ω) a polygonal line Xn (·, ω) with vertices (tn,j , X(tn,j , ω)):
 
Xn (t, ω) := X(tn,j , ω) + 2n (t − tn,j ) X tn,j+1 , ω) − X(tn,j , ω) , tn,j ≤ t ≤ tn,j+1 .

This defines a sequence of processes Xn (·). Note that since tn+1,2j = tn,j ,
 
Xn (t, ω) = X(tn+1,2j , ω) + 2n (t − tn+1,2j ) X tn+1,2j+2 , ω) − X(tn+1,2j , ω) .

A direct calculation shows that if tn,j ≤ t ≤ tn,j+1 , then


 
Xn+1 (t, ω) − Xn (t, ω) ≤ X tn+1,2j+1 , ω) − 1 X(tn+1,2j , ω) + X(tn+1,2j+2 , ω) . (α)
2

The figure below illustrates the idea. Here, A and B are consecutive points of the polygon
for the process Xn (·), and C is the interpolation point required to pass to the polygon for
the process Xn+1 (·).
Probability Theory 479
C = (tn+1,2j+1 , X(tn+1,2j+1 , ω))

(t, Xn+1 (t, ω))

B = (tn,j+1 , X(tn,j+1 , ω))


A+B = (tn+1,2j+2 , X(tn+1,2j+2 , ω))
2
(t, Xn (t, ω))
A = (tn,j , X(tn,j , ω))
= (tn+1,2j , X(tn+1,2j , ω))

FIGURE 18.2:

From (α) we have



Mn,j (ω) := sup{ Xn+1 (t, ω) − Xn (t, ω) : tn,j ≤ t ≤ tn,j+1 }


≤ 12 X tn+1,2j+1 , ω) − X(tn+1,2j , ω) + 12 X tn+1,2j+1 , ω) − X(tn+1,2j+2 , ω) . (β)

Since the processes Xn have continuous paths, Mn,j (ω) may be calculated as the supremum
over the rational interval [tn,j , tn,j+1 ] ∩ Q, hence Mn,j is measurable. Moreover, from (β),
  
P Mn,j ≥ g 2n+1 ≤ P X tn+1,2j+1 , ω) − X(tn+1,2j , ω) ≥ g(2n+1 )
 
+ P X tn+1,2j+1 , ω) − X(tn+1,2j+2 , ω) ≥ g(2(n+1) )

≤ 2h 1/2n+1 , (γ)

the last inequality holding for all n with 1/2n+1 < a. Now set

Mn (ω) := sup{ Xn+1 (t, ω) − Xn (t, ω) : 0 ≤ t ≤ n}.
Sn2n −1
Since [0, n] = j=0 [tnj , tnj+1 ],
n
  n2[−1  
Mn ≥ g 1/2n+1 ⊆ Mn,j ≥ g 1/2n+1 ,
j=0

hence from (c)


n
−1 n
  n2X o 
P Mn ≥ g 1/2n+1 ≤ P Mn,j ≥ g 1/2n+1 ≤ (n + 1)2n+1 h 1/2n+1 .
j=0

P∞ 
It follows from the hypothesis that the series n=1 P Mn ≥ g 1/2n+1 converges. By the
Borel-Cantelli lemma we then have
n o
P (A) = 0, where A := lim Mn ≥ g 1/2n+1 .
n
480 Principles of Analysis

Now, let ω ∈ Ac and b > 0. For any p ∈ N, t ∈ [0, b], and all sufficiently large n we have
p p
X X
Xn+p (t, ω) − Xn (t, ω) ≤ Xn+k (t, ω) − Xn+k−1 (t, ω) ≤ Mn+k−1 (ω)
k=1 k=1
Xp
n+k

≤ g 1/2 .
k=1
P∞ 
Since the series k=1 g 1/2k converges, the preceding inequality implies that the sequence
(Xn (t, ω)) is uniformly Cauchy on [0, b] and therefore converges uniformly on [0, b]. Now
define (
limn Xn (t, ω) ω 6∈ A,
Y (t, ω) =
0 ω ∈ A.
Then (Yt ) has continuous paths and Y (j/2n , ω) = X(j/2n , ω) for all ω ∈ Ac , n ∈ N, and
j ∈ Z+ .
It remains to show that Yt = Xt a.s. This is clear if t is a dyadic rational. For arbitrary
t, choose a sequence of dyadic rationals sn so that 0 ≤ t − sn < 2−n . Since g and h are
increasing,
 
P |X(sn , ω) − X(t, ω)| ≥ g(1/2n ) ≤ P |X(sn , ω) − X(t, ω)| ≥ g(t − sn )
≤ h(t − sn ) ≤ h(1/2n ).
P
Since the series n h(1/2n ) converges, by the Borel-Cantelli lemma again we have

P (B) = 0, where B := lim |X(sn , ω) − X(t, ω)| ≥ g(1/2n ) .
n

If ω 6∈ B, then, eventually, |X(sn , ω) − X(t, ω)| ≤ g(1/2n ) → 0, hence X(sn , ω) → X(t, ω).
Therefore, Y (t, ω) = X(t, ω) for ω ∈ (A ∪ B)c .
18.7.2 Theorem. Brownian motion exists.
Proof. Take X to be any process satisfying (a)–(c) of the definition of Brownian motion.
Define r !
1/4 2 1/4 −1
g(t) = |t| and h(t) = t exp p .
π 2 |t|
We show that these functions satisfy the hypotheses of 18.7.1. We may then apply that
theorem to obtain a continuous version of X, which is the desired Brownian motion.
The functions g and h are clearly even and increasing in t > 0. Moreover,
∞ ∞ ∞ r ∞  n/2 
X X 1 X 2 X 3n/4 −2
n n n
g(1/2 ) = n/4
< ∞ and n2 h(1/2 ) = n2 exp < ∞.
n=1 n=1
2 n=1
π n=1
2

We claim that for t 6= s,



P |Xt − Xs | ≥ g(|t − s|) ≤ h(|t − s|),

which may be written


Z    
1 x2 1/4 −1
exp − dx ≤ |t − s| exp . (†)
2|t − s|1/2 |x|≥|t−s|1/4 2|t − s| 2|t − s|1/2
Probability Theory 481

Making the substitution y = x|t − s|−1/2 , setting z := |t − s|−1/4 , and integrating by parts,
the left side of the inequality becomes
Z ∞ Z ∞ Z ∞
1 d 1 1
exp(−y 2 /2) dy = − exp(−y 2 /2) dy = exp(−z 2 /2) − 2
exp(−y 2 /2) dy
z z y dy z z y
1 2
≤ exp(−z /2),
z
which is (†).

Non-Differentiability of Brownian Paths


The following result models some aspects of the behavior observed by Brown in his
experiments with suspended particles.
18.7.3 Theorem. Let Wt be a Brownian motion. Then there exists a set B of probability
one such that, for each ω ∈ B, the path t → Wt (ω) is nowhere differentiable.
Proof. We follow the argument in [15]. Let ω ∈ Ω. If t → W (t, ω) has a derivative at some
point s and if a > |W 0 (s, ω)|, then for all n ≥ some n(ω, s, a)
|W (t, ω) − W (s, ω)| ≤ a|s − t| whenever |t − s| < 1/2n−1 . (†)

S the set of all ω such that (†) holds for some s and for all n ≥ n(ω, s, a).
For a > 0, let Ea denote
Thus if ω 6∈ E := a∈Q+ Ea , then W (·, ω) is nowhere differentiable. S We show that there
exists a set Aa of probability zero such that Ea ⊆ Aa . Setting A := a∈Q+ Aa we then have
E ⊆ A, P (A) = 0, and W (·, ω) is nowhere differentiable for every ω ∈ Ac ⊆ E c , completing
the proof.
Set tj,n := j/2n and

Mn,k := max W (tk+j,n ) − W (tk+j−1,n ) .
1≤j≤3

The increments in the definition are independent and have the distribution of W (1/2n ),
which is the same as that of 2−n/2 W (1), these being normally distributed with mean zero
and variance 1/2n . Thus if g denotes the standard normal density, then
Z 2n/2 ε !3
3 n/2

P (Mn,k ≤ ε) = P |W (1)| ≤ 2 ε ≤ g(t) dt ≤ (21+n/2 ε)3 .
−2n/2 ε

Setting Mn = min 0≤k<n2n Mn,k we have


n
n2
X −1
P (Mn ≤ ε) ≤ P (Mn,k ≤ ε) ≤ n2n (21+n/2 ε)3 . (‡)
k=0

Now let ω ∈ Ea , so that (†) holds for some s and for all n > n(ω, s, a). We assume that
s > 0. (A separate, one-sided argument may be given for the case s = 0.) Then, for each
sufficiently large n, there exists k ≥ 0 such that tk+1,n ≤ s < tk+2,n . It follows that, for
0 ≤ j ≤ 3, |tk+j,n − s| < 1/2n−1 , hence, by (†), |W (tk+j,n , ω) − W (s, ω)| ≤ a/2n−1 . By the
triangle inequality, Mn,k (ω) ≤ a/2n−2 . Taking n > s we have k < k + 1 ≤ 2n s ≤ n2n , hence
Mn (ω) ≤ a/2n−2 . We have shown that
Ea ⊆ Aa := lim An , where An := {Mn ≤ a/2n−2 }.
n

n 1+n/2
By (‡), P (An ) ≤ n2 (2 a/2n−2 )3 → 0. It follows that P (Aa ) = 0, completing the
proof.
482 Principles of Analysis

Variation of Brownian Paths


A useful way to quantify the volatile behavior of Brownian motion is by the variation of
its paths, defined as follows: Let [a, b] be a fixed interval and let
P = {a = t0 < t1 < · · · < tn = b}
be a partition of [a, b]. For p > 0, define the pth variation of W over P as the random
variable
Xn
(p)
VP = |∆W (tj )|p , ∆W (tj ) := W (tj ) − W (tj−1 ).
j=1

A path t 7→ Wt (ω) is said to have bounded (unbounded) pth variation on [a, b] if the
(p)
quantities VP (ω), taken over all partitions P, form a bounded (unbounded) set of real
numbers. By 18.7.3 and 5.5.8 we have
18.7.4 Proposition. With probability one, the paths of Brownian motion have unbounded
first variation on every interval [a, b].
It may be shown that the paths of Brownian motion have unbounded p variation for all
p ≤ 2 [47]. This state of affairs is partially redeemed by the following important result.
(2)
18.7.5 Theorem. lim||P||→0 VP = b − a in L2 (P ).
Proof. Given a partition P as above, define
n
X
(2)
AP = VP (W ) − (b − a) = Dj , Dj := (∆Wtj )2 − ∆tj .
j=1

6 k we have E(Dj Dk ) = (E Dj )(E Dk ), which equals zero


By independent increments, for j =
since ∆Wk has variance ∆tk . Therefore,
X n−1
X n−1
X ∆Wt
E(A2P ) = E(Dj Dk ) = E Dj2 = E(Zj2 − 1)2 (∆tj )2 , Zj := p j . (18.15)
j,k j=0 j=0
∆tj

Since Zj is normal with mean zero and variance one, the quantity c := E(Zj2 − 1)2 is finite,
as may be verified by expressing c as an integral, using the standard normal density. We
now have
n−1
X
E(A2P ) ≤ c||P|| ∆tj = c||P||(b − a).
j=0

Letting ||P|| → 0 forces E(A2P ) → 0, which is the assertion of the theorem.


(p)
18.7.6 Corollary. For p > 2, lim||P||→0 VP = 0 a.s.
Proof. By the theorem, for any sequence of partitions with mesh tending to zero there exists
(2) a.s.
a subsequence (Pn ) such that VPn → b − a. The inequality
n−1
X
(p) (2)
VPn = |∆Wj |p−2 |∆Wj |2 ≤ max |Wtj − Wtj−1 |p−2 VPn
j
j=0

(2)
and the uniform continuity of the paths of W on [a, b] imply that limn VPn = 0 a.s.
(2)
The L2 limit lim||P||→0 VP is called the quadratic variation of Brownian motion on the
interval [a, b]. That Brownian motion has nonzero quadratic variation on any interval is a key
property of Brownian motion that accounts for some of the differences between stochastic
calculus, discussed below, and classical calculus.
Probability Theory 483

Brownian Motion as a Martingale


The definitions of discrete-time
 martingales carry over in a natural way to the continuous
case: Let Ω, F, (Ft ), P be a filtered probability space and let (Xt ) be an L1 process adapted
to (Ft ). Then (Xt ) is said to be a
• supermartingale if Xs ≥ E(Xt | Fs ) for all 0 ≤ s < t,
• submartingale if Xs ≤ E(Xt | Fs ) for all 0 ≤ s < t,
• martingale if Xs = E(Xt | Ft ) for all 0 ≤ s < t.

The continuous-time analogs of 18.5.2, 18.5.3, and 18.5.4 hold and are proved as before.
A martingale convergence theorem for continuous time is established below.
The following examples are taken relative to the natural (Brownian) filtration FW .
18.7.7 Examples. (a) Brownian motion (Wt ) is a martingale. Indeed, since Wt − Ws is
independent of FsW for all s ≤ t, E(Wt − Ws | FW
s ) = E(Wt − Ws ) = 0.

(b) The process Wt2 − t t≥0 is a martingale: For 0 ≤ s ≤ t write

Wt2 = [(Wt − Ws ) + Ws ]2 = (Wt − Ws )2 + 2Ws (Wt − Ws ) + Ws2 .

Taking conditional expectations and using linearity and the factor and independence proper-
ties yields

E(Wt2 | FW 2 2 2
s ) = E(Wt − Ws ) + 2Ws E(Wt − Ws ) + Ws = t − s + Ws .


(c) The exponential process exp(aWt − a2 t/2) t≥0 is a martingale. This follows from the
calculation (for t > s)
2
E(eaWt | FWs )=e
aWs
E(eaWt −aWs | FW s )=e
aWs
E(ea(Wt −Ws ) ) = eaWs +a (t−s)/2
.

The last equality is seen as follows: Set σ = t − s. Then
Z ∞ 
a(Wt −Ws ) 1 x2 
E(e )= √ exp ax − 2 dx
σ 2π −∞ 2σ
Z   2
exp(a2 σ 2 /2) ∞
1 x−a 
= √ exp − dx
σ 2π −∞ 2 σ
= exp(a2 σ 2 /2). ♦

Here is a continuous time analog of the martingale convergence theorem (18.5.10). As


with the latter, Doob’s notion of upcrossing figures prominently in the proof.
18.7.8 Martingale Convergence Theorem (Doob). Let (Xt )t≥0 be a submartingale on a
filtered probability space (Ω, F, (Ft )t≥0 , P ) such that supt≥0 kXt k1 < ∞. Then Xt converges
almost surely to an L1 random variable X∞ as t → ∞. If, additionally, (Xt ) is uniformly
integrable, then the convergence is in L1 .

Proof. For fixed m, let R =: {r1 < · · · < rn } be a finite sequence of rationals contained in
R
[0, m] and let U[a,b] be the number of upcrossings of Xr1 , . . . , Xrn of [a, b]. By the upcrossing
lemma,
R 1 
E U[a,b] ≤ E |Xm | + 2|a| .
b−a
484 Principles of Analysis
m R
Let U[a,b] denote the supremum of U[a,b] over all sets R. A sequence (Rk ) of such sets
Rk
increases to [0, m] ∩ Q, hence U[a,b] m
↑ U[a,b] . By the monotone convergence theorem,

m 1  1  
E U[a,b] ≤ E |Xm | + 2|a| ≤ sup kXt k1 + 2|a| < ∞.
b−a b−a t
m
Now let U[a,b] = supm U[a,b] . By the monotone convergence theorem again,

1  
E U[a,b] ≤ sup kXt k1 + 2|a| .
b−a t
In particular, U[a,b] is finite a.s. Set
[
Sa,b := {ω : lim Xt (ω) < a < b < lim Xt (ω)} and S := Sa,b
t→∞ t→∞
a,b∈Q,a<b

For ω ∈ Sa,b , there exists a strictly increasing sequence (rn ) in Q+ tending to ∞ such that

Xr2n−1 (ω) < a < b < Xr2n (ω),

implying that U[a,b] (ω) = ∞. Since U[a,b] is finite a.s., P (Sa,b ) = 0 and so P (S) = 0. One
argues as in the discrete case that X∞ (ω) := limt→∞ Xt (ω) exists in R for each ω ∈ S c .
Setting X∞ = 0 on S and letting t → ∞ through Q, we may apply Fatou’s lemma as in the
discrete case to obtain X∞ ∈ L1 (Ω, F∞ , P ).
The continuous time analogs of 18.5.11–18.5.14 are valid. The proofs follow from 18.7.8
in much the same way as before. The continuous time notion of reversed martingale may be
formulated as in the discrete case, and a martingale convergence theorem may be proved in
this setting.

18.8 Stochastic Integration


A subject that has attracted a great deal of interest over recent years is an extension of
classical calculus called stochastic calculus, loosely described as the calculus of continuous
time processes. The basis for this calculus is the Ito integral, which we construct in this
section. We assume throughout that W is a Brownian motion on a filtered probability space
(Ω, F, (Ft ), P ).

The Ito Integral of a Step Process


An Ito step process on a subinterval [a, b] of [0, ∞) is a process of the form
n
X
ft (ω) = f (t, ω) = ξj−1 (ω)1[tj−1 ,tj ) (t), t0 = a < t1 < · · · < tn = b, (18.16)
j=1

where ξj−1 is Ftj−1 -measurable and ξj−1 ∈ L2 (Ω). The Ito integral of f on [a, b] is defined
by
Z b n
X
Iab (f ) = f (t) dW (t) := ξj−1 ∆W (tj ), where ∆W (tj ) := W (tj ) − W (tj−1 ).
a j=1
Probability Theory 485

Note that by refining the partition in (18.16) one still has an Ito step process, and the Ito
integral is unchanged. For example, if a point s is inserted into (a, t1 ), then
n
X
f (t, ω) = ξ0 (ω)1[a,s) (t) + ξ0 (ω)1[s,t1 ) (t) + ξj−1 (ω)1[tj−1 ,tj ) (t).
j=2

In particular, if g(t, ω) is another Ito step process, then, by taking the common refinement
of the partitions, one may assume that f and g are defined on the same partition. It follows
that the collection S[a, b] of all Ito step processes on [a, b] is a linear space and Iab is a linear
map on S[a, b]. Moreover, by Fubini’s theorem,
Z n
X
2
kf kL2 ([a,b]×Ω) = E(ft2 ) dt = 2
(tj − tj−1 )E(ξj−1 ) < ∞, (18.17)
[a,b] j=1

hence S[a, b] ⊆ L2 ([a, b] × Ω).


The following proposition shows that Iab : S[a, b] → L2 (Ω) is an isometry.
18.8.1 Proposition. Let f be as in (18.16). Then Iab (f ) has mean zero and variance
b 2 2
Ia (f ) 2 = kf kL2 ([a,b]×Ω) . (18.18)
L (Ω)

Proof. For the first assertion, we have

E(ξj−1 ∆W (tj ) | Ftj −1 ) = ξj−1 E(∆W (tj | Ftj −1 ) = ξj−1 E(∆W (tj )) = 0.

Here we have used the independence and factor properties of conditional expectation and
the fact that Brownian increments have mean zero. Taking expectations yields
n
X 
E Iab (f ) = E ξj−1 ∆W (tj ) = 0.
j=1

To verify (18.18), note that


X n
X
Iab (f )2 = ξi−1 ξj−1 ∆W (ti )∆W (tj ) + 2
ξj−1 [∆W (tj )]2 ,
i6=j j=1

hence
X n
 X 
E[Iab (f )]2 = E ξi−1 ξj−1 ∆W (ti )∆W (tj ) + 2
E ξj−1 [∆W (tj )]2 . (α)
i6=j j=1

If i < j, then, by conditioning and using the factor and independence properties again, we
have
 
E ξi−1 ξj−1 ∆W (ti )∆W (tj ) = E E(ξi−1 ξj−1 ∆W (ti )∆W (tj | Ftj−1 )

= E ξi−1 ξj−1 ∆W (ti )E(∆W (tj ) | Ftj−1 )
 
= E ξi−1 ξj−1 ∆W (ti ) E ∆W (tj )
= 0. (β)

Similarly,
2
  
E ξj−1 [∆W (tj )]2 = E E(ξj−1
2
[∆W (tj )]2 | Ftj−1 ) = E ξj−1
2
E([∆W (tj )]2 | Ftj−1 )
2
= E(ξj−1 )(tj − tj−1 ). (γ)
486 Principles of Analysis

Thus from (α), (β), (γ)


n
X 2
E[Iab (f )]2 = 2
E(ξj−1 )(tj − tj−1 ) = kf kL2 ([a,b]×Ω) ,
j=1

the last equality by (18.17).

The General Ito Integral

By (18.18), the mapping Iab : S[a, b] → L2 (Ω) extends to an isometry on the closure
cl S[a, b] of S[a, b] in L2 ([a, b] × Ω). This defines the Ito integral for functions f ∈ cl S[a, b]:
Z b Z b Z b
Iab (f ) = f dW = f (t) dW (t) := lim fn (t) dW (t), f ∈ cl S[a, b],
a a n a

where (fn ) is a sequence in S[a, b] such that fn → f in L2 ([a, b] × Ω), that is,
Z
E[fn (t) − f (t)]2 dt → 0.
[a,b]

By the isometric property,


Z b 2 Z b
E f dW = E[Iab (f )]2 = E[f 2 (t)] dt, (18.19)
a a

hence, by the polarization identity,


Z b  Z b  Z b
E f dW g dW = E[f (t)g(t)] dt. (18.20)
a a a

18.8.2 Proposition. Let f ∈ cl S[a, b] and a < c < b. Then


Z b Z c Z b
f (t) dW (t) = f (t) dW (t) + f (t) dW (t).
a a c

Proof. Assume first that f ∈ S[a, b], as given in 18.16. If c ∈ [tk − 1, tk ), then
Z b n
X
f (t) dW (t) = ξj−1 1[tj−1 ,tj ) (t)ξj−1 ∆W (tj ) + ξk−1 [W (c) − W (tk−1 )]
a j<k
n
X
+ ξk−1 [W (tk ) − W (c))] + ξj−1 1[tj−1 ,tj ) (t)ξj−1 ∆W (tj )
j>k
Z c Z b
= f (t) dW (t) + f (t) dW (t).
a c
R
In the general case, let fn ∈ S[a, b] such that [a,b] E(fn (t) − f (t))2 dt → 0. Then clearly
R
fn [a,c] and fn [c,b] are Ito step functions, and both [a,c] E(fn (t) − f (t))2 dt → 0 and
R
[c,b]
E(fn (t) − f (t))2 dt → 0, hence
Z b Z b Z c Z b Z c Z b
f dW = lim fn dW = lim fn dW + lim fn dW = f dW + f dW.
a n a n a n c a c
Probability Theory 487

The following proposition shows that in certain circumstances the Ito integral is a limit
of Riemann-Stieltjes sums.
18.8.3 Proposition. Let f ∈ L2 ([a, b] × Ω) such that ft is Ft -measurable and the mapping
(s, t) → E(fs ft ) is continuous. Then f ∈ cl S[a, b] and
Z b n
X
f (t) dW (t) = lim f (tj−1 )∆W (tj ), where P := {a = t0 < t1 < · · · < tn }.
a kPk→0
j=1

Proof. Define an Ito step process fP by fP (t, ω) = f (tj−1 , ω) (tj−1 < t ≤ tj ), where the
(tj−1 , tj ] are the intervals of the partition P. Let (Pn ) be any sequence of partitions with
kPn k → 0 and set fn := fPn . From the calculation

E[f (t) − f (s)]2 = E f (t)2 − 2E[f (s)f (t)] + E f (s)2

we see that lims→t E[f (t) − f (s)]2 = 0. Since kPn k → 0 it follows that

lim E[f (t) − fn (t)]2 = 0.


n

From the inequality


|f (t) − fn (t)|2 ≤ 2|f (t)|2 + 2|fn (t)|2
we have
E |f (t) − fn (t)|2 ≤ 2E |f (t)|2 + 2E |fn (t)|2 ≤ 4 sup E |f (s)|2 .
a≤s≤b

By continuity the supremum is finite, so we may apply the dominated convergence theorem
to conclude that Z b
lim E |f (t) − fn (t)|2 dt = 0,
n a
2
Rb Rb
that is, fn → f in L ([a, b] × Ω). Therefore, f ∈ cl S[a, b] and a
fn dW → a
f dW . Since
the sequence (Pn ) was arbitrary, the conclusion follows.
18.8.4 Example. Let P = {a = t0 < t1 < · · · < tn = b} be an arbitrary partition of [a, b].
By direct expansion
n−1
X  (2) 
W (tj−1 )∆Wtj = 1
2 W 2 (b) − W 2 (a) − VP (W ) .
j=0

It follows from 18.8.3 and 18.7.5 that


Z b
W 2 (b) − W 2 (a) b − a
W (t) dW (t) = − . ♦
a 2 2

The Ito Integral as a Martingale


Rt
18.8.5 Theorem. Let f ∈ cl S[a, b]. Then the process Xt = a f dW is a martingale with
respect to the filtration (Ft ).
Rt
Proof. Let b ≥ t > s ≥ a. Since Xt − Xs = s f dW , we need to show that
Z t 

E f dW Fs = 0.
s
488 Principles of Analysis

Assume first that f is an Ito step process, say


n
X
fu (ω) = ξj−1 (ω)1[tj−1 ,tj ) (u), s = t0 < t1 < · · · < tn = t.
j=1

Then
Z t  X n

E f dW Fs = E ξj−1 ∆W (tj ) | Fs
s j=1
Xn

= E E(ξj−1 ∆W (tj ) | Ftj −1 ) | Fs
j=1
n
X 
= E ξj−1 E(∆W (tj ) | Ftj −1 ) | Fs .
j=1

The last sum is zero since, as noted earlier, E(∆W (tj ) | Ftj −1 ) = 0.
Rt
For a general f , let fn ∈ S[s, t] such that s E |fn (u) − f (u)|2 du → 0. By the first
paragraph, Z t  Z t 

E f dW Fs = E (f − fn ) dW Fs . (†)
s s

Now, by Jensen’s inequality,


Z t  2 Z t 2 !

E (f − f ) dW Fs ≤ E (f − fn ) dW Fs .
n
s s

Taking expectations we have


Z t  2 Z t 2 Z t


E E
(f − fn ) dW Fs ≤ E (f − fn ) dW = E|f − fn |2 dW,
s s s

the last equality by (18.19). Thus


Z t 

E (f − fn ) dW Fs → 0 in L2 ,
s
R 
t
hence a subsequence converges a.s. to zero. It follows from (†) that E s
f dW Fs = 0 as
required.

It may be shown that almost all paths of the integral process X are continuous. (See, for
example, [29].)

18.9 An Application to Finance


In this section we outline the argument that leads to the Black-Scholes formula for the
price of a call option. For details the reader is referred to [24] or [43].
Probability Theory 489

The Stock Price Process


Let W be a Brownian motion on a filtered probability space (Ω, F, (Ft ), P ), where we
take (Ft ) to be the natural filtration for W . In the Black-Scholes-Merton model, the price
(in dollars) of a single share of a stock at time t is assumed to be a random variable St
satisfying the stochastic integral equation
Z t Z t
St = S0 + σ S(s) dW (s) + µ S(s) ds. (18.21)
0 0

Here σ and µ are constants called, respectively, the volatility and drift of the stock. The
integral equation is frequently written as a stochastic differential equation
dS
dS = σS dW + µS dt or = σ dW + µ dt. (18.22)
S
The latter form expresses the fact that the relative change in the stock price has a deterministic
part µ dt, which accounts for the general trend of the stock, and a component σ dW , which
reflects the random nature of the stock.
The solution of (18.21) may be shown to be the geometric Brownian motion process
  
St = S0 exp σWt + µ − 12 σ 2 t . (18.23)
Note that because of the relationship between St and Wt , Ft = σ(Ss : 0 ≤ s ≤ t). Thus the
Brownian filtration (Ft ) reveals stock price information. We show how these facts lead to a
formula for the price of an option.

Self-Financing Portfolios
The key to determining the value of an option is the construction of a self-financing
portfolio based on the stock and a risk-free bond. Assuming that the bond earns interest
at a continuously compounded annual rate r and that the initial value of the bond is one
dollar, the value of the bond at time t is seen to be Bt := ert . Now let φ and θ be stochastic
process adapted to the filtration (Ft ), these representing, respectively, the number of dollar
bonds and number of shares of the stock held at time t. The value of the portfolio at time t
is the random variable
Vt = φt Bt + θt St , 0 ≤ t ≤ T,
where V0 is the initial investment in the portfolio, assumed to be a constant. The portfolio
is said to be self-financing if
dV = φ dB + θ dS, (18.24)
where the differentials represent small changes. The equation may be best understood by
considering a discrete version at times t0 = 0 < t1 < t2 < · · · < tn = T . At time tj , the
value of the portfolio before the price Sj is known is

φj Bj−1 + θj Sj−1 ,

where we write Sj for Stj , etc. After Sj becomes known and the new bond value Bj is noted,
the portfolio has value
Vj = φj Bj + θj Sj .
At this time, stocks and bonds may be bought and sold (based on the information provided
by Ftj ). For the portfolio to be self-financing, this rebalancing must not change the current
value of the portfolio. Thus the new values φj+1 and θj+1 must satisfy

φj+1 Bj + θj+1 Sj = φj Bj + θj Sj .
490 Principles of Analysis

It follows that

∆Vj = φj+1 Bj + θj+1 Sj − (φj Bj + θj Sj )


= φj+1 Bj+1 + θj+1 Sj+1 − (φj+1 Bj + θj+1 Sj )
= φj+1 ∆Bj + θj+1 ∆Sj ,

which is the discrete version of (18.24).

Call Options
A call option based on a stock is a contract made between two parties, the buyer (holder)
of the option and the seller (writer) of the option. The contract requires the writer to offer to
sell the stock to the holder at a future time T for a predetermined amount K. At this time,
the holder may or may not decide to exercise the option. Thus the payoff for the holder is
(ST − K)+ . A self-financing portfolio may be used by the writer as a hedging strategy, that
is, an investment in shares of the stock and units of the bond devised to exactly cover the
writer’s obligation at maturity T . In this case, the portfolio is said to replicate the option.
The writer initiates the portfolio with an amount V0 , the price of the option (cost to the
holder). Here, V0 is chosen so that VT = (ST − K)+ , which is the cost to the writer of the
transaction. The law of one price (in an arbitrage-free market) then asserts that V0 is the
fair price of the option.

The Black-Scholes Option Price


To determine the fair price V0 of the option, one introduces a new probability measure
P ∗ on (Ω, F), called the risk-neutral probability measure, defined by
µ−r
dP ∗ = ZT dP, where ZT := exp (−αWT − 12 α2 T ) and α := .
σ
The corresponding expectation operator is denoted by E ∗ . It may be shown that the process
µ−r
Wt∗ := Wt + αt, 0 ≤ t ≤ T, α := ,
σ
is a Brownian motion under P ∗ on the interval [0, T ]. By (18.23),
 
St = S0 exp σWt∗ + r − 12 σ 2 t , 0 ≤ t ≤ T. (18.25)

e given by
Now form the discounted price process S,

Set := e−rt St = S0 exp σWt∗ − 12 σ 2 t , 0 ≤ t ≤ T.

By 18.7.7(c), Set is a P ∗ -martingale. One may show, as a consequence, that the discounted
value process Ve , given by
Vet := e−rt Vt ,
is also a P ∗ -martingale. This implies the key fact E ∗ Vet is constant in t. In particular,

V0 = E ∗ V0 = E ∗ VeT = e−rT E ∗ VT .

Since the portfolio value VT is assumed to be the payoff to the holder of the option,

V0 = e−rT E ∗ (ST − K)+ . (18.26)


Probability Theory 491

Now use (18.25) to write


 √ 
ST = S0 exp σ T Y + (r − 12 σ 2 )T , Y := T −1/2 WT∗ .

Since Y is a standard normal random variable under P ∗ ,


Z ∞ n √ o +
E ∗ (ST − K)+ = S0 exp σ T y + (r − 12 σ 2 )T − K ϕ(y) dy, (18.27)
−∞

where ϕ is the standard normal density. From (18.26) and (18.27) we see that the price of
the option is given by the formula
Z ∞ n √ o +
V0 = e−rT S0 exp σ T y + (r − 21 σ 2 )T − K ϕ(y) dy. (18.28)
−∞

A more succinct formula for the option price may be obtained as follows. Define

ln (S0 /K) + (r + 21 σ 2 )T ln (S0 /K) + (r − 12 σ 2 )T √


d1 := √ and d2 := √ = d1 − σ T .
σ T σ T
Since the integrand in (18.28) is zero when y < −d2 , we may write the integral as
Z ∞ n √ Z ∞
 o
S0 exp σ T y + r − 12 σ 2 T ϕ(y) dy − K ϕ(y) dy
−d2 −d2
(r−σ 2 /2)T Z ∞ n √ o
e
= S0 √ exp − 12 y 2 + σ T y dy − K [1 − Φ(−d2 )]
2π −d2
= S0 erT Φ(d1 ) − KΦ(d2 ),
Rx
where Φ(x) = −∞ ϕ(y) dy. Using (18.28), we finally arrive at the celebrated Black-Scholes
option pricing formula
V0 = S0 Φ(d1 ) − Ke−rT Φ(d2 ).
Part IV

Appendices
Appendix A
Change of Variables Theorem

The goal of this appendix is prove the following result.


A.0.1 Change of Variables Theorem. Let U, V ⊆ Rd be open and let ϕ : U → V be C 1
with C 1 inverse ϕ−1 : V → U . If f : V → R is Lebesgue measurable and either f ≥ 0 or f is
integrable, then Z Z
f (y) dy = (f ◦ ϕ)(x)|Jϕ (x)| dx, (A.1)
V U
where Jϕ is the Jacobian of ϕ on U .
We prove the theorem first for Borel functions. By the usual considerations, we may
assume that f ≥ 0. It then suffices to prove that
Z Z
f dλd ≤ (f ◦ ϕ)|Jϕ | dλd (A.2)
V U

for all Borel measurable functions f : V :→ [0, +∞]. Indeed, if this inequality holds for all f
and ϕ, then switching the roles of U and V we also have
Z Z
d
g dλ ≤ (g ◦ ϕ−1 )|Jϕ−1 | dλd
U V

for all Borel measurable g : U :→ [0, +∞]. Taking g = (f ◦ ϕ)|Jϕ | and recalling that
Jϕ Jϕ−1 = 1, we obtain the reverse of inequality (A.2). Finally, by the standard arguments, it
suffices to verify (A.2) for indicator functions f = 1B , where B ∈ B(V ). Then (A.2) reduces
to Z
λd (B) ≤ |Jϕ | dλd , B ∈ B(V ).
ϕ−1 (B)

Taking B = ϕ(E) we obtain the equivalent statement


Z

d
λ ϕ(E) ≤ |Jϕ | dλd , E ∈ B(U ). (A.3)
E

The proof of (A.3) is accomplished by a sequence of lemmas. The first treats the case of
a linear change of variable.
A.0.2 Lemma. If T : Rd → Rd is linear and nonsingular, then
λd (T (E)) = | det T |λd (E), E ∈ B(Rd ). (A.4)
Proof. Since T is a homeomorphism, T (E) ∈ B(Rd ), so the left side of (A.4) is defined.
Furthermore, if (A.4) holds for T1 and T2 , then
 
λd T1 T2 (E) = | det T1 |λd T2 (E) = | det T1 | | det T2 |λd (E) = | det(T1 T2 )|λd (E).

Since T is a product of elementary linear transformations, we may therefore assume that T


is such a transformation. Thus we assume that the matrix of T is obtained from the identity
matrix by one of the following operations:

495
496 Principles of Analysis

(a) Interchange of two rows.

(b) Multiplication of a row by a nonzero constant.

(c) Addition of one row to another.

To prove (A.4) in this setting, suppose first that E = I1 × · · · × Id is a bounded d-


dimensional interval. In case (a), det T = −1 and T (E) is the interval obtained from E by
interchanging a pair of intervals Ii and Ij , hence (A.4) holds in this case. In (b), T (E) is
the interval obtained from E by multiplying one of the Ij by a nonzero constant a, hence
λd (T (E)) = |a|λd (E). Since | det T | = |a|, (A.4) holds in this case as well. For case (c),
suppose, for example, that the matrix of T is obtained by adding row two of the identity
matrix to row one. Then

T (x1 , x2 , x3 , . . . , xn ) = (x1 + x2 , x2 , x3 , . . . , xn ),

hence, by Fubini’s theorem and translation invariance,


Z Z
d

λ T (E) = 1T (E) (x) dx = 1E (x1 − x2 , x2 , . . . , xn ) dx
ZZ Z
= · · · 1I1 (x1 − x2 )1I2 (x2 ) · · · 1In (xn ) dxn · · · dx2 dx1
Z Z
= |In | · · · |I3 | 1I2 (x2 ) 1I1 (x1 − x2 ) dx1 dx2

= |In | · · · |I3 | |I2 | |I1 |


= λd (E).

Since det T = 1, (A.4) holds in case (c). Therefore (A.4) holds for all nonsingular T and all
bounded intervals E.
Now let I be a fixed bounded interval and let GI denote the collection of all E ∈ B(Rd )
for which
λd (T (E ∩ I)) = | det T |λd (E ∩ I). (†)
By the first part of the proof, GI contains the collection I all intervals of Rd . We show that
GI is a λ-system (see 1.5). Let A, B ∈ GI with A ⊆ B, and set C = A ∩ I and D = B ∩ I.
Then (B \ A) ∩ I = D \ C and
   
λd T (D \ C) = λd T (D) − λd T (C) = | det T | λd (D) − λd (C) = | det T |λd (D \ C),

hence B \ A ∈ GI . Now let Ak ∈ GI , Ak ↑ A. Letting k → +∞ in

λd (T (Ak ∩ I)) = | det T |λd (Ak ∩ I)

we see that A ∈ GI . Therefore, GI is a λ-system. By Dynkin’s theorem (1.2.6), GI contains


σ(I) = B(Rd ). Thus (†) holds for every E ∈ B(Rd ). Taking a sequence of bounded intervals
I in (†) increasing to Rd we obtain (A.4).

For the next lemma, recall that dfx : Rd → Rd denotes the differential of a function
f : U → Rd at x, that is, the linear operator whose matrix is the Jacobian matrix of f
evaluated at x.
A.0.3 Lemma. Let f : U → Rd be C 1 and let K ⊆ U be compact and convex. Then
M := supz∈K kdfz k < ∞ and kf (x) − f (y)k ≤ M kx − yk for all x, y ∈ K.
Change of Variables Theorem 497

Proof. Since z 7→ dfz is continuous and K is compact, M < ∞. Let x, y ∈ K and u ∈ Rd


and set α(t) := tx + (1 − t)y. By the mean value theorem applied to the scalar function
g := u · f ◦ α, there exists a point z = φ(t) ∈ [x : y] ⊆ K such that
 
u · f (x) − f (y) = g(1) − g(0) = g 0 (t) = u · dfz (x − y).

Taking u = f (x) − f (y) and using the CBS and operator norm inequalities, we have
   
|f (x) − f (y)|2 = f (x) − f (y) · dfc (x − y) ≤ M |f (x) − f (y)| |x − y|.

For the remaining lemmas, we use the following terminology and notation: The cube with
center y ∈ Rd and edge r > 0 is the half-closed interval

Q = Qr (y) := {x ∈ Rd : yj − r/2 < xj ≤ yj + r/2, j = 1, . . . , d}.



Note that λ(Q) = rd and that the diameter of Q is r d. Thus

Br/2 (y) ⊆ Qr (y) ⊆ Cr√d/2 (y). (A.5)

A.0.4 Lemma. Let ψ be C 1 on U , Q a cube contained  in U , and Id the identity operator


on Rd . If kdψx − Id k ≤ c for all x ∈ Q, then λd ψ(Q) ≤ [(1 + c)d]d λd (Q).
e
Proof. Let ψ(x) = ψ(x) − x, so that dψex = dψx − Id . By A.0.3, for suitable c > 0,

e
kψ(x) e
− ψ(y)k ≤ ckx − yk for all x, y ∈ Q.

Thus, if Q has center x0 and edge r, then recalling (A.5) we have for all x ∈ Q,

e
kψ(x) − ψ(x0 )k ≤ kψ(x) e 0 )k + kx − x0 k ≤ (c + 1)kx − x0 k ≤ 1 (c + 1)r d.
− ψ(x 2

Thus ψ(Q) is contained in the closed ball C with center ψ(x0 ) and radius 12 (c + 1)r d. Since
C is contained in the cube with center ψ(x0 ) and edge (c + 1)dr, we have

λd ψ(Q) ≤ [(c + 1)dr]d = [(c + 1)d]d λd (Q).

We call a finite collection Qr of pairwise disjoint cubes with edge r that covers a subset A
of Rd a paving of A. Pavings Qr = {Qr (xj ) : 1 ≤ j ≤ m} and Qs = {Qs (xj ) : 1 ≤ j ≤ m}
with the same centers are said to be concentric. Clearly, any bounded set has a paving Qr
with arbitrarily small r.
A.0.5 Lemma. Let K ⊆ U be compact. Then, for all sufficientlySsmall δ and each 0 < r < δ,
there exists a compact set Kδ and a paving Qr of K with K ⊆ Qr ⊆ Kδ ⊆ U .

Proof. Since K is compact and U c is closed, d(U c , K) > 0. For 0 < δ < d(U c , K)/ d, let

Kδ = {x : d(x, K) ≤ δ d}.

Then Kδ is compact and K ⊆ Kδ ⊆ U . Let 0 < r < δ and let Q be a cube with edge r. If
x ∈ Q ∩ K and y ∈ Q ∩ Kδc , then
√ √
δ d < d(y, K) ≤ |x − y| ≤ r d.

Therefore, if r < δ and Q ∩ K =6 ∅, then Q ∩ Kδc = ∅, that is, Q ⊆ Kδ . Since K is bounded,


there exists a paving Qr of K. Removing those members of Qr that do not meet K produces
a paving of K contained in Kδ .
498 Principles of Analysis

A.0.6 Lemma. Let ψ : U → Rd be C 1 on U and let K ⊆ U be compact. Then for each


ε > 0 there exists δ > 0, a compact set Kδ with K ⊆ Kδ ⊆ U , and concentric pavings Qr ,
Qdr of K contained in Kδ with arbitrarily small r such that for any Qr (y) ∈ Qr ,
  
λd ϕ Qr (y) ≤ (1 + ε)d |Jϕ (y)|λd Qdr (y) . (A.6)

Moreover, δ may be chosen so that


Z Z
|Jϕ (x)| dx < |Jϕ (x)| dx + ε. (A.7)
Kδ K

Proof. Let M = sup{ (dϕy )−1 : y ∈ Kδ }, where Kδ is chosen as in A.0.5. For x, y ∈ U
define −1  −1  −1 
ψ y (x) = dϕy ϕ(x) − ϕ(y) = dϕy ϕ(x) − dϕy ϕ(y) . (a)
−1
Since dϕy is linear, by the chain rule

d(ψ y )x = (dϕy )−1 ◦ dϕx .

Thus for all x ∈ U , y ∈ Kδ , and z ∈ Rd ,


−1 
kd(ψ y )x (z) − zk = dϕy dϕx (z) − dϕy (z) ≤ M kdϕx − dϕy k kzk

and so, by definition of the operator norm,

kd(ψ y )x − Id k ≤ M kdϕx − dϕy k. (b)

By the uniform continuity of dϕ on Kδ we may choose 0 < δ1 < δ such that



kdϕx − dϕy k ≤ ε/M ∀ x, y ∈ Kδ with kx − yk < δ1 d. (c)

Let r < δ1 /d, and by A.0.5 let Qr , Q√


dr be concentric
√ pavings of K contained in Kδ . If
x ∈ Q := Qr (y) ∈ Qr , then kx−yk < r d < δ1 d, hence, by (b) and (c), kd(ψ y )x −Id k < ε.
Applying A.0.4 we have
 
λd ψ y (Q) ≤ [(1 + ε)d]d λd (Q) = (1 + ε)d λd Qdr (y) . (d)
−1  −1 
But by (a), ψ y (Q) = dϕy ) ϕ(Q) − dϕy ϕ(y) , hence, by translation invariance
and A.0.2,
  −1  
λd ψ y (Q) = λd dϕy (ϕ(Q)) = |Jϕ (y)|−1 λd ϕ(Q) . (e)
Inequality (A.6) now follows from (d) and (e). R
For (A.7),note that since K1/n ↓ K and µ(A) := A |Jϕ | dλd is a measure on the Borel
sets, µ K1/n ↓ µ(K). Thus there exists k such that µ K1/n < µ(K) + ε. Taking δ < 1/n
completes the proof.
A.0.7 Lemma. If K ⊆ U is compact, then
Z

λd ϕ(K) ≤ |Jϕ (y)| dy.
K

Proof. Let ε > 0 and choose δ > 0 as in A.0.6. By uniform continuity of Jϕ (x) on Kδ , there
exists δ1 < δ such that

|Jϕ (x) − Jϕ (y)| < ε ∀ x, y ∈ Kδ with kx − yk < δ1 .


Change of Variables Theorem 499

Choose pavings Qr = {Qr (y)}y and Qdr = {Qdr (y)}y as in A.0.6. For x ∈ Qdr (y) we have
|Jϕ (y)| ≤ |Jϕ (x) − Jϕ (y)| + |Jϕ (x)| < ε + |Jϕ (x)|, hence, applying (A.6),
Z
 
(1 + ε)−d λd ϕ(Qr (y)) ≤ |Jϕ (y)|λd (Qdr (y)) ≤ |Jϕ (x)| + ε dx.
Qdr (y)

Therefore,
Z
−d d
 X  
(1 + ε) λ ϕ(K) ≤ (1 + ε)−d λd ϕ(Qr (y)) ≤ |Jϕ (x)| + ε dx
y Kδ
Z

≤ |Jϕ (x)| dx + ε 1 + λd (Kδ ) ,
K

the last inequality by (A.7). Letting ε → 0 gives the desired inequality.


To prove (A.3), use regularity to obtain an increasing sequence of compact sets Kn ⊆ E
such that λd (Kn ) ↑ λd (E). Then λd ϕ(Kn ) ↑ λd ϕ(E) , hence by A.0.7 we obtain
Z Z
d
 d

λ ϕ(E) = lim λ ϕ(Kn ) ≤ lim |Jϕ (y)| dy ≤ |Jϕ (y)| dy,
n n Kn E

as required. This completes the proof of the change of variables theorem for the case f Borel.
Now let f ≥ 0 be Lebesgue measurable on V . Then f = g on V \ E, where g ≥ 0 is Borel
measurable, E ⊆ V , and λd (E) = 0. By the first part of the proof,
Z Z
g(y) dy = (g ◦ ϕ)(x)|Jϕ (x)| dx.
V U
R
But theR left side equals V f (y) dy, and since f ◦ ϕ = g ◦ ϕ on U \ ϕ−1 (E) the right side
equals U (f ◦ ϕ)(x)|Jϕ (x)| dx provided we can show that

λd (ϕ−1 (E)) = 0.

To verify this, suppose first that E is bounded. Then E ⊆ K for a compact interval K with
λd (K) arbitrarily small. Applying A.0.1 “in reverse,” we have
Z Z
h dλd = (h ◦ ϕ−1 )|Jϕ−1 | dλd
U V

for Borel functions h ≥ 0 on U . Taking h = 1ϕ−1 (K) yields


Z Z
λd (ϕ−1 (E)) ≤ λd (ϕ−1 (K)) = 1ϕ−1 (K) ◦ ϕ−1 )|Jϕ−1 | dλd = |Jϕ−1 | dλd .
V K

Since the right side may be made arbitrarily small, λd (ϕ−1 (E)) = 0. If E is unbounded, take
a sequence of bounded set En of measure zero with En ↑ E.
Appendix B
Separate and Joint Continuity

In this appendix we prove the following theorem, which is used in Chapter 17 to establish
joint continuity of multiplication in certain algebraic structures.
B.0.8 Theorem. Let X and Y be topological spaces with X locally compact or a complete
metric space and Y compact Hausdorff. If f : X × Y → C is bounded and separately
continuous, then there exists a dense Gδ subset A of X such that f is jointly continuous at
every point of A × Y .
The proof is based on the following lemmas. For these, we assume the hypotheses of
the theorem, except we allow X to be an arbitrary topological space. We shall need the
functions F : X → C(Y ) and G : X → R+ defined by

F (x) = f (x, ·) and G(x) = inf sup{kF (x0 ) − F (x00 )k∞ : x0 x00 ∈ U },
U

where the infimum is taken over all neighborhoods U of x.


B.0.9 Lemma. The function f is jointly continuous at every point of {x} × Y iff F is
norm continuous at x.
Proof. If F is not norm continuous at x, then there exists an ε > 0 and nets (yα ) and (xα )
with xα → x such that |f (xα , yα ) − f (x, yα )| ≥ 2ε for all α. By compactness of Y , we may
assume that yα → y ∈ Y . Then, eventually, |f (x, yα ) − f (x, y| < ε and so

|f (xα , yα ) − f (x, y)| ≥ |f (xα , yα ) − f (x, yα )| − |f (x, yα ) − f (x, y| > ε.

Therefore, F is not jointly continuous at (x, y).


Now assume that F is norm continuous at x and let xα → x and yα → y. Then from

|f (xα , yα ) − f (x, y)| ≤ |f (xα , yα ) − f (x, yα )| + |f (x, yα ) − f (x, y)|


≤ kF (xα ) − F (x)k∞ + |f (x, yα ) − f (x, y)|

we see that f (xα , yα ) → f (x, y).


The next lemma follows from 0.6.6 and 0.6.7
B.0.10 Lemma. Define sets

Aε := {x ∈ X : G(x) < ε} and A := {x ∈ X : G(x) = 0}.

Then Aε is open and A is a Gδ . Moreover, G(x) = 0 iff F is continuous at x.


B.0.11 Lemma. If K ⊆ C(Y ) is norm-compact and r > 0, then the set

Kr := {x ∈ X : d(F (x), K) ≤ r}

is closed in X. Moreover, if X is a Baire space and G ≥ ε > 0 on X, then int Kr = ∅ for


r = ε/12.

501
502 Principles of Analysis

Proof. Let x0 ∈ Krc and r < s < t < d(F (x0 ), K). For any g ∈ K, because Y is compact
there exists y0 ∈ Y such that

d(F (x0 ), K) = min{kF (x0 ) − hk∞ : h ∈ K} ≤ kF (x0 ) − gk∞ = |f (x0 , y0 ) − g(y0 )|.

Therefore, |f (x0 , y0 ) − g(y0 )| > t, hence, by separate continuity of f , there exists a neighbor-
hood Ug of x0 such that
|f (x, y0 ) − g(y0 )| > t ∀ x ∈ Ug .
Thus if h is in the ball Bt−s (g) in C(Y ) and x ∈ Ug , then

kF (x) − hk∞ ≥ kF (x) − gk∞ − kg − hk∞ ≥ |f (x, y0 ) − g(y0 )| − kg − hk∞ > s. (†)
S
Now, by compactness of K, there exist g1 , . . . , gn ∈ K such that K ⊆ j Bt−s (gj ). Therefore,
by (†) \
kF (x) − hk∞ > s > r ∀ h ∈ K and x ∈ U := Ugj .
j

Taking the infimum of all h shows that the neighborhood U of x0 is contained in Krc .
Therefore, Kr is closed.
Now assume that X is a Baire space and G ≥ ε on X. Since K is compact, we may
cover K with closed balls Cs (g1 ), . . . , Cs (gk ), where gj ∈ K and s = ε/4. It follows that for
Sk
r = ε/12, {h ∈ C(Y ) : d(h, K) ≤ r} ⊆ j=1 Cs (gj ) and so

k
 [ 
Kr ⊆ F −1 {h ∈ C(Y ) : d(h, K) ≤ r} ⊆ F −1 Cs (gj ) .
j=1

By the first paragraph, Kr is closed, as are the sets F −1 Cb (gj ) (take K = {gj }). Since X
is a Baire space, if int Kr 6= ∅, then U := int F −1 Cs (gj ) =6 ∅ for some j. It follows that
kF (x0 ) − F (x00 )k∞ ≤ 2s (x0 , x00 ∈ U ) and so G(x) ≤ 2s = ε/2 (x ∈ U ), contradicting the
hypothesis.
B.0.12 Lemma. Let (xn ) be a sequence in X such that every subsequence has a cluster
point in X. If x0 is a cluster point of (xn ), then F (x0 ) is in the norm-closed convex hull of
the set {F (xn ) : n ∈ N}.
Proof. We show first that the set S := {F (xn ) : n ∈ N} is relatively sequentially compact in
the topology p of pointwise convergence in C(Y ). To see this, (gk ) be a sequence in S. If (gk )
has infinitely many distinct terms, then it has a subsequence that is in fact a subsequence
of (F (xn )). Since F is clearly p-continuous, the hypothesis on (xn ) implies that (gk ) has a
subsequence that p-converges to some g ∈ C(Y ). On the other hand, if (gk ) has only finitely
many distinct terms, then it has a constant subsequence, and the same conclusion holds.
By 14.1.4, S is relatively w-compact in C(Y ), hence the weak and pointwise closures of S
coincide. Since F (x0 ) is in the pointwise closure and since the norm and weak closures of
co S are the same, the conclusion of the lemma follows.
The proof of B.0.8 is based on the following “game” on a topological space X. The game
has two players, α and β. Player β starts the game by choosing a nonempty open set U1 .
Player α then chooses a nonempty open set V1 ⊆ U1 and a point x1 ∈ V1 . Next, player β
chooses a nonempty open set U2 ⊆ V1 . In general, move n of β is the choice of an open
set Un ⊆ Vn−1 , and α’s subsequent move n is the choice (Vn , xn ), where Vn is open and
xn ∈ Vn ⊆ Un . In this way we obtain two decreasing sequences (Un ) and (Vn ) of open
sets and a sequence (xn ) of points in X. Player α wins the game (and defeats β) if every
Separate and Joint Continuity 503
T∞ T∞
subsequence of (xn ) has a cluster point in the common intersection n=1 Un = n=1 Vn .
A strategy for α is a rule that governs each of α’s moves based only on the immediately
preceding move of β. A winning strategy for α is a strategy that results in the defeat of
β no matter how β moves. A topological space X for which a winning strategy for α exists
is called α-favorable.
B.0.13 Proposition. (a) A complete metric space is α-favorable.
(b) A locally compact Hausdorff space is α-favorable.

(c) A nonempty open subset X 0 of an α-favorable space X is α-favorable.

(d) An α-favorable space X is a Baire space.


Proof. (a) If β makes the move Un , α counters it with the move (V Tn∞:= Brn (xT
n ), xn ), where

Crn (xn ) ⊆ Un and rn ↓ 0. By Cantor’s intersection theorem, n=1 Vn = n=1 Crn (xn )
contains a point x and xn → x. Therefore, X is α-favorable.
(b) If β makes the move Un , α counters it with the move (Vn , xn ), where cl Vn ⊆ Un
is compact and xn ∈ Vn is arbitrary (0.12.3). The compactness of clTV1 impliesTthat every
∞ ∞
subsequence of (xn ) has a cluster point in the common intersection n=1 Vn = n=1 cl, Vn .
Therefore, X is α-favorable.
(c) On the nth move in the game on X 0 , player β chooses an open subset Un of X 0 . Since
Un is open in X, player α may (and does) counter by a move (Vn , xn ) from the winning
strategy.
(d) If X is notTa Baire space, then there exists a sequence of open, dense subsets Wn of X
such that W := n Wn is not dense in X. Let U be a nonempty open set that does not meet
W . The initial move U1 = U of β then defeats α. Indeed, no matter T how α moves at stage
n − 1, β need only choose Un = Vn−1 ∩ Wn−1 , which results in n Un ⊆ U ∩ W = ∅.
We may now prove the following generalization of B.0.8.
B.0.14 Theorem. Let X and Y be topological spaces with X α-favorable and Y compact
Hausdorff. If f : X × Y → C is bounded and separately continuous, then there exists a dense
Gδ subset A of X such that f is jointly continuous at every point of A × Y .
Proof. Let A and Aε be the sets in B.0.10. By B.0.9 and B.0.10, it suffices to show that A
is dense in X. We assume that is not the case and seek a contradiction.
Since X is a Baire space, some Aε is not dense in X. Thus G ≥ ε on the nonempty open
set X 0 := (cl Aε )c . Since X 0 is α-favorable, (B.0.13(c)) we may as well assume that G ≥ ε
on X. To deduce the desired contradiction, we start the game.
Assume that α plays according to the winning strategy. For the first move, β chooses
U1 = X and α makes a move (V1 , x1 ) (x1 ∈ V1 ⊆ U1 ) from the winning strategy. Now
consider the game immediately after the moves U1 , . . . , Un−1 and (V1 , x1 ), . . . , (Vn−1 , xn−1 ).
For the nth move, β applies B.0.11 to the compact set Kn := co {F (x1 ), . . . , F (xn−1 )} to
obtain a set Kn,r = {x ∈ X : d(F (x), Kn ) ≤ r} with void interior and then chooses the
c
nonempty open set Un = Vn−1 ∩ Kn,r Kn ) ≥ r on Un . Now, since
. Thus, for all n, d(F (x),T
α plays with a winning strategy, (xn ) has a cluster point x0 ∈ n Un . By B.0.12, a convex
combination of members of {F (xn ) : n ∈ N} is within r/2 of F (x0 ). But then for some n,
d(F (x0 ), Kn ) < r, the desired contradiction.
B.0.15 Remark. Joint continuity results like those considered here go back at least to
Baire [3], who proved B.0.8 for X = Y = [0, 1]. More refined results were obtained much
later by Ellis [18], Namioka [35], and Lawson [31], [32]. For additional references see [41].
The treatment in this appendix via the notion of topological game follows Christensen [7],
where f is allowed more generally to take values in a pseudo-metric space. ♦
References

[1] R. Ash and C. Doleans-Dade, Probability and Measure Theory, 2nd Ed., Academic
Press, San Diego, 2000.
[2] G. Bachman, and L. Narici, Functional Analysis, Academic Press, New York, 1966.
[3] R. Baire, Sur les fonctions de variables réelles, Ann. di Mat. 3, 1–123, 1899.
[4] J. Berglund, H. Junghenn, and P. Milnes, Analysis on Semigroups: Function Spaces,
Compactifications, Representations, Wiley, New York, 1988.
[5] P. Billingsly, Probability and Measure, Wiley, New York, 1979.
[6] H. Brezis, Functional Analysis, Sobolev Spaces, and Partial Differential Equations,
Springer-Verlag, New York, 2011.
[7] J.P.R. Christensen, Joint continuity of separately continuous functions, Proc. Amer.
Math. Soc. 82, 455–461, 1981.
[8] J. Clarkson, Uniformly convex spaces, Trans. Amer. Math. Soc. 40, 415–420, 1936.
[9] J. Conway, A Course in Functional Analysis, Springer-Verlag, New York, 1990.

[10] K. deLeeuw, and I. Glicksberg, Applications of almost periodic compactifications, Acta


Math. 105, 63–97, 1961.
[11] K. deLeeuw, and I. Glicksberg, Almost periodic functions on semigroups, Acta Math.
105, 99–140, 1961.
[12] J. Diestel and J.J. Uhl, Vector Measures, Mathematical Surveys, 15, Amer. Math, Soc.
Providence, 1977.
[13] J. Dugundji and A. Granas, Fixed Point Theory, Springer-Verlag, New York, 2003.
[14] N. Dunford and J.T. Schwartz, Linear Operators, Vol. I, Wiley-Interscience, New York,
1958.

[15] A. Dvoretski, P. Erdos, and S. Kakutani, Nonincrease everywhere of the Brownian


motion process, Proc. 4th Berkeley Symposium on Math Stat. and Prob., Vol. II, 103–116,
1961.
[16] W. Eberlein, Abstract ergodic theorems and weak almost periodic functions, Trans. Amer.
Math. Soc. 94, 217–240, 1949.
[17] R. Ellis, Locally compact transformation groups, Duke Math Journal, Vol. 24, Number
2, 119–125, 1957.
[18] R. Ellis, A note on the continuity of the inverse, Proc. Amer. Math. Soc. 8, 372–373,
1957.

505
506 References

[19] L. Fejer, Beispiele stetiger Funktionen mit divegenter Fourierreihe, J. Reine Angew.
Math. 137, 1–5, 1910.
[20] G. Folland, Real Analysis. Modern Techniques and Their Applications, 2nd Ed. John
Wiley & Sons, New York, 1999.
[21] G. Folland, A Course in Abstract Harmonic Analysis, CRC Press, Boca Raton, 1995.
[22] P. Halmos, Lectures on Ergodic Theory, Chelsea, New York, 1956.
[23] P. Halmos, Naive Set Theory, Springer-Verlag, New York, 1994.
[24] H. Junghenn, Option Valuation: A First Course in Financial Mathemtics, CRC Press,
Boca Raton, 2012.
[25] H. Junghenn, A Course in Real Analysis, CRC Press, Boca Raton, 2015.
[26] J. Kindler, A simple proof of the Daniell-Stone representation theorem, Amer. Math.
Monthly 90, 396–397, 1983.
[27] I. Kluvnek and G. Knowles, Vector measures and control systems, North-Holland
Mathematics Studies 20, North-Holland, New York, 1976.
[28] E. Kreyszig, Introduction to Functional Analysis with Applications, John Wiley & Sons,
New York, 1978.
[29] H. Kuo, Introduction to Stochastic Integration, Springer-Verlag, New York, 2006.
[30] S. Lang, Real and Functional Analysis, 3rd Ed., Springer-Verlag, New York, 1993.
[31] J. D. Lawson, Joint continuity in semitopological semigroups, Illinois J. Math 18,
275–285, 1974.
[32] J. D. Lawson, Additional notes on continuity in semitopological semigroups, Semigroup
Forum 12, 265–280, 1976.
[33] P. Lax, Functional Analysis, 3rd Ed., Wiley Interscience, John Wiley & Sons, 2002.
[34] L. Loomis, An Introduction to Abstract Harmonic Analysis, D. Van Nostrand, Princeton,
1953.
[35] I. Namioka, Separate continuity and joint continuity, Pacific J. Math, 51, 515–531,
1974.
[36] G. Pedersen, Analysis Now, Springer-Verlag, New York, 1995.
[37] R. Phelps, Lectures on Choquet’s Theorem, 2nd Ed., Lecture Notes in Mathematics
1757, Springer-Verlag, New York, 2001.
[38] I. Rana, An Introduction to Measure and Integration, 2nd Ed., Graduate Studies in
Mathematics Vol. 45, AMS, Providence, 2002.
[39] J. Ringrose, A note on uniformly convex spaces, J. London Math. Soc. 34, p.92, 1959.
[40] M. Rosenblum, On a theorem of Fuglede and Putnam, J. London Math. Soc. 33, 376–377,
1958.
[41] W. Ruppert, Compact Semitopological Semigroups: An Intrinsic Theory, Lecture Notes
in Mathematics 1079, Springer-Verlag, New York, 1984.
References 507

[42] S. Saeki, A proof of the existence of infinite product probability measures, Amer. Math.
Monthly Vol. 103, No. 8, 682-683, Oct. 1996.
[43] S. Shreve, Stochastic Calculus for Finance, Springer-Verlag, New York, 2004.
[44] I. Singer, Bases in Banach Spaces I, Springer-Verlag, Heidelberg, 1970.
[45] C. Swartz, An Introduction to Functional Analysis, Marcel Dekker, New York, 1992.

[46] M. Taylor, Measure Theory and Integration, Graduate Studies in Mathematics Vol. 76,
American Mathematical Society, Providence, 2006.
[47] S. Taylor, Exact asymptotic estimates of Brownian path variation, Duke Math. J. Vol.
39, No. 2, 219–241, 1972.

[48] F. Treves, Topological Vector Spaces, Distributions, and Kernels, Academic Press, New
York, 1967.
List of Symbols

Analysis on Groups and Semigroups


b b

Lt f , Rt f , 386; f ∗ g, 169,397; f , 397; f , 387; F(f ), f , 171,411; f , 172; F , F , 380; P(G),
e b b
402; G,b 411; µ ∗ ν, 178,400; µ b, 179; W AP (S), 424; AP (S), 429; SAP (S), 433; S W AP , 426;
AP SAP
S , 430; S , 435.

Convergence
T a.e. µ a.u. Lp v
T -limα xα = x, xα → x, 21; fn → f , fn → f , fn → f , 85; fn → f , 131; µn → µ, 191;
w w∗
xα → x, 257; xα → x, 262.

Functions
idX , ιA : A ,→ X, δi,j , 1A , x+ , x− , 5; Re z, Im z, z, |z|, sgn(z), xb, δx , 6; f + , f − , f1 ∨· · ·∨fn ,
f1 ∧ · · · ∧ fn , supn fn , inf n fn , limn fn , limn fn , Re f , Im f , f and |f |, 6; xα , 172; ∆(x), 392;
f (A), f −1 (B), 5.

Function Spaces
B(X), 16; Cb (X), C(X), 25; Cc (X), 34; C0 (X), 35; C k (U ), C ∞ (U ), Cck (U ), Cc∞ (U ), 36;
Lp (X, F, µ), 123; L∞ (X, F, µ), 126; L0 (X, F, µ), 245; BV (I), 160; AC(I), 164; S = S(Rd ),

175; A(D), 201; `p (N), 127; `p (Z), c00 , c0 , c, 200; CK (U ), 369; D(U ), 370; Lpk (U ), 380; P(G),
402; G, 411; W AP (S), 424; AP (S), 429; SAP (S), 433.
b

Measure
σ(A), ϕ(A), B(X), 45; OI , CI , HI , 46; B(R), 46; A1 × · · · × Ad , F1 ⊗ · · · ⊗ Fd , 46; (X, F, µ)
51; δx , µE , 52; (X, Fµ , µ), M(µ∗ ), 54; µE , 52; µ∗ , 56; M(µ∗ ), 56; λ, λd , 63; F/G, 75; T (µ),
hµ, h dµ, 96; µ ⊗ ν, 112,188; µ⊗ν, 189; µ1 ⊗ · · · ⊗ µd , 114; µi1 ⊗ · · · ⊗ µin , 189; ν ⊥ η, 140; µ+ ,
µ− , 140; |µ|, 140,144; µr , µi , 143; M (X, F), 146; ν  µ, 148; dµdν
, 149; D(µ; x, r), D(µ; x, r),
D µ, D µ, D µ, 154; VI,P (f ), VI (f ), 159; Tf , 161; Mra (X), 182.

Metric Spaces
d(x, y), (X, d), 10; Br (x), Cr (x), Sr (x), 11; int(E), cl(E), bd(E), 12,19; d(A, B), d(x, A),
d(E), 14.

Integration
R R R R R Rb
f dµ, 91; S(f, P), S(f, P), a f ,
R
f, f dµ, E
f (x) dµ(x), E
f (x)µ(dx), f dF , 89; E
Rb Rb
f , a f , 101; kPk, 102; S(f, P, ξ), 103; X×Y f (x, y) d(µ⊗ν)(x, y), Y X f (x, y) dµ(x) dν(y),
R R R
a RR
112; f (x1 , . . . , xd ) dµ1 (x1 ) . . . dµd (xd ), 114.

509
510 List of Symbols

Normed and Locally Convex Spaces


0 0 ⊥ ⊥ 0
Br , Cr , S
Qr , 200; C1 , 208; X × Y, 214; X ⊕ Y, 215; X/Y, 216; T , 234 A , B, 234; A ,
0
B, 252; i∈I Xi , 252; S ⊥ , 279; T ∗ 287; S0 , S00 , 288; σ(x), r(x), ρ(x), 319; σ(A), 324.

Norms, Seminorms, and Related Concepts


k · k, 15; k·k∞ , 16; | · |, 3; k·kp , 123; k·k∞ , 126; pα,β , qα,n , 174; kT k, 207; pU , 243; pm,α , 244;
(· | ·), 274; kT k2 , 303; kT k1 , 309; tr T , 311; kf kk,p , (f | g)k 380.

Probability
(Ω, F, P ), 443; E(X), 443; V (X), σ(X), cov(X, Y ), φX , FX , fX , PX , 444; N∞P(X1 ,...,Xn ) ,
Q∞ PX1 ⊗N
446; · · · ⊗ P Xn N, 447; PX1 ∗ · · · ∗ PXn , 447; E(X|G), E(X|Y ), 448; n=1 Fn , 450;
∞ ∞
n=1 Ω n , n=1 F n , n=1 P n , 452; P X1 ⊗X2 ⊗··· , 452; Ω, F, (F n ), P , 464; (Xn , Fn ), 464;
Xτ , Fτ , 467; U[a,b]
n
, U[a,b] , 468; P(i1 ...in ) , 472; (S n , Fn ), 473; FI , 473; Ω, F, (Ft )t≥0 , P , 477;

(p) Rb
W = (W (t))t≥0 , 477; VP , 482; FW , 483; ∆W (tj ), 484; Iab (f ), a f (t) dW (t), 484,486; (St ),
(Vt ), 489.

Sets
S
A ∩ B, A, P(X), 1; An ↑ A, An ↓ A, 2; N, Z, Q, R, C, D, T, 2; Z+ , R+ , K, Rd , Cd , Kd , R,
K, 3; A∗ , 3; Y X , 4; A 4 B, 44; limn An , limn An , 44; span A, [a : b], co A, cobal A, 8; ker ϕ,
7; ker T , 9; V/U 10; ex K, 348.

Spaces of Linear Mappings


B(X, Y), 206; B(X), 208; X 0 , 208; BI(X × Y, Z), 208; X 00 , 221; B0 (X, Y), B0 (X), 236;
Xτ , Xw , 257; Xw∗ , 262; B00 (H), B0 (H), 296; B2 (H, K), 303; B1 (H, K), 309; D0 (U ), 370;
E0 (U ), 373; S0 (Rd ), 378.

Topological Spaces
N(x), 20; B(x), 20; Fσ , Gδ , 25; (X∞ , T ∞ ), 35; supp(f ), 34; K(f, ε), 35; Cx , 39; βS, 328.
Index

abelian group, 7 Black-Scholes option pricing formula, 491


absolute convergence, 16 Borel σ-field, 45
absolutely continuous, 148, 164 Borel functional calculus, 338
absolutely convergent trigonometric series, boundary, 12, 19
328 bounded
absolutely convex combination, 9 linear transformation, 18
adapted to a filtration, 464 sesquilinear form, 286
adjoint set, 246
of a map, 6 variation, 160
of an operator, 287 Brownian motion, 477
algebra, 10
algebraic direct sum, 215 C ∗ -algebra, 288, 315
almost all (a.a), 55 call option, 490
almost everywhere (a.e), 55 canonical mapping, 327
almost periodic cardinality, 6
compactification, 430 Cartesian product of a family, 2
function, 429 Cauchy product, 317
semigroup of operators, 437 Cauchy sequence, 12
vector, 437 Cavalieri’s principle, 115
almost surely (a.s.), 443 Cayley transform, 296
α-favorable space, 503 CBS inequality, 274
annihilator, 234, 252 chain, 4
antisymmetric relation, 3 character, 324, 411
approximate identity, 170, 400 character space, 324
associative operation, 7 circle group, 2
axiom of choice, 2, 4 closed
ball, 11
Baire σ-field, 184 convex balanced hull, 241
Banach convex hull, 241
algebra, 18 set, 11, 19
limit, 224 unit ball, 200
space, 15 closure, 12, 19
∗-algebra, 315 cluster point, 22
base for a topology, 19 cocountable, 45
basis, 232, 281 coefficient
Bessel’s inequality, 281 algebra, 438
bicommutant, 288, 318 of a representation, 409
bidual, 221 space, 441
bijection, 4 cofinite, 45
bijective, 4 common refinement, 101
bilateral sequence, 200 commutant, 288, 318
bilinear form, 208 commutative
bilinear mapping, 208 algebra, 10

511
512 Index

group, 7 covariance, 444


semigroup, 7 cube, 497
compact cyclic vector, 405
operator, 236
set, 27 d-dimensional
support, 34, 373 interval, 3
complemented subspace, 222 Lebesgue measure, 63
complete random variable, 443
measure space, 54 volume, 63
metric space, 12 decreasing sequence of sets, 2
completely regular topological space, 27 defining sequence, 361
completion dense subspace, 19
of a metric space, 12 diagonalizable operator, 299
of a normed space, 16 dilation property, 96
component, 39 Dirac delta distribution, 370
composition of functions, 5 direct product of groups, 394
conditional expectation, 150, 448 Dirichlet kernel, 230
conjugate exponents, 124 discounted
connected topological space, 39 price process, 490
consistency conditions, 472 value process, 490
continuous discrete
at a point, 13, 23 metric, 11
function, 13, 23 topology, 19
linear transformation, 18 disjoint sets, 2
on a set, 13, 23 disjoint union, 2
continuous functional calculus, 333 disk algebra, 201
convergence dissipative vector, 439
of a net, 21 distance between sets, 14
almost everywhere, 85 distribution, 370
almost uniformly, 85 d-dimensional probability, 444
in Lp norm, 131 Bernoulli, 445
in measure, 85 binomial, 445
of a sequence, 12 exponential, 445
of a series, 16 finite dimensional, 472
vague, 191 function, 67, 69
weak, 257 geometric, 445
weak∗ , 262 joint, 444
convex normal, 445
combination, 9 of a random variable, 444
function, 136 of a sequence of random variables, 444,
hull, 8 452
set, 8 uniform, 445
convex balanced hull, 8 distributional derivative, 371
convolution, 169, 178, 319, 374, 397, 400 dual
convolution operator, 207 group, 413
coordinate difference operator, 69 of a map, 6
coordinate functionals, 233 of an operator, 234
countably space, 208, 246
additive, 50, 139
infinite, 6 eigenspace, 299
subadditive, 50 eigenvalue, 299
Index 513

eigenvector, 299 characteristic, 444


equicontinuous, 31 concave, 136
equivalence class, 4 convex, 136
equivalence relation, 4 cumulative distribution, 444
equivalent measures, 153 Heaviside, 371
equivalent norms, 201 identity, 5
ergodic transformation, 460 indicator, 5
essential range, 128 integrable, 90
essential uniqueness, 391 inverse of, 5
Euclidean modular, 392
inner product, 276 probability mass, 444
metric, 11 radial, 120
norm, 3 rapidly decreasing, 175
space, 3 Schwartz, 175
evaluation functional, 6, 221 signum, 6
events up to time τ , 467 simple, 82, 361
eventually, 12, 21 step, 131
expectation, 443 strictly concave, 136
extended strictly convex, 136
Borel σ-field, 46 strongly almost periodic, 433
Borel set, 46 symmetric, 387
extended real number system, 3 total variation, 161
extreme weakly almost periodic, 424
point, 348
subset, 350 Gaussian density, 116
Gelfand topology, 324
F-measurable sets, 44 Gelfand transform, 327
field, 44 generalized Cantor set, 73
filtered probability space, 464, 477 generalized function, 370
filtration generated
continuous time, 477 σ-field, 45
discrete-time, 464 field, 45
reversed, 471 geometric Brownian motion process, 489
final space, 294 greatest lower bound, 4
final topology, 24
finite intersection property (f.i.p.), 27 Hölder continuous, 205
finite rank operator, 237 Hahn decomposition, 140
finitely half-space, 252
additive, 50, 139 Hamel basis, 9, 285
subadditive, 50 Hausdorff topological space, 19
first countable, 21 hedging strategy, 490
fixed point, 356 Hermitian form, 273
Fourier series, 229 Hilbert space, 275
Fourier transform, 171, 380, 411 Hilbert-Schmidt
Fourier-Stieltjes transform, 179 inner product, 304
Fréchet space, 244 integral operator, 307
frequently, 12, 21 norm, 303
function operator, 303
affine, 9 homeomorphism, 13, 23
almost periodic, 429 homomorphism
Cantor, 72 of algebras, 10, 315
514 Index

of semigroups, 7 subspace, 221, 406


inverse
ideal, 10, 315 Fourier transform, 172
idempotent, 292, 432 of an element, 7, 316
identically distributed, 444 of function, 5
identity, 10 invertible element, 316
identity element, 7 invertible operator, 206
iid random variables, 470 involution, 315, 397
imaginary part of an operator, 290 irreducible, 406
inclusion map, 5 isometric isomorphism, 18
increasing sequence of sets, 2 isometry, 18
independent isomorphism
events, 446 of algebras, 10
increments, 477 of Hilbert spaces, 277
random variables, 446 of linear spaces, 9
trials, 446, 447 of normed spaces, 18
index sequence, 473 of semigroups, 7
indiscrete topology, 19 Ito step process, 484
infimum, 4
initial space, 294 Jacobian, 97
initial topology, 24 Jensen’s inequality, 138
injection, 4 jointly measurable, 78
injective, 4 Jordan decomposition, 140
inner product, 275
inner product norm, 275 kernel
inner product space, 275 of a linear transformation, 9
inner regularity, 181 of a homomorphism, 7
integrability on E, 91 Kolmogorov’s inequality, 455
integral, 89
Bochner, 361 L∞ norm, 126
Daniell-Stone, 193 λ-system, 47
Darboux, 101 least upper bound, 4
Dunford, 358 Lebesgue fundamental theorem of calculus,
Ito, 484, 486 166
Lebesgue, 91 Lebesgue measurable sets, 65
Lebesgue-Stieltjes, 91 Lebesgue points, 159
lower, 101 left
of a complex function, 91 coset, 7
of a nonnegative function, 90 coset space, 7
of a real function, 90 ideal, 432
of a simple function, 89 invariant mean, 428
over E, 91 regular representation, 404
Pettis, 358 translate, 386
Riemann, 103 translates, 423
upper, 101 translation invariant, 386
weak, 357 translation operator, 223, 423
integral operator, 207, 237 uniformly continuous, 386
interior, 12, 19 Legendre polynomial, 284
invariant, 288 lemma
event, 460, 462 Borel-Cantelli, 454
mean, 223, 428 Fatou’s, 109
Index 515

Riemann-Lebesgue, 172 dual, 418


Urysohn’s, 26, 34 ergodic, 352
Zorn’s, 4 finite, 140
limit generated outer, 56
of a net, 21 Haar, 387
of a sequence, 12 Hahn decomposition of, 140
of Riemann sums, 103 image of, 96
of upper and lower sums, 102, 103 invariant, 352
limit infimum of sets, 44 Jordan decomposition of, 140
limit supremum of sets, 44 Lebesgue, 65
line segment, 8 Lebesgue-Stieltjes, 66
linear Lebesgue-Stieltjes signed, 154
combination, 8 left Haar, 387
functional, 9 left invariant, 387
space, 8 negative variation of, 140
transformation, 9 non-atomic, 355
local uniform convergence, 244 normalized Haar, 393
locally outer, 56
compact group, 387 polar decomposition of, 150
compact space, 33 positive variation of, 140
convex space, 241 probability, 51
convex topology, 241 product probability, 452
Lebesgue integrable, 370 Radon, 181
Riemann integrable, 106 real and imaginary parts of, 143
lower bound, 4 regular, 181
lower sum, 101 right Haar, 387
Lp norm, 123 right invariant, 387
Lp -process, 463 risk-neutral probability, 490
signed, 139
martingale signed Radon, 182
continuous-time, 483 space, 51
discrete-time, 464 spectral, 338
reversed, 471 support of, 183, 184
maximal total variation, 140
element, 4 total variation of, 144
ideal, 324 upper and lower derivates of, 155
ideal space, 325 with density, 96
mean, 223, 263, 443 measure preserving transformation, 459
measurable mesh of a partition, 102
function, 78 metric, 10
partition, 44 metric space, 10
rectangles, 46 metric topology, 19
space, 44 minimal
transformation, 75 element, 4
measure, 50 ideal, 432
algebra, 316, 401 left ideal, 432
complex, 143 right ideal, 432
complex conjugate of, 143 minimality property, 45
complex Radon, 182 Minkowski functional, 218, 243
d-dimensional Lebesgue, 63 monotone class, 50
derivative of, 155 monotone set function, 50
516 Index

µ∗ -measurable, 56 path of the process, 472


multi-index, 36 paving, 50, 497
multiplication operator, 207 π-system, 47
multiplicity of an eigenvalue, 301 positive
multistep property, 464 element, 321
mutually singular, 140 form, 273
linear functional, 184
natural filtration, 464, 477 operator, 291
neighborhood, 11, 20 positive definite, 401
neighborhood base, 20 positive type, 402
neighborhood system, 20 probability density, 444
net, 21 probability measure, 51
noncontracting, 356 process, 463
nonnegative set function, 50 product
norm, 15 σ-field, 450
norm topology, 200 of measure spaces, 112
normal of probability spaces, 452
element, 321 ordering, 21
operator, 289 σ-field, 46, 473
subgroup, 7 topology, 24
topological space, 26 vector space, 214, 217
normed algebra, 18 product rule for distributions, 372
normed linear space, 15 projection, 321
projection map, 214, 473
one-point compactification, 35 proper ideal, 315
one-to-one correspondence, 4
onto, 4 quadratic variation, 482
open quotient
ball, 11 map, 4, 10
cover, 27 norm, 216
function, 23 space, 10
set, 11, 19 topology, 25
unit ball, 200
unit disk, 2 radical, 326
operator norm, 207 Radon measure space, 181
orthogonal, 278 Radon-Nikodym derivative, 149
orthogonal complement, 279 random variable, 443
orthogonal projection, 292 rank one operator, 297
orthonormal, 281 real part of an operator, 290
orthonormal basis, 281 reduce an operator, 288
outer regularity, 181 refinement of a partition, 101
reflection invariance, 96
P-variation, 159 reflexive relation, 3
pth variation, 482 reflexive space, 267
parallelogram law, 274 relation, 3
Parseval’s formula, 420 antisymmetric, 3
Parseval’s identity, 282 equivalence, 4
partial differential operator, 36 reflexive, 3
partial isometry, 294 symmetric, 3
partial order, 4 transitive, 3
partition of unity, 37 relative topology, 21
Index 517

relatively Cantor ternary, 71


closed, 21 closed, 11
compact, 27 compact, 27
open, 21 convex, 8
replicating portfolio, 490 countable, 6
representation, 403 countably infinite, 6
resolvent function, 323 cylinder, 450, 473
reversed martingale, 471 dense, 12
Riemann sum, 103 diameter of, 14
right directed, 21
ideal, 432 extended Borel, 46
invariant mean, 428 finite, 6
translate, 386 Fσ , 25
translates, 423 Gδ , 25, 184
translation invariant, 386 image of, 4
translation operator, 223, 423 inclusion directed, 21
uniformly continuous, 386 invariant, 356
ring, 58 minimal invariant, 356
µ-null, 55
sample space, 443 open, 11
Schauder basis, 232 partially ordered, 4
Schwartz space, 175, 245 partition of, 2
second countable space, 19 polar of, 252
self-adjoint power, 1
element, 321 preimage, 4
form, 273 rectangular cylinder, 450
operator, 289 resolvent, 319
self-financing portfolio, 489 reverse inclusion directed, 21
semi-inner-product space, 274 self-adjoint, 298
semidirect product, 394 test, 57
semigroup, 7 totally ordered, 4
semigroup of operators, 221, 437 uncountable, 6
seminorm, 15 universal, 1
seminorm topology, 242 set function, 50
seminormed space, 242 shrink to x nicely, 159
semiring, 58 σ-field, 44
semitopological simplex, 115
group, 423 space average, 460
semigroup, 423 span, 8
separable, 12, 19 spectral radius, 319
separable normed space, 201 spectral resolution of the identity, 338
separately measurable, 78 spectrum, 319, 324
separating seminorms, 243 sphere, 11
separation of points, 32 standard
sequentially compact, 30 deviation, 444
sesquilinear form, 361
form, 273 metric, 244
functional, 273 normal density, 116
set representation, 82
balanced, 8 state space, 472
Borel, 45 stationary process, 462
518 Index

stochastic differential equation, 489 test functions, 370


stochastic process theorem
d-dimensional, 472 Arzelá–Ascoli, 31
continuous time, 472 Baire category, 15, 33
discrete time, 463 Banach closed range, 265
Stone vector lattice, 193 Banach isomorphism, 227
Stone-Čech compactification, 327 Banach-Alaoglu, 263
stopped Banach-Steinhaus, 225
process, 467 Banach-Stone isomorphism, 354
random variable, 467 Beppo Levi, 93
stopping time, 466 bipolar, 252
strategy, 503 Birkhoff individual ergodic, 460
winning, 503 Bochner’s, 415
strict inductive system, 254 Cantor intersection, 14
strictly convex normed space, 203 Caratheodory, 56
strong operator topology, 403, 438 central limit, 458
strongly almost periodic change of variables, 97, 495
compactification, 435 Choquet, 365
function, 433 closed graph, 228
vector, 439 Daniell-Stone, 194
strongly analytic, 231 DeMoivre-Laplace, 459
strongly elliptic, 381 dominated convergence, 109
strongly measurable, 361 Dunford-Pettis, 345
subalgebra, 10 Dynkin π-λ, 48
subcover, 27 Eberlein-S̆mulian, 342
subgroup, 7 Egoroff, 87
submartingale Fourier inversion, 173, 417
continuous-time, 483 Fredholm alternative, 239
discrete-time, 464 Fubini–Tonelli, 112, 114
subsemigroup, 7 Fuglede-Putnam, 334
subspace of a linear space, 8 Gelfand representation, 326, 330
supermartingale Gelfand-Mazur, 320
continuous-time, 483 Gelfand-Raikov, 408
discrete-time, 464 Hahn-Banach, 218–220
support Hahn-Jordan, 140
of a distribution, 372, 373 Heine-Borel, 31
of a function, 34 image measure, 96
of a measure, 183, 184 Kakutani fixed point, 356
supremum, 4 Kolmogorov extension, 474
surjection, 4 Kolmogorov zero-one law, 454
surjective, 4 Krein-S̆mulian, 346
symmetric Krein-Milman, 350
difference, 44 Lax-Milgram, 287
neighborhood, 386 Lebesgue-Decomposition, 152
relation, 3 Lusin, 182
Lyapunov, 355
tagged partition, 103 martingale convergence, 469, 483
tail σ-field, 453 Mazur, 259, 347
tail events, 453 mean ergodic, 294
tempered distribution, 378 mean value for integrals, 100
tensor product of functions, 188 measure approximation, 61
Index 519

measure completion, 54 topology, 19


measure differentiation, 157 total variation
measure extension, 60 of a function, 160
measure uniqueness, 62 of a measure, 140
measure zero criterion, 104 total variation norm, 146
Milman-Pettis, 271 totally bounded, 30
Minkowski’s inequality, 124, 125 trace, 1
Minkowski’s integral inequality, 126 class, 309
monotone class, 50 inner product, 276
monotone convergence, 108 norm, 309
Neumann series, 316 of a matrix, 276
open mapping, 226 of an operator, 311
optional stopping, 467 transitive relation, 3
orthogonal complement, 279 translation invariance, 96
partition of unity, 37, 38 triangle inequality, 15
Peter-Weyl, 410
Plancherel, 176, 419 unconditional convergence, 17
Poisson summation formula, 285 uniformly continuous, 14, 386
polar decomposition, 295 uniformly convex space, 269
polarization identity, 273 uniformly integrable (u.i.), 134
Pontrjagin duality, 421 unimodular, 392
product measure, 112, 114, 189 unital, 10
radial integration, 119 unitary
Radon-Nikodym, 149 element, 321
Radon-Riesz, 270 operator, 293
Rainwater, 366 representation, 404
Rayleigh, 290 space, 439
Riesz representation, 185, 280 unitization, 329
Ryll-Nardzewski fixed point, 357 unordered sum, 17
separation, 249, 250 upcrossings, 468
Silverman-Toeplitz, 231 upper bound, 4
Sobolev’s inequality, 382, 383 upper sum, 101
spectral, 302, 338 usual metric, 11
stationary ergodic, 463 usual topology, 19
Stone–Čech compactification, 328
Stone–Weierstrass, 32, 36, 353 vanish at infinity, 35, 300
strong law of large numbers, 457 variance, 444
Tietze extension, 27, 34 variation of Brownian motion, 482
Tychonoff, 29 vector space, 8
uniform boundedness, 225 vector topology, 241
Vitali, 132 Volterra operator, 309
Vitali-Hahn-Saks, 145
weak
weak law of large numbers, 455
basis, 260
Wiener, 328
derivative, 371
time average, 460
operator topology, 403
topological
topology, 257
direct sum, 215
weak∗ topology, 262
group, 385
weak∗ -sequentially complete, 266
semigroup, 423
weakly
space, 19
analytic, 231
vector space, 241
520 Index

Cauchy, 261
integrable, 357
measurable, 357
sequentially complete, 261
weakly almost periodic
compactification, 426
function, 424
semigroup of operators, 437
vector, 437
weakly continuous operator, 228
Wiener process, see Brownian motion

Das könnte Ihnen auch gefallen